Home / Blog / Tutorials / Meeting Notes Pipeline with Whisper and LLM

Tutorials

Meeting Notes Pipeline with Whisper and LLM

Build an automated meeting notes pipeline that transcribes recordings with Whisper, extracts action items with an LLM, and distributes structured summaries on a GPU server.

Tutorials April 16, 2026 3 min read gigagpu

You will build a pipeline that takes a recorded meeting audio file, transcribes it with Whisper, and produces structured notes: a summary, key decisions, action items with owners, and follow-up questions. The end result: upload a 45-minute team meeting recording and receive formatted notes in under 3 minutes. No meeting audio or transcripts leave your infrastructure — critical for board meetings, HR discussions, and client calls. Here is the full pipeline on dedicated GPU infrastructure.

Pipeline Architecture

Stage	Tool	Output	VRAM
1. Transcription	Whisper Large v3	Timestamped transcript	~3GB
2. Summarisation	LLaMA 3.1 8B	Structured meeting notes	~6GB
3. Distribution	Email / Slack webhook	Formatted notes delivery	CPU

Stage 1: Audio Transcription

from faster_whisper import WhisperModel

whisper = WhisperModel("large-v3", device="cuda", compute_type="float16")

def transcribe_meeting(audio_path: str) -> dict:
    segments, info = whisper.transcribe(
        audio_path, beam_size=5, word_timestamps=True
    )
    transcript = []
    for seg in segments:
        transcript.append({
            "start": round(seg.start, 1),
            "end": round(seg.end, 1),
            "text": seg.text.strip()
        })
    return {
        "language": info.language,
        "duration": info.duration,
        "segments": transcript,
        "full_text": " ".join([s["text"] for s in transcript])
    }

Whisper Large v3 handles multi-speaker meetings well. For improved speaker attribution, add a diarisation step (see the podcast transcription recipe).

Stage 2: Structured Extraction

from openai import OpenAI
import json

client = OpenAI(base_url="http://localhost:8000/v1", api_key="none")

def extract_meeting_notes(transcript: str) -> dict:
    # Process in chunks if transcript exceeds context window
    response = client.chat.completions.create(
        model="meta-llama/Llama-3.1-8B-Instruct",
        messages=[{
            "role": "system",
            "content": """Analyse this meeting transcript and return JSON:
{"title": "Meeting topic",
 "date": "if mentioned",
 "attendees": ["names mentioned"],
 "summary": "3-5 sentence overview",
 "key_decisions": ["list of decisions made"],
 "action_items": [{"task": "", "owner": "", "deadline": "", "priority": "high|medium|low"}],
 "open_questions": ["unresolved items"],
 "next_meeting": "if discussed"}"""
        }, {"role": "user", "content": transcript}],
        max_tokens=1500, temperature=0.1
    )
    return json.loads(response.choices[0].message.content)

The vLLM server handles the extraction. For meetings longer than the model’s context window, split the transcript into overlapping chunks and merge the extracted items.

Formatting and Distribution

def format_notes_html(notes: dict) -> str:
    html = f"{notes['title']}"
    html += f"Summary: {notes['summary']}"
    html += "Key Decisions
"
    for decision in notes["key_decisions"]:
        html += f"{decision}"
    html += "
Action Items"
    html += ""
    for item in notes["action_items"]:
        html += f""
        html += f""
    html += "Task Owner Deadline Priority
{item['task']} {item['owner']} {item['deadline']} {item['priority']}"
    return html

Task	Owner	Deadline	Priority
{item['task']}	{item['owner']}	{item['deadline']}	{item['priority']}

Upload API

from fastapi import FastAPI, UploadFile
app = FastAPI()

@app.post("/process-meeting")
async def process_meeting(audio: UploadFile):
    # Save and transcribe
    path = save_upload(audio)
    transcript = transcribe_meeting(path)

    # Extract structured notes
    notes = extract_meeting_notes(transcript["full_text"])
    notes["duration_minutes"] = round(transcript["duration"] / 60)
    notes["transcript"] = transcript["segments"]

    # Format and optionally distribute
    html = format_notes_html(notes)
    return {"notes": notes, "html": html}

Production Features

For production: add speaker diarisation to attribute statements to specific attendees; implement a searchable archive of past meeting notes with RAG retrieval using ChromaDB; create follow-up reminders from action items with deadlines; and integrate with calendar APIs to auto-process scheduled meeting recordings. For confidential meetings (HR, board, legal), deploy on private infrastructure with access controls. See model options for larger models with better extraction accuracy, chatbot hosting for meeting Q&A interfaces, more tutorials, and business use cases.

Meeting AI GPU Servers

Dedicated GPU servers for audio transcription and meeting intelligence. Process confidential recordings on isolated UK infrastructure.

Browse GPU Servers

Need a Dedicated GPU Server?

Deploy from RTX 3050 to RTX 5090. Full root access, NVMe storage, 1Gbps — UK datacenter.

Browse GPU Servers

Tutorials

gigagpu

We benchmark, deploy, and optimise GPU infrastructure for AI workloads. All data in our guides comes from real-world testing on our UK-based dedicated GPU servers.

Ready to deploy your AI workload?

Dedicated GPU servers from our UK datacenter. NVMe storage, 1Gbps networking, full root access.

Browse GPU Servers Contact Sales

Meeting Notes Pipeline with Whisper and LLM

Pipeline Architecture

Stage 1: Audio Transcription

Stage 2: Structured Extraction

Formatting and Distribution

{notes['title']}

Key Decisions

Action Items

Upload API

Production Features

Meeting AI GPU Servers

Need a Dedicated GPU Server?

gigagpu

GPU Hosting

Blog Categories

AI Model Hosting

Benchmarks & Tools

Deploy a GPU Server

Ready to deploy your AI workload?

Have a question? Need help?

Meeting Notes Pipeline with Whisper and LLM

Pipeline Architecture

Stage 1: Audio Transcription

Stage 2: Structured Extraction

Formatting and Distribution

{notes['title']}

Key Decisions

Action Items

Upload API

Production Features

Meeting AI GPU Servers

Need a Dedicated GPU Server?

gigagpu

Related Articles

Migrate from Anthropic to Self-Hosted: Customer Support Guide

Self-Hosted LLM Fine-Tuning Pipeline: Data, Training, Eval, Deploy

Blue-Green Deployment for an LLM API

Synthetic Training Data Generation – Self-Hosted Pipeline

GPU Hosting

Blog Categories

AI Model Hosting

Benchmarks & Tools

Deploy a GPU Server

Ready to deploy your AI workload?

Have a question? Need help? Contact us

Have a question? Need help?