RTX 3050 - Order Now
Home / Blog / Tutorials / Meeting Notes Pipeline with Whisper and LLM
Tutorials

Meeting Notes Pipeline with Whisper and LLM

Build an automated meeting notes pipeline that transcribes recordings with Whisper, extracts action items with an LLM, and distributes structured summaries on a GPU server.

You will build a pipeline that takes a recorded meeting audio file, transcribes it with Whisper, and produces structured notes: a summary, key decisions, action items with owners, and follow-up questions. The end result: upload a 45-minute team meeting recording and receive formatted notes in under 3 minutes. No meeting audio or transcripts leave your infrastructure — critical for board meetings, HR discussions, and client calls. Here is the full pipeline on dedicated GPU infrastructure.

Pipeline Architecture

StageToolOutputVRAM
1. TranscriptionWhisper Large v3Timestamped transcript~3GB
2. SummarisationLLaMA 3.1 8BStructured meeting notes~6GB
3. DistributionEmail / Slack webhookFormatted notes deliveryCPU

Stage 1: Audio Transcription

from faster_whisper import WhisperModel

whisper = WhisperModel("large-v3", device="cuda", compute_type="float16")

def transcribe_meeting(audio_path: str) -> dict:
    segments, info = whisper.transcribe(
        audio_path, beam_size=5, word_timestamps=True
    )
    transcript = []
    for seg in segments:
        transcript.append({
            "start": round(seg.start, 1),
            "end": round(seg.end, 1),
            "text": seg.text.strip()
        })
    return {
        "language": info.language,
        "duration": info.duration,
        "segments": transcript,
        "full_text": " ".join([s["text"] for s in transcript])
    }

Whisper Large v3 handles multi-speaker meetings well. For improved speaker attribution, add a diarisation step (see the podcast transcription recipe).

Stage 2: Structured Extraction

from openai import OpenAI
import json

client = OpenAI(base_url="http://localhost:8000/v1", api_key="none")

def extract_meeting_notes(transcript: str) -> dict:
    # Process in chunks if transcript exceeds context window
    response = client.chat.completions.create(
        model="meta-llama/Llama-3.1-8B-Instruct",
        messages=[{
            "role": "system",
            "content": """Analyse this meeting transcript and return JSON:
{"title": "Meeting topic",
 "date": "if mentioned",
 "attendees": ["names mentioned"],
 "summary": "3-5 sentence overview",
 "key_decisions": ["list of decisions made"],
 "action_items": [{"task": "", "owner": "", "deadline": "", "priority": "high|medium|low"}],
 "open_questions": ["unresolved items"],
 "next_meeting": "if discussed"}"""
        }, {"role": "user", "content": transcript}],
        max_tokens=1500, temperature=0.1
    )
    return json.loads(response.choices[0].message.content)

The vLLM server handles the extraction. For meetings longer than the model’s context window, split the transcript into overlapping chunks and merge the extracted items.

Formatting and Distribution

def format_notes_html(notes: dict) -> str:
    html = f"

{notes['title']}

" html += f"

Summary: {notes['summary']}

" html += "

Key Decisions

    " for decision in notes["key_decisions"]: html += f"
  • {decision}
  • " html += "

Action Items

" html += "" for item in notes["action_items"]: html += f"" html += f"" html += "
TaskOwnerDeadlinePriority
{item['task']}{item['owner']}{item['deadline']}{item['priority']}
" return html

Upload API

from fastapi import FastAPI, UploadFile
app = FastAPI()

@app.post("/process-meeting")
async def process_meeting(audio: UploadFile):
    # Save and transcribe
    path = save_upload(audio)
    transcript = transcribe_meeting(path)

    # Extract structured notes
    notes = extract_meeting_notes(transcript["full_text"])
    notes["duration_minutes"] = round(transcript["duration"] / 60)
    notes["transcript"] = transcript["segments"]

    # Format and optionally distribute
    html = format_notes_html(notes)
    return {"notes": notes, "html": html}

Production Features

For production: add speaker diarisation to attribute statements to specific attendees; implement a searchable archive of past meeting notes with RAG retrieval using ChromaDB; create follow-up reminders from action items with deadlines; and integrate with calendar APIs to auto-process scheduled meeting recordings. For confidential meetings (HR, board, legal), deploy on private infrastructure with access controls. See model options for larger models with better extraction accuracy, chatbot hosting for meeting Q&A interfaces, more tutorials, and business use cases.

Meeting AI GPU Servers

Dedicated GPU servers for audio transcription and meeting intelligence. Process confidential recordings on isolated UK infrastructure.

Browse GPU Servers

Need a Dedicated GPU Server?

Deploy from RTX 3050 to RTX 5090. Full root access, NVMe storage, 1Gbps — UK datacenter.

Browse GPU Servers

admin

We benchmark, deploy, and optimise GPU infrastructure for AI workloads. All data in our guides comes from real-world testing on our UK-based dedicated GPU servers.

Ready to deploy your AI workload?

Dedicated GPU servers from our UK datacenter. NVMe storage, 1Gbps networking, full root access.

Browse GPU Servers Contact Sales

Have a question? Need help?