You will build a pipeline that takes a recorded meeting audio file, transcribes it with Whisper, and produces structured notes: a summary, key decisions, action items with owners, and follow-up questions. The end result: upload a 45-minute team meeting recording and receive formatted notes in under 3 minutes. No meeting audio or transcripts leave your infrastructure — critical for board meetings, HR discussions, and client calls. Here is the full pipeline on dedicated GPU infrastructure.
Pipeline Architecture
| Stage | Tool | Output | VRAM |
|---|---|---|---|
| 1. Transcription | Whisper Large v3 | Timestamped transcript | ~3GB |
| 2. Summarisation | LLaMA 3.1 8B | Structured meeting notes | ~6GB |
| 3. Distribution | Email / Slack webhook | Formatted notes delivery | CPU |
Stage 1: Audio Transcription
from faster_whisper import WhisperModel
whisper = WhisperModel("large-v3", device="cuda", compute_type="float16")
def transcribe_meeting(audio_path: str) -> dict:
segments, info = whisper.transcribe(
audio_path, beam_size=5, word_timestamps=True
)
transcript = []
for seg in segments:
transcript.append({
"start": round(seg.start, 1),
"end": round(seg.end, 1),
"text": seg.text.strip()
})
return {
"language": info.language,
"duration": info.duration,
"segments": transcript,
"full_text": " ".join([s["text"] for s in transcript])
}
Whisper Large v3 handles multi-speaker meetings well. For improved speaker attribution, add a diarisation step (see the podcast transcription recipe).
Stage 2: Structured Extraction
from openai import OpenAI
import json
client = OpenAI(base_url="http://localhost:8000/v1", api_key="none")
def extract_meeting_notes(transcript: str) -> dict:
# Process in chunks if transcript exceeds context window
response = client.chat.completions.create(
model="meta-llama/Llama-3.1-8B-Instruct",
messages=[{
"role": "system",
"content": """Analyse this meeting transcript and return JSON:
{"title": "Meeting topic",
"date": "if mentioned",
"attendees": ["names mentioned"],
"summary": "3-5 sentence overview",
"key_decisions": ["list of decisions made"],
"action_items": [{"task": "", "owner": "", "deadline": "", "priority": "high|medium|low"}],
"open_questions": ["unresolved items"],
"next_meeting": "if discussed"}"""
}, {"role": "user", "content": transcript}],
max_tokens=1500, temperature=0.1
)
return json.loads(response.choices[0].message.content)
The vLLM server handles the extraction. For meetings longer than the model’s context window, split the transcript into overlapping chunks and merge the extracted items.
Formatting and Distribution
def format_notes_html(notes: dict) -> str:
html = f"{notes['title']}
"
html += f"Summary: {notes['summary']}
"
html += "Key Decisions
"
for decision in notes["key_decisions"]:
html += f"- {decision}
"
html += "
Action Items
"
html += "Task Owner Deadline Priority "
for item in notes["action_items"]:
html += f"{item['task']} {item['owner']} "
html += f"{item['deadline']} {item['priority']} "
html += "
"
return html
Upload API
from fastapi import FastAPI, UploadFile
app = FastAPI()
@app.post("/process-meeting")
async def process_meeting(audio: UploadFile):
# Save and transcribe
path = save_upload(audio)
transcript = transcribe_meeting(path)
# Extract structured notes
notes = extract_meeting_notes(transcript["full_text"])
notes["duration_minutes"] = round(transcript["duration"] / 60)
notes["transcript"] = transcript["segments"]
# Format and optionally distribute
html = format_notes_html(notes)
return {"notes": notes, "html": html}
Production Features
For production: add speaker diarisation to attribute statements to specific attendees; implement a searchable archive of past meeting notes with RAG retrieval using ChromaDB; create follow-up reminders from action items with deadlines; and integrate with calendar APIs to auto-process scheduled meeting recordings. For confidential meetings (HR, board, legal), deploy on private infrastructure with access controls. See model options for larger models with better extraction accuracy, chatbot hosting for meeting Q&A interfaces, more tutorials, and business use cases.
Meeting AI GPU Servers
Dedicated GPU servers for audio transcription and meeting intelligence. Process confidential recordings on isolated UK infrastructure.
Browse GPU Servers