Home / Blog / Tutorials / Speech Translation Pipeline with Whisper, LLM, and TTS

Tutorials

Speech Translation Pipeline with Whisper, LLM, and TTS

Build a speech-to-speech translation pipeline using Whisper for transcription, an LLM for translation, and TTS for output synthesis across multiple languages on a GPU server.

Tutorials April 16, 2026 3 min read gigagpu

You will build a pipeline that takes spoken audio in one language, transcribes it with Whisper, translates it with an LLM, and synthesises the translated text as natural speech. The end result: upload a 5-minute German customer call and receive an English audio translation within 90 seconds. No cloud translation APIs, no per-minute charges, no audio data leaving your server. Here is the complete multilingual pipeline on dedicated GPU infrastructure.

Pipeline Architecture

Stage	Tool	Input	Output	VRAM
1. Transcription	Whisper Large v3	Source language audio	Source language text	~3GB
2. Translation	LLaMA 3.1 8B	Source text	Target language text	~6GB
3. Synthesis	Coqui XTTS v2	Translated text	Target language audio	~2GB

Stage 1: Multilingual Transcription

from faster_whisper import WhisperModel

whisper = WhisperModel("large-v3", device="cuda", compute_type="float16")

def transcribe(audio_path: str) -> dict:
    segments, info = whisper.transcribe(audio_path, beam_size=5)
    text_segments = []
    for seg in segments:
        text_segments.append({
            "start": seg.start, "end": seg.end, "text": seg.text
        })
    return {
        "language": info.language,
        "text": " ".join([s["text"] for s in text_segments]),
        "segments": text_segments
    }

Whisper Large v3 automatically detects the source language and transcribes with timestamps. The segment-level output preserves timing for subtitle generation.

Stage 2: LLM Translation

from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="none")

def translate_text(text: str, source_lang: str, target_lang: str = "English") -> str:
    response = client.chat.completions.create(
        model="meta-llama/Llama-3.1-8B-Instruct",
        messages=[{
            "role": "system",
            "content": f"Translate the following {source_lang} text to {target_lang}. "
                       f"Preserve the original meaning, tone, and technical terminology. "
                       f"Return only the translation, no commentary."
        }, {"role": "user", "content": text}],
        max_tokens=2000, temperature=0.2
    )
    return response.choices[0].message.content

LLMs produce higher-quality translations than traditional MT systems for conversational and domain-specific content because they understand context. The vLLM server handles batched translation of multiple segments efficiently.

Stage 3: Speech Synthesis

from TTS.api import TTS

tts = TTS("tts_models/multilingual/multi-dataset/xtts_v2").to("cuda")

def synthesise(text: str, output_path: str, target_lang: str = "en"):
    tts.tts_to_file(text=text, file_path=output_path, language=target_lang)
    return output_path

Coqui XTTS supports multiple output languages with natural prosody. For voice cloning, provide a reference audio sample from the target speaker.

Combined Translation Endpoint

from fastapi import FastAPI, UploadFile
from fastapi.responses import FileResponse
import tempfile

app = FastAPI()

@app.post("/translate-audio")
async def translate_audio(audio: UploadFile, target_lang: str = "English"):
    with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as f:
        f.write(await audio.read())
        input_path = f.name

    # Pipeline
    transcript = transcribe(input_path)
    translated = translate_text(transcript["text"], transcript["language"], target_lang)
    lang_code = {"English": "en", "French": "fr", "Spanish": "es", "German": "de"}
    output_path = input_path.replace(".wav", f"_{target_lang}.wav")
    synthesise(translated, output_path, lang_code.get(target_lang, "en"))

    return FileResponse(output_path, media_type="audio/wav")

Production Considerations

For production deployments: process long audio files in segments (5-minute chunks) to stay within model context limits; implement a queue for batch processing of multiple files; add language detection validation to catch Whisper misidentification; cache translations of repeated phrases; and monitor translation quality with periodic human review. For domain-specific terminology (medical, legal, technical), add glossary terms to the translation prompt. Deploy on private infrastructure for confidential audio. See model options for multilingual specialists, chatbot hosting for real-time voice interfaces, more tutorials, and industry use cases for translation deployments.

Translation AI GPU Servers

Dedicated GPU servers for multilingual speech pipelines. Run Whisper, LLMs, and TTS on isolated UK infrastructure.

Browse GPU Servers

Need a Dedicated GPU Server?

Deploy from RTX 3050 to RTX 5090. Full root access, NVMe storage, 1Gbps — UK datacenter.

Browse GPU Servers

Tutorials

gigagpu

We benchmark, deploy, and optimise GPU infrastructure for AI workloads. All data in our guides comes from real-world testing on our UK-based dedicated GPU servers.

Ready to deploy your AI workload?

Dedicated GPU servers from our UK datacenter. NVMe storage, 1Gbps networking, full root access.

Browse GPU Servers Contact Sales

Speech Translation Pipeline with Whisper, LLM, and TTS

Pipeline Architecture

Stage 1: Multilingual Transcription

Stage 2: LLM Translation

Stage 3: Speech Synthesis

Combined Translation Endpoint

Production Considerations

Translation AI GPU Servers

Need a Dedicated GPU Server?

gigagpu

GPU Hosting

Blog Categories

AI Model Hosting

Benchmarks & Tools

Deploy a GPU Server

Ready to deploy your AI workload?

Have a question? Need help?

Speech Translation Pipeline with Whisper, LLM, and TTS

Pipeline Architecture

Stage 1: Multilingual Transcription

Stage 2: LLM Translation

Stage 3: Speech Synthesis

Combined Translation Endpoint

Production Considerations

Translation AI GPU Servers

Need a Dedicated GPU Server?

gigagpu

Related Articles

Hybrid Search – BM25 Plus Embeddings on a GPU Server

Whisper+TTS Pipeline Latency Optimization

vLLM Memory Fragmentation: Defragmentation Guide

RTX 5060 Ti 16GB Ubuntu Driver Install

GPU Hosting

Blog Categories

AI Model Hosting

Benchmarks & Tools

Deploy a GPU Server

Ready to deploy your AI workload?

Have a question? Need help? Contact us

Have a question? Need help?