Home / Blog / Tutorials / MCP Server with a Self-Hosted LLM

Tutorials

MCP Server with a Self-Hosted LLM

Model Context Protocol lets any compatible client wire tools and resources into an LLM. Self-hosted MCP servers stay private on your GPU server.

Tutorials April 23, 2026 2 min read gigagpu

Model Context Protocol (MCP) is Anthropic’s standard for connecting tools, data, and prompts to LLM clients. The protocol is model-agnostic – once you have an MCP server, any MCP client (Claude Desktop, Cursor, custom) can use it. On our dedicated GPU hosting you can run both the MCP server and the LLM fully private.

What MCP provides
Writing a server
Client integration
Pairing with a self-hosted LLM

What MCP Provides

Three primitives:

Tools: functions the LLM can call
Resources: read-only data the LLM can fetch
Prompts: parameterised prompt templates

One MCP server exposes any of these. Clients discover them at connection time.

Writing a Server

from mcp.server.fastmcp import FastMCP

mcp = FastMCP("internal-docs")

@mcp.tool()
def search_docs(query: str) -> str:
    """Search internal documentation."""
    return do_search(query)

@mcp.resource("docs://index")
def list_docs() -> str:
    return "\n".join(list_all_doc_paths())

if __name__ == "__main__":
    mcp.run(transport="stdio")

For HTTP-accessible servers use SSE transport instead of stdio. Both are supported by recent MCP clients.

Client

Custom clients can use the mcp Python SDK:

from mcp import ClientSession
from mcp.client.sse import sse_client

async with sse_client("http://your-gpu-server:8080/sse") as (read, write):
    async with ClientSession(read, write) as session:
        tools = await session.list_tools()
        result = await session.call_tool("search_docs", {"query": "auth"})

With Self-Hosted LLM

Orchestrator pattern: your application talks to the LLM via OpenAI-compatible API, and when the LLM emits a tool call, the orchestrator forwards that to the MCP server. The LLM and MCP server can live on the same GPU box or separate hosts.

The advantage over direct function calling: tools defined once in the MCP server work with any compatible client. You avoid duplicating tool definitions across codebases.

MCP + Self-Hosted LLM Hosting

UK dedicated GPU servers with MCP server and vLLM preconfigured.

Browse GPU Servers

See function calling with Llama 3.3 and tool use with Qwen Coder.

Need a Dedicated GPU Server?

Deploy from RTX 3050 to RTX 5090. Full root access, NVMe storage, 1Gbps — UK datacenter.

Browse GPU Servers

Tutorials

gigagpu

We benchmark, deploy, and optimise GPU infrastructure for AI workloads. All data in our guides comes from real-world testing on our UK-based dedicated GPU servers.

Ready to deploy your AI workload?

Dedicated GPU servers from our UK datacenter. NVMe storage, 1Gbps networking, full root access.

Browse GPU Servers Contact Sales

MCP Server with a Self-Hosted LLM

Contents

What MCP Provides

Writing a Server

Client

With Self-Hosted LLM

MCP + Self-Hosted LLM Hosting

Need a Dedicated GPU Server?

gigagpu

GPU Hosting

Blog Categories

AI Model Hosting

Benchmarks & Tools

Deploy a GPU Server

Ready to deploy your AI workload?

Have a question? Need help?

MCP Server with a Self-Hosted LLM

Contents

What MCP Provides

Writing a Server

Client

With Self-Hosted LLM

MCP + Self-Hosted LLM Hosting

Need a Dedicated GPU Server?

gigagpu

Related Articles

Retrieval-Augmented Fine-Tuning (RAFT)

Semantic Kernel vs LangChain

Self-Hosted OpenAI-Compatible API: A Complete Replacement Guide for vLLM, Ollama and TGI

CUDA Graphs in vLLM

GPU Hosting

Blog Categories

AI Model Hosting

Benchmarks & Tools

Deploy a GPU Server

Ready to deploy your AI workload?

Have a question? Need help? Contact us

Have a question? Need help?