RTX 3050 - Order Now
Home / Blog / Tutorials / MCP Server with a Self-Hosted LLM
Tutorials

MCP Server with a Self-Hosted LLM

Model Context Protocol lets any compatible client wire tools and resources into an LLM. Self-hosted MCP servers stay private on your GPU server.

Model Context Protocol (MCP) is Anthropic’s standard for connecting tools, data, and prompts to LLM clients. The protocol is model-agnostic – once you have an MCP server, any MCP client (Claude Desktop, Cursor, custom) can use it. On our dedicated GPU hosting you can run both the MCP server and the LLM fully private.

Contents

What MCP Provides

Three primitives:

  • Tools: functions the LLM can call
  • Resources: read-only data the LLM can fetch
  • Prompts: parameterised prompt templates

One MCP server exposes any of these. Clients discover them at connection time.

Writing a Server

from mcp.server.fastmcp import FastMCP

mcp = FastMCP("internal-docs")

@mcp.tool()
def search_docs(query: str) -> str:
    """Search internal documentation."""
    return do_search(query)

@mcp.resource("docs://index")
def list_docs() -> str:
    return "\n".join(list_all_doc_paths())

if __name__ == "__main__":
    mcp.run(transport="stdio")

For HTTP-accessible servers use SSE transport instead of stdio. Both are supported by recent MCP clients.

Client

Custom clients can use the mcp Python SDK:

from mcp import ClientSession
from mcp.client.sse import sse_client

async with sse_client("http://your-gpu-server:8080/sse") as (read, write):
    async with ClientSession(read, write) as session:
        tools = await session.list_tools()
        result = await session.call_tool("search_docs", {"query": "auth"})

With Self-Hosted LLM

Orchestrator pattern: your application talks to the LLM via OpenAI-compatible API, and when the LLM emits a tool call, the orchestrator forwards that to the MCP server. The LLM and MCP server can live on the same GPU box or separate hosts.

The advantage over direct function calling: tools defined once in the MCP server work with any compatible client. You avoid duplicating tool definitions across codebases.

MCP + Self-Hosted LLM Hosting

UK dedicated GPU servers with MCP server and vLLM preconfigured.

Browse GPU Servers

See function calling with Llama 3.3 and tool use with Qwen Coder.

Need a Dedicated GPU Server?

Deploy from RTX 3050 to RTX 5090. Full root access, NVMe storage, 1Gbps — UK datacenter.

Browse GPU Servers

admin

We benchmark, deploy, and optimise GPU infrastructure for AI workloads. All data in our guides comes from real-world testing on our UK-based dedicated GPU servers.

Ready to deploy your AI workload?

Dedicated GPU servers from our UK datacenter. NVMe storage, 1Gbps networking, full root access.

Browse GPU Servers Contact Sales

Have a question? Need help?