Model Context Protocol (MCP) is Anthropic’s standard for connecting tools, data, and prompts to LLM clients. The protocol is model-agnostic – once you have an MCP server, any MCP client (Claude Desktop, Cursor, custom) can use it. On our dedicated GPU hosting you can run both the MCP server and the LLM fully private.
Contents
What MCP Provides
Three primitives:
- Tools: functions the LLM can call
- Resources: read-only data the LLM can fetch
- Prompts: parameterised prompt templates
One MCP server exposes any of these. Clients discover them at connection time.
Writing a Server
from mcp.server.fastmcp import FastMCP
mcp = FastMCP("internal-docs")
@mcp.tool()
def search_docs(query: str) -> str:
"""Search internal documentation."""
return do_search(query)
@mcp.resource("docs://index")
def list_docs() -> str:
return "\n".join(list_all_doc_paths())
if __name__ == "__main__":
mcp.run(transport="stdio")
For HTTP-accessible servers use SSE transport instead of stdio. Both are supported by recent MCP clients.
Client
Custom clients can use the mcp Python SDK:
from mcp import ClientSession
from mcp.client.sse import sse_client
async with sse_client("http://your-gpu-server:8080/sse") as (read, write):
async with ClientSession(read, write) as session:
tools = await session.list_tools()
result = await session.call_tool("search_docs", {"query": "auth"})
With Self-Hosted LLM
Orchestrator pattern: your application talks to the LLM via OpenAI-compatible API, and when the LLM emits a tool call, the orchestrator forwards that to the MCP server. The LLM and MCP server can live on the same GPU box or separate hosts.
The advantage over direct function calling: tools defined once in the MCP server work with any compatible client. You avoid duplicating tool definitions across codebases.
MCP + Self-Hosted LLM Hosting
UK dedicated GPU servers with MCP server and vLLM preconfigured.
Browse GPU ServersSee function calling with Llama 3.3 and tool use with Qwen Coder.