RTX 3050 - Order Now
Home / Blog / Tutorials / RTX 5060 Ti 16GB LangChain Quickstart
Tutorials

RTX 5060 Ti 16GB LangChain Quickstart

Connect LangChain to your self-hosted vLLM on Blackwell 16GB - RAG chains, agents, and structured outputs.

LangChain builds LLM pipelines (retrieval, tool use, agents). Point it at your self-hosted vLLM on the RTX 5060 Ti 16GB via our hosting:

Contents

Install

uv pip install langchain langchain-openai langchain-community langchain-huggingface chromadb

Connect to vLLM

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(
    model="meta-llama/Llama-3.1-8B-Instruct",
    openai_api_base="http://localhost:8000/v1",
    openai_api_key="none",  # vLLM doesn't require a key by default
    temperature=0.3,
)

result = llm.invoke("In one sentence, what is LangChain?")
print(result.content)

RAG Chain

from langchain_community.vectorstores import Chroma
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough

embeddings = HuggingFaceEmbeddings(model_name="BAAI/bge-base-en-v1.5")
vs = Chroma(embedding_function=embeddings, persist_directory="./vs")

retriever = vs.as_retriever(search_kwargs={"k": 4})

prompt = ChatPromptTemplate.from_template(
    "Answer based only on the context:\n\n{context}\n\nQuestion: {question}"
)

rag = {"context": retriever, "question": RunnablePassthrough()} | prompt | llm
print(rag.invoke("What was our Q3 revenue?").content)

Tool-Use Agent

from langchain.agents import AgentExecutor, create_tool_calling_agent
from langchain_core.tools import tool

@tool
def get_weather(city: str) -> str:
    """Get the current weather for a city."""
    return f"The weather in {city} is sunny."

tools = [get_weather]
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant."),
    ("user", "{input}"),
    ("placeholder", "{agent_scratchpad}"),
])
agent = create_tool_calling_agent(llm, tools, prompt)
exe = AgentExecutor(agent=agent, tools=tools)
exe.invoke({"input": "What's the weather in Manchester?"})

Llama 3.1 8B handles simple tool calls fine; for complex multi-tool agents, Qwen 2.5 14B is stronger.

LangChain + Self-Hosted LLM

Full LangChain toolkit on Blackwell 16GB. UK dedicated hosting.

Order the RTX 5060 Ti 16GB

See also: LlamaIndex quickstart, RAG stack install, vLLM setup, embedding server.

Need a Dedicated GPU Server?

Deploy from RTX 3050 to RTX 5090. Full root access, NVMe storage, 1Gbps — UK datacenter.

Browse GPU Servers

admin

We benchmark, deploy, and optimise GPU infrastructure for AI workloads. All data in our guides comes from real-world testing on our UK-based dedicated GPU servers.

Ready to deploy your AI workload?

Dedicated GPU servers from our UK datacenter. NVMe storage, 1Gbps networking, full root access.

Browse GPU Servers Contact Sales

Have a question? Need help?