LangChain builds LLM pipelines (retrieval, tool use, agents). Point it at your self-hosted vLLM on the RTX 5060 Ti 16GB via our hosting:
Contents
Install
uv pip install langchain langchain-openai langchain-community langchain-huggingface chromadb
Connect to vLLM
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(
model="meta-llama/Llama-3.1-8B-Instruct",
openai_api_base="http://localhost:8000/v1",
openai_api_key="none", # vLLM doesn't require a key by default
temperature=0.3,
)
result = llm.invoke("In one sentence, what is LangChain?")
print(result.content)
RAG Chain
from langchain_community.vectorstores import Chroma
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
embeddings = HuggingFaceEmbeddings(model_name="BAAI/bge-base-en-v1.5")
vs = Chroma(embedding_function=embeddings, persist_directory="./vs")
retriever = vs.as_retriever(search_kwargs={"k": 4})
prompt = ChatPromptTemplate.from_template(
"Answer based only on the context:\n\n{context}\n\nQuestion: {question}"
)
rag = {"context": retriever, "question": RunnablePassthrough()} | prompt | llm
print(rag.invoke("What was our Q3 revenue?").content)
Tool-Use Agent
from langchain.agents import AgentExecutor, create_tool_calling_agent
from langchain_core.tools import tool
@tool
def get_weather(city: str) -> str:
"""Get the current weather for a city."""
return f"The weather in {city} is sunny."
tools = [get_weather]
prompt = ChatPromptTemplate.from_messages([
("system", "You are a helpful assistant."),
("user", "{input}"),
("placeholder", "{agent_scratchpad}"),
])
agent = create_tool_calling_agent(llm, tools, prompt)
exe = AgentExecutor(agent=agent, tools=tools)
exe.invoke({"input": "What's the weather in Manchester?"})
Llama 3.1 8B handles simple tool calls fine; for complex multi-tool agents, Qwen 2.5 14B is stronger.
LangChain + Self-Hosted LLM
Full LangChain toolkit on Blackwell 16GB. UK dedicated hosting.
Order the RTX 5060 Ti 16GBSee also: LlamaIndex quickstart, RAG stack install, vLLM setup, embedding server.