RTX 3050 - Order Now
Home / Blog / Tutorials / Connect JetBrains IDE to Self-Hosted AI on GPU
Tutorials

Connect JetBrains IDE to Self-Hosted AI on GPU

Connect IntelliJ, PyCharm, or WebStorm to a self-hosted AI model on GPU for private code completion and chat. This guide covers the Continue plugin for JetBrains IDEs, API configuration, and getting AI assistance from your own inference endpoint.

What You’ll Connect

After this guide, your JetBrains IDE will have AI code completion and chat powered by your own GPU server — no API costs, no rate limits. Whether you use IntelliJ IDEA, PyCharm, WebStorm, or any other JetBrains product, the Continue plugin connects to your vLLM or Ollama endpoint on dedicated GPU hardware.

The setup delivers inline code suggestions, refactoring assistance, and an AI chat panel — essentially a private AI coding assistant that runs entirely on your infrastructure. Your source code stays within your network at all times.

Continue Plugin –> HTTP Client –> GPU Server (vLLM) (IntelliJ, PyCharm, reads editor /v1/chat/ or Code model inference WebStorm, etc.) context /v1/completions on dedicated GPU | | Inline completion <-- Plugin renders <-- JSON response <-- Model output or chat response in editor UI from server returned -->

Prerequisites

  • A GigaGPU server with a code model running behind an OpenAI-compatible API (vLLM guide)
  • Any JetBrains IDE (2023.1 or later) — IntelliJ IDEA, PyCharm, WebStorm, GoLand, etc.
  • The Continue plugin installed from the JetBrains Marketplace
  • HTTPS access to your GPU server from your development machine (Nginx proxy guide)

Integration Steps

Open your JetBrains IDE and navigate to Settings > Plugins > Marketplace. Search for “Continue” and install the plugin. Restart the IDE. The Continue tool window appears in the sidebar.

The plugin uses the same configuration file as the VS Code version: ~/.continue/config.json. If you have already configured it for VS Code, JetBrains will use the same settings automatically. Otherwise, create the config file and add your GPU server as an OpenAI-compatible provider.

Configure both a chat model (larger, for explanations and refactoring) and an autocomplete model (smaller, for inline tab completions). Point both at your GPU inference endpoint with the appropriate model identifiers from your vLLM deployment.

Code Example

Create or update ~/.continue/config.json to connect your JetBrains IDE to your GPU inference server:

{
  "models": [
    {
      "title": "CodeLlama 70B (Chat)",
      "provider": "openai",
      "model": "codellama/CodeLlama-70b-Instruct-hf",
      "apiBase": "https://your-gpu-server.gigagpu.com/v1",
      "apiKey": "sk-your-gpu-api-key",
      "contextLength": 16384
    }
  ],
  "tabAutocompleteModel": {
    "title": "StarCoder2 3B (Autocomplete)",
    "provider": "openai",
    "model": "bigcode/starcoder2-3b",
    "apiBase": "https://your-gpu-server.gigagpu.com/v1",
    "apiKey": "sk-your-gpu-api-key"
  },
  "allowAnonymousTelemetry": false,
  "tabAutocompleteOptions": {
    "debounceDelay": 500,
    "maxPromptTokens": 2048
  }
}

Testing Your Integration

Open a project in your JetBrains IDE and begin writing code. Inline suggestions should appear as ghost text after a brief pause — press Tab to accept. Open the Continue panel from the sidebar and ask the AI to explain the current function or suggest improvements.

Test across different languages supported by your model. Verify completions are contextually relevant to your code, not generic snippets. Check the Continue plugin logs (available in the IDE’s log file) for connection status and any errors.

Production Tips

JetBrains IDEs run their own indexing alongside AI suggestions. Increase the debounceDelay in the config if autocomplete suggestions interfere with the IDE’s native completions. A 500ms delay gives the IDE time to show its own suggestions first, with AI filling in when the native completer has no match.

For enterprise teams, centralise the Continue config by distributing it through your team’s shared configuration. Every developer uses the same GPU endpoint and model versions, ensuring consistent AI assistance across the organisation. Restrict model access using your API key management.

Self-hosted code models on open-source GPU infrastructure give JetBrains users Copilot-equivalent features without per-seat licensing or data leaving the network. Explore more tutorials or get started with GigaGPU to power your JetBrains AI integration.

Need a Dedicated GPU Server?

Deploy from RTX 3050 to RTX 5090. Full root access, NVMe storage, 1Gbps — UK datacenter.

Browse GPU Servers

admin

We benchmark, deploy, and optimise GPU infrastructure for AI workloads. All data in our guides comes from real-world testing on our UK-based dedicated GPU servers.

Ready to deploy your AI workload?

Dedicated GPU servers from our UK datacenter. NVMe storage, 1Gbps networking, full root access.

Browse GPU Servers Contact Sales

Have a question? Need help?