RTX 3050 - Order Now
Home / Blog / Tutorials / Connect Terraform to Manage GPU Server Infrastructure
Tutorials

Connect Terraform to Manage GPU Server Infrastructure

Manage your GPU server infrastructure as code with Terraform. This guide covers provider configuration, resource definitions for GPU instances, and automating the deployment of AI inference servers with reproducible, version-controlled infrastructure.

What You’ll Connect

After this guide, your GPU server infrastructure will be fully managed through Terraform — no manual provisioning, no configuration drift. Terraform definitions describe your dedicated GPU servers, networking, and AI inference stack as code, enabling reproducible deployments that your team can version control and review like any other codebase.

The integration uses Terraform providers to provision GPU instances and then applies configuration resources to install and start your vLLM or Ollama inference endpoint. Infrastructure changes go through pull request reviews, not ad-hoc SSH sessions.

terraform plan –> Provider API –> GPU Server Created (.tf files in previews changes provisions with NVIDIA drivers version control) GPU instance and AI stack | | terraform apply –> Provisioner –> SSH/cloud-init –> vLLM running executes plan runs setup installs deps on dedicated GPU scripts starts inference –>

Prerequisites

  • Terraform 1.5+ installed on your workstation or CI runner
  • API credentials for your GPU infrastructure provider
  • A Git repository for storing Terraform configuration files
  • SSH key pair for post-provisioning configuration
  • Familiarity with your vLLM deployment requirements

Integration Steps

Create a new Terraform project directory with main.tf, variables.tf, and outputs.tf. Configure the provider block with your GPU hosting credentials. Define a GPU server resource specifying the instance type, GPU model (RTX 6000 Pro, RTX 6000 Pro, etc.), operating system, and SSH key.

Add a provisioner block that runs after the server is created. The provisioner SSHs into the new instance and executes a setup script: installing NVIDIA drivers, Docker, and pulling your inference container image. This automates the entire server bring-up process described in our self-host LLM guide.

Define networking resources — a firewall or security group that opens only the inference API port (8000) and SSH (22) to allowed IP ranges. Use Terraform outputs to expose the server’s public IP and API endpoint URL for downstream configuration.

Code Example

Terraform configuration for provisioning a GPU server and deploying a FastAPI inference server:

# main.tf
terraform {
  required_providers {
    cloud = { source = "your-provider/cloud", version = "~> 1.0" }
  }
}

variable "gpu_api_key" { type = string, sensitive = true }
variable "ssh_public_key" { type = string }

resource "cloud_ssh_key" "ai_server" {
  name       = "ai-inference-key"
  public_key = var.ssh_public_key
}

resource "cloud_server" "gpu_inference" {
  name        = "ai-inference-prod"
  image       = "ubuntu-22.04-cuda-12"
  size        = "gpu-a100-80gb"
  region      = "london"
  ssh_keys    = [cloud_ssh_key.ai_server.id]

  provisioner "remote-exec" {
    inline = [
      "apt-get update && apt-get install -y docker.io nvidia-container-toolkit",
      "systemctl enable docker && systemctl start docker",
      "docker run -d --gpus all -p 8000:8000 --name vllm \\",
      "  vllm/vllm-openai:latest \\",
      "  --model meta-llama/Llama-3-70b-chat-hf \\",
      "  --api-key ${var.gpu_api_key} \\",
      "  --tensor-parallel-size 1",
    ]
    connection {
      type        = "ssh"
      user        = "root"
      private_key = file("~/.ssh/id_rsa")
      host        = self.ipv4_address
    }
  }
}

resource "cloud_firewall" "inference_rules" {
  name = "ai-inference-fw"
  inbound_rule { protocol = "tcp"; port_range = "8000"; source_addresses = ["YOUR_CIDR/32"] }
  inbound_rule { protocol = "tcp"; port_range = "22"; source_addresses = ["YOUR_CIDR/32"] }
}

output "inference_url" {
  value = "https://${cloud_server.gpu_inference.ipv4_address}:8000/v1"
}

Testing Your Integration

Run terraform init to download the provider, then terraform plan to preview the resources. Review the plan output to confirm the GPU instance type, region, and firewall rules match your requirements. Apply with terraform apply and monitor the provisioner output for installation progress.

Once provisioning completes, test the inference endpoint using the output URL: curl https://SERVER_IP:8000/v1/models. Verify the model is loaded and accepting requests. Run terraform destroy on a test environment to confirm clean teardown.

Production Tips

Store Terraform state in a remote backend (S3, GCS, or Terraform Cloud) so your team shares a single source of truth for infrastructure state. Use workspaces to manage separate staging and production GPU environments from the same configuration files.

Separate the GPU provisioning from the application deployment. Use Terraform for the infrastructure layer (server, networking, DNS) and a configuration management tool like Ansible for the application layer (model updates, config changes). This lets you update models without reprovisioning servers.

For teams scaling open-source AI across multiple GPU servers, Terraform’s count and for_each meta-arguments let you provision fleets of inference servers with a single configuration change. Pair with a load balancer resource to distribute traffic. Secure endpoints with our API security guide, explore more tutorials, or get started with GigaGPU to manage your GPU infrastructure as code.

Need a Dedicated GPU Server?

Deploy from RTX 3050 to RTX 5090. Full root access, NVMe storage, 1Gbps — UK datacenter.

Browse GPU Servers

admin

We benchmark, deploy, and optimise GPU infrastructure for AI workloads. All data in our guides comes from real-world testing on our UK-based dedicated GPU servers.

Ready to deploy your AI workload?

Dedicated GPU servers from our UK datacenter. NVMe storage, 1Gbps networking, full root access.

Browse GPU Servers Contact Sales

Have a question? Need help?