Axolotl lets you describe a fine-tune in a YAML config and run it. On our dedicated GPU hosting it is the right tool when you want reproducible, checked-in training runs rather than bespoke scripts.
Contents
Install
pip install packaging ninja
pip install axolotl[flash-attn,deepspeed]
Verify your PyTorch and CUDA match (Axolotl docs list supported combinations).
Config
A typical QLoRA config for Llama 3.1 8B:
base_model: meta-llama/Meta-Llama-3.1-8B-Instruct
model_type: LlamaForCausalLM
tokenizer_type: AutoTokenizer
load_in_4bit: true
adapter: qlora
lora_r: 16
lora_alpha: 32
lora_dropout: 0.05
lora_target_modules:
- q_proj
- k_proj
- v_proj
- o_proj
datasets:
- path: your_dataset.jsonl
type: chat_template
sequence_len: 4096
sample_packing: true
pad_to_sequence_len: true
micro_batch_size: 2
gradient_accumulation_steps: 8
num_epochs: 3
learning_rate: 2e-4
optimizer: paged_adamw_8bit
bf16: true
gradient_checkpointing: true
output_dir: ./out
logging_steps: 10
save_steps: 200
Launch
accelerate launch -m axolotl.cli.train config.yml
For multi-GPU training on two or more cards Axolotl picks up accelerate config and routes through DeepSpeed or FSDP automatically.
Why Axolotl
Compared to hand-rolled SFTTrainer scripts:
- Reproducibility – config is data, commits cleanly
- Deepspeed integration is smoother
- Dataset format support is broader (ChatML, Alpaca, ShareGPT, etc.)
- Sample packing for higher training efficiency is a toggle
Axolotl Preinstalled on Dedicated GPUs
We set up UK dedicated servers with Axolotl and your training data already mounted.
Browse GPU ServersSee Unsloth (faster on small GPUs) and QLoRA on 5090.