Concepts

Fine-Tuning

Process of specializing a pre-trained model for a specific task or domain through additional training with curated data, adapting its behavior without starting from scratch.

seed#fine-tuning#llm#transfer-learning#lora#rlhf#training

What it is

Fine-tuning is the process of taking a pre-trained language model and training it further with specific data to adapt it to a particular task, domain, or style. Instead of training from scratch (costly and impractical), it leverages the base model's general knowledge and specializes it.

When to use fine-tuning vs RAG

ScenarioBetter option
Updatable factual knowledgeRAG
Specific writing styleFine-tuning
Consistent output formatFine-tuning
Frequently changing dataRAG
Domain terminologyFine-tuning + RAG

Fine-tuning techniques

Full fine-tuning

Updates all model parameters. Produces the best results but requires:

  • GPUs with lots of memory
  • Large datasets (thousands of examples)
  • Risk of "catastrophic forgetting" of base knowledge

LoRA (Low-Rank Adaptation)

Freezes the base model and trains small adaptation matrices. Advantages:

  • 10-100x fewer trainable parameters
  • Multiple adapters for different tasks
  • Easy to share and combine

QLoRA

LoRA on a quantized model (4-bit). Enables fine-tuning large models on consumer hardware.

RLHF (Reinforcement Learning from Human Feedback)

Aligns the model with human preferences using a reward model trained with human comparisons. This is how Claude, GPT-4, and other chat models are trained.

Typical process

  1. Prepare data: input/output pairs in the desired format
  2. Choose base model: balance between capability and cost
  3. Configure training: learning rate, epochs, batch size
  4. Train: monitor loss and validation metrics
  5. Evaluate: test on real use cases
  6. Iterate: adjust data or hyperparameters based on results

Considerations

  • Quality > quantity: 100 excellent examples outperform 10,000 mediocre ones
  • Consistent format: the model learns patterns — maintain uniform structure
  • Rigorous evaluation: it's easy to overfit to training data
  • Maintenance cost: each update requires retraining

Why it matters

Fine-tuning allows adapting a general model to a specific domain with your own data. It is the technique that turns a generic LLM into an expert in your terminology, response format, and particular use cases — when prompting is not enough.

References

Concepts