Process of specializing a pre-trained model for a specific task or domain through additional training with curated data, adapting its behavior without starting from scratch.
Fine-tuning is the process of taking a pre-trained language model and training it further with specific data to adapt it to a particular task, domain, or style. Instead of training from scratch (costly and impractical), it leverages the base model's general knowledge and specializes it.
The decision between fine-tuning, RAG, and prompt engineering depends on the problem:
| Criterion | Prompt engineering | RAG | Fine-tuning |
|---|---|---|---|
| Initial cost | Low | Medium | High |
| Production latency | Low | Medium (retrieval) | Low |
| Updatable knowledge | No | Yes | No (requires retraining) |
| Consistent style/format | Limited | Limited | Excellent |
| Domain terminology | Limited | Good | Excellent |
| Data needed | 0 | Documents | 100-10,000 examples |
| Maintenance | Low | Medium (index) | High (retraining) |
Practical rule: start with prompt engineering, add RAG if external knowledge is needed, and resort to fine-tuning only when the model can't achieve the desired format, style, or terminology.
Updates all model parameters. Produces the best results but requires:
Freezes the base model and trains small low-rank adaptation matrices. Instead of updating a weight matrix W of dimension d×d, LoRA trains two matrices A (d×r) and B (r×d) where r is much smaller than d (typically 8-64).
LoRA on a 4-bit quantized model. Enables fine-tuning 70B parameter models on a single 24GB GPU.
Aligns the model with human preferences using a reward model trained with human comparisons. This is how Claude, GPT-4, and other chat models are trained.
from datasets import load_dataset
from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments
from peft import LoraConfig, get_peft_model
from trl import SFTTrainer
model_id = "meta-llama/Llama-3.1-8B"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, load_in_4bit=True) # QLoRA
# LoRA configuration
lora_config = LoraConfig(
r=16, # adaptation rank
lora_alpha=32, # scaling factor
target_modules=["q_proj", "v_proj"], # layers to adapt
lora_dropout=0.05,
task_type="CAUSAL_LM",
)
model = get_peft_model(model, lora_config)
model.print_trainable_parameters() # ~0.1% of total
# Dataset in instruction-response format
dataset = load_dataset("json", data_files="training_data.jsonl")
trainer = SFTTrainer(
model=model,
train_dataset=dataset["train"],
args=TrainingArguments(
output_dir="./output",
num_train_epochs=3,
per_device_train_batch_size=4,
learning_rate=2e-4,
),
peft_config=lora_config,
)
trainer.train()
model.save_pretrained("./lora-adapter") # Only saves the adapter (~50MB)The resulting adapter weighs ~50MB instead of the full model's ~16GB, and can be loaded on top of the base model in production.
Dataset quality is the most determining factor:
{"instruction": "Classify the sentiment", "input": "The service was excellent", "output": "positive"}
{"instruction": "Classify the sentiment", "input": "They took 2 hours to serve me", "output": "negative"}Loss going down is not enough — evaluate in the real usage context:
Fine-tuning allows adapting a general model to a specific domain with your own data. With LoRA and QLoRA, hardware cost dropped dramatically — fine-tuning Llama 3.1 8B fits on a 24GB GPU. The key decision is not how to fine-tune, but whether you actually need it: prompt engineering and RAG solve most cases without the maintenance cost of a custom model.
Massive neural networks based on the Transformer architecture, trained on enormous text corpora to understand and generate natural language with emergent capabilities like reasoning, translation, and code generation.
Field of computer science dedicated to creating systems capable of performing tasks that normally require human intelligence, from reasoning and perception to language generation.
Algorithmically generated data that replicates the statistical properties of real data, used to train, evaluate, and test AI systems when real data is scarce, expensive, or sensitive.