Fine-tuning a large language model — training it further on your own data — is often proposed as the solution to AI application limitations. Here is an honest framework for when it actually helps.
What Fine-Tuning Does
Fine-tuning adjusts the model’s weights by training it on additional data. The result: the model better reflects the patterns in your training data. This is distinct from RAG (retrieval-augmented generation), which adds information at query time without changing the model. Fine-tuning changes the model; RAG changes what information the model has access to during a query. The confusion: people who want the model to “know” their data often reach for fine-tuning when they actually want RAG. Fine-tuning is not primarily about adding knowledge — it is about changing behaviour and style.
When Fine-Tuning Genuinely Helps
Four good use cases for fine-tuning: (1) Consistent output format — if you need the model to always return a specific JSON schema or response structure that is difficult to enforce purely through prompting; (2) Tone and style — if you want the model to consistently write in a very specific voice or style (legal documents in your firm’s style, customer service responses in your brand’s tone); (3) Task specialisation — if you have a very specific, narrow task where a base model is wasteful (classifying medical ICD codes, extracting specific fields from structured documents); (4) Inference cost reduction — a smaller, fine-tuned model can match a larger general model on a specific task at lower cost and latency. These use cases are real, but all four require labelled training data that is specific, correct, and sufficient in quantity (typically 500–5,000+ examples for meaningful fine-tuning).
When Fine-Tuning Doesn’t Help
Common cases where people reach for fine-tuning but shouldn’t: “I want the model to know my company’s product documentation” — this is a RAG problem, not fine-tuning; “I want the model to be more accurate on facts” — fine-tuning on your data doesn’t improve factual accuracy, it changes the style of being wrong; “I want the model to follow complex instructions better” — better prompt engineering usually addresses this without fine-tuning; “I want to add knowledge from after the training cutoff” — again RAG, not fine-tuning. The test: if you can address the problem with a better prompt or a knowledge base, do that first. Fine-tuning is the option when those approaches have been exhausted.
Practical Options in 2026
OpenAI’s fine-tuning API: GPT-4o fine-tuning is available (costly), as is GPT-3.5 Turbo fine-tuning (more affordable). Anthropic: does not offer public fine-tuning via API currently (as of 2025). Open-source models: Llama 3, Mistral, and Qwen 2 are all fine-tunable via LoRA (Low-Rank Adaptation) on moderate GPU hardware. LoRA significantly reduces the compute required for fine-tuning by only updating a small subset of model weights. For teams with a clear use case and labelled data: LoRA fine-tuning of an open-source model on cloud GPU (A100, €2–5/hour) is often more cost-effective than OpenAI’s API fine-tuning for production scale.



