Fine-tuning a language model and improving prompts are often described as alternatives, but they solve different problems. Understanding which to use — and when to skip both in favour of retrieval — is a key decision for LLM application development.
What Prompt Engineering Can Do
A well-crafted system prompt can dramatically change how a model behaves: its persona, its output format, its constraints, the topics it addresses, and its tone. For most application requirements, prompt engineering is sufficient and is always the right starting point. It requires no data, no training, and can be iterated in minutes. The limits: you cannot teach a model knowledge it does not have, you cannot permanently change its core reasoning style, and prompts add latency and cost to every request.
What Fine-Tuning Actually Does
Fine-tuning updates the model’s weights by training on a curated dataset of examples. It is useful for: teaching a very specific output format consistently (JSON with a fixed schema, specific domain language), teaching a tone or style so distinctive it cannot be reliably prompted into the base model, or reducing prompt size by baking instructions into model behaviour. It is not useful for: teaching new factual knowledge (RAG is better for this), fixing fundamental model limitations, or achieving results cheaper than prompting.
RAG as the Better Alternative for Knowledge
Retrieval-Augmented Generation (RAG) — retrieving relevant documents from a vector database and including them in the prompt — is almost always better than fine-tuning for “the model needs to know X.” Fine-tuning on knowledge is expensive, requires regular updates when knowledge changes, and produces less reliable results than just including the relevant text in the context. RAG is the right answer for most knowledge-grounding needs.
The Actual Decision Tree
1. Can you solve it with a better system prompt? Start here — always. 2. Do you need the model to consistently know specific proprietary facts? Use RAG. 3. Do you need a highly specific output format or style that prompting cannot reliably produce? Consider fine-tuning. 4. Fine-tuning should be the last option explored, not the first.




