RAG vs Fine-Tuning
To improve the capabilities of pre-trained LLMs, two strategies are available:
- A Retrieval-Augmented Generation (RAG) approach
- Fine-tuning the pre-trained model
RAG
The RAG approach aims to enhance LLM capabilities through the integration of a retrieval system. This system extracts relevant document fragments from a large information corpus, providing the model with an external knowledge base to consult for generating more accurate and detailed responses (paper).
Different implementation methods exist: RAG-Sequence uses the same document for the entire generation, while RAG-Token can use different documents for each token. Practical references in this video series by Donato Capitella.
Essential is the concept of vector retrieval, explored in this book.
Fine-tuning
Fine-tuning consists of adapting a pre-trained LLM to a specific task through additional training on a smaller, focused dataset.
Key approaches:
- RLHF — Reinforcement Learning with Human Feedback
- DPO — Direct Preference Optimization, which removes the need for a separate reward model
- Self-Rewarding — the reward model is iteratively improved
- LoRA — Low-Rank Adaptation, updates only some parameters
- QLoRA — LoRA with quantization
A practical example of fine-tuning LLaMA2 is available on Maxime Labonne’s blog.
RAG vs Fine-Tuning
The article RAG vs Fine-Tuning presents performance differences based on the application of RAG, fine-tuning, or both. RAG is effective where data is contextually relevant; fine-tuning teaches the model domain-specific skills. The two methodologies are complementary.