RAG vs Fine-Tuning

To improve the capabilities of pre-trained LLMs, two strategies are available:

A Retrieval-Augmented Generation (RAG) approach
Fine-tuning the pre-trained model

RAG

The RAG approach aims to enhance LLM capabilities through the integration of a retrieval system. This system extracts relevant document fragments from a large information corpus, providing the model with an external knowledge base to consult for generating more accurate and detailed responses (paper).

Different implementation methods exist: RAG-Sequence uses the same document for the entire generation, while RAG-Token can use different documents for each token. Practical references in this video series by Donato Capitella.

Essential is the concept of vector retrieval, explored in this book.

Fine-tuning

Fine-tuning consists of adapting a pre-trained LLM to a specific task through additional training on a smaller, focused dataset.

Key approaches:

RLHF — Reinforcement Learning with Human Feedback
DPO — Direct Preference Optimization, which removes the need for a separate reward model
Self-Rewarding — the reward model is iteratively improved
LoRA — Low-Rank Adaptation, updates only some parameters
QLoRA — LoRA with quantization

A practical example of fine-tuning LLaMA2 is available on Maxime Labonne’s blog.

The article RAG vs Fine-Tuning presents performance differences based on the application of RAG, fine-tuning, or both. RAG is effective where data is contextually relevant; fine-tuning teaches the model domain-specific skills. The two methodologies are complementary.