Retrieval-Augmented Generation (RAG)
A complete guide to RAG: Embeddings, Vector Databases, Chunking, and Semantic Search.
What is RAG?
Retrieval-Augmented Generation (RAG) is a technique that gives an LLM access to your private data. Before asking the LLM a question, the system searches a database for relevant documents, and then injects those documents into the LLM's prompt.
Data Concept
User: 'What is our refund policy?' System: [Searches Database for 'Refund'] -> Finds Document -> Sends (Question + Document) to LLM.
Interview Preparation
Why use RAG instead of Fine-Tuning? (Answer: RAG is cheaper, prevents hallucinations, allows for access control, and makes updating data instant.)
Embeddings and Vector Databases
To search text semantically (by meaning, not just exact keyword), text is converted into 'Embeddings' (lists of numbers). These embeddings are stored in a Vector Database (like Pinecone or pgvector).
Data Concept
The words 'Dog' and 'Puppy' have very similar embedding vectors, so a search for 'Dog' will easily find 'Puppy'.
Interview Preparation
What is Cosine Similarity? (Answer: It is a mathematical metric used to measure how similar two embedding vectors are to each other.)