Glossary
RAG (Retrieval-Augmented Generation)
Retrieval + generation: the model reads sources you control, then composes the reply.
Definition
RAG (Retrieval-Augmented Generation) is an architecture where the model first pulls relevant chunks from a knowledge base (vector DB, search index, CRM exports, support articles) and only then writes the user-facing answer. That is different from “pure parametric” answers that come only from weights learned at training time. For brand teams, RAG is why updating docs, schema, and crawlable FAQs can change what ChatGPT-style products say without retraining the base model.
How it's computed
Operationally: documents are chunked and embedded; at query time the system runs a similarity search, attaches top-k passages to the prompt as context, and asks the model to stay faithful to those passages. Quality depends on chunking, deduplication, recency filters, and guardrails when no good chunk exists.
How it works in practice
What to optimize
How to read it
RAG reduces but does not eliminate hallucinations: if the retriever surfaces a wrong snippet, the model may still amplify it. Pair RAG initiatives with fanout queries to see whether fixes hold across phrasings.
When to use
- When product facts change weekly (pricing, regions, integrations).
- When support and marketing disagree on wording — unify the retrieved layer.
- When you need auditability: which URL backed this answer?