Skip to content

Glossary

RAG (Retrieval-Augmented Generation)

RAG lets an LLM answer using retrieved documents (your site, help center, PDFs) instead of relying only on training weights — the core pattern behind many grounded brand answers.
  • Retrieval + generation: the model reads sources you control, then composes the reply.

  • Strong RAG signals improve factual answers about your brand — see GEO and citations.

Definition

RAG (Retrieval-Augmented Generation) is an architecture where the model first pulls relevant chunks from a knowledge base (vector DB, search index, CRM exports, support articles) and only then writes the user-facing answer. That is different from “pure parametric” answers that come only from weights learned at training time. For brand teams, RAG is why updating docs, schema, and crawlable FAQs can change what ChatGPT-style products say without retraining the base model.

How it's computed

Operationally: documents are chunked and embedded; at query time the system runs a similarity search, attaches top-k passages to the prompt as context, and asks the model to stay faithful to those passages. Quality depends on chunking, deduplication, recency filters, and guardrails when no good chunk exists.

How it works in practice

What to optimize

  • Source of truth pages — clear H1/H2, dated facts, canonical URLs the retriever can fetch.
  • llms.txt + robots — make sure AI crawlers you care about can reach the corpus you want retrieved.
  • Measurement — compare LLM-Score and quote-level citations before/after you publish or restructure content.

How to read it

RAG reduces but does not eliminate hallucinations: if the retriever surfaces a wrong snippet, the model may still amplify it. Pair RAG initiatives with fanout queries to see whether fixes hold across phrasings.

When to use

  • When product facts change weekly (pricing, regions, integrations).
  • When support and marketing disagree on wording — unify the retrieved layer.
  • When you need auditability: which URL backed this answer?