Glossary
llms.txt
Think of it as a curated map: “read these sections first, ignore noise.”
Works alongside robots.txt rules for GPTBot, PerplexityBot, ClaudeBot, etc.
Definition
llms.txt is a community-driven convention (popularised alongside ChatGPT-era crawlers) for publishing a short manifest of URLs or sections that brand owners want language-model systems to prioritise. It does not replace robots.txt or legal contracts, but it reduces ambiguity: marketing landing pages vs. legal boilerplate, canonical pricing pages vs. tag archives.
How it's computed
There is no single ISO standard — parsers differ by vendor. Files are usually plain Markdown or plaintext with headings and bullet lists of URLs. Crawlers that honour the file may fetch listed paths more often or weight them higher during retrieval.
How it works in practice
Practical checklist
- Keep the file small and stable; link to canonical product, pricing, security, and FAQ hubs.
- Update dates when facts change so retrieval systems can prefer fresher URLs.
- Pair with schema.org markup so answers have both narrative and structured hooks.
How to read it
If models still ignore you, the blocker is often robots.txt, authentication, or thin content — llms.txt cannot override a hard Disallow.
When to use
- After a GEO audit when crawl paths are messy.
- When you want PMMs to own a single manifest without editing nginx maps.
- Before large site migrations (point crawlers to the new canonical tree).