Skip to content

Glossary

llms.txt

llms.txt is a simple text file (often at /llms.txt) that tells LLM crawlers which paths matter for training or retrieval — a lightweight complement to robots.txt for GEO.
  • Think of it as a curated map: “read these sections first, ignore noise.”

  • Works alongside robots.txt rules for GPTBot, PerplexityBot, ClaudeBot, etc.

Definition

llms.txt is a community-driven convention (popularised alongside ChatGPT-era crawlers) for publishing a short manifest of URLs or sections that brand owners want language-model systems to prioritise. It does not replace robots.txt or legal contracts, but it reduces ambiguity: marketing landing pages vs. legal boilerplate, canonical pricing pages vs. tag archives.

How it's computed

There is no single ISO standard — parsers differ by vendor. Files are usually plain Markdown or plaintext with headings and bullet lists of URLs. Crawlers that honour the file may fetch listed paths more often or weight them higher during retrieval.

How it works in practice

Practical checklist

  • Keep the file small and stable; link to canonical product, pricing, security, and FAQ hubs.
  • Update dates when facts change so retrieval systems can prefer fresher URLs.
  • Pair with schema.org markup so answers have both narrative and structured hooks.

How to read it

If models still ignore you, the blocker is often robots.txt, authentication, or thin content — llms.txt cannot override a hard Disallow.

When to use

  • After a GEO audit when crawl paths are messy.
  • When you want PMMs to own a single manifest without editing nginx maps.
  • Before large site migrations (point crawlers to the new canonical tree).