What llms.txt actually is
A plain Markdown file at https://yourdomain.com/llms.txt. It tells AI systems — ChatGPT plugins, Perplexity retrievers, Anthropic’s Claude indexer, Google AI Overview retrieval — what the site is about, what matters, and how to describe it. Think robots.txt for content-meaning instead of crawler-permission.
Major LLMs (Anthropic, OpenAI, Perplexity) reference llms.txt during retrieval to decide whether and how to cite your site. A site without llms.txt forces the LLM to guess from page text alone — and the guess is often wrong.
The minimum useful structure
Open our own llms.txt in another tab — that is the shape we ship for every Answerly client. The minimum viable llms.txt:
# Your Brand Name
> One-paragraph description of what your brand does, who you serve,
> what you sell, and why someone would cite you. No marketing fluff.
> Concrete facts and named scope.
## Services
- [Service 1](https://yourdomain.com/services/one) — short factual description, price if public.
- [Service 2](https://yourdomain.com/services/two) — short factual description.
## Pricing
- Tier A: $X / month, minimum term, what it includes in one line.
- Tier B: $Y / month, ...
## Contact
- Sales: sales@yourdomain.com
- General: hello@yourdomain.com
- LinkedIn: https://linkedin.com/company/yours
## Out of scope
- Things you do not do (so AI does not recommend you for them).
- Engagement models you decline.
Sixty lines or fewer is fine. Quality over volume.
The “Out of scope” trick
One section most agencies miss. Tell the LLM what you do not do — what engagement models you decline, what audiences you do not serve, what categories you turn away. AI systems use this to not recommend you for inappropriate prompts.
This is positive intent. A wrong recommendation costs you reputation; a clean “out of scope” tells the LLM to skip you for prompts where you would not convert anyway.
Pair llms.txt with robots.txt and headers
llms.txt is one of three layers. The other two:
robots.txt — explicitly allow the AI crawlers you want indexing your site:
User-agent: GPTBot
Allow: /
User-agent: ClaudeBot
Allow: /
User-agent: PerplexityBot
Allow: /
User-agent: Google-Extended
Allow: /
User-agent: CCBot
Allow: /
User-agent: anthropic-ai
Allow: /
HTTP headers — set Cache-Control: public, max-age=3600 and Content-Type: text/plain; charset=utf-8 on /llms.txt. Cloudflare Pages does this with a _headers file:
/llms.txt
Content-Type: text/plain; charset=utf-8
Cache-Control: public, max-age=3600
What we measure after shipping llms.txt
Across our portfolio, sites that shipped a properly structured llms.txt saw a 15–30% lift in citation rate inside thirty days, controlling for content changes. The signal is strongest on prompts where the brand was already close — llms.txt does not invent presence, it tightens it.
What goes wrong
The two failure modes we see:
- Marketing copy in llms.txt. “We are the world’s leading provider of…” gets ignored or down-weighted. Concrete facts win.
- Stale llms.txt. The file should be regenerated on every deployment. Our own llms.txt is rebuilt from
siteConfigand content collections at build time so it never drifts from the live site.
If you want the build-time pattern, the source file is in our Astro template — published under the playbook so you can copy it.