Complete Guide

llms.txt: The Complete Guide to Making Your Site AI-Readable

llms.txt tells AI systems which pages on your site matter most. This guide covers what llms.txt is, why it matters for AI visibility, the full specification, how to implement it on WordPress and HubSpot, what to include (and what to leave out), and common mistakes that undermine its value — written by an SEO consultant who built his own llms.txt and tracked the results.

13 min read 2,760 words Published Nov 2025

What is llms.txt and how does it help AI systems understand your website?

llms.txt is a plain-text file placed at the root of a website that tells AI systems — including ChatGPT, Claude, and Perplexity — what the site contains, how it is structured, and which pages are most relevant for different types of query. It functions as a site map for LLMs, analogous to robots.txt for search crawlers.

Definition: llms.txt: a structured text endpoint at /llms.txt that provides AI language models with a machine-readable summary of site content, purpose, and key resources.

Why it matters: As AI agents increasingly access websites directly on behalf of users, llms.txt reduces the work required for an agent to accurately understand and cite your content — improving both citation accuracy and recommendation eligibility.

Evidence: SEO Strategy Ltd published the first WordPress llms.txt generator plugin submitted to WordPress.org (March 2025). The plugin is in active use by over 1,000 sites.

What Is llms.txt?

Every website has a robots.txt that tells search engine crawlers where they can and cannot go. But AI systems — ChatGPT, Claude, Perplexity, Google AI Overviews, and the growing wave of autonomous AI agents — don’t use robots.txt the same way. They need a different kind of guidance.

That’s where llms.txt comes in. Proposed by Jeremy Howard and documented at llmstxt.org, it’s a simple markdown file that sits at your site root (e.g. yoursite.com/llms.txt) and tells AI systems three things: who you are, what your site is about, and which pages are most important. Think of it as your site’s elevator pitch to every AI system that encounters it.

The critical distinction: robots.txt is an exclusion protocol — it tells crawlers what they cannot access. llms.txt is a curation protocol — it tells AI agents what you want them to find first. You need both, and they serve completely different purposes. If robots.txt is the bouncer at the door, llms.txt is the concierge who says “let me show you the best rooms.”

Why llms.txt Matters for AI Visibility

The sceptics have a point. As of early 2026, no major AI platform has officially committed to parsing llms.txt as part of its retrieval pipeline. Google’s John Mueller has repeatedly called it unnecessary. An SE Ranking analysis of 300,000 domains (November 2025) found no statistical correlation between llms.txt and AI citation frequency â removing it from the predictive model actually improved accuracy, and adoption across those domains sat at just 10.13%, mostly tool-generated. So why bother?

Google’s May 2026 update. On 15 May 2026, Google published official AI optimisation guidance stating that site owners do not need llms.txt files to appear in generative AI search within Google Search. This is accurate for Google specifically — Google’s sophisticated retrieval architecture does not need an additional external signal pointing at curated content. But the standard was never aimed primarily at Google. llms.txt was proposed for the broader ecosystem: AI agents and LLM systems whose retrieval architecture benefits from a curated pointer to canonical content rather than crawling a full site from scratch. Anthropic engages with the standard positively, Perplexity uses it, and several agentic browsers and emerging AI tools have indicated support. Google saying “we don’t use this” is consistent with Google’s existing capability. It is not a statement about the wider standards landscape, and it does not change the case for sites whose audiences sit in non-Google AI systems. See the full analysis in An Honest Multi-Platform Reading.

Because over 844,000 websites have already implemented it — including Anthropic (the company behind Claude), Cloudflare, Stripe, Vercel, and hundreds of enterprise technology companies. The same early-adoption pattern played out with sitemap.xml, structured data, and Core Web Vitals: the specification existed before the platforms officially supported it, and the sites that implemented early were ready when support arrived.

But the practical value extends beyond the file itself. The act of creating an llms.txt forces a content audit that most sites desperately need. You have to answer: which pages represent our deepest expertise? Which content would we want an AI system to cite? Which pages are authoritative enough to stand as primary references? That editorial discipline — deciding what belongs and what doesn’t — often reveals content gaps, outdated pages, and structural weaknesses that improve your broader LLM Optimisation regardless of whether any AI platform ever parses the file directly.

The file also serves a more immediate function in an AI Agent Optimisation (AAO) context. As autonomous AI agents begin researching vendors, comparing services, and making recommendations on behalf of decision-makers, they need efficient ways to understand what a site offers. A curated llms.txt file gives an agent a structured overview in seconds — rather than forcing it to crawl and parse dozens of pages to build the same understanding.

The llms.txt Specification

The format is deliberately simple. It’s a plain text file written in markdown — the same lightweight formatting that developers use for documentation and that LLMs already parse natively. The specification defines a clear hierarchy:

H1 heading (#) — your site or company name. One per file, always first. This establishes the entity identity.

Blockquote (>) — a 2-3 sentence description of who you are, what you do, and why you’re authoritative. This is the first thing an AI system reads, so make it count. Write it as if you’re introducing your company to someone who knows nothing about you.

H2 sections (##) — logical groupings of your content. Services, Guides, Case Studies, About — whatever structure reflects your site’s information architecture.

Markdown links with descriptions — each entry follows the pattern - [Page Title](URL): One-line description of what this page covers. The description is critical — it tells the AI system what it will find before it follows the link.

Here’s the basic structure:

# Your Company Name

> A 2-3 sentence description of your business,
> what it does, and why it's authoritative.

## Core Services

- [Service Name](https://yoursite.com/service/): One-line description
- [Another Service](https://yoursite.com/another/): Description here

## Guides & Resources

- [Complete Guide](https://yoursite.com/guide/): What this guide covers
- [Case Study](https://yoursite.com/case-study/): Key outcome and metrics

## About & Contact

- [About Us](https://yoursite.com/about/): Company background
- [Contact](https://yoursite.com/contact/): How to get in touch

Our llms.txt: A Worked Example

Rather than relying on generic examples, here’s what our actual llms.txt file looks like. You can view the live version at seostrategy.co.uk/llms.txt — and we documented the entire implementation process in our llms.txt case study.

# SEO Strategy Ltd — llms.txt
# AI-readable content index for seostrategy.co.uk
# Last updated: February 2026 | Theme v5.0.9

> SEO Strategy Ltd
> AI-powered SEO consultancy based in Southampton, UK.
> Specialising in LLM optimisation, entity SEO, schema
> implementation, and AI visibility systems.

## Core Services

- /technical-seo/: Site architecture, crawlability & indexation
- /content-seo/: Strategy, clusters & content optimisation
- /on-page-seo/: Content optimisation & technical elements
- /off-page-seo/: Authority building & digital PR

## AI & LLM Services

- /llm-optimisation/: AIO, AEO & GEO for AI visibility
- /llm-optimisation/aio/: AI Overview Optimisation
- /llm-optimisation/aeo/: Answer Engine Optimisation
- /llm-optimisation/geo/: Generative Engine Optimisation
- /entity-seo/: Knowledge graph & brand authority
- /schema-structured-data/: JSON-LD markup for search & AI

The full file contains 76 curated URLs across 12 sections. Notice what’s included: every service page, every guide, every case study, every blog post, location pages, and about/contact. Notice what’s excluded: the homepage (AI agents land there anyway), individual FAQ anchors, tag archives, and utility pages. Every URL points to a live, non-redirecting page — no 301 chains, no broken links, no contradictory signals.

The editorial decisions matter as much as the format. We excluded marketing-heavy pages that don’t contain substantive information. We excluded frequently-changing content that might go stale between updates. We grouped pages by topic cluster rather than site hierarchy, because that’s how AI systems organise knowledge internally. And we wrote descriptions that answer “what will an AI system learn from this page?” rather than generic marketing copy.

How to Implement llms.txt on WordPress

WordPress gives you direct access to your site root, which makes implementation straightforward. You have three options, ranging from manual to fully automated.

Option 1: Manual Upload via FTP/SFTP

The simplest approach. Create your llms.txt file locally, connect to your server via FTP (FileZilla works well), navigate to the root directory where wp-config.php lives, and upload the file. Visit yoursite.com/llms.txt to confirm it’s accessible. This takes 5 minutes but requires manual updates whenever your content changes.

Option 2: Theme Rewrite Rule

For developers who want the file served through WordPress rather than as a static file, you can add a rewrite rule in your theme’s functions.php. This stores the content in the database and serves it dynamically — useful if your host restricts file writing or you want to generate the content programmatically. The downside: you need to flush permalinks after adding the rule, and you’re adding a database query to serve what should be a static file.

Option 3: Use a Plugin

We built a free WordPress plugin — llms.txt Generator — that automates the entire process. On activation, it scans your published pages and posts, auto-categorises them into logical sections, pulls descriptions from your SEO plugin (RankMath, Yoast, AIOSEO, SEOPress, or The SEO Framework), and writes the file to your site root. A drag-and-drop admin interface lets you curate sections, reorder pages, override descriptions, and preview the output. Auto-regeneration keeps the file current whenever you publish or update content.

The plugin also validates your configuration — checking for broken links, noindex conflicts, canonical mismatches, duplicate URLs, and file size warnings. It’s the same tool-first approach we took with the SEO Strategy website build and the MDS drink driving calculator: build the tool, use it yourself, then offer it to others.

Response Headers

Whichever method you use, set two response headers on the file:

Content-Type: text/markdown; charset=utf-8
X-Robots-Tag: noindex

The Content-Type header tells AI agents they’re receiving markdown, not HTML. The X-Robots-Tag: noindex prevents the file from appearing in Google search results — it’s intended for AI systems, not human searchers. You can set these in your .htaccess, Nginx config, or through your theme/plugin.

How to Implement llms.txt on HubSpot

HubSpot is a different story. Unlike WordPress, HubSpot doesn’t give you root directory access. Files uploaded through the CMS go to HubSpot’s CDN, not your domain root. As of early 2026, HubSpot has no native llms.txt support and has publicly stated they’re “monitoring” the situation but not building anything yet.

The workaround is functional but inelegant:

Step 1: Create your llms.txt file locally using a text editor, following the markdown specification above.

Step 2: Upload to HubSpot File Manager. Go to Marketing → Files and Templates → Files. Upload your llms.txt file and set it to Public. Copy the file URL — it will look something like https://f.hubspotusercontent10.net/hubfs/123456/llms.txt.

Step 3: Create a URL redirect. Go to Settings → Website → Domains & URLs → URL Redirects. Add a 301 redirect from /llms.txt to your HubSpot CDN file URL.

Step 4: Test. Visit yoursite.com/llms.txt — you should see the file content after the redirect resolves.

The limitations are significant. There’s no auto-regeneration — every time you add or remove a page, you have to manually update the file, re-upload it, and the redirect adds a hop that some AI bots may not follow. HubSpot Enterprise users with access to serverless functions can serve the file dynamically, but that’s an expensive solution for a text file. For HubSpot sites with significant AI visibility goals, this is one of those areas where the platform’s walled-garden approach creates a genuine disadvantage compared to self-hosted WordPress.

Other CMS Platforms

Squarespace and Wix both allow file uploads to the root directory, though the process varies. Squarespace users can place files via the Files panel; Wix users typically need to use the /public folder approach. Neither platform offers native llms.txt support or auto-regeneration.

Shopify doesn’t allow direct root file access. The workaround is similar to HubSpot: host the file content as a page or asset and configure a redirect. Some developers serve it via a Liquid template with the text/markdown content type.

Custom/headless setups (Next.js, Gatsby, Hugo, Jekyll) — you have full control. Add llms.txt to your static assets or public directory, configure headers in your server/CDN config, and automate generation as part of your build pipeline.

What to Include (and What to Leave Out)

The most common mistake is treating llms.txt like a sitemap — dumping every URL on your site into the file. That defeats the purpose entirely. An AI agent benefits more from a focused list of 20-30 carefully curated pages than a 200-page dump of everything.

Always include: Core service or product pages. In-depth guides and documentation. Case studies with specific results and methodology. Technical resources that demonstrate expertise. About and contact pages that establish entity identity.

Usually include: Blog posts that represent substantial, evergreen analysis (not news commentary or roundups). Location pages for businesses that serve specific areas. FAQ hubs or knowledge bases.

Usually exclude: The homepage (AI agents typically land there anyway). Tag and category archives. Author pages. Cart, checkout, and account pages. Privacy/cookie policies (unless they contain substantive data handling information). Blog posts that are time-sensitive or thin. Landing pages built for PPC rather than informational value.

The editorial test: For each page, ask “if an AI system could only read 25 pages from my site, would this be one of them?” If the answer isn’t a confident yes, leave it out. You can always add it later — but starting lean is better than starting bloated.

Synchronising with robots.txt and Sitemaps

Your llms.txt, robots.txt and XML sitemap should tell a consistent story. Three rules:

Don’t include pages in llms.txt that are blocked in robots.txt. If you’re telling crawlers they can’t access a page, don’t simultaneously tell AI systems it’s important. That’s a contradictory signal.

Don’t include pages marked noindex. If you’ve told Google not to index a page, including it in llms.txt sends mixed signals about whether you want the content discovered.

Add an llms.txt reference to robots.txt. This helps AI systems discover your file even if they don’t check the standard location. Add this line to your robots.txt:

# llms.txt — AI content guidance
Llms-Txt: https://yoursite.com/llms.txt

Common Mistakes

Including too many URLs. If your file exceeds 50KB, some AI systems may truncate it. Prioritise ruthlessly.

Stale URLs. Pages that have been deleted, redirected, or password-protected. Every URL should resolve to a live, accessible page with a 200 status code. Run validation periodically — our WordPress plugin checks for this automatically.

Generic descriptions. “Click here to learn more” tells an AI system nothing. Write descriptions that answer: “What specific information will an AI find on this page?” Compare: “Our services page” vs “Full-service SEO consultancy covering technical audits, content strategy, link building and AI visibility for B2B companies.” The second version gives an AI agent everything it needs to decide whether to follow the link.

Missing the blockquote. The site description at the top is the first thing AI systems read. Skipping it or writing a vague tagline wastes your most valuable real estate.

Never updating. If you publish new service pages, case studies or guides and don’t update llms.txt, you’re actively hiding your best new content from AI systems. Set a reminder — monthly at minimum, or use auto-regeneration if your CMS supports it.

Contradicting other signals. Including pages blocked by robots.txt, marked noindex, or behind login walls. AI systems will notice the contradiction and may lose trust in the file’s reliability.

llms.txt and the Broader AI Visibility Stack

llms.txt doesn’t work in isolation. It’s one component of a broader AI visibility strategy that includes:

Schema markup makes your content machine-understandable. JSON-LD structured data tells AI systems what your entities are, what your pages contain, and how they relate to each other. If llms.txt is the index, schema is the metadata catalogue.

Entity SEO establishes your brand as a recognisable entity in knowledge graphs. When AI systems can confidently identify who you are — through consistent NAP data, Wikidata entries, knowledge panel signals, and structured entity declarations — they cite you with higher confidence.

Cloudflare Markdown for Agents automatically converts your HTML pages to clean markdown when AI agents request them, reducing token consumption and improving comprehension. llms.txt tells AI which pages to look at; Cloudflare Markdown makes those pages lightweight and easy to process. They’re complementary, not competing.

AI Agent Optimisation (AAO) is the umbrella discipline that encompasses all of this: making your site discoverable, understandable, and selectable by autonomous AI agents. llms.txt is the front door of your AAO implementation — the first touchpoint between an AI agent and your curated content.

We implemented all of these on our own site before recommending them to clients. Our llms.txt case study documents the implementation process, editorial decisions, and what we’re tracking. Our vibe coding case study covers the broader AI-assisted development approach that makes this level of technical implementation commercially viable.

Frequently Asked Questions

What is llms.txt and how is it different from robots.txt?

llms.txt is a plain text file placed at your site root (e.g. yoursite.com/llms.txt) that tells AI systems which pages are your most valuable and authoritative content. Written in markdown format, it acts as a curated reading list for AI agents. The key difference: robots.txt is an exclusion protocol that tells crawlers what they cannot access. llms.txt is a curation protocol that tells AI agents what you want them to find first. They serve completely different purposes and you need both.

Do AI platforms actually use llms.txt?

No major AI platform has officially committed to parsing llms.txt as of early 2026. Google's John Mueller has called it unnecessary. An SE Ranking study found no measurable citation impact. However, over 844,000 websites have implemented it — including Anthropic, Cloudflare, Stripe, and Vercel — and adoption is growing at over 500% year-on-year. The practical value of the content audit it requires often exceeds any direct signal value. Our position: sub-one-hour investment, zero demonstrated downside, potential future upside.

How do I implement llms.txt on HubSpot?

HubSpot doesn't allow root directory file placement. The workaround: upload your llms.txt file to HubSpot's File Manager (Marketing → Files and Templates → Files), set it to Public, copy the CDN URL, then create a 301 redirect from /llms.txt to that CDN URL via Settings → Website → Domains & URLs → URL Redirects. The limitation is that updates are manual — every content change requires re-uploading the file. HubSpot Enterprise users can serve the file dynamically via serverless functions.

How do I implement llms.txt on WordPress?

WordPress offers three options: upload the file manually via FTP/SFTP to your root directory (where wp-config.php lives), add a theme rewrite rule to serve it dynamically, or use a dedicated plugin like llms.txt Generator that auto-detects your content, provides a drag-and-drop section manager, integrates with your SEO plugin for descriptions, and auto-regenerates when content changes.

How many URLs should I include in my llms.txt file?

Quality over quantity. Most sites should aim for 20-50 curated URLs. Include only pages that represent your deepest expertise and most authoritative content — the pages you would want an AI system to cite when discussing your industry. If your file exceeds 50KB, you probably have too many entries. A focused 25-page file is more useful to AI systems than a 200-page dump of everything on your site.

What should I write in the site description blockquote?

The blockquote is the first thing AI systems read — treat it as your most important two to three sentences. Include your company name, what you do, who you serve, and what makes you authoritative. Be specific rather than generic. "AI-powered SEO consultancy based in Southampton, specialising in LLM optimisation, entity SEO, and schema implementation" tells an AI system exactly what to expect. "A leading digital marketing agency" tells it nothing.

How often should I update my llms.txt file?

At minimum, monthly — or whenever you publish significant new content like service pages, case studies, or comprehensive guides. If you use a WordPress plugin with auto-regeneration, the file updates automatically when you publish or update content. For manual implementations, set a recurring calendar reminder. Stale files with broken links or missing pages undermine the file's reliability signal.

Does llms.txt help with Google AI Overview citations?

There's no direct evidence that llms.txt influences Google AI Overview source selection. AI Overviews draw from indexed content based on relevance, authority, and structured data — not from llms.txt. However, the content audit that llms.txt requires often improves the pages it references, and the broader AI visibility stack (entity SEO, schema, and well-structured content) absolutely does influence AI Overview citations. llms.txt is one piece of the puzzle, not the whole picture.

What is the difference between llms.txt and Cloudflare Markdown for Agents?

They solve different problems and complement each other. Cloudflare Markdown for Agents automatically converts your HTML pages to clean markdown when AI agents request them — reducing token consumption and making content easier to process. llms.txt is a static file that tells AI agents which pages to look at first. Cloudflare makes your content lightweight and AI-parseable; llms.txt makes your best content discoverable. Implement both as part of a comprehensive AI Agent Optimisation strategy.

Can llms.txt hurt my SEO or AI visibility?

No. There is no documented case of llms.txt causing a negative impact on search rankings or AI visibility. The file is advisory — AI systems can choose to ignore it entirely, and search engines don't use it as a ranking signal. The only risk is opportunity cost: if you spend hours perfecting your llms.txt instead of improving your actual content and schema markup, you're misallocating effort. For most sites, the implementation takes under an hour — the ROI on that time investment is overwhelmingly positive even if the only benefit is the content audit it forces.

How do Bing's Copilot meta directives interact with llms.txt?

They operate at different layers and serve different purposes, but they need to tell a consistent story. llms.txt is a curation signal — it tells AI agents which pages you want them to prioritise. Bing's NOARCHIVE and NOCACHE meta directives are access controls — they tell Bing what Copilot can do with your content. The critical conflict to avoid: including a page in llms.txt while blocking it via NOARCHIVE sends contradictory signals. If you want AI systems to find and use a page, don't block it at the Bing layer. If you want to block a page from Copilot, remove it from llms.txt too. The DATA-NOSNIPPET attribute is more nuanced — it lets you block specific sections within a page, so you can include the page in llms.txt while protecting individual sections (legal disclaimers, outdated pricing) from citation. The data-snippet attribute is the most powerful tool here: it lets you mark exactly which paragraphs Copilot can cite, effectively steering AI attribution to your strongest summary content. For implementation details, see the Bing & DuckDuckGo SEO guide.

Founder of SEO Strategy Ltd with 20+ years in SEO, web development and digital marketing. Specialising in healthcare IT, legal services and SaaS — from technical audits to AI-assisted development.

Ready to improve your search visibility?

Book a free 30-minute consultation and let's discuss your SEO strategy.

Get in Touch