What Is llms.txt?
Every website has a robots.txt that tells search engine crawlers where they can and cannot go. But AI systems — ChatGPT, Claude, Perplexity, Google AI Overviews, and the growing wave of autonomous AI agents — don’t use robots.txt the same way. They need a different kind of guidance.
That’s where llms.txt comes in. Proposed by Jeremy Howard and documented at llmstxt.org, it’s a simple markdown file that sits at your site root (e.g. yoursite.com/llms.txt) and tells AI systems three things: who you are, what your site is about, and which pages are most important. Think of it as your site’s elevator pitch to every AI system that encounters it.
The critical distinction: robots.txt is an exclusion protocol — it tells crawlers what they cannot access. llms.txt is a curation protocol — it tells AI agents what you want them to find first. You need both, and they serve completely different purposes. If robots.txt is the bouncer at the door, llms.txt is the concierge who says “let me show you the best rooms.”
Why llms.txt Matters for AI Visibility
The sceptics have a point. As of early 2026, no major AI platform has officially committed to parsing llms.txt as part of its retrieval pipeline. Google’s John Mueller has repeatedly called it unnecessary. An SE Ranking study of 300,000 domains found no measurable impact on AI citations. So why bother?
Because over 844,000 websites have already implemented it — including Anthropic (the company behind Claude), Cloudflare, Stripe, Vercel, and hundreds of enterprise technology companies. The same early-adoption pattern played out with sitemap.xml, structured data, and Core Web Vitals: the specification existed before the platforms officially supported it, and the sites that implemented early were ready when support arrived.
But the practical value extends beyond the file itself. The act of creating an llms.txt forces a content audit that most sites desperately need. You have to answer: which pages represent our deepest expertise? Which content would we want an AI system to cite? Which pages are authoritative enough to stand as primary references? That editorial discipline — deciding what belongs and what doesn’t — often reveals content gaps, outdated pages, and structural weaknesses that improve your broader LLM Optimisation regardless of whether any AI platform ever parses the file directly.
The file also serves a more immediate function in an AI Agent Optimisation (AAO) context. As autonomous AI agents begin researching vendors, comparing services, and making recommendations on behalf of decision-makers, they need efficient ways to understand what a site offers. A curated llms.txt file gives an agent a structured overview in seconds — rather than forcing it to crawl and parse dozens of pages to build the same understanding.
The llms.txt Specification
The format is deliberately simple. It’s a plain text file written in markdown — the same lightweight formatting that developers use for documentation and that LLMs already parse natively. The specification defines a clear hierarchy:
H1 heading (#) — your site or company name. One per file, always first. This establishes the entity identity.
Blockquote (>) — a 2-3 sentence description of who you are, what you do, and why you’re authoritative. This is the first thing an AI system reads, so make it count. Write it as if you’re introducing your company to someone who knows nothing about you.
H2 sections (##) — logical groupings of your content. Services, Guides, Case Studies, About — whatever structure reflects your site’s information architecture.
Markdown links with descriptions — each entry follows the pattern - [Page Title](URL): One-line description of what this page covers. The description is critical — it tells the AI system what it will find before it follows the link.
Here’s the basic structure:
# Your Company Name
> A 2-3 sentence description of your business,
> what it does, and why it's authoritative.
## Core Services
- [Service Name](https://yoursite.com/service/): One-line description
- [Another Service](https://yoursite.com/another/): Description here
## Guides & Resources
- [Complete Guide](https://yoursite.com/guide/): What this guide covers
- [Case Study](https://yoursite.com/case-study/): Key outcome and metrics
## About & Contact
- [About Us](https://yoursite.com/about/): Company background
- [Contact](https://yoursite.com/contact/): How to get in touch
Our llms.txt: A Worked Example
Rather than relying on generic examples, here’s what our actual llms.txt file looks like. You can view the live version at seostrategy.co.uk/llms.txt — and we documented the entire implementation process in our llms.txt case study.
# SEO Strategy Ltd — llms.txt
# AI-readable content index for seostrategy.co.uk
# Last updated: February 2026 | Theme v5.0.9
> SEO Strategy Ltd
> AI-powered SEO consultancy based in Southampton, UK.
> Specialising in LLM optimisation, entity SEO, schema
> implementation, and AI visibility systems.
## Core Services
- /technical-seo/: Site architecture, crawlability & indexation
- /content-seo/: Strategy, clusters & content optimisation
- /on-page-seo/: Content optimisation & technical elements
- /off-page-seo/: Authority building & digital PR
## AI & LLM Services
- /llm-optimisation/: AIO, AEO & GEO for AI visibility
- /llm-optimisation/aio/: AI Overview Optimisation
- /llm-optimisation/aeo/: Answer Engine Optimisation
- /llm-optimisation/geo/: Generative Engine Optimisation
- /entity-seo/: Knowledge graph & brand authority
- /schema-structured-data/: JSON-LD markup for search & AI
The full file contains 76 curated URLs across 12 sections. Notice what’s included: every service page, every guide, every case study, every blog post, location pages, and about/contact. Notice what’s excluded: the homepage (AI agents land there anyway), individual FAQ anchors, tag archives, and utility pages. Every URL points to a live, non-redirecting page — no 301 chains, no broken links, no contradictory signals.
The editorial decisions matter as much as the format. We excluded marketing-heavy pages that don’t contain substantive information. We excluded frequently-changing content that might go stale between updates. We grouped pages by topic cluster rather than site hierarchy, because that’s how AI systems organise knowledge internally. And we wrote descriptions that answer “what will an AI system learn from this page?” rather than generic marketing copy.
How to Implement llms.txt on WordPress
WordPress gives you direct access to your site root, which makes implementation straightforward. You have three options, ranging from manual to fully automated.
Option 1: Manual Upload via FTP/SFTP
The simplest approach. Create your llms.txt file locally, connect to your server via FTP (FileZilla works well), navigate to the root directory where wp-config.php lives, and upload the file. Visit yoursite.com/llms.txt to confirm it’s accessible. This takes 5 minutes but requires manual updates whenever your content changes.
Option 2: Theme Rewrite Rule
For developers who want the file served through WordPress rather than as a static file, you can add a rewrite rule in your theme’s functions.php. This stores the content in the database and serves it dynamically — useful if your host restricts file writing or you want to generate the content programmatically. The downside: you need to flush permalinks after adding the rule, and you’re adding a database query to serve what should be a static file.
Option 3: Use a Plugin
We built a free WordPress plugin — llms.txt Generator — that automates the entire process. On activation, it scans your published pages and posts, auto-categorises them into logical sections, pulls descriptions from your SEO plugin (RankMath, Yoast, AIOSEO, SEOPress, or The SEO Framework), and writes the file to your site root. A drag-and-drop admin interface lets you curate sections, reorder pages, override descriptions, and preview the output. Auto-regeneration keeps the file current whenever you publish or update content.
The plugin also validates your configuration — checking for broken links, noindex conflicts, canonical mismatches, duplicate URLs, and file size warnings. It’s the same tool-first approach we took with the SEO Strategy website build and the MDS drink driving calculator: build the tool, use it yourself, then offer it to others.
Response Headers
Whichever method you use, set two response headers on the file:
Content-Type: text/markdown; charset=utf-8
X-Robots-Tag: noindex
The Content-Type header tells AI agents they’re receiving markdown, not HTML. The X-Robots-Tag: noindex prevents the file from appearing in Google search results — it’s intended for AI systems, not human searchers. You can set these in your .htaccess, Nginx config, or through your theme/plugin.
How to Implement llms.txt on HubSpot
HubSpot is a different story. Unlike WordPress, HubSpot doesn’t give you root directory access. Files uploaded through the CMS go to HubSpot’s CDN, not your domain root. As of early 2026, HubSpot has no native llms.txt support and has publicly stated they’re “monitoring” the situation but not building anything yet.
The workaround is functional but inelegant:
Step 1: Create your llms.txt file locally using a text editor, following the markdown specification above.
Step 2: Upload to HubSpot File Manager. Go to Marketing → Files and Templates → Files. Upload your llms.txt file and set it to Public. Copy the file URL — it will look something like https://f.hubspotusercontent10.net/hubfs/123456/llms.txt.
Step 3: Create a URL redirect. Go to Settings → Website → Domains & URLs → URL Redirects. Add a 301 redirect from /llms.txt to your HubSpot CDN file URL.
Step 4: Test. Visit yoursite.com/llms.txt — you should see the file content after the redirect resolves.
The limitations are significant. There’s no auto-regeneration — every time you add or remove a page, you have to manually update the file, re-upload it, and the redirect adds a hop that some AI bots may not follow. HubSpot Enterprise users with access to serverless functions can serve the file dynamically, but that’s an expensive solution for a text file. For HubSpot sites with significant AI visibility goals, this is one of those areas where the platform’s walled-garden approach creates a genuine disadvantage compared to self-hosted WordPress.
Other CMS Platforms
Squarespace and Wix both allow file uploads to the root directory, though the process varies. Squarespace users can place files via the Files panel; Wix users typically need to use the /public folder approach. Neither platform offers native llms.txt support or auto-regeneration.
Shopify doesn’t allow direct root file access. The workaround is similar to HubSpot: host the file content as a page or asset and configure a redirect. Some developers serve it via a Liquid template with the text/markdown content type.
Custom/headless setups (Next.js, Gatsby, Hugo, Jekyll) — you have full control. Add llms.txt to your static assets or public directory, configure headers in your server/CDN config, and automate generation as part of your build pipeline.
What to Include (and What to Leave Out)
The most common mistake is treating llms.txt like a sitemap — dumping every URL on your site into the file. That defeats the purpose entirely. An AI agent benefits more from a focused list of 20-30 carefully curated pages than a 200-page dump of everything.
Always include: Core service or product pages. In-depth guides and documentation. Case studies with specific results and methodology. Technical resources that demonstrate expertise. About and contact pages that establish entity identity.
Usually include: Blog posts that represent substantial, evergreen analysis (not news commentary or roundups). Location pages for businesses that serve specific areas. FAQ hubs or knowledge bases.
Usually exclude: The homepage (AI agents typically land there anyway). Tag and category archives. Author pages. Cart, checkout, and account pages. Privacy/cookie policies (unless they contain substantive data handling information). Blog posts that are time-sensitive or thin. Landing pages built for PPC rather than informational value.
The editorial test: For each page, ask “if an AI system could only read 25 pages from my site, would this be one of them?” If the answer isn’t a confident yes, leave it out. You can always add it later — but starting lean is better than starting bloated.
Synchronising with robots.txt and Sitemaps
Your llms.txt, robots.txt and XML sitemap should tell a consistent story. Three rules:
Don’t include pages in llms.txt that are blocked in robots.txt. If you’re telling crawlers they can’t access a page, don’t simultaneously tell AI systems it’s important. That’s a contradictory signal.
Don’t include pages marked noindex. If you’ve told Google not to index a page, including it in llms.txt sends mixed signals about whether you want the content discovered.
Add an llms.txt reference to robots.txt. This helps AI systems discover your file even if they don’t check the standard location. Add this line to your robots.txt:
# llms.txt — AI content guidance
Llms-Txt: https://yoursite.com/llms.txt
Common Mistakes
Including too many URLs. If your file exceeds 50KB, some AI systems may truncate it. Prioritise ruthlessly.
Stale URLs. Pages that have been deleted, redirected, or password-protected. Every URL should resolve to a live, accessible page with a 200 status code. Run validation periodically — our WordPress plugin checks for this automatically.
Generic descriptions. “Click here to learn more” tells an AI system nothing. Write descriptions that answer: “What specific information will an AI find on this page?” Compare: “Our services page” vs “Full-service SEO consultancy covering technical audits, content strategy, link building and AI visibility for B2B companies.” The second version gives an AI agent everything it needs to decide whether to follow the link.
Missing the blockquote. The site description at the top is the first thing AI systems read. Skipping it or writing a vague tagline wastes your most valuable real estate.
Never updating. If you publish new service pages, case studies or guides and don’t update llms.txt, you’re actively hiding your best new content from AI systems. Set a reminder — monthly at minimum, or use auto-regeneration if your CMS supports it.
Contradicting other signals. Including pages blocked by robots.txt, marked noindex, or behind login walls. AI systems will notice the contradiction and may lose trust in the file’s reliability.
llms.txt and the Broader AI Visibility Stack
llms.txt doesn’t work in isolation. It’s one component of a broader AI visibility strategy that includes:
Schema markup makes your content machine-understandable. JSON-LD structured data tells AI systems what your entities are, what your pages contain, and how they relate to each other. If llms.txt is the index, schema is the metadata catalogue.
Entity SEO establishes your brand as a recognisable entity in knowledge graphs. When AI systems can confidently identify who you are — through consistent NAP data, Wikidata entries, knowledge panel signals, and structured entity declarations — they cite you with higher confidence.
Cloudflare Markdown for Agents automatically converts your HTML pages to clean markdown when AI agents request them, reducing token consumption and improving comprehension. llms.txt tells AI which pages to look at; Cloudflare Markdown makes those pages lightweight and easy to process. They’re complementary, not competing.
AI Agent Optimisation (AAO) is the umbrella discipline that encompasses all of this: making your site discoverable, understandable, and selectable by autonomous AI agents. llms.txt is the front door of your AAO implementation — the first touchpoint between an AI agent and your curated content.
We implemented all of these on our own site before recommending them to clients. Our llms.txt case study documents the implementation process, editorial decisions, and what we’re tracking. Our vibe coding case study covers the broader AI-assisted development approach that makes this level of technical implementation commercially viable.