Complete Guide

From Speakable Schema to AI Summary Control: What Google Tried Early — and What Matters Now

Voice search is declining. AI voice delivery is rising. Speakable schema was Google's first attempt at publisher control over spoken delivery — and while its current deployment is limited, the content architecture it demands is exactly what AI Overviews, ChatGPT and Perplexity need to cite you accurately.

11 min read 2,256 words Updated Mar 2026

Voice search is declining. AI voice delivery is rising. Here’s what that means for structured data, content architecture, and the businesses that want to be cited when AI speaks.

Google built Speakable schema for a world where voice assistants read web content aloud. That world is arriving — just not in the way anyone expected.

Voice search queries are collapsing. In the UK, “voice search SEO” dropped 81% year-on-year. “Voice search optimisation” fell by the same margin. “Voice search schema” hit -100%. The command-based “Hey Google, find me a plumber” era peaked and is now in structural decline.

But AI voice delivery is surging. ChatGPT’s voice mode has tens of millions of active users. Google AI Overviews — up 179% year-on-year in the UK, 234% in the US — increasingly power spoken summaries. Perplexity reads synthesised answers aloud. Every major AI platform is moving toward delivering content audibly to users who never typed a search query.

This is not the same thing as voice search. And that distinction changes everything about how you should think about structured data, content architecture, and what it means to be “optimised” for the AI delivery layer.

The Two Waves of Voice: Command vs. Conversation

Voice search — the first wave — was transactional and command-based. “What’s the weather?” “Call Mum.” “Navigate to Tesco.” Google built Speakable schema for this wave: a structured data type that lets publishers mark specific content sections as suitable for text-to-speech playback.

Speakable was a sensible idea for its time. If a voice assistant needed to read something from your website, you could tell it which bits to read. The cssSelector property points at specific HTML elements — your page title, your opening paragraph, your key headings — and says “these are the parts that work as spoken content.”

The problem: almost nobody implemented it. It remains in beta, limited to news content in English. Google’s own documentation still lists it as an experimental feature. In the UK, “speakable schema” gets 10 searches per month. In the US, 50 — though that 50 represents +150% year-on-year growth, which tells you something about early adopters paying attention.

The second wave is fundamentally different. AI voice delivery doesn’t read your content aloud — it synthesises new content from your content, then speaks that synthesis. ChatGPT doesn’t visit your page and read paragraph three into a microphone. It ingests your content, extracts the relevant information, generates a new response, and delivers that response conversationally — sometimes with voice, sometimes with text, always as a reformulated summary rather than a direct reading.

This means the entire paradigm has shifted. The question is no longer “which parts of my page sound good when read aloud?” The question is “which parts of my page will an AI system reliably extract, understand, and cite when generating its own answer?”

What Google Was Trying to Solve — and Why It Still Matters

Speakable schema was Google’s first attempt at solving a real problem: how do publishers maintain some influence over how their content is delivered when the delivery mechanism isn’t a blue link on a results page?

That problem hasn’t gone away. It’s become vastly more important. AI Overviews, ChatGPT, Perplexity, and every emerging AI search platform all face the same challenge: they need to select, extract, and reformat content from publishers. The publishers who make that extraction reliable, unambiguous, and high-quality get cited. The ones who don’t get ignored or, worse, misrepresented.

Speakable was a narrow, mechanical solution — point at CSS selectors, mark them as speakable. The broader solution, the one that actually works for the AI summary layer, is structural. It’s about how your content is architected, not which HTML classes you tag.

But here’s the thing: the mechanical solution and the structural solution aren’t in conflict. They’re layers. And the businesses that implement both are the ones building compounding advantage.

The AI Summary Layer: What Actually Controls How AI Delivers Your Content

If you want AI systems to accurately represent your brand, cite your expertise, and recommend your services when they generate spoken or written summaries, you need to control four things.

1. The Opening Declaration

The first 120–150 words of any page are disproportionately influential in AI extraction. This is where large language models form their initial understanding of what the page is about, who wrote it, and what authority it carries.

Most websites waste this space. They open with narrative scene-setting, vague value propositions, or marketing language that sounds good to humans but tells an AI system nothing useful. “In today’s rapidly evolving digital landscape, businesses are increasingly looking for ways to stand out” — that’s 16 words of zero-information filler that an AI will skip entirely.

Compare that with a declarative opening: “Entity SEO is the practice of building machine-readable identity for your brand across search engines, knowledge graphs, and AI systems. It works by establishing clear, unambiguous connections between your business, your expertise, and the topics you’re authoritative for.”

The second version gives an AI system exactly what it needs: a definition, a mechanism, and a scope. It can extract that confidently, cite it accurately, and use it as the foundation for a synthesised answer.

This isn’t about keyword stuffing or SEO tricks. It’s about information density in the position where AI systems look first.

2. Entity Clarity

AI systems don’t rank pages. They evaluate entities — people, organisations, concepts, products — and assess whether those entities are authoritative for a given query.

Your structured data is the machine-readable API for your entity identity. Organisation schema tells AI systems what you are, where you operate, what you’re known for, and how to verify your identity across platforms (via sameAs links to LinkedIn, Wikidata, industry directories). Person schema does the same for individual experts — their credentials, their expertise areas, their published work.

The combination of Organisation schema with knowsAbout properties, Person schema with credentials and sameAs links, and consistent entity references across your content creates what we call entity clarity. An AI system encountering your content can immediately verify: “This is [Organisation], founded by [Person], authoritative for [Topics], verified across [Platforms].”

Without entity clarity, your content is just another anonymous web page. With it, you’re a known, verified source that AI systems can cite with confidence.

3. Content Architecture for Synthesis

AI systems don’t extract content the way search engines index it. Search engines care about keywords, links, and page authority. AI systems care about whether they can decompose your content into discrete, reliable facts that can be reassembled into a new response.

This is what query fan-out looks like in practice. When a user asks an AI system a complex question, the system breaks it into sub-questions, retrieves relevant content for each sub-question, extracts specific claims from that content, evaluates the reliability of those claims, and synthesises a response. Your content needs to survive every step of that process.

That means clear heading hierarchies that map to specific questions. Standalone summary paragraphs that can be extracted without losing meaning. Structured definitions that an AI system can quote or paraphrase with confidence. And — critically — no ambiguity about what claims you’re making and what evidence supports them.

4. Structured Data as the Clarity Layer

This is where speakable fits into the bigger picture — not as a voice search optimisation tactic, but as part of the structured data stack that gives AI systems explicit signals about your content.

FAQPage schema provides ready-made question-answer pairs that AI systems can extract with high confidence. HowTo schema breaks procedural content into discrete steps. Article schema with proper author attribution provides provenance — who wrote this, when, and what authority do they carry? Organisation schema provides entity context.

And Speakable schema, despite its limited current deployment, does something none of the others do: it explicitly tells AI systems which parts of your content are designed to be delivered as spoken output. As AI voice delivery expands — and every trend line says it will — that signal becomes increasingly valuable.

The implementation cost is minimal. A few CSS selectors pointing at your headings and opening paragraphs. The potential upside, as voice delivery scales, is that you’ve already told AI systems exactly which parts of your content work as spoken answers.

The Decision Framework: Where Speakable Fits in Your Priority Stack

Not every business needs to implement speakable schema today. Here’s how to think about it.

News publishers should implement it now. Google’s speakable support is currently limited to news content in English, which means news publishers are the only ones who can see direct results today. The summary paragraph control that speakable provides is directly relevant to how Google News and AI Overviews select content for spoken delivery.

SaaS companies and consultancies should treat it as experimental but low-cost. The structured data itself takes minutes to implement. The real work — ensuring your content has clear, extractable summary paragraphs and entity-grounded authority — benefits your AI visibility regardless of whether speakable specifically drives results.

E-commerce businesses have lower priority for speakable but should focus heavily on product schema clarity, review aggregation, and the entity foundation that helps AI shopping assistants understand and recommend their products.

For everyone: the content architecture work that makes speakable effective — declarative openings, entity clarity, structured definitions, clear heading hierarchies — is the same work that makes your content more visible across AI Overviews, ChatGPT, Perplexity, and every other AI delivery platform. You’re not implementing speakable schema in isolation. You’re building the content infrastructure that the entire AI summary layer depends on.

What Actually Matters in 2026

Let’s be direct about what speakable does and doesn’t do today.

Speakable does not influence AI Overviews. Google does not use speakable selectors when deciding what to include in its AI-generated summaries. AI Overviews use their own content extraction and synthesis pipeline, which is based on content quality, entity authority, and structural clarity — not on which CSS classes you’ve marked as speakable.

Google does not rely on speakable for summarisation. The AI rewriting pipeline generates novel text based on source content. It doesn’t read your speakable-marked paragraphs aloud; it synthesises new responses informed by your content.

AI systems rewrite your content regardless. No amount of structured data will make ChatGPT quote you verbatim. The goal isn’t to control the exact words an AI speaks — it’s to ensure your content is selected as a source, accurately understood, and properly attributed.

Control comes from structure, not tags. The businesses that dominate AI visibility aren’t the ones with the most schema markup. They’re the ones whose content is architecturally designed for extraction: clear entities, declarative content, structured definitions, and unambiguous authority signals.

Speakable is one signal in that architecture. A small one today. Potentially a significant one tomorrow. But the architecture itself — that’s what you should be building now.

The Market Signal

The keyword data tells a clear story. “AI overviews” is at 8,100 monthly searches in the UK (+179% YoY) and 49,500 in the US (+234% YoY). “LLM optimisation” grew 600% year-on-year. “Answer engine optimisation” is up 125%. “Entity SEO” is up 29%.

Meanwhile, “voice search SEO” is down 81%. “Voice search optimisation” is down 81%. “Schema for voice search” shows zero sustained interest.

The market is not asking how to optimise for voice search. The market is asking how to optimise for AI delivery — AI Overviews, LLM responses, answer engines, entity-grounded citation systems.

Speakable sits at the intersection. It’s a structured data type designed for spoken delivery, in a market where spoken delivery is shifting from command-based voice search to AI-synthesised voice output. Its current implementation is limited. Its directional signal is strong.

The businesses that are building entity authority, structuring content for AI extraction, implementing comprehensive schema, and — yes — adding speakable selectors to their most important content sections are the ones positioning for where AI delivery is heading. Not where voice search has been.

Implementation: The Practical Steps

If you’ve read this far and want to act on it, here’s the priority order.

First, fix your content architecture. Audit your top 20 pages. Does each one open with a declarative summary in the first 120 words? Can you extract a clear, standalone definition or answer from each major section? Do your headings map to specific questions that an AI system might decompose from a complex query?

Second, build your entity foundation. Ensure your Organisation schema includes knowsAbout, areaServed, founder, and sameAs links to every verifiable profile (LinkedIn, Wikidata, industry directories). Add Person schema for key team members with credentials, expertise areas, and cross-platform verification.

Third, implement your answer schema stack. FAQPage for pages with question-answer content. HowTo for step-by-step processes. Article with proper author attribution on all content pages. VideoObject for embedded video with transcript where possible.

Fourth, add speakable selectors. On each page, identify the CSS selectors that point to your most important content: the page title, the opening summary paragraph, and the primary heading structure. Add SpeakableSpecification to your WebPage schema with those selectors. Validate through the Schema.org validator to ensure every selector actually matches an element on the page — referencing classes that don’t exist in your HTML will generate validation errors.

The first three steps drive measurable AI visibility results today. The fourth costs almost nothing and positions you for the expansion of AI voice delivery that every trend line is pointing toward.


This article is part of our structured data and schema markup series. For a comprehensive guide to entity authority and AI visibility, see Entity SEO: The Complete Guide. To assess your current AI visibility, book a free consultation.

How to Optimise Your Content for AI Summary Control

A four-step process for structuring your content so AI systems reliably extract, understand, and cite your business when generating spoken or written summaries.

  1. 1

    Fix your content architecture

    Audit your top 20 pages. Ensure each one opens with a declarative summary in the first 120–150 words that states what the page covers, who it is for, and what authority it carries. Check that each major section contains a standalone definition or answer that can be extracted without losing meaning. Verify that your headings map to specific questions an AI system might decompose from a complex query. Remove vague introductions and marketing filler from the opening paragraphs.

  2. 2

    Build your entity foundation

    Ensure your Organisation schema includes knowsAbout properties for your core expertise areas, areaServed for your geographic coverage, founder linked to a Person entity, and sameAs links to every verifiable profile — LinkedIn company page, Wikidata entry, industry directories, Companies House. Add Person schema for key team members with credentials, expertise areas, alumniOf and cross-platform sameAs verification. This gives AI systems the entity clarity they need to cite you with confidence.

  3. 3

    Implement your answer schema stack

    Add FAQPage schema on any page with question-answer content — each question should be a genuine audience query, each answer 2–4 sentences of direct response. Add HowTo schema for step-by-step processes. Ensure Article schema with proper author attribution exists on all content pages, linked to the Person entity via @id. Add VideoObject with transcript metadata for any embedded video. Validate all schema through Google's Rich Results Test and resolve any errors before deployment.

  4. 4

    Add speakable selectors

    Identify the CSS selectors that target your most important page content: the page title element, the opening summary paragraph, and the primary H2 heading structure. Add a SpeakableSpecification block to your WebPage schema's JSON-LD, using the cssSelector property to reference those elements. Validate through the Schema.org validator to confirm every selector matches an element that actually exists on the page — referencing classes that don't exist in your HTML will generate validation errors. For WordPress sites, this can be automated at the theme level using a filter on the SEO plugin's JSON-LD output.

Frequently Asked Questions

What is speakable schema markup?

Speakable schema (SpeakableSpecification) is a structured data type defined at Schema.org that lets publishers identify which sections of a web page are best suited for text-to-speech playback. It uses the cssSelector property to point at specific HTML elements — page titles, summary paragraphs, key headings — telling voice assistants and AI systems "these are the parts designed to be spoken aloud." Google currently supports it in beta for news content in English, but its relevance is growing as AI platforms increasingly deliver content via voice.

Should news publishers implement speakable schema?

Yes — news publishers should implement speakable now. Google's current speakable support is specifically limited to news content in English, making news publishers the only category that can see direct results today. The summary paragraph control that speakable provides is directly relevant to how Google News and AI Overviews select content for spoken delivery. Given the minimal implementation cost — a few CSS selectors in your WebPage JSON-LD — the risk-reward ratio strongly favours implementation.

Should SaaS companies and consultancies implement speakable schema?

SaaS companies and consultancies should treat speakable as experimental but low-cost. The structured data implementation takes minutes. The real work — ensuring content has clear, extractable summary paragraphs and entity-grounded authority — benefits AI visibility across every platform regardless of whether speakable specifically drives results. Think of it as a low-cost positioning bet: the content architecture speakable demands is the same architecture that AI Overviews, ChatGPT and Perplexity already reward.

Does speakable schema influence Google AI Overviews?

No — not directly. Google does not currently use speakable selectors when deciding what to include in AI Overviews. AI Overviews use their own content extraction and synthesis pipeline based on content quality, entity authority, and structural clarity. However, the content architecture that makes speakable effective — declarative openings, clear heading hierarchies, entity-grounded authority — is the same architecture that improves AI Overview citations. Speakable is one signal in a broader content infrastructure strategy.

What is the difference between voice search and AI voice delivery?

Voice search (the first wave) was transactional and command-based: "What's the weather?" "Find me a plumber." The voice assistant found a result and read it aloud. AI voice delivery (the second wave) is fundamentally different: the AI system ingests your content, synthesises a new response from it, and delivers that synthesis conversationally — sometimes spoken, sometimes text, always reformulated rather than directly read. This distinction matters because optimising for AI voice delivery requires content architecture for extraction and synthesis, not just marking content as readable aloud.

How do I implement speakable schema on my website?

Add a SpeakableSpecification block within your WebPage JSON-LD schema. The cssSelector property should reference the HTML elements containing your most important content: typically the page title, the opening summary paragraph, and primary headings. For WordPress sites, this can be automated at the theme level by filtering the SEO plugin's JSON-LD output — hooking into rank_math/json_ld or wpseo_schema_webpage to inject the speakable property programmatically. Validate through the Schema.org validator to confirm every referenced selector exists in your page markup.

What content architecture helps AI systems cite my business accurately?

Four elements matter most: declarative openings (the first 120–150 words should provide definitions, mechanisms and scope rather than marketing filler), entity clarity (Organisation and Person schema with knowsAbout, sameAs and credentials), content structured for synthesis (clear heading hierarchies mapping to specific questions, standalone extractable paragraphs, structured definitions), and a comprehensive structured data stack (FAQPage, HowTo, Article with author attribution, speakable selectors). Together, these ensure AI systems can reliably extract, understand and attribute your content.

Is voice search dead?

Command-based voice search is in structural decline — UK search volumes for "voice search SEO" and "voice search optimisation" are both down 81% year-on-year. But spoken delivery of information is growing rapidly through a different mechanism: AI-synthesised voice output via ChatGPT voice mode, Google AI Overviews, Perplexity and other platforms. The technology isn't dying — it's transforming from "search by voice" into "AI answers delivered by voice." The businesses that recognised this shift early are optimising for AI delivery rather than voice search, which is where the compounding advantage lies.

Sean Mullins

Founder of SEO Strategy Ltd with 20+ years in SEO, web development and digital marketing. Specialising in healthcare IT, legal services and SaaS — from technical audits to AI-assisted development.

Ready to improve your search visibility?

Book a free 30-minute consultation and let's discuss your SEO strategy.

Get in Touch