Where voice fits in the AI-search stack
The buyer who types “best crypto licensing firms for fintech startups” into ChatGPT at their desk is the same buyer who asks Siri “what’s the best crypto licensing firm” in the car. Same buyer, different surface, different answer-length budget.
Voice is part of the AEO surface, not separate from it. The structural recipe — Hero, X-is-Y intro, Quick Facts, H2-as-question, FAQ — works for both. What changes for voice is the answer-length constraint.
What voice favours
Three things voice extracts more aggressively than text-AI:
- Direct answer ≤ 25 words — even tighter than the FAQ block’s 30-word rule
- One-sentence definitions — no paragraph-level extraction for voice; it picks one sentence
- Schema validation as a hard gate — voice has no fallback; if schema is malformed, the assistant reads the page text raw and usually picks the wrong sentence
The FAQ block with direct answers ≤ 30 words is the bridge. If your FAQ is structured for text-AI extraction, it is 80% of the way to voice extraction too. Tighten the answers slightly (target ≤ 25 words) and add HowTo schema where there is a process — that is the voice-specific layer.
Voice-first vs. voice-secondary
Voice-first niches. Local services, retail and food / hospitality. Buyers ask voice assistants for “best dentist near me”, “what’s open right now”, “is X gluten-free”. For these niches voice is 30–50% of the AI-search surface and you optimise primarily for it.
Voice-secondary niches. B2B SaaS, fintech, legal, edtech. Buyers research these on screens. Voice plays a 5–15% role — useful but not central. The optimisation is the same recipe, no extra voice-specific layer beyond the schema.
For voice-first niches we add LocalBusiness (or specific subtype) schema and prioritise HowTo schema for process pages. For voice-secondary, the standard stack covers it.
What does not work for voice
- Long-form content with no direct-answer block — voice cannot pick a quote
- Answer paragraphs with conditions (“it depends on…”) — voice flattens to a single sentence
- Brand-name-stuffed answers (“at AcmeCorp we believe…”) — voice strips them
- Marketing fluff in the FAQ (“our award-winning approach to…”) — voice ignores
The Speakable schema question
Schema.org has a Speakable property designed for voice. Our experience: useful for news and editorial content, ignored on commercial / B2B content. Voice assistants (Google, Siri, Alexa) primarily extract from FAQPage and HowTo — not from Speakable.
We do not deploy Speakable on commercial sites. The investment-to-return is poor compared to tightening FAQPage answers.
What you should do this month
If you run a local services brand: add LocalBusiness (or specific subtype) schema if you do not have it. Tighten FAQ direct answers to ≤ 25 words. Validate. That is the cheap voice layer and it is the right entry point.
If you run a B2B SaaS or fintech: voice is secondary. Focus on the text-AI optimisation. The 30-word FAQ rule from the four-layer recipe covers 90% of voice incidentally.
If you have an active AEO programme already: ask your team whether they have stress-tested top-5 prompts in voice (Siri / Google Assistant / Alexa) and whether the assistant returns the brand. If not, that is a 30-minute audit and a likely 10–15% citation lift on voice surfaces by tightening FAQ.