Question 1

Why does the structural recipe matter for AI citation?

Accepted Answer

AI systems extract whole compact blocks, not scattered sentences. A page that follows the recipe is extractable by construction; one that doesn't isn't. Google AI Overviews and Perplexity in particular pick clean blocks (intro paragraph, FAQ Q&A, table row) and quote them verbatim or near-verbatim. Pages with walls of text without H3s, scattered facts across paragraphs, or buried answers force the AI to summarise rather than cite — and summarisation hides your URL.

Question 2

Why do you ban certain words from generated content?

Accepted Answer

Because LLM-tells like "delve", "tapestry", "robust", "in conclusion" make AI-detection trivial and erode trust signals. We maintain a blocklist of about 40 noun, verb, adjective and phrase markers. Every generated page gets a regex check + a Claude editing pass that enforces the bans. We also require ≥4 explicit human signals — em-dashes, contractions, sentences starting with But/And/So, mixed paragraph lengths, one concrete number, one explicit opinion.

Question 3

What schema do you actually deploy?

Accepted Answer

Article (with dateModified), FAQPage, HowTo where there's a process, Person for named experts, Organization, BreadcrumbList, and Service for commercial pages. For ratings/comparison pages we add ItemList + Review where applicable. For local services we add LocalBusiness or its subtype (Attorney, MedicalOrganization, RealEstateAgent). All schema is JSON-LD; we don't mix microdata.

Question 4

How do you choose what AI crawlers to allow?

Accepted Answer

Default: allow GPTBot, ClaudeBot, PerplexityBot, Google-Extended, CCBot, Applebot-Extended, anthropic-ai, cohere-ai. Block the noisy ones case by case. For most B2B sites we allow all major AI crawlers because the goal is being cited. For some YMYL clients with regulatory exposure, we restrict by user-agent and content type; the llms.txt at /llms.txt is always a strict subset of what robots.txt allows.

Question 5

How long until the structural changes show in AI?

Accepted Answer

Google AI Overviews: 14–60 days. ChatGPT and Perplexity: 7–30 days for new content; longer for restructured legacy. New content tends to surface faster because AI systems index it without legacy citation patterns to break. Restructured legacy pages need to outrank their old extractable signals — usually a 30–60 day cycle.

Parameter	Value
Page recipe	Hero · X-is-Y intro · Quick Facts · H2-question sections · FAQ · CTA
Hero rule	≤3 sentences, human hook, not a duplicate of intro
Intro rule	X is Y form: regulator + license types + audience + advantages + timeline + cost
Quick Facts	Parameter / Value table, ≥5 rows, mandatory even if minimal
H2 rule	Each H2 is a user question; first sentence directly answers it
FAQ rule	Direct answer ≤30 words, then optional 2–3 sentence depth
Sentence target	15–20 words avg, active voice ≥70%
Schema stack	Article, FAQPage, HowTo, Person, Organization, BreadcrumbList

The AEO/GEO playbook we run on every engagement.

The recipe at a glance

Four-layer AI extraction structure

Readability gates and human signals

Schema engineering — what we deploy and why

Niche-aware multipliers — the four-factor formula

Frequently asked questions

Want this recipe applied to your site?