In 2026, the question every brand asks is no longer "do we rank on Google" but "are we cited by ChatGPT, Claude, Perplexity or Gemini when a buyer describes their problem?" An AEO audit (Answer Engine Optimization audit) answers that question with numbers. This guide explains the practical methodology we apply when running an AEO audit — from query design to scoring to the deliverable — so you can either run one yourself or pressure-test the one your agency hands you.
If you are still looking for the conceptual definition of AEO, read the companion piece first: What is AEO (Answer Engine Optimization)? Guide 2026. This article focuses on the audit itself — the workflow, the metrics and the deliverable.
AEO vs GEO audit: what is the practical difference?
AEO and GEO are two angles on the same underlying problem: optimising for AI-generated answers instead of blue links. In short:
- AEO (Answer Engine Optimization) centres on answer engines that return a single synthesised reply: ChatGPT, Claude, Perplexity, the conversational mode of Gemini, You.com.
- GEO (Generative Engine Optimization) covers a broader surface that includes generative experiences embedded in search — Google AI Overviews, SGE, Bing Copilot, multimodal answers.
In practice an audit covers both: you measure how the brand is named, cited and contextualised across these surfaces. For a deeper comparison see our SEO vs AEO: Differences and Complementarities and the GEO Audit guide. The methodology below applies to either label.
What an AEO audit actually measures
A serious AEO audit is not a screenshot of a single ChatGPT conversation. It produces structured metrics that can be compared across competitors and over time. The core measurement set:
- Citation rate — share of answers where the brand appears with a clickable link or named source. This is what drives referral traffic and the strongest authority signal.
- Mention rate — share of answers where the brand is named without a link. Mentions still influence perception even when no click follows.
- Share of Voice (SoV) — the brand's share of all named entities across a prompt set, benchmarked against the chosen competitors. The replacement metric for "keyword ranking". See AI Share of Voice 2026.
- Sentiment — positive, neutral or negative framing when the brand is named. A high mention rate with a negative tilt is a risk, not a win.
- Position in lists — when a model returns a ranked list, the brand's average rank. Top-3 placements concentrate the attention.
- Source authority — the URLs cited as sources alongside (or instead of) the brand. This reveals whether owned domains, review sites or wiki pages drive the answer.
- Hallucination rate — share of answers containing factual errors about the brand: invented products, wrong pricing, false claims, fabricated URLs. Critical for risk reporting.
These metrics work together. A 70% mention rate with 0% citation rate, neutral sentiment and a competitor leading three of the four model lists is a very different story from the headline number alone.
The 6-step methodology of an AEO audit
1. Scoping: target queries and competitor set
Before any prompt runs, the auditor needs three things from the client: the target audience (decision maker, geography, language), the buying journey questions the brand wants to win, and the competitor list — typically four to six direct rivals plus one or two adjacent challengers. Without that frame, prompts drift toward generic queries and the score becomes meaningless.
Concretely the scoping note records: industry, sub-segment, geography, languages to audit, brand variants (legal name, trade name, common misspellings), and a written description of the offer in 200 to 400 words. The last point is essential — AI models will paraphrase that description, so its accuracy on the public web sets the ceiling for what the audit can measure.
2. Prompt generation
Manual prompt writing does not scale and introduces author bias. The most reliable approach is to use an LLM to generate prompts from the scoping note, with explicit coverage targets: branded, non-branded, comparative, problem-led, persona-led, geographic. Our own platform generates a tailored prompt bank per audit; the principle is documented in AI Prompt Generation.
A typical bank for a single audit contains 40 to 120 prompts in each language. Coverage matters more than volume: ten prompts well distributed across the funnel beat one hundred near-duplicate variants of "best X tool".
3. Multi-model run
Testing one model is not an audit, it is an anecdote. The minimum credible set in 2026 is ChatGPT, Claude, Perplexity and Gemini — adding Mistral, Llama or Grok depending on the geography. Each prompt is fired against every model, with two distinct passes when relevant: native mode (the model answers from training data alone) and web mode (the model is allowed to browse). The two passes measure very different things and are worth separating in the report. We discuss the trade-off in Native vs Web Score.
4. Scoring and extraction
Each answer is parsed for: the brand's presence, the form of presence (cited with link, mentioned by name, listed in a comparison), the sentiment, the position when relevant, the competitors named, the URLs cited and any factual error. Manual scoring is feasible for a sample of fifty answers; beyond that, automation is required. The output is a row-per-answer dataset that powers the rest of the analysis.
Score normalisation matters. A brand cited five times across 50 prompts is at 10% mention rate — that figure is only meaningful when compared to the competitor cited 22 times (44%) and to the previous quarter's baseline.
5. Comparative analysis
This is where the audit earns its keep. Useful cuts:
- Share of Voice by model — does the brand collapse on one specific engine?
- Share of Voice by funnel stage — visible on "awareness" prompts but invisible on "compare" prompts?
- Source authority overlap — which URLs are cited for both the brand and its competitors?
- Sentiment gap — does a competitor get systematically warmer framing?
- Native vs web delta — strong on web but weak on native means the brand's authority signals are too recent for training data.
Read AI Competitive Analysis for the full benchmarking template.
6. Prioritised action plan
The audit ends with a written action plan, not just dashboards. Each recommendation carries an estimated effort, an expected metric to move and an owner (content, technical, PR, partnerships). A useful rule: never deliver more than ten priority actions in the first iteration — beyond that, nothing gets done.
How many prompts make an AEO audit credible?
There is no magic number, but a few practical anchors. For a single market, a single language and four to six competitors, a credible audit runs 40 to 120 prompts. Below 30, statistical noise overwhelms the signal — a single hostile answer swings the sentiment score by ten points. Above 200, you are paying for resolution you cannot actually act on.
The smarter lever is stratification: split the prompt bank by funnel stage (discovery, comparison, decision, post-sale), by persona, by geography. A 60-prompt audit with 15 per stage gives more usable insight than a 200-prompt flat list. For tracking over time, freeze the prompt bank and re-run it monthly or quarterly — see Scheduled Audits.
Tools you need to run an AEO audit
You can run an AEO audit manually with a spreadsheet, a stopwatch and four browser tabs. We do not recommend it past the first proof of concept — the manual approach struggles with two things: model coverage at scale, and reproducibility across audits.
A purpose-built platform like AI Labs Audit automates the prompt run across more than 50 AI models, scores each answer and produces a comparable report. New accounts receive 600 free credits, enough to run an initial audit before committing. For a wider view of the category, see our 2026 AEO/GEO platforms comparison and the monitoring tools roundup.
Automated scoring is not optional past the proof-of-concept stage. Manual annotation of a hundred answers is feasible; doing it monthly across four models and six competitors is not.
How to read and present the results of an AEO audit
The deliverable is where most audits fail. Dashboards alone do not convert into action. A useful AEO audit report has four sections:
- Executive summary — three to five sentences and one chart. Headline Share of Voice, position vs the leading competitor, the single biggest gap and the recommended first move.
- Metrics deep dive — citation rate, mention rate, sentiment, source authority, hallucination examples. One model per page so the comparison stays readable.
- Competitive benchmark — a single page that lines up the brand against the chosen competitor set on each metric.
- Action plan — ten priorities maximum, each with effort, expected impact and owner.
The methodology overlaps with the broader visibility audit covered in our AI Visibility Audit Methodology. For the metrics layer specifically, the AI Visibility Metrics guide is the reference.
Common mistakes in AEO audits
- Too few prompts — under 30, the noise floor swallows the signal. Stratify rather than inflate.
- No competitive baseline — a 25% mention rate means nothing without competitor numbers next to it.
- Conflating sentiment and citation — being mentioned often with a hostile tone is a problem, not a success.
- One model only — ChatGPT-only audits miss the Perplexity citation pattern and the Gemini AI Overviews surface entirely.
- Ignoring hallucinations — a model that invents a product or a price hurts the brand even with zero citations. Track it explicitly. Read AI Hallucinations & Brand Reputation.
- One-shot audits — AI answers shift constantly. Without a recurring schedule, the first audit ages within weeks.
AEO Audit FAQ
How long does an AEO audit take?
An automated audit of 60 to 100 prompts across four models runs in a few hours of compute, then a day of analysis to produce a written report. Manually, the same scope takes a week of careful work.
How often should an AEO audit be re-run?
Quarterly is the floor for slow-moving categories, monthly is healthier for competitive markets. The prompt bank must stay frozen between runs — otherwise the trend line is comparing different questions.
Can a small brand run an AEO audit?
Yes. A small brand with a clear positioning and three direct competitors only needs around 40 prompts. The complexity scales with the number of geographies and languages, not with the brand's size.
Should branded queries be included?
Yes — they reveal what AI models say spontaneously about the brand, including sentiment and hallucinations. Around 20 to 30% of the prompt bank should be branded; the rest stays non-branded to measure organic visibility.
Does an AEO audit replace an SEO audit?
No, they complement each other. Search authority still influences what AI models cite, especially in web-search mode. An AEO audit measures the AI surface; the SEO work continues to underpin it.
What does an AEO audit cost?
It varies with prompt volume, language coverage and model count. A self-served run on a platform like ours can be done with the 600 free credits offered at sign-up; agency-led audits with strategic interpretation are priced per project.
Is an AEO audit useful for a brand already strong in SEO?
Often more so. Strong SEO does not guarantee AI citations — the source-selection logic differs, and the surfacing of structured content matters as much as the ranking. See the AEO Checklist 2026 for the action layer.
Next step
An AEO audit is the foundation of any serious AI visibility programme — not the finish line. Once the baseline exists, the work shifts to closing the gaps the audit reveals: content, authority, structured data, partnerships. The audit also becomes a recurring measurement instrument that proves whether those efforts move the needle.
If you want to test the methodology described here, you can run a first AEO audit on AI Labs Audit with the 600 credits offered at sign-up. The output gives you the structured baseline, the competitor benchmark and the prioritised action plan described in this guide.