A GEO (Generative Engine Optimization) audit is not just an AEO audit with a different label. In 2026, generative engines like ChatGPT, Claude, Gemini, Perplexity and Google AI Overviews do far more than cite sources: they rewrite, summarise, recommend and sometimes invent facts about your brand. A serious GEO audit measures all of that, including hallucinations and narrative sentiment. This guide gives you the full methodology our team uses inside AI Labs Audit to run a defensible, repeatable GEO audit.
GEO vs AEO vs traditional audit: what actually changes
Many practitioners use AEO and GEO interchangeably. They are related, but the scope is not the same. For the conceptual definition, see our guide on what GEO is. From a methodology standpoint, here is how the three audits differ.
- Traditional GEO/web audit: technical crawl, on-page signals, backlinks, rankings on Google or Bing. It tells you if a search engine can find and rank you.
- AEO audit: focuses on whether answer engines cite you. It looks at citation rate, mention rate, source URLs. See our AEO audit guide for the complementary angle.
- GEO audit: covers everything an AEO audit does, plus the generative output itself. What is the AI actually saying about your brand? Is it accurate? Is the sentiment positive? Are URLs hallucinated? Does it recommend you against competitors in a multimodal context (text, voice, AI Overviews)?
In short: AEO asks "am I cited?", GEO asks "am I cited, well represented, recommended, and free of dangerous hallucinations?". A GEO audit therefore needs to look at a wider surface area, including AI Overviews, voice assistants and image-grounded answers.
The 7 dimensions a serious GEO audit must measure
An audit that only reports "we are cited 32% of the time" is incomplete. A robust GEO audit in 2026 measures seven distinct dimensions, each producing its own KPI.
1. Citation rate
Percentage of prompts where the AI explicitly cites your URL or domain. Strong signal of source authority. Best correlated with Bing top-10 ranking and structured data quality.
2. Mention rate
Percentage of prompts where the brand name appears, even without a URL. A brand can be mentioned without being cited as a source, which is itself useful information.
3. Share of Voice (SoV)
Your mentions divided by the total mentions across the competitive set. Read more in our deep dive on AI Share of Voice. SoV is the single best metric to track over time.
4. Sentiment
Tone of the generative output when your brand is discussed. Positive, neutral, negative. Sentiment matters because a model recommending a competitor over you is functionally worse than not mentioning you at all.
5. Source authority
Which URLs and domains does the model rely on when discussing your category? The answer reveals which third-party sources you must influence (press, Wikipedia, review platforms). See our analysis of source authority.
6. Hallucination rate
Percentage of generated outputs containing factual errors about your brand, invented features, fake URLs or fabricated quotes. This is the brand safety dimension specific to GEO. Detailed in our work on hallucinated URLs.
7. Position relative to competitors
When you are cited alongside competitors, are you first, second or last? In list-style answers, position correlates with click-through to the brand.
GEO audit methodology in 8 steps
Below is the methodology our team applies internally and that we have packaged into the AI Labs Audit platform. It is engine-agnostic; you can run it manually or with a tool.
Step 1: Scoping
Define the brand, the geographies, the languages and the category. A French B2B SaaS audited in English from the United States produces almost no signal. Scope determines everything that follows.
Step 2: Prompt design
Generate a prompt corpus that reflects how real users speak to AI. Mix discovery prompts ("who are the leading players in X?"), comparative prompts ("X vs Y"), recommendation prompts ("which tool for Z?") and factual prompts ("when was X founded?"). Inside AI Labs Audit, prompts are generated by an AI based on the client brief - because who better than an AI to interrogate another AI?
Step 3: Multi-model runs
Query at least 5 generative engines per prompt. Single-engine audits are statistically meaningless in 2026 because each model has different training data and search backends.
Step 4: Native vs web testing
Run each prompt twice: once with web search disabled (native knowledge) and once with web search enabled (RAG mode). The gap between the two scores is critical, as explained in our native vs web score analysis.
Step 5: Scoring
Apply the seven dimensions to every response. Automated scoring (regex + LLM classifier) is far more reliable than manual reading at scale.
Step 6: Hallucination detection
Flag every URL the model produced. Crawl them. Any 404 is a hallucinated URL. Cross-check factual claims about the brand against the official website. This step is the most often skipped, and the most dangerous to skip.
Step 7: Competitive benchmark
Run the same prompts targeting 3 to 5 competitors. Without a baseline, raw scores are meaningless.
Step 8: Action plan
Translate findings into prioritised actions: schema markup, content rewrites, third-party placements, Wikipedia/Wikidata work, FAQ pages. The plan should rank actions by expected lift per unit of effort.
The sensitive topic: detecting hallucinations about your brand
Hallucinations are the single most underestimated risk in GEO. A model can confidently state that your product offers a feature it does not have, quote a fictional case study, or send users to a URL that returns 404. Worse, it can attribute statements to your CEO that were never made.
A GEO audit must therefore include systematic hallucination detection across three layers:
- URL hallucinations: every URL the model generates is crawled. Any 404 or domain mismatch is logged.
- Factual hallucinations: pricing, founding date, headcount, features, certifications. Cross-checked against the official site and a curated knowledge base.
- Attribution hallucinations: fake quotes, invented partnerships, fabricated awards.
The brand safety dimension of GEO is covered in depth in our AI brand safety guide. Skipping this step means delivering an audit that misses the most legally and reputationally sensitive issues.
Multi-model GEO audit: how many engines should you test?
The most common methodological mistake is testing a single engine, usually ChatGPT, and calling it a GEO audit. The reality is that ChatGPT, Claude, Gemini and Perplexity behave very differently because they rely on different training data and different web backends.
- 1 model: anecdotal, not statistically defensible.
- 3 models: minimum acceptable for a paid audit, covers the three dominant providers.
- 5 to 10 models: the AI Labs Audit standard, captures regional engines and reasoning variants.
- 50+ models: the platform tests against 50+ generative engines because each variant of a model family can give different answers, particularly for sensitive prompts.
Sample stability matters too. Running each prompt several times reveals that the same prompt sent to the same model returns different answers. A serious GEO audit therefore reports an average across runs, not a single point-in-time snapshot.
Which prompts should you use in a GEO audit?
The audit is only as good as the prompts. A bad prompt corpus produces statistically meaningless results regardless of how many models you query.
Discovery prompts
"Who are the leading providers of X in Europe?" These prompts test whether you appear in the consideration set at all. They are the hardest prompts to win because they generate long lists where only the top names get cited.
Comparative prompts
"X vs Y, which one should I choose?" These prompts test how the model frames your brand relative to a specific competitor. Sentiment and feature framing matter as much as citation here.
Recommendation prompts
"What is the best tool for Z?" High commercial intent. These prompts are the most valuable to win because users querying them are close to a purchase decision.
Factual prompts
"When was X founded?", "How much does X cost?". These prompts test the model's factual accuracy about your brand and reveal hallucinations.
For a deeper look at AI-generated prompt corpora, see our analysis on tailored GEO audits.
What a GEO audit report should actually contain
A GEO audit deliverable that an agency can hand to a client should always include the following sections.
- Executive summary: 1 page with the seven KPIs and one headline finding per dimension.
- Per-model breakdown: how the brand performs on each engine separately.
- Competitive benchmark: Share of Voice chart against the agreed competitor set.
- Hallucination log: every flagged hallucination with the prompt, the model, the date and the corrected version.
- Source authority map: top 20 URLs the model relies on for your category.
- Prioritised action plan: 10 to 20 actions ranked by expected lift per effort.
- Re-audit schedule: the audit must be repeated at a defined cadence because LLMs and their web backends update continuously.
For agencies, the report is typically delivered as a PDF. Our team has published a detailed view on premium PDF reports for AI visibility audits.
Common mistakes when running a GEO audit
- Testing a single model: produces a partial picture that may be flatly wrong.
- No competitive baseline: without competitors, KPIs have no meaning.
- Skipping hallucination detection: the most dangerous shortcut, since it hides the brand-safety risk.
- One-shot audits: GEO results drift weekly. A one-off audit is obsolete within 30 days. Use scheduled audits instead.
- Mixing native and web modes: combining the two scores in a single average hides where the visibility actually comes from.
- No prompt diversity: testing only recommendation prompts under-represents the discovery and factual layers.
Tools to conduct a GEO audit
You can run a GEO audit manually by querying each model through its consumer interface, logging results in a spreadsheet, and crawling the URLs by hand. It works for a one-off proof of concept, but it does not scale and it cannot be repeated reliably.
Dedicated platforms - AI Labs Audit, Profound, Otterly and others - automate the heavy lifting: multi-model querying, scoring, hallucination detection, competitive benchmarking and PDF delivery. AI Labs Audit is the European option, built for agencies, with white-label PDF reports, a read-only client portal, 50+ engines tested and 600 credits offered on signup so you can run a full audit before committing. See the broader comparison in our review of AEO/GEO monitoring tools.
FAQ: GEO audit
How long does a GEO audit take?
A manual GEO audit on 50 prompts across 5 models takes one to two days of consultant time. The same audit through an automated platform takes minutes to launch and produces a PDF report within an hour.
How often should I re-run a GEO audit?
Monthly at minimum. LLMs update their backends continuously, and a brand that was cited 40% of the time in March can drop to 12% in April after a model update.
Is a GEO audit different from an AEO audit?
Yes. AEO focuses on citation. GEO adds the generative dimension: sentiment, hallucinations, narrative framing, AI Overviews, multimodal answers. Both audits are complementary and both have their place.
Can I trust a GEO audit that tested only ChatGPT?
No. ChatGPT behaves very differently from Claude, Gemini and Perplexity. A single-model audit is anecdotal and should not be used for strategic decisions.
How do I detect hallucinations at scale?
Automated URL crawling for hallucinated links, plus an LLM-as-judge classifier that cross-checks factual claims against a curated brand knowledge base. Manual review confirms the most sensitive flags.
Do generative engines remember previous queries?
Within a single session, sometimes. Across sessions, no. Each GEO audit run starts from a clean slate, which is why repeatable methodology matters more than clever prompt phrasing.
Conclusion: GEO audit as a recurring discipline
A GEO audit in 2026 is not a one-off exercise. Generative engines change too fast and their backends shift weekly. The right mental model is closer to financial reporting than to a launch checklist: measure, benchmark, act, repeat.
If you want to skip the manual setup, you can launch a full GEO audit on AI Labs Audit with the 600 free credits offered on signup. The platform was built by our European team for agencies and consultants who need a defensible, repeatable methodology and a white-label PDF deliverable they can hand to clients.