Known limits & measurement variability
Measuring a brand's visibility in generative AIs is not an exact science: the models are non-deterministic and evolve over time. Rather than hiding this reality behind a "clean" score, we document it — and we explain how our method accounts for it.
1AI responses vary from one run to the next
The same prompt can yield different responses depending on the moment and the model's randomness (the observed variability is typically in the order of 15 to 20% on certain queries).
Our answer : A jury of several judges, robust aggregation, and a Wilson confidence interval displayed — you see the margin of uncertainty, not just a single figure.
2Each model has its own behaviour
Models are not all equal: some cite technical sources more readily, others are less stable on local queries, and others vary depending on the language.
Our answer : A multi-model audit (several judges, dozens of models queried) with per-model detail in the report, rather than a single overall grade that would mask these differences.
3Scores shift when providers change their models
When OpenAI, Google or Anthropic update a model, the responses — and therefore the scores — can shift without the brand having changed anything.
Our answer : The anchor set (benchmark brands re-measured continuously) isolates this model drift from the brand's actual performance; the client's score is corrected accordingly.
4A single point-in-time measurement is not enough
AI visibility is unstable over time: a single measurement is misleading.
Our answer : Scheduled / recurring audits and trend tracking — we look at the evolution, not a snapshot.
5The "web search" vs "model memory" share
A response changes depending on whether the model performed a web search (retrieval) or answered from its internal memory (parametric).
Our answer : A differential native vs web diagnosis to tell the two apart and know which lever to pull.
6Results depend on how the prompt is worded
Two ways of asking the same question can produce different responses — and therefore different scores.
Our answer : Standardised, versioned and documented prompts (a prompt taxonomy by intent); we don't improvise the question, we apply a reproducible protocol and follow the same grid over time.
7Geographic and linguistic context changes the response
An AI may respond differently depending on the country (France, Belgium, Canada, United States, etc.) and the language.
Our answer : Each audit is carried out in a documented linguistic and geographic context (audits by language with market contextualisation, local competitors) — we always specify which market the measurement was taken in, rather than a context-free score.
What this honestly implies
The AGS is a calibrated and reproducible estimate, not an absolute truth. It is designed to be comparable over time (same judge_config_hash) and honest about its uncertainty (confidence intervals, GRC, corrected drift). Our goal is not a flattering figure, but a measurement that you — and your clients — can understand, verify and challenge.
The method
Measuring AI visibility is not an exact science. We document our limits and how the method accounts for them.
ReadThe proof, step by step
An anonymised example showing how an AI response actually becomes an AGS score.
ReadTechnical terms glossary
Go further
Measure your brand's real visibility in AI answers
Run an AGS audit and get an auditable score, openly stated limits and a concrete action plan.
See pricing