Method & Transparency

Known limits & measurement variability

Measuring a brand's visibility in generative AIs is not an exact science: the models are non-deterministic and evolve over time. Rather than hiding this reality behind a "clean" score, we document it — and we explain how our method accounts for it.

1AI responses vary from one run to the next

The same prompt can yield different responses depending on the moment and the model's randomness (the observed variability is typically in the order of 15 to 20% on certain queries).

Our answer : A jury of several judges, robust aggregation, and a Wilson confidence interval displayed — you see the margin of uncertainty, not just a single figure.

2Each model has its own behaviour

Models are not all equal: some cite technical sources more readily, others are less stable on local queries, and others vary depending on the language.

Our answer : A multi-model audit (several judges, dozens of models queried) with per-model detail in the report, rather than a single overall grade that would mask these differences.

3Scores shift when providers change their models

When OpenAI, Google or Anthropic update a model, the responses — and therefore the scores — can shift without the brand having changed anything.

Our answer : The anchor set (benchmark brands re-measured continuously) isolates this model drift from the brand's actual performance; the client's score is corrected accordingly.

4A single point-in-time measurement is not enough

AI visibility is unstable over time: a single measurement is misleading.

Our answer : Scheduled / recurring audits and trend tracking — we look at the evolution, not a snapshot.

5The "web search" vs "model memory" share

A response changes depending on whether the model performed a web search (retrieval) or answered from its internal memory (parametric).

Our answer : A differential native vs web diagnosis to tell the two apart and know which lever to pull.

6Results depend on how the prompt is worded

Two ways of asking the same question can produce different responses — and therefore different scores.

Our answer : Standardised, versioned and documented prompts (a prompt taxonomy by intent); we don't improvise the question, we apply a reproducible protocol and follow the same grid over time.

7Geographic and linguistic context changes the response

An AI may respond differently depending on the country (France, Belgium, Canada, United States, etc.) and the language.

Our answer : Each audit is carried out in a documented linguistic and geographic context (audits by language with market contextualisation, local competitors) — we always specify which market the measurement was taken in, rather than a context-free score.

What this honestly implies

The AGS is a calibrated and reproducible estimate, not an absolute truth. It is designed to be comparable over time (same judge_config_hash) and honest about its uncertainty (confidence intervals, GRC, corrected drift). Our goal is not a flattering figure, but a measurement that you — and your clients — can understand, verify and challenge.

The method

Measuring AI visibility is not an exact science. We document our limits and how the method accounts for them.

Read

The proof, step by step

An anonymised example showing how an AI response actually becomes an AGS score.

Read

Technical terms glossary

Go further

Measure your brand's real visibility in AI answers

Run an AGS audit and get an auditable score, openly stated limits and a concrete action plan.

See pricing