Method & Transparency

The proof: from an AI response to an AGS score

Here, step by step, is how an AGS score is actually produced: we put a question to an AI model, retrieve its raw response, extract facts from it (is the brand there? in what position? which competitors? which claims?), a jury of AIs grades it, and everything feeds into the score. Anonymised example below ("B2B IT Brand").

Illustrative anonymised example: no real names, no URLs, no exact claims. The case is masked ("B2B IT Brand", competitors "A" and "B") and the prompts are reworded. The aim is to show the measurement mechanics, not to publish a client audit.

The pipeline

  1. PROMPT            2. AI RESPONSE       3. EXTRACTION        4. GRADING         5. REPORT
  (question      -->   (raw text      -->   (brand? rank?    -->  (AI jury:     -->  (P/I/Q ->
   asked)               from model)          competitors?         accuracy,          AGS + CI)
                                             claims?)              sentiment...)

1Case no.1: a positive mention… but an unverified claim

Prompt (reworded, anonymised)

"For enterprise IT infrastructure support and maintenance, which provider is best and why?"

AI response (excerpt, anonymised)

"B2B IT Brand comes out best for critical support… it highlights a predictive analytics tool that reportedly resolves 'over 80% of incidents automatically'… Competitor A remains competitive on cost…"

Extraction (what the system detects)

Brand detected Position Competitors Sentiment Source cited
Yes (direct mention) 1st Competitor A, Competitor B positive (≈ 0.8 / 1) none

Grading by the AI jury

  • Accuracy: medium — one judge flags an unverifiable claim: "the '80% of incidents resolved automatically' figure is not verifiable / possibly outdated".
  • evidence_grounded: false (the response is not backed by any verifiable source).
  • Judge confidence: medium.

What this looks like in the report

Mention in 1st position and positive sentiment, but brand-safety lowered (unsupported claim): this is the classic "well cited but poorly backed" case. Without this analysis, you'd simply have seen "cited as no.1, great"; the system, by contrast, flags the risk.

2Case no.2: the brand isn't genuinely known (an honest case)

Prompt (reworded, anonymised)

"What can you tell me about the reputation of B2B IT Brand?"

AI response (excerpt, anonymised)

"I don't have any verifiable information about B2B IT Brand in my knowledge…"

Extraction & grading

  • Brand detected: the name is repeated, but is_genuinely_known: false (the model doesn't actually know the brand — it is paraphrasing the question).
  • Low accuracy, presence "by echo" rather than spontaneous.

In the report

This case does not inflate the score: "echo" presence (the model repeats the name it was given) is distinguished from spontaneous visibility. This is precisely what prevents a flattering but false score.

How these responses become a score

Each response goes through the jury (several AIs), produces its P / I / Q grades, and the AGS aggregates everything into a geometric mean, with a confidence interval and GRC (inter-judge agreement) published. For the detail of the formula, see the AGS Methodology page. For the limits we own up to, see the Limits & variability page.

A note of honesty on this page

This example is anonymised and reworded to illustrate the mechanics without exposing a real client. The figures and wording are representative, not a named audit. Soon, we will publish a self-audit of AI Labs Audit (us, by name), with real raw data — including our weak points.

Measure your brand's real visibility in AI answers

Run an AGS audit and get an auditable score, openly stated limits and a concrete action plan.

See pricing