Home Dashboard My Clients New Client Audits Scheduling Prompts
🤖 Full Guide RoboKali assistant
Langue
EnglishEN FrançaisFR EspañolES DeutschDE
0%
Complete my profile

AGS Methodology — AI Grading System | AI Labs Audit

AGS (AI Grading System) is AI Labs Audit's scoring engine. It has every AI response graded by 5 AI judges that calibrate against each other, then publishes an inter-judge reliability coefficient so you know exactly how defensible your score is.

What is AGS?

AGS is an open-source multi-judge scoring protocol. Instead of relying on a single LLM to assess your brand's visibility (bias, hallucinations, model drift), AGS queries 5 AI judges in parallel (GPT-4o, Claude Sonnet, Gemini Pro, Mistral Large, Llama 3.1) and publishes the spread between them. The smaller the spread, the more reliable the score.

The 3 evaluated dimensions

  • P (Precision): does the answer mention your brand correctly, without confusion with a competitor or homonym? Measures hallucinations and attribution errors.
  • I (Informativeness): does the answer provide useful and differentiating information about your brand, or just name it? Measures the depth of the citation.
  • Q (Quality): is the answer factually correct and up-to-date? Measures information freshness and conformity to verifiable facts.

Evaluation protocol

For each audited prompt, AGS executes 5 parallel calls to the AI judges with identical instructions (zero-shot scoring). Scores are aggregated via a weighted average using each judge's declared confidence. The final result includes the average score, inter-judge standard deviation, and 95% bootstrap confidence interval.

Inter-judge reliability coefficient

AGS publishes the Fleiss kappa coefficient (multi-rater agreement measure) for each audit. A kappa above 0.80 indicates strong consensus among judges (highly reliable score). Between 0.60 and 0.80: moderate consensus. Below 0.60: weak consensus — the score should be interpreted with caution and the question rephrased.

Transparency and reproducibility

Each AGS audit produces a cryptographic hash of the prompts, raw responses, and individual scores. This signature proves the score has not been manipulated. AGS code is open source (MIT license) on github.com/sarsator/aqa-specification, and the scoring formula is versioned and published. Any customer can verify or challenge a score.

AGS Acronyms

GRC
Generative Response Coverage: percentage of prompts where at least one judge cites the brand.
GIS
Generative Inclusion Score: weighted average score based on brand position in the response (first mentioned = 100%, last = 0%).
ASR
Answer Sentiment Rating: tonality of the mention (positive/neutral/negative) on a -1 to +1 scale.
BVI
Brand Visibility Index: composite score (GRC × GIS × ASR), from 0 to 100, that summarises the brand's overall performance across tested AIs.
CIA
Citation Inter-judge Agreement: Fleiss kappa coefficient measuring agreement among the 5 AI judges on citation presence.

30 Advanced GEO Checks 2026

Sprint 15 delivered 30 new GEO/AEO signals measured passively (zero ToS-violating scraping). These signals complement the AGS scoring through the 6th category 'advanced_signals' (15% composite weight).

6 market differentiators

  • A08 — Specificity score (Princeton GEO 2024) — Density of tier-1 sourced statistics (Princeton GEO KDD 2024: +27 to +40% LLM citations).
  • A09 — Counter-arguments markers — No competing tool measures balanced-argumentation markers.
  • A07 — Date-stamped statements
  • S05 — Common Crawl inclusion
  • S08 — llms.txt RFC validation
  • B10 — Stack Overflow brand mentions

Module 6 — External Authority Signals

New module dedicated to external authority signals: LinkedIn, ProductHunt, G2/Capterra, Stack Overflow, GitHub, Substack/Medium.

Checks by GEO module

SSR / Crawlability
mainEntity · QAPage · Video transcripts · Speakable · @graph @id · inLanguage · Common Crawl · llms.txt · IndexNow · ai.txt · Verifications · HTTP/3 · Brotli
Entity Health
Wikidata · DBpedia
Citation Readiness
Sourced stats · Balanced argumentation · Inline dating · ItemList · Dataset · Blockquote cite · Internal links · Anchor entropy · News Sitemap
External Authority
Stack Overflow · LinkedIn · GitHub · ProductHunt · B2B reviews · Newsletter

All checks use safe_external_call (retry + cache + circuit breaker) and store their results in audits.advanced_checks_v2 (JSONB + GIN index).

Full article on the blog: 30 new GEO/AEO 2026 signals — State of the art audited.