AI Technical

AI panel (LLM-as-a-judge)

The AI panel, or "LLM-as-a-judge", consists of having an answer assessed by several language models acting as judges. By cross-referencing models from different providers, you limit self-preference bias and obtain a more robust score than with a single model.

The principle

Rather than scoring an AI answer with a single model — which would tend to favour itself — you convene a panel of several judge models from different providers. Each one scores the answer against precise rubrics (presence, accuracy, sentiment), and the scores are then aggregated.

Why several judges?

  • Anti self-preference: a model cannot favour its own answers if the panel is diverse.
  • Robustness: the occasional errors of one judge are smoothed out by the others.
  • Detecting hallucinations: an unverifiable claim flagged by the judges lowers the score.

Reliability and traceability

The agreement between judges is measured by an inter-rater reliability coefficient. The exact composition of the panel for each audit is tracked via a configuration fingerprint, ensuring comparability over time. This panel is at the core of the AGS; see the methodology and the demonstration.

Only 16% of brands appear when their customers ask AIs. Does yours?

Every question asked to ChatGPT without your name in the answer is a competitor recommended instead of you — measured across 6,820 real AI answers.