Glossary
Prompt Win Rate
Binary per prompt: did we satisfy the rule (win) or not?
Complements LLM-Score by highlighting which intents still fail.
Definition
Prompt Win Rate counts how many prompts in a fixed prompt pack pass a clear success criterion after models answer. The criterion can be “brand mentioned correctly”, “no hallucinated price”, “included in top-3 alternatives”, etc. Unlike a blended 0–100 score, win rate is easy to explain to stakeholders: “we won 37 of 50 category prompts this week.”
How it's computed
For each prompt × model pair, run the answer through automated checks (regex + NER + classifier) and optional human review for edge cases. Win rate = wins ÷ eligible prompts. Eligibility rules exclude prompts that are out-of-scope or blocked by safety filters so the denominator stays meaningful.
How it works in practice
How teams use it
- Sprint retros — compare win rate before/after publishing new FAQ blocks.
- Model triage — if ChatGPT wins but YandexGPT fails, invest in regional sources.
- Pair with fanout — compute win rate across fanout queries to ensure wins are not fragile wording luck.
How to read it
A high win rate with toxic sentiment still needs content fixes — always read the underlying quotes next to the percentage.
When to use
- When leadership wants a KPI simpler than LLM-Score.
- When you track compliance-style prompts (claims, medical, finance).
- When agencies report weekly progress on a fixed test deck.