Quick Start
AI Model Comparator lets you send one prompt to multiple AI models simultaneously and see their responses side-by-side with quality scores, disagreement detection, and cost tracking.
Type any question or task in the prompt box on the Compare page. Optionally prefix with a slash command for advanced analysis.
Choose 2–4 AI models from the list. Trial users get 2 models; Pro and Mega users can run all models simultaneously.
Click 'Run Comparison'. Results appear in seconds with quality scores, a consensus summary, and disagreement highlights.
Slash Commands
Prefix your prompt with a slash command to unlock specialised analysis modes. Commands are processed server-side and add structured scoring on top of the raw model responses.
/rankQuality RankingAutomatically ranks all model responses by quality score. Each response receives a 0–100 score based on completeness, clarity, and relevance. A ranked leaderboard is shown above the responses.
Best for: Evaluating subjective questions, getting a definitive recommendation, or comparing response quality on technical topics.
/estimateConsensus EstimateGenerates a consensus estimate by aggregating numerical or range-based answers from all models. Returns a central estimate, confidence range, and a breakdown of each model's contribution.
Best for: Time estimates, cost projections, probability assessments, and any question with a numerical answer.
Choosing Models
Different models excel at different tasks. Here's a quick reference to help you pick the right combination:
| Model | Strengths | Best for |
|---|---|---|
| GPT-4o | Reasoning, code, multimodal | Technical questions, code review, analysis |
| GPT-4.1 | Long context, instruction following | Document analysis, complex instructions |
| Claude 3.5 Sonnet | Writing quality, nuance, safety | Creative writing, nuanced ethical questions |
| Claude Opus 4.5 | Deep reasoning, research | Research tasks, complex multi-step problems |
| Gemini Pro | Speed, factual recall, Google data | Current events, factual lookups, speed |
Model availability depends on your plan tier. Trial users can select up to 2 models per query.
Reading Results
Quality Score
A 0–100 score generated by an AI evaluator assessing completeness, clarity, and relevance. Higher is better, but scores above 80 are generally excellent. This is not a measure of factual accuracy.
Consensus Analysis
The percentage of models that agree on the core answer. High consensus (>80%) means the models broadly agree. Low consensus (<50%) signals genuine ambiguity or a subjective question — the most interesting cases.
Disagreement Alert
When models diverge significantly, a red alert banner appears at the top of results. This highlights sycophancy risk — where one model may be telling you what you want to hear rather than the truth.
Response Time & Cost
Each result card shows the model's response time in seconds and the estimated token cost. Use this to balance quality vs. speed vs. cost for your use case.
Tips & Tricks
When models strongly disagree, it often means the question is genuinely hard or ambiguous. Dig deeper rather than picking the highest-scored answer.
Toggle 'Optimise Prompt' before running a comparison. The AI rewriter tightens your prompt for clarity and removes filler, which consistently improves response quality across all models.
Run the same topic with different phrasings to see how sensitive models are to wording. Fragile answers that change with small rephrasing are less reliable.
Paste a job description and ask models to rank candidate profiles. The disagreements reveal which qualities are genuinely debatable vs. clearly important.
If you run the same type of comparison regularly (code review, market research), save it as a template in the Library. One click to reuse with a new input.
New models are added regularly. Check the Changelog page to see what's been added and when — you may find a better model for your use case.
Your API Keys
By default, AI Model Comparator uses shared system API keys so you don't need to supply your own. If you have your own OpenAI, Anthropic, or Google API keys, you can add them in Settings → API Keys.
Security note
Your API keys are encrypted with AES-256 before storage and are never logged or exposed in API responses. They are decrypted only at query time on the server.
Using your own keys means queries are billed to your provider accounts directly. This is useful if you have negotiated rates, need higher rate limits, or want full control over your usage data.
Still have questions?
