Quick Start

AI Model Comparator lets you send one prompt to multiple AI models simultaneously and see their responses side-by-side with quality scores, disagreement detection, and cost tracking.

Write your prompt

Type any question or task in the prompt box on the Compare page. Optionally prefix with a slash command for advanced analysis.

Select models

Choose 2–4 AI models from the list. Trial users get 2 models; Pro and Mega users can run all models simultaneously.

Run & compare

Click 'Run Comparison'. Results appear in seconds with quality scores, a consensus summary, and disagreement highlights.

Slash Commands

Prefix your prompt with a slash command to unlock specialised analysis modes. Commands are processed server-side and add structured scoring on top of the raw model responses.

/rankQuality Ranking

Automatically ranks all model responses by quality score. Each response receives a 0–100 score based on completeness, clarity, and relevance. A ranked leaderboard is shown above the responses.

/rank Which Python web framework should I use for a REST API?

Best for: Evaluating subjective questions, getting a definitive recommendation, or comparing response quality on technical topics.

/estimateConsensus Estimate

Generates a consensus estimate by aggregating numerical or range-based answers from all models. Returns a central estimate, confidence range, and a breakdown of each model's contribution.

/estimate How long will it take to build a mobile app MVP?

Best for: Time estimates, cost projections, probability assessments, and any question with a numerical answer.

Choosing Models

Different models excel at different tasks. Here's a quick reference to help you pick the right combination:

Model	Strengths	Best for
GPT-4o	Reasoning, code, multimodal	Technical questions, code review, analysis
GPT-4.1	Long context, instruction following	Document analysis, complex instructions
Claude 3.5 Sonnet	Writing quality, nuance, safety	Creative writing, nuanced ethical questions
Claude Opus 4.5	Deep reasoning, research	Research tasks, complex multi-step problems
Gemini Pro	Speed, factual recall, Google data	Current events, factual lookups, speed

Model availability depends on your plan tier. Trial users can select up to 2 models per query.

Reading Results

Quality Score

A 0–100 score generated by an AI evaluator assessing completeness, clarity, and relevance. Higher is better, but scores above 80 are generally excellent. This is not a measure of factual accuracy.

Consensus Analysis

The percentage of models that agree on the core answer. High consensus (>80%) means the models broadly agree. Low consensus (<50%) signals genuine ambiguity or a subjective question — the most interesting cases.

Disagreement Alert

When models diverge significantly, a red alert banner appears at the top of results. This highlights sycophancy risk — where one model may be telling you what you want to hear rather than the truth.

Response Time & Cost

Each result card shows the model's response time in seconds and the estimated token cost. Use this to balance quality vs. speed vs. cost for your use case.

Tips & Tricks

✦Use disagreements as a signal

When models strongly disagree, it often means the question is genuinely hard or ambiguous. Dig deeper rather than picking the highest-scored answer.

✦Optimise your prompt first

Toggle 'Optimise Prompt' before running a comparison. The AI rewriter tightens your prompt for clarity and removes filler, which consistently improves response quality across all models.

✦Compare the same question differently

Run the same topic with different phrasings to see how sensitive models are to wording. Fragile answers that change with small rephrasing are less reliable.

✦Use /rank for hiring decisions

Paste a job description and ask models to rank candidate profiles. The disagreements reveal which qualities are genuinely debatable vs. clearly important.

✦Save templates for recurring prompts

If you run the same type of comparison regularly (code review, market research), save it as a template in the Library. One click to reuse with a new input.

✦Check the Changelog for new models

New models are added regularly. Check the Changelog page to see what's been added and when — you may find a better model for your use case.

Your API Keys

By default, AI Model Comparator uses shared system API keys so you don't need to supply your own. If you have your own OpenAI, Anthropic, or Google API keys, you can add them in Settings → API Keys.

Security note

Your API keys are encrypted with AES-256 before storage and are never logged or exposed in API responses. They are decrypted only at query time on the server.

Using your own keys means queries are billed to your provider accounts directly. This is useful if you have negotiated rates, need higher rate limits, or want full control over your usage data.

Still have questions?

Contact Support Ask on X

Documentation

Quick Start

Slash Commands

Choosing Models

Reading Results

Quality Score

Consensus Analysis

Disagreement Alert

Response Time & Cost

Tips & Tricks

Your API Keys

Cookie & Tracking Notice

Documentation

Quick Start

Slash Commands

Choosing Models

Reading Results

Quality Score

Consensus Analysis

Disagreement Alert

Response Time & Cost

Sharing & Export

Tips & Tricks

Your API Keys

Cookie & Tracking Notice