Documentation

Everything you need to get the most out of AI Model Comparator — from your first comparison to advanced tips used by power users.

Quick Start

AI Model Comparator lets you send one prompt to multiple AI models simultaneously and see their responses side-by-side with quality scores, disagreement detection, and cost tracking.

1
Write your prompt

Type any question or task in the prompt box on the Compare page. Optionally prefix with a slash command for advanced analysis.

2
Select models

Choose 2–4 AI models from the list. Trial users get 2 models; Pro and Mega users can run all models simultaneously.

3
Run & compare

Click 'Run Comparison'. Results appear in seconds with quality scores, a consensus summary, and disagreement highlights.

Slash Commands

Prefix your prompt with a slash command to unlock specialised analysis modes. Commands are processed server-side and add structured scoring on top of the raw model responses.

/rankQuality Ranking

Automatically ranks all model responses by quality score. Each response receives a 0–100 score based on completeness, clarity, and relevance. A ranked leaderboard is shown above the responses.

/rank Which Python web framework should I use for a REST API?

Best for: Evaluating subjective questions, getting a definitive recommendation, or comparing response quality on technical topics.

/estimateConsensus Estimate

Generates a consensus estimate by aggregating numerical or range-based answers from all models. Returns a central estimate, confidence range, and a breakdown of each model's contribution.

/estimate How long will it take to build a mobile app MVP?

Best for: Time estimates, cost projections, probability assessments, and any question with a numerical answer.

Choosing Models

Different models excel at different tasks. Here's a quick reference to help you pick the right combination:

ModelStrengthsBest for
GPT-4oReasoning, code, multimodalTechnical questions, code review, analysis
GPT-4.1Long context, instruction followingDocument analysis, complex instructions
Claude 3.5 SonnetWriting quality, nuance, safetyCreative writing, nuanced ethical questions
Claude Opus 4.5Deep reasoning, researchResearch tasks, complex multi-step problems
Gemini ProSpeed, factual recall, Google dataCurrent events, factual lookups, speed

Model availability depends on your plan tier. Trial users can select up to 2 models per query.

Reading Results

Quality Score

A 0–100 score generated by an AI evaluator assessing completeness, clarity, and relevance. Higher is better, but scores above 80 are generally excellent. This is not a measure of factual accuracy.

Consensus Analysis

The percentage of models that agree on the core answer. High consensus (>80%) means the models broadly agree. Low consensus (<50%) signals genuine ambiguity or a subjective question — the most interesting cases.

Disagreement Alert

When models diverge significantly, a red alert banner appears at the top of results. This highlights sycophancy risk — where one model may be telling you what you want to hear rather than the truth.

Response Time & Cost

Each result card shows the model's response time in seconds and the estimated token cost. Use this to balance quality vs. speed vs. cost for your use case.

Sharing & Export

Every comparison result can be shared as a public link. Recipients don't need an account to view shared results.

Share Link

Click the Share button in the results panel to generate a unique URL. The link is permanent and includes all responses, scores, and analysis.

Share on X

When a significant disagreement is detected, a Share on X button appears in the alert banner. It pre-fills a tweet with the disagreement summary and your share link.

Export

Pro and Mega users can export results as JSON, CSV, or Markdown using the Export button in the results panel. Useful for reports, research, or feeding results into other tools.

Tips & Tricks

Use disagreements as a signal

When models strongly disagree, it often means the question is genuinely hard or ambiguous. Dig deeper rather than picking the highest-scored answer.

Optimise your prompt first

Toggle 'Optimise Prompt' before running a comparison. The AI rewriter tightens your prompt for clarity and removes filler, which consistently improves response quality across all models.

Compare the same question differently

Run the same topic with different phrasings to see how sensitive models are to wording. Fragile answers that change with small rephrasing are less reliable.

Use /rank for hiring decisions

Paste a job description and ask models to rank candidate profiles. The disagreements reveal which qualities are genuinely debatable vs. clearly important.

Save templates for recurring prompts

If you run the same type of comparison regularly (code review, market research), save it as a template in the Library. One click to reuse with a new input.

Check the Changelog for new models

New models are added regularly. Check the Changelog page to see what's been added and when — you may find a better model for your use case.

Your API Keys

By default, AI Model Comparator uses shared system API keys so you don't need to supply your own. If you have your own OpenAI, Anthropic, or Google API keys, you can add them in Settings → API Keys.

Security note

Your API keys are encrypted with AES-256 before storage and are never logged or exposed in API responses. They are decrypted only at query time on the server.

Using your own keys means queries are billed to your provider accounts directly. This is useful if you have negotiated rates, need higher rate limits, or want full control over your usage data.

Still have questions?

AI Model Comparator

Cookie & Tracking Notice

We use cookies and browser fingerprinting to track anonymous visitors and improve your experience. This helps us understand how you use our site before and after signup. Your data is retained for 90 days and is used solely for analytics and service improvement. You can opt out at any time in your settings.

By dismissing this notice without declining, you consent to tracking. Learn more in our Privacy Policy.