📅 Data snapshot: December 2025

Model Family Copilot $/task SWE-bench Aider Arena
Claude Opus 4.5Anthropic$0.5074.4%-1480
Claude Opus 4.5 thinkingAnthropic-$0.50--1520
Claude Opus 4.1Anthropic10×$1.5067.6%--
Claude Sonnet 4.5Anthropic$0.3070.6%-1387
Claude Sonnet 4.5 thinkingAnthropic-$0.30--1393
Claude 3.5 SonnetAnthropic-$0.30-84.2%-
Claude Haiku 4.5Anthropic0.33×$0.10--1290
Claude 3.5 HaikuAnthropic-$0.08-75.2%-
Claude 3 OpusAnthropic-$1.50-68.4%-
Claude 3 HaikuAnthropic-$0.025-47.4%-
GPT-5.2 highOpenAI$0.2371.8%-1484
GPT-5.2OpenAI$0.2369.0%--
GPT-5OpenAI$0.1665.0%--
GPT-4.1OpenAI$0.1839.6%--
GPT-4oOpenAI$0.23-72.9%-
GPT-4o-miniOpenAI-$0.01-55.6%-
GPT-5 miniOpenAI$0.03---
o1OpenAI-$1.35-84.2%-
o3OpenAI-$0.1858.4%-1417
o4-miniOpenAI-$0.1045.0%--
Gemini 3 ProGoogle$0.2274.2%-1478
Gemini 3 FlashGoogle0.33×$0.06--1465
Gemini 2.5 ProGoogle$0.1653.6%--
Gemini 2.5 FlashGoogle-$0.0428.7%--
Gemini 2.0 FlashGoogle-$0.01---
DeepSeek Coder V2DeepSeek---72.9%-
💡 Tip: Click "Copilot" header to sort by cost - free models (0×) first. Click "SWE-bench" to find top performers. The best value models have good scores AND low Copilot cost.

Column guide

Column What it means
Copilot GitHub Copilot premium request multiplier (0× = free, 1× = standard, 3× = expensive, - = not available)
$/task Estimated cost per task if using APIs directly (50K in + 10K out tokens). Useful for comparing relative model costs - Copilot users pay via the multiplier instead.
SWE-bench % of real GitHub issues the model can fix autonomously (source)
Aider % correct on multi-language code editing (source)
Arena Elo rating from human preference voting on web dev tasks (source)
Data sources: SWE-bench · Aider · Chatbot Arena WebDev · GitHub Copilot
API pricing: Anthropic · OpenAI · Google

Best value picks

Based on the data:

Use case Best value model Why
Daily coding (Copilot) Claude Sonnet 4.5 70% SWE-bench at 1× cost
Free in Copilot GPT-4o 73% Aider, costs nothing extra
Cheap in Copilot Gemini 3 Flash Arena #5 at 0.33× and only $0.06/task
When you need the best Gemini 3 Pro 74% SWE-bench at 1× (beats Opus!)
Cheapest API GPT-4o-mini $0.01/task - 50× cheaper than GPT-4o
Best $/performance Gemini 2.5 Flash $0.04/task, 29% SWE-bench