📅 Data snapshot: December 2025

Model Family Copilot $/task SWE-bench Aider Arena
Claude Opus 4.5Anthropic$0.5074.4%-1480
Claude Opus 4.5 thinkingAnthropic-$0.50--1520
Claude Opus 4.1Anthropic10×$1.5067.6%--
Claude Sonnet 4.5Anthropic$0.3070.6%-1387
Claude Sonnet 4.5 thinkingAnthropic-$0.30--1393
Claude 3.5 SonnetAnthropic-$0.30-84.2%-
Claude Haiku 4.5Anthropic0.33×$0.10--1290
Claude 3.5 HaikuAnthropic-$0.08-75.2%-
Claude 3 OpusAnthropic-$1.50-68.4%-
Claude 3 HaikuAnthropic-$0.025-47.4%-
GPT-5.2 highOpenAI$0.2371.8%-1484
GPT-5.2OpenAI$0.2369.0%--
GPT-5OpenAI$0.1665.0%--
GPT-4.1OpenAI$0.1839.6%--
GPT-4oOpenAI$0.23-72.9%-
GPT-4o-miniOpenAI-$0.01-55.6%-
GPT-5 miniOpenAI$0.03---
o1OpenAI-$1.35-84.2%-
o3OpenAI-$0.1858.4%-1417
o4-miniOpenAI-$0.1045.0%--
Gemini 3 ProGoogle$0.2274.2%-1478
Gemini 3 FlashGoogle0.33×$0.06--1465
Gemini 2.5 ProGoogle$0.1653.6%--
Gemini 2.5 FlashGoogle-$0.0428.7%--
Gemini 2.0 FlashGoogle-$0.01---
DeepSeek V3DeepSeek-$0.02---
DeepSeek R1DeepSeek-$0.0560.0%--
DeepSeek Coder V2DeepSeek---72.9%-

Column guide

Column What it means
Copilot GitHub Copilot premium request multiplier (0× = free, 1× = standard, 3× = expensive, - = not available)
$/task Estimated cost per task if using APIs directly (50K in + 10K out tokens). Useful for comparing relative model costs - Copilot users pay via the multiplier instead.
SWE-bench % of real GitHub issues the model can fix autonomously (source)
Aider % correct on multi-language code editing (source)
Arena Elo rating from human preference voting on web dev tasks (source)
Data sources: SWE-bench · Aider · Chatbot Arena WebDev · GitHub Copilot
API pricing: Anthropic · OpenAI · Google