📅 Data snapshot: March 2026

Model Family Copilot $/task SWE-bench Aider Arena LiveBench
GPT-5.4OpenAI$0.28---80.3
Gemini 3.1 ProGoogle$0.2269.6%--79.9
Claude Opus 4.6Anthropic$0.5075.6%--76.3
Claude Opus 4.5 thinking-32kAnthropic-$0.5076.8%72.0%149776.0
Claude Sonnet 4.6Anthropic$0.30---75.5
Claude Opus 4.5Anthropic$0.5076.8%70.7%146859.1
Minimax M2.5Minimax-$0.0775.8%--60.1
GPT-5.2 high reasoningOpenAI$0.2372.8%88.0%147074.8
GLM-5Zhipu-$0.0572.8%--68.9
Claude Sonnet 4.5Anthropic$0.3071.4%82.4%138353.7
Kimi K2.5Moonshot-$0.1570.8%--69.1
DeepSeek V3.2 ReasonerDeepSeek-$0.0270.0%74.2%135062.2
GPT-5.2OpenAI$0.2372.8%88.0%143248.9
Claude Opus 4.1Anthropic10×$1.5067.6%82.1%143154.5
Claude Haiku 4.5Anthropic0.33×$0.1066.6%73.5%129045.3
GPT-5OpenAI$0.1665.0%88.0%140770.5
Kimi K2 Thinking TurboMoonshot-$0.0663.4%59.1%135661.6
Minimax M2Minimax-$0.0361.0%-1408-
o3OpenAI-$0.1858.4%81.3%1417-
GPT-5 miniOpenAI$0.0356.2%50.2%1145-
GLM-4.6Zhipu-$0.0555.4%--55.2
Devstral 2Mistral--53.8%-136341.2
Gemini 2.5 ProGoogle$0.1653.6%83.1%137258.3
GPT-5.1OpenAI$0.16---72.0
Grok 4xAI---79.6%-62.0
Grok 4.1 FastxAI0.25×---139360.0
Gemini 3 FlashGoogle0.33×$0.0675.8%-144372.4
GPT-4oOpenAI$0.2348.9%72.9%1372-
GLM-4.7Zhipu-$0.05--144058.1
Minimax M2.1 previewMinimax-$0.03--1408-
o4-miniOpenAI-$0.1045.0%72.0%1310-
GPT-4.1OpenAI$0.1839.6%52.4%1305-
DeepSeek V3.2 ChatDeepSeek-$0.0239.0%70.2%128751.8
Gemini 2.5 FlashGoogle-$0.0428.7%55.1%123347.7
Gemini 2.0 FlashGoogle-$0.0122.0%58.0%1214-
GPT-4o-miniOpenAI-$0.0118.6%55.6%1176-

Column guide

Column What it means
Copilot GitHub Copilot premium request multiplier (0× = free, 1× = standard, 3× = expensive, - = not available)
$/task Estimated cost per task if using APIs directly (50K in + 10K out tokens). Useful for comparing relative model costs - Copilot users pay via the multiplier instead.
SWE-bench % of real GitHub issues the model can fix autonomously (source) - February 2026 data (standardized harness, high reasoning mode)
Aider % correct on multi-language code editing (source) - October 2025 data
Arena Elo rating from human preference voting on Code category (source) - February 2026 data
LiveBench Global average score across 23 diverse tasks (source) - January 2026 data, contamination-free
Data sources: SWE-bench (Feb 2026) · Aider (Oct 2025) · Arena Code (Feb 2026, not refreshed) · LiveBench (Jan 2026) · GitHub Copilot
API pricing: Anthropic · OpenAI · Google · DeepSeek · Zhipu (GLM)