📅 Data snapshot: June 2026

Using GitHub Copilot? Hit the Copilot filter above to see only models available in Copilot. Since June 2026, Copilot uses per-token AI credit billing — the $/task column directly reflects your cost.
Model Family Copilot $/task SWE-bench Aider Arena LiveBench
GPT-5.5OpenAI$0.55---80.7
GPT-5.4OpenAI$0.28---80.3
Gemini 3.1 ProGoogle$0.2269.6%--79.9
Claude Fable 5Anthropic$1.00---78.3
Claude Opus 4.8Anthropic$0.50---77.2
Claude Opus 4.7Anthropic$0.50---76.9
Claude Opus 4.6Anthropic$0.5075.6%--76.3
Claude Opus 4.5 thinking-32kAnthropic$0.5076.8%72.0%149776.0
Claude Sonnet 4.6Anthropic$0.30---75.5
Gemini 3.5 FlashGoogle$0.17---75.0
GPT-5.2 high reasoningOpenAI$0.2372.8%88.0%147074.8
DeepSeek V4 ProDeepSeek-$0.03---73.6
GPT-5.3 CodexOpenAI$0.23---72.8
Gemini 3 FlashGoogle$0.0675.8%-144372.4
Kimi K2.6 ThinkingMoonshot-----72.2
GPT-5.1OpenAI$0.16---72.0
GLM-5Zhipu-$0.0572.8%--68.9
GPT-5OpenAI$0.1665.0%88.0%140770.5
GPT-5.4 nanoOpenAI$0.02---70.1
Minimax M3Minimax-----70.0
Kimi K2.5Moonshot-$0.1570.8%--69.1
DeepSeek V4 FlashDeepSeek-$0.0170.0%74.2%135067.3
GPT-5.4 miniOpenAI$0.08---67.5
Grok 4xAI---79.6%-62.0
Grok 4.1 FastxAI----139360.0
Kimi K2 Thinking TurboMoonshot-$0.0663.4%59.1%135661.6
Minimax M2.5Minimax-$0.0775.8%--60.1
Claude Opus 4.5Anthropic$0.5076.8%70.7%146859.1
Gemini 2.5 ProGoogle$0.1653.6%83.1%137258.3
GLM-4.7Zhipu-$0.05--144058.1
Claude Opus 4.1Anthropic$1.5067.6%82.1%143154.5
Claude Sonnet 4.5Anthropic$0.3071.4%82.4%138353.7
GPT-5.2OpenAI$0.2372.8%88.0%143248.9
Claude Haiku 4.5Anthropic$0.1066.6%73.5%129045.3
GPT-4oOpenAI-$0.2348.9%72.9%1372-
Gemini 2.5 FlashGoogle-$0.0428.7%55.1%123347.7
GPT-5 miniOpenAI$0.0356.2%50.2%1145-
GPT-4.1OpenAI-$0.1839.6%52.4%1305-

Column guide

Column What it means
Copilot Available in GitHub Copilot (✓ = yes, - = not listed). Since June 1, 2026 Copilot uses token-based AI credit billing — same per-token rates as direct API.
$/task Estimated cost per task if using APIs directly (50K in + 10K out tokens). Also reflects Copilot AI credit cost since Jun 2026 billing change.
SWE-bench % of real GitHub issues the model can fix autonomously (source) - February 2026 data (standardized harness, high reasoning mode)
Aider % correct on multi-language code editing (source) - June 2025 data · Best signal for CLI/agentic use cases
Arena Elo rating from human preference voting on Code category (source) - February 2026 data
LiveBench Global average score across 23 diverse tasks (source) - June 2026 data, contamination-free
Data sources: SWE-bench (Feb 2026) · Aider (Oct 2025) · Arena Code (Feb 2026, not refreshed) · LiveBench (Jun 2026) · GitHub Copilot (AI credit billing since Jun 2026)
API pricing: Anthropic · OpenAI · Google · DeepSeek · Zhipu (GLM)