AI coding model comparison
📅 Data snapshot: June 2026
Using GitHub Copilot? Hit the Copilot filter above to see only models available in Copilot. Since June 2026, Copilot uses per-token AI credit billing — the $/task column directly reflects your cost.
| Model | Family | Copilot | $/task | SWE-bench | Aider | Arena | LiveBench |
|---|---|---|---|---|---|---|---|
| GPT-5.5 | OpenAI | ✓ | $0.55 | - | - | - | 80.7 |
| GPT-5.4 | OpenAI | ✓ | $0.28 | - | - | - | 80.3 |
| Gemini 3.1 Pro | ✓ | $0.22 | 69.6% | - | - | 79.9 | |
| Claude Fable 5 | Anthropic | ✓ | $1.00 | - | - | - | 78.3 |
| Claude Opus 4.8 | Anthropic | ✓ | $0.50 | - | - | - | 77.2 |
| Claude Opus 4.7 | Anthropic | ✓ | $0.50 | - | - | - | 76.9 |
| Claude Opus 4.6 | Anthropic | ✓ | $0.50 | 75.6% | - | - | 76.3 |
| Claude Opus 4.5 thinking-32k | Anthropic | ✓ | $0.50 | 76.8% | 72.0% | 1497 | 76.0 |
| Claude Sonnet 4.6 | Anthropic | ✓ | $0.30 | - | - | - | 75.5 |
| Gemini 3.5 Flash | ✓ | $0.17 | - | - | - | 75.0 | |
| GPT-5.2 high reasoning | OpenAI | ✓ | $0.23 | 72.8% | 88.0% | 1470 | 74.8 |
| DeepSeek V4 Pro | DeepSeek | - | $0.03 | - | - | - | 73.6 |
| GPT-5.3 Codex | OpenAI | ✓ | $0.23 | - | - | - | 72.8 |
| Gemini 3 Flash | ✓ | $0.06 | 75.8% | - | 1443 | 72.4 | |
| Kimi K2.6 Thinking | Moonshot | - | - | - | - | - | 72.2 |
| GPT-5.1 | OpenAI | ✓ | $0.16 | - | - | - | 72.0 |
| GLM-5 | Zhipu | - | $0.05 | 72.8% | - | - | 68.9 |
| GPT-5 | OpenAI | ✓ | $0.16 | 65.0% | 88.0% | 1407 | 70.5 |
| GPT-5.4 nano | OpenAI | ✓ | $0.02 | - | - | - | 70.1 |
| Minimax M3 | Minimax | - | - | - | - | - | 70.0 |
| Kimi K2.5 | Moonshot | - | $0.15 | 70.8% | - | - | 69.1 |
| DeepSeek V4 Flash | DeepSeek | - | $0.01 | 70.0% | 74.2% | 1350 | 67.3 |
| GPT-5.4 mini | OpenAI | ✓ | $0.08 | - | - | - | 67.5 |
| Grok 4 | xAI | - | - | - | 79.6% | - | 62.0 |
| Grok 4.1 Fast | xAI | - | - | - | - | 1393 | 60.0 |
| Kimi K2 Thinking Turbo | Moonshot | - | $0.06 | 63.4% | 59.1% | 1356 | 61.6 |
| Minimax M2.5 | Minimax | - | $0.07 | 75.8% | - | - | 60.1 |
| Claude Opus 4.5 | Anthropic | ✓ | $0.50 | 76.8% | 70.7% | 1468 | 59.1 |
| Gemini 2.5 Pro | ✓ | $0.16 | 53.6% | 83.1% | 1372 | 58.3 | |
| GLM-4.7 | Zhipu | - | $0.05 | - | - | 1440 | 58.1 |
| Claude Opus 4.1 | Anthropic | ✓ | $1.50 | 67.6% | 82.1% | 1431 | 54.5 |
| Claude Sonnet 4.5 | Anthropic | ✓ | $0.30 | 71.4% | 82.4% | 1383 | 53.7 |
| GPT-5.2 | OpenAI | ✓ | $0.23 | 72.8% | 88.0% | 1432 | 48.9 |
| Claude Haiku 4.5 | Anthropic | ✓ | $0.10 | 66.6% | 73.5% | 1290 | 45.3 |
| GPT-4o | OpenAI | - | $0.23 | 48.9% | 72.9% | 1372 | - |
| Gemini 2.5 Flash | - | $0.04 | 28.7% | 55.1% | 1233 | 47.7 | |
| GPT-5 mini | OpenAI | ✓ | $0.03 | 56.2% | 50.2% | 1145 | - |
| GPT-4.1 | OpenAI | - | $0.18 | 39.6% | 52.4% | 1305 | - |
Column guide
| Column | What it means |
|---|---|
| Copilot | Available in GitHub Copilot (✓ = yes, - = not listed). Since June 1, 2026 Copilot uses token-based AI credit billing — same per-token rates as direct API. |
| $/task | Estimated cost per task if using APIs directly (50K in + 10K out tokens). Also reflects Copilot AI credit cost since Jun 2026 billing change. |
| SWE-bench | % of real GitHub issues the model can fix autonomously (source) - February 2026 data (standardized harness, high reasoning mode) |
| Aider | % correct on multi-language code editing (source) - June 2025 data · Best signal for CLI/agentic use cases |
| Arena | Elo rating from human preference voting on Code category (source) - February 2026 data |
| LiveBench | Global average score across 23 diverse tasks (source) - June 2026 data, contamination-free |
Data sources:
SWE-bench (Feb 2026) ·
Aider (Oct 2025) ·
Arena Code (Feb 2026, not refreshed) ·
LiveBench (Jun 2026) ·
GitHub Copilot (AI credit billing since Jun 2026)
API pricing: Anthropic · OpenAI · Google · DeepSeek · Zhipu (GLM)
API pricing: Anthropic · OpenAI · Google · DeepSeek · Zhipu (GLM)