AI coding model comparison

📅 Data snapshot: June 2026

Using GitHub Copilot? Hit the Copilot filter above to see only models available in Copilot. Since June 2026, Copilot uses per-token AI credit billing — the $/task column directly reflects your cost.

Model	Family	Copilot	$/task	SWE-bench	Aider	Arena	LiveBench
GPT-5.5	OpenAI	✓	$0.55	-	-	-	80.7
GPT-5.4	OpenAI	✓	$0.28	-	-	-	80.3
Gemini 3.1 Pro	Google	✓	$0.22	69.6%	-	-	79.9
Claude Fable 5	Anthropic	✓	$1.00	-	-	-	78.3
Claude Opus 4.8	Anthropic	✓	$0.50	-	-	-	77.2
Claude Opus 4.7	Anthropic	✓	$0.50	-	-	-	76.9
Claude Opus 4.6	Anthropic	✓	$0.50	75.6%	-	-	76.3
Claude Opus 4.5 thinking-32k	Anthropic	✓	$0.50	76.8%	72.0%	1497	76.0
Claude Sonnet 4.6	Anthropic	✓	$0.30	-	-	-	75.5
Gemini 3.5 Flash	Google	✓	$0.17	-	-	-	75.0
GPT-5.2 high reasoning	OpenAI	✓	$0.23	72.8%	88.0%	1470	74.8
DeepSeek V4 Pro	DeepSeek	-	$0.03	-	-	-	73.6
GPT-5.3 Codex	OpenAI	✓	$0.23	-	-	-	72.8
Gemini 3 Flash	Google	✓	$0.06	75.8%	-	1443	72.4
Kimi K2.6 Thinking	Moonshot	-	-	-	-	-	72.2
GPT-5.1	OpenAI	✓	$0.16	-	-	-	72.0
GLM-5	Zhipu	-	$0.05	72.8%	-	-	68.9
GPT-5	OpenAI	✓	$0.16	65.0%	88.0%	1407	70.5
GPT-5.4 nano	OpenAI	✓	$0.02	-	-	-	70.1
Minimax M3	Minimax	-	-	-	-	-	70.0
Kimi K2.5	Moonshot	-	$0.15	70.8%	-	-	69.1
DeepSeek V4 Flash	DeepSeek	-	$0.01	70.0%	74.2%	1350	67.3
GPT-5.4 mini	OpenAI	✓	$0.08	-	-	-	67.5
Grok 4	xAI	-	-	-	79.6%	-	62.0
Grok 4.1 Fast	xAI	-	-	-	-	1393	60.0
Kimi K2 Thinking Turbo	Moonshot	-	$0.06	63.4%	59.1%	1356	61.6
Minimax M2.5	Minimax	-	$0.07	75.8%	-	-	60.1
Claude Opus 4.5	Anthropic	✓	$0.50	76.8%	70.7%	1468	59.1
Gemini 2.5 Pro	Google	✓	$0.16	53.6%	83.1%	1372	58.3
GLM-4.7	Zhipu	-	$0.05	-	-	1440	58.1
Claude Opus 4.1	Anthropic	✓	$1.50	67.6%	82.1%	1431	54.5
Claude Sonnet 4.5	Anthropic	✓	$0.30	71.4%	82.4%	1383	53.7
GPT-5.2	OpenAI	✓	$0.23	72.8%	88.0%	1432	48.9
Claude Haiku 4.5	Anthropic	✓	$0.10	66.6%	73.5%	1290	45.3
GPT-4o	OpenAI	-	$0.23	48.9%	72.9%	1372	-
Gemini 2.5 Flash	Google	-	$0.04	28.7%	55.1%	1233	47.7
GPT-5 mini	OpenAI	✓	$0.03	56.2%	50.2%	1145	-
GPT-4.1	OpenAI	-	$0.18	39.6%	52.4%	1305	-

Column guide

Column	What it means
Copilot	Available in GitHub Copilot (✓ = yes, - = not listed). Since June 1, 2026 Copilot uses token-based AI credit billing — same per-token rates as direct API.
$/task	Estimated cost per task if using APIs directly (50K in + 10K out tokens). Also reflects Copilot AI credit cost since Jun 2026 billing change.
SWE-bench	% of real GitHub issues the model can fix autonomously (source) - February 2026 data (standardized harness, high reasoning mode)
Aider	% correct on multi-language code editing (source) - June 2025 data · Best signal for CLI/agentic use cases
Arena	Elo rating from human preference voting on Code category (source) - February 2026 data
LiveBench	Global average score across 23 diverse tasks (source) - June 2026 data, contamination-free

Data sources: SWE-bench (Feb 2026) · Aider (Oct 2025) · Arena Code (Feb 2026, not refreshed) · LiveBench (Jun 2026) · GitHub Copilot (AI credit billing since Jun 2026)
API pricing: Anthropic · OpenAI · Google · DeepSeek · Zhipu (GLM)

Ben Hall

Column guide