Getting Started: AI Coding Quick Reference

📅 Last updated: June 2026

No-nonsense reference for developers who just want to know which AI model to pick. Bookmark this and stop Googling.

📌 Quick pick - just tell me what to use

Features / bugs / tests

Claude Sonnet 4.5/4.6
Gemini 3.1 Pro
GPT-5.2

Agentic / CLI / scaffolding

Claude Haiku 4.5
Gemini 3 Flash
GPT-5.4 mini

Architecture / refactors

Claude Fable 5 / Opus 4.8
Gemini 3.1 Pro
GPT-5.5 / 5.4

Code review

Claude Sonnet 4.5/4.6
Gemini 3.1 Pro
GPT-5.4

Documentation

Claude Sonnet 4.5/4.6
Gemini 3.1 Pro
GPT-4o

GitHub Copilot users

Filter the comparison table by Copilot to see what's available and at what cost

Design / planning

Claude Fable 5 / Opus 4.8
Gemini 3.1 Pro
GPT-5.5 / 5.4

A note on versions

You’ll see version numbers everywhere: Sonnet 3.5, Sonnet 4, Sonnet 4.5. Gemini 2.5, Gemini 3. GPT-4o, GPT-5.

Don’t overthink it. The tier matters more than the version. “Sonnet” is the mid-tier Claude. “Opus” is the heavyweight Claude. “Flash” is the fast/cheap Gemini. Your IDE usually offers the latest version of each tier - just pick the tier that fits your task.

When this page says “Sonnet”, it means whatever the current Sonnet is. Same for the others.

The big three model families

Speed key: ⚡⚡⚡ Fast · ⚡⚡ Medium · ⚡ Slow

Anthropic (Claude)

Model	What it’s for	Speed	Cost
Haiku	Fast tasks, scaffolding, CLI	⚡⚡⚡	💰
Sonnet	Everyday coding	⚡⚡	💰💰
Opus	Complex reasoning, design	⚡	💰💰💰
Fable	Frontier reasoning, highest quality	⚡	💰💰💰💰

Start with Sonnet. It's the workhorse. Reach for Opus when Sonnet can't handle the complexity. Fable 5 is the new top tier — only worth it for genuinely hard reasoning tasks at $1/task.

OpenAI (GPT)

Model	What it’s for	Speed	Cost
GPT-5 mini	Quick questions, completions	⚡⚡⚡	💰
GPT-5.4 mini / nano	Budget options, high volume	⚡⚡⚡	💰
GPT-5.2 / 5.3-Codex	Reliable everyday coding	⚡⚡	💰💰💰
GPT-5.4	Versatile, all-round	⚡⚡	💰💰💰
GPT-5.5	Latest, highest performance	⚡	💰💰💰💰

Skip the "o-series" reasoning models (o1, o3, o4) for everyday coding. They think longer and cost more - o1 is particularly expensive at ~$1.35/task. Save them for:

Implementing algorithms (graph traversal, dynamic programming)
Debugging race conditions or complex state machines
Mathematical proofs or formal verification

Google (Gemini)

Model	What it’s for	Speed	Cost
Gemini 2.5 Flash	Fast tasks, high volume	⚡⚡⚡	💰
Gemini 2.5 Pro	Complex reasoning	⚡	💰💰💰
Gemini 3 Flash	Everyday coding	⚡⚡	💰💰
Gemini 3.1 Flash-Lite	Ultra-cheap agentic tasks	⚡⚡⚡	💰
Gemini 3.1 Pro	Heavy lifting	⚡	💰💰💰
Gemini 3.5 Flash	Latest, high performance	⚡⚡	💰💰💰

Benchmarks

Want numbers?

Compare all models - sortable table, filter by Copilot cost
Benchmark details - methodology, sources, caveats

The TLDR:

GPT-5.5 is the new LiveBench #1 (80.7) — edges out GPT-5.4 (80.3) and Gemini 3.1 Pro (79.9)
Anthropic has two new top-tier models: Claude Fable 5 (78.3, $1/task) and Claude Opus 4.8 (77.2, $0.50/task) — Fable 5 is a new model tier above Opus
Gemini 3.5 Flash is Google’s new model (75.0 LiveBench) — available in Copilot
Gemini 2.0 Flash is gone — shut down June 1, 2026. Migrate to Gemini 2.5 Flash or 3 Flash
Copilot billing changed June 1, 2026 — moved from “premium request multipliers” to token-based AI credits. Cost per interaction = tokens × model price (same as direct API). See benchmark details for the full breakdown
Claude Opus 4.5 leads SWE-bench at 76.8% — DeepSeek V4 Pro (73.6 LB) and V4 Flash (70% SWE) are the standout budget options
GPT-5 high reasoning dominates Aider at 88% — Gemini 2.5 Pro thinking (83.1%) and Sonnet 4.5 (82.4%) follow

Benchmarks are useful for gut-checking, but the real test is running a model on your own work.

Marketing BS decoder

They say	It means
“Most intelligent”	Bigger, slower, pricier
“Balanced”	Mid-tier - usually right
“Fast” / “efficient”	Smaller, cheaper, simpler
“Reasoning” / “thinking”	Extra thinking time - see below
“Preview” / “experimental”	Unstable - skip it
“200K context”	Can see lots of code - but should it?

Opus is NOT a "thinking" model. It's just big and slow. "Thinking" models (o1, o3, Opus-thinking, Sonnet-thinking) explicitly reason step-by-step before responding - you'll see them labeled with "thinking" or "reasoning" in the model name. Regular Opus/Sonnet/GPT-5 are slower because they're larger, not because they're doing extra reasoning passes.

💡 "Thinking..." in the UI ≠ reasoning model. When your IDE shows "Thinking..." or a spinner, that's just the model processing your request - every model does this. True reasoning models show you their actual chain-of-thought (sometimes in a collapsible section), and are explicitly labeled "thinking" or "reasoning" in the model picker. Don't confuse a slow response with deep reasoning.

When do “reasoning” models actually help?

Reasoning models (o1, o3, “thinking” variants) work through problems step-by-step before responding.

Worth it for:

Implementing complex algorithms (A*, red-black trees, constraint solvers)
Debugging concurrency issues, race conditions, deadlocks
Untangling deeply nested dependency chains
Mathematical proofs or formal logic

Overkill for:

Adding a new API endpoint
Fixing a null pointer exception
Writing unit tests
Refactoring for readability
Most day-to-day feature work

A standard model with a good prompt is faster and cheaper for 90% of coding tasks.

What about context window size?

Context window (what’s this?) = how much code the model can “see” at once. Bigger sounds better, but:

More context = more noise. The model gets distracted.
More context = slower and pricier. You pay per token.
You rarely need it. Most tasks involve a few files, not hundreds.

Big windows help for: exploring unfamiliar codebases, analysing logs, multi-file refactors. For everyday coding, focused context beats massive context.

Ben Hall

Getting Started: AI Coding Quick Reference

📌 Quick pick - just tell me what to use

A note on versions

The big three model families

Anthropic (Claude)

OpenAI (GPT)

Google (Gemini)

Benchmarks

Marketing BS decoder

When do “reasoning” models actually help?

What about context window size?

Sources