Getting Started: AI Coding Quick Reference
📅 Last updated: June 2026
No-nonsense reference for developers who just want to know which AI model to pick. Bookmark this and stop Googling.
📌 Quick pick - just tell me what to use
Gemini 3.1 Pro
GPT-5.2
Gemini 3 Flash
GPT-5.4 mini
Gemini 3.1 Pro
GPT-5.5 / 5.4
Gemini 3.1 Pro
GPT-5.4
Gemini 3.1 Pro
GPT-4o
Gemini 3.1 Pro
GPT-5.5 / 5.4
A note on versions
You’ll see version numbers everywhere: Sonnet 3.5, Sonnet 4, Sonnet 4.5. Gemini 2.5, Gemini 3. GPT-4o, GPT-5.
Don’t overthink it. The tier matters more than the version. “Sonnet” is the mid-tier Claude. “Opus” is the heavyweight Claude. “Flash” is the fast/cheap Gemini. Your IDE usually offers the latest version of each tier - just pick the tier that fits your task.
When this page says “Sonnet”, it means whatever the current Sonnet is. Same for the others.
The big three model families
Speed key: ⚡⚡⚡ Fast · ⚡⚡ Medium · ⚡ Slow
Anthropic (Claude)
| Model | What it’s for | Speed | Cost |
|---|---|---|---|
| Haiku | Fast tasks, scaffolding, CLI | ⚡⚡⚡ | 💰 |
| Sonnet | Everyday coding | ⚡⚡ | 💰💰 |
| Opus | Complex reasoning, design | ⚡ | 💰💰💰 |
| Fable | Frontier reasoning, highest quality | ⚡ | 💰💰💰💰 |
OpenAI (GPT)
| Model | What it’s for | Speed | Cost |
|---|---|---|---|
| GPT-5 mini | Quick questions, completions | ⚡⚡⚡ | 💰 |
| GPT-5.4 mini / nano | Budget options, high volume | ⚡⚡⚡ | 💰 |
| GPT-5.2 / 5.3-Codex | Reliable everyday coding | ⚡⚡ | 💰💰💰 |
| GPT-5.4 | Versatile, all-round | ⚡⚡ | 💰💰💰 |
| GPT-5.5 | Latest, highest performance | ⚡ | 💰💰💰💰 |
- Implementing algorithms (graph traversal, dynamic programming)
- Debugging race conditions or complex state machines
- Mathematical proofs or formal verification
Google (Gemini)
| Model | What it’s for | Speed | Cost |
|---|---|---|---|
| Gemini 2.5 Flash | Fast tasks, high volume | ⚡⚡⚡ | 💰 |
| Gemini 2.5 Pro | Complex reasoning | ⚡ | 💰💰💰 |
| Gemini 3 Flash | Everyday coding | ⚡⚡ | 💰💰 |
| Gemini 3.1 Flash-Lite | Ultra-cheap agentic tasks | ⚡⚡⚡ | 💰 |
| Gemini 3.1 Pro | Heavy lifting | ⚡ | 💰💰💰 |
| Gemini 3.5 Flash | Latest, high performance | ⚡⚡ | 💰💰💰 |
Benchmarks
Want numbers?
- Compare all models - sortable table, filter by Copilot cost
- Benchmark details - methodology, sources, caveats
The TLDR:
- GPT-5.5 is the new LiveBench #1 (80.7) — edges out GPT-5.4 (80.3) and Gemini 3.1 Pro (79.9)
- Anthropic has two new top-tier models: Claude Fable 5 (78.3, $1/task) and Claude Opus 4.8 (77.2, $0.50/task) — Fable 5 is a new model tier above Opus
- Gemini 3.5 Flash is Google’s new model (75.0 LiveBench) — available in Copilot
- Gemini 2.0 Flash is gone — shut down June 1, 2026. Migrate to Gemini 2.5 Flash or 3 Flash
- Copilot billing changed June 1, 2026 — moved from “premium request multipliers” to token-based AI credits. Cost per interaction = tokens × model price (same as direct API). See benchmark details for the full breakdown
- Claude Opus 4.5 leads SWE-bench at 76.8% — DeepSeek V4 Pro (73.6 LB) and V4 Flash (70% SWE) are the standout budget options
- GPT-5 high reasoning dominates Aider at 88% — Gemini 2.5 Pro thinking (83.1%) and Sonnet 4.5 (82.4%) follow
Benchmarks are useful for gut-checking, but the real test is running a model on your own work.
Marketing BS decoder
| They say | It means |
|---|---|
| “Most intelligent” | Bigger, slower, pricier |
| “Balanced” | Mid-tier - usually right |
| “Fast” / “efficient” | Smaller, cheaper, simpler |
| “Reasoning” / “thinking” | Extra thinking time - see below |
| “Preview” / “experimental” | Unstable - skip it |
| “200K context” | Can see lots of code - but should it? |
When do “reasoning” models actually help?
Reasoning models (o1, o3, “thinking” variants) work through problems step-by-step before responding.
Worth it for:
- Implementing complex algorithms (A*, red-black trees, constraint solvers)
- Debugging concurrency issues, race conditions, deadlocks
- Untangling deeply nested dependency chains
- Mathematical proofs or formal logic
Overkill for:
- Adding a new API endpoint
- Fixing a null pointer exception
- Writing unit tests
- Refactoring for readability
- Most day-to-day feature work
A standard model with a good prompt is faster and cheaper for 90% of coding tasks.
What about context window size?
Context window (what’s this?) = how much code the model can “see” at once. Bigger sounds better, but:
- More context = more noise. The model gets distracted.
- More context = slower and pricier. You pay per token.
- You rarely need it. Most tasks involve a few files, not hundreds.
Big windows help for: exploring unfamiliar codebases, analysing logs, multi-file refactors. For everyday coding, focused context beats massive context.