AI Coding FAQ
Jargon-free explanations of AI concepts for developers.
What are parameters?
When you see “7B model” or “32B model”, the B means billions of parameters.
A parameter is a number the model learned during training. When you give the model input, it multiplies that input by these numbers in various ways to produce output. Each parameter captures a tiny piece of “knowledge” - like how strongly one concept relates to another.
So a 7B model has 7 billion of these numbers stored. When you “load” a model, you’re loading those 7 billion floats into memory - which is why VRAM matters.
Parameters and weights mean essentially the same thing in this context.
What do we mean by “patterns”?
When people say a model has learned “patterns”, they mean learned associations from training data. The model has seen millions of examples like:
if (user == null)→ usually followed bythroworreturnasync function→ usually hasawaitsomewhere insideSELECT * FROM→ followed by a table nametry {→ needs acatchblock- “authentication” → related to “token”, “JWT”, “password”, “session”
Each parameter helps store these associations. More parameters = can store more complex relationships.
A smaller model (7 billion parameters aka 7B) knows all the basic patterns (syntax, common idioms, typical function structures). Where it struggles is holding multiple complex patterns active simultaneously - like “refactor this authentication flow while maintaining backward compatibility with the legacy API and handling token expiration edge cases.”
That requires juggling several concerns at once. Bigger models (32B, 70B+) have more “headroom” to do this.
Analogy: Like a junior dev who knows syntax perfectly but struggles with complex architectural decisions that require holding many concerns in mind at once.
What’s a context window?
The context window is how much text the model can “see” at once - measured in tokens (roughly 0.75 words per token).
- 4K context ≈ ~500 lines of code
- 32K context ≈ ~4,000 lines of code
- 128K context ≈ ~15,000 lines of code
This is separate from parameters. A 7B model and a 32B model might both have 128K context - they can both “see” the same amount of code, but the 32B model reasons about it more deeply.
Context matters because:
- Longer context = can include more of your codebase
- But models often perform worse at the edges of their context window
- More context = slower inference and higher VRAM usage
What’s quantization?
Quantization compresses a model to use less memory, with minor quality loss.
Models are normally stored in FP16 (16-bit floating point) - each parameter uses 2 bytes. A 32B model needs ~64GB in FP16.
Quantized versions (Q4, Q5, Q8) use fewer bits per parameter:
- Q4 = 4 bits per parameter → ~50% size, ~2-5% quality drop
- Q5 = 5 bits per parameter → ~60% size, ~1-3% quality drop
- Q8 = 8 bits per parameter → ~75% size, minimal quality drop
For local deployment, Q4 or Q5 is the sweet spot - significant VRAM savings with barely noticeable quality difference.
What’s the difference between inference and training?
Training = teaching the model (adjusting all those billions of parameters based on examples). Requires massive compute - weeks on thousands of GPUs.
Inference = using the model to generate responses. The model’s parameters are fixed, you’re just running input through them to get predictions.
When you use ChatGPT, Claude, or run a local model, you’re doing inference - not training or modifying the model.
What’s a “frontier” model?
Industry term for the most capable models available - currently GPT-4, Claude Sonnet/Opus, Gemini Ultra. These are:
- Massive (hundreds of billions to trillions of parameters)
- Trained on enormous datasets with extensive fine-tuning
- Only available via API (cloud)
- Expensive to run
“Local” or “open-source” models are smaller, less capable, but runnable on consumer hardware.
What’s “fine-tuning”?
Taking a pre-trained model and training it further on specific data.
Qwen 2.5 Coder is Qwen 2.5 (general model) fine-tuned on code. It starts with general language understanding, then learns coding patterns specifically.
You can fine-tune models yourself on your own codebase, but it requires significant compute and expertise. For most developers, using pre-fine-tuned coding models is simpler.
What’s a “system prompt”?
Instructions given to the model before your actual question. Sets the model’s persona, constraints, and behavior.
Example: “You are a senior Python developer. Write clean, well-documented code. Always include error handling.”
System prompts are part of your context window - longer system prompts leave less room for your actual code.