Research: The Future of Software Development Teams with AI

Compiled: February 2026 Purpose: Complete research synthesis from top commentators and evidence-based sources on how AI is reshaping software teams, workflows, and engineering practice.


Sources Reviewed

Source Type Date
Thoughtworks Future of Software Engineering Retreat Multi-day retreat synthesis (Chatham House Rule) Feb 2026
Dan Shapiro Five Levels of AI Coding (blog) Jan 2026
StrongDM “Software Factory” Simon Willison write-up of production deployment Jan 2026
Mitchell Hashimoto AI Adoption Journey - 6 steps (blog) Jan 2026
Addy Osmani “Agents Need a Manager” / Agentic Engineering (blog) Jan 2026
HBR / Berkeley Haas “AI Doesn’t Reduce Work — It Intensifies It” (study, 200 employees) Late 2025
GitClear AI Code Quality Research (211M lines analysed) 2024–2025
DORA AI Capabilities Model, 2025 Accelerate State of DevOps Report 2025
Tom Dale (Ember.js) Mental health impact commentary 2025

Theme 1: The “Junior Dev” Framing is Obsolete

Almost every early take on AI coding settled on the same metaphor: treat it like a junior developer. This framing is now limiting. It undersells what AI agents can do (parallel execution, zero onboarding, instant duplication) and ignores genuine new risks (epistemic debt, drift, decision bottlenecks).

Old framing vs New framing

Old Framing New Framing
AI is a junior dev AI is an entire class of agent worker with different physics
Review its code Invest in specs, tests, and constraints so code review becomes secondary
Pair with it Orchestrate parallel streams and calibrate trust per task
It makes mistakes Non-determinism requires verification infrastructure, not just eyeballs
Use it for boilerplate Delegate entire work packages with acceptance criteria
Measure lines of code Measure coherence, comprehension, and system stability

Theme 2: Where Does the Rigor Go? (Thoughtworks Retreat)

The single most important question from the retreat. If AI takes over code production, engineering discipline doesn’t disappear — it migrates. The retreat identified five destinations:

1. Upstream to specification review

2. Into test suites as first-class artifacts

“I’ve gotten better results from TDD and agent coding than I’ve ever gotten anywhere else, because it stops a particular mental error where the agent writes a test that verifies the broken behaviour.”

3. Into type systems and constraints

4. Into risk mapping

5. Into continuous comprehension

“Paired programming solves all of this. If it’s important to understand the system, then do it all the time.”


Theme 3: The Middle Loop — A New Category of Work (Thoughtworks Retreat)

Nobody in the industry has named this yet.

Software development has two recognised loops:

The retreat identified a third: the middle loop — supervisory engineering work sitting between them.

What middle loop work involves:

Who excels at it:

Career identity crisis:

PM convergence:


Theme 4: Maturity Models and Adoption Journeys

Dan Shapiro’s Five Levels of AI Coding

Level Name Description
1 Spicy Autocomplete Tab-complete on steroids
2 Chat Pair Programmer Conversational back-and-forth
3 The Trap Agents write lots of code, humans lose comprehension — “the uncanny valley of AI coding”
4 AI-Native Development Rearchitected workflows where AI writes and humans specify/verify
5 Dark Factory Full autonomous operation (theoretical)

Key insight: Level 3 is where most teams stall. Productivity numbers look great but system understanding erodes. Teams that don’t deliberately move to Level 4 practices accumulate epistemic debt.

Mitchell Hashimoto’s 6-Step Adoption Journey

  1. Drop the chatbot. Stop using ChatGPT in a browser tab. Use AI inside the IDE.
  2. Reproduce your existing work. Use AI to redo tasks you already know how to do. You can verify quality because you know the answer.
  3. End-of-day agents. Queue up agent tasks at end of day. Review results next morning. Builds trust calibration.
  4. Outsource slam dunks. Give agents the straightforward, well-defined work. Free human time for hard problems.
  5. Engineer the harness. Build AGENTS.md, custom rules, project context files. The harness is more valuable than any single prompt.
  6. Always have an agent running. Continuous background agent work on lower-priority tasks. You review and redirect.

Key insight: “Invest in the harness, not the prompts.”

Addy Osmani’s “Agents Need a Manager”


Theme 5: The Evidence — What’s Actually Happening

HBR / Berkeley Haas Study (200 employees, real workplace)

GitClear Research (211M lines of code analysed)

Tom Dale (Ember.js creator) on Mental Health

DORA AI Capabilities Model (2025 Report)


Theme 6: Agent Topologies and Enterprise Architecture (Thoughtworks Retreat)

Conway’s Law didn’t retire. It got more complicated.

Speed mismatch

Agent drift

Decision fatigue as new bottleneck

The StrongDM “Software Factory” (via Simon Willison)


Theme 7: Self-Healing Systems (Thoughtworks Retreat)

Prerequisites that don’t exist yet:

The latent knowledge problem

Incident commander problem

Agent coordination risks


Theme 8: Security, Governance, and Agile (Thoughtworks Retreat)

Security is dangerously behind

Agile is evolving, not dying

Batch size regression


Theme 9: The Human Side — Roles, Skills, Experience (Thoughtworks Retreat)

Productivity/experience paradox

Staff engineers under pressure

Juniors are more valuable, not less

Mid-levels are the real concern

University of Waterloo co-op model highlighted


Theme 10: Agent Swarms (Thoughtworks Retreat)

First barrier is mental, not technical

Collective convergence > individual accuracy

“Patrol workers on loops” — the more common pattern


Theme 11: Technical Foundations (Thoughtworks Retreat)

Programming languages for agents

Semantic layers and knowledge graphs

The agentic operating system


Key Open Questions (Retreat)

On work and identity

On organizational design

On trust and verification

On speed and stability


Synthesis: What to Act On Now

  1. Invest in the harness, not the prompts. AGENTS.md, test infrastructure, scenario holdouts, code quality metrics, WIP limits.
  2. Rigor migrates — track where yours is going. Specs, tests, constraints, risk mapping, comprehension practices.
  3. Name the middle loop. Recognise supervisory engineering as real work. Update career ladders.
  4. Watch batch size. AI makes large changesets easy — this is a stability regression. Keep batches small.
  5. TDD is the strongest form of prompt engineering. Tests before code is the single highest-leverage practice for AI-assisted development.
  6. Staff engineers are your leverage. Reposition them as friction killers, not just architects.
  7. Mid-levels need a plan. The retraining problem is real and unsolved. Don’t ignore it.
  8. Security can’t wait. Agent access = full access. Platform engineering must drive secure defaults.
  9. Measure comprehension, not just output. Epistemic debt is invisible until it’s catastrophic.
  10. Start now, start small. Hashimoto’s step 1: stop using a browser chatbot. Move AI into the IDE.