AI for Engineers: Navigating the Developer Evolution
March 2026 ยท 10 min read ยท Written by ๐ค @claude & ๐ง @adityamohta
I recently ran a consulting session with the engineering team at a friend's company, Ablespace, on adopting agentic AI workflows. The conversation kept circling back to the same concerns: "How do we go agentic? We can't just trust AI with our entire codebase." This post is a distillation of that session โ part framework, part practical playbook.
Note: This space is moving incredibly fast. Some of the tools and workflows I mention here may be outdated within weeks. Focus on the principles, not the specific tools.
On November 24, 2025, Anthropic released Opus 4.5. Before that, AI could write maybe 90% of your code โ impressive, but that last 10% still needed a human at the keyboard. Opus 4.5 felt like a turning point โ the first time it seemed like AI could realistically handle 100% of the code.
A month later, Boris Cherny โ the creator of Claude Code โ said something that stuck with me:
"In the last thirty days, 100% of my contributions to Claude Code were written by Claude Code."
At the time, I couldn't wrap my head around how that was possible. It was also the holiday season, so I didn't give it much thought. But when I came back to work and gave Claude Code another try, something had clearly shifted. It captured my intent so well that it genuinely caught me off guard โ it felt like a big leap from what I'd been used to.
How the Industry Is Reacting
From conversations with friends who work at companies of various sizes, the split is pretty clear.
Large enterprises are cautious. Security and compliance reviews slow everything down โ and honestly, that's important. AI performs best when it has more freedom and power, but with that power comes real security risk. Giving an agent broad access to your codebase, CI pipelines, and production tooling is a genuine attack surface. These reviews exist for a reason.
Startups are moving fast. CLI-first workflows with tools like Claude Code are becoming the default. The speed of adoption is significantly higher because there are fewer gates to pass through.
Neither approach is wrong โ they're just operating under different constraints. But the direction is the same.
The 8 Stages of Developer Evolution
Steve Yegge wrote a great post called Welcome to GAS TOWN that lays out eight stages describing how developers evolve their relationship with AI agents, split into two phases. It resonated with my own experience, so I'm borrowing his framing here.
Stages 1โ4: The IDE Era
In the first phase, everything happens inside your editor:
| Stage | Name | What it looks like |
|---|---|---|
| 1 | Near-Zero AI | Tab completions, occasional chat questions |
| 2 | Agent + Permissions | A sidebar agent that asks permission before running tools |
| 3 | YOLO Mode On | Trust builds, you stop clicking "approve" on every action |
| 4 | Wide Agent in IDE | The agent fills your screen โ code is just for reviewing diffs |
Stages 5โ8: The CLI & Multi-Agent Era
The second phase moves beyond the IDE entirely:
| Stage | Name | What it looks like |
|---|---|---|
| 5 | CLI, Single Agent | YOLO mode in the terminal. Diffs scroll by. IDE-less development. |
| 6 | Multi-Agent YOLO | 3โ5 parallel agent instances. You become very fast. |
| 7 | 10+ Agents | Pushing the limits of what you can hand-manage |
| 8 | Orchestrator | You build your own coordination system |
graph TD
S1[Stage 1: Near-Zero AI] --> S2[Stage 2: Agent + Permissions]
S2 --> S3[Stage 3: YOLO Mode On]
S3 --> S4[Stage 4: Wide Agent in IDE]
S4 -.->|Hardest Jump| S5[Stage 5: CLI Single Agent]
S5 --> S6[Stage 6: Multi-Agent YOLO]
S6 --> S7[Stage 7: 10+ Agents]
S7 --> S8[Stage 8: Orchestrator]
style S1 fill:#161b22,stroke:#30363d,color:#c9d1d9
style S2 fill:#161b22,stroke:#30363d,color:#c9d1d9
style S3 fill:#161b22,stroke:#30363d,color:#c9d1d9
style S4 fill:#161b22,stroke:#da3633,color:#c9d1d9
style S5 fill:#161b22,stroke:#238636,color:#c9d1d9
style S6 fill:#161b22,stroke:#238636,color:#c9d1d9
style S7 fill:#161b22,stroke:#a371f7,color:#c9d1d9
style S8 fill:#161b22,stroke:#a371f7,color:#c9d1d9The dotted line between Stage 4 and Stage 5 is worth paying attention to.
The Hardest Jump: Stage 4 to Stage 5
I used to toggle between Stage 4 and 5. For a while I was comfortable with AI in the IDE โ the agent fills the screen, you review diffs, life is good.
The natural question I kept asking was: "Why would I leave the IDE?" It's familiar, visual, and it works.
What convinced me: CLI tooling scales beyond the editor. A CLI agent works in GitHub CI/CD, in automated testing pipelines, in staging environments โ anywhere a terminal can run. An IDE plugin can't follow you there. For the kind of work I do, it opened up a lot.
That said, the jump from Stage 4 to 5 was genuinely uncomfortable for me. I was trusting an agent to operate across my entire project without a visual safety net. It took me a while to get used to it. But once I got past that initial discomfort, moving to Stage 6 felt more incremental.
I'll be honest โ I haven't cracked Stages 7 and 8 yet. And I think the jump from Stage 6 to 7 is its own big challenge. Managing 3โ5 agents is one thing; coordinating 10+ is a fundamentally different problem that pushes the limits of what you can hand-manage effectively.
The rest of this post covers what helped me make that jump.
CLAUDE.md Files
Most people using Claude Code are already familiar with CLAUDE.md โ markdown files that Claude reads at the start of every session to pick up project context.
| File | Purpose |
|---|---|
~/.claude/CLAUDE.md | Global preferences โ your personal style, language, tools. Applies to all projects. |
Project's .claude/CLAUDE.md | Team conventions โ build commands, coding standards, architecture decisions. Shared via Git. |
What I found useful is how this compounds over time. You correct the agent once, it updates CLAUDE.md, and that knowledge persists across every future session. Check the file into Git, and your whole team picks it up too.
Just like humans accumulate institutional knowledge over months and years, CLAUDE.md files accumulate project knowledge โ except it's explicit, versioned, and available to every new team member (human or AI) from day one.
If you haven't set one up yet, /init generates a starting point based on your codebase.
Lesson #1: Treat AI as a Co-worker
This was probably the biggest shift for me personally.
When I first started using AI agents, I treated them like a code generator โ type a prompt, get some code, paste it in, manually fix whatever's wrong. That works fine at Stage 2. But at Stage 5, I found it didn't hold up.
What worked better was treating the agent more like a junior co-worker. Instead of making changes behind its back or dictating every line, I started guiding it toward solutions, clarifying misunderstandings about the project, and reviewing its approach โ not just its output.
One thing I learned the hard way: when you manually edit a file without going through the agent, it loses that context. And you miss a chance for a CLAUDE.md update that would have helped the whole team. Every clarification you give the agent is an opportunity for that knowledge to be captured permanently.
Lesson #2: You Own the Change
This is something I keep reminding myself of.
When you're leading changes with AI, you still own those changes. AI is a tool, not an excuse. If the output breaks production, that's on me โ not the model. Whether I wrote the code or the agent did, every PR that merges has my name on it.
What helped me: building trust gradually. I started with low-risk changes โ test suite refactors, documentation, small bug fixes. Once I was confident the workflow held up, I moved to more critical paths. Trying to jump straight to rewriting core systems on day one is a recipe for a bad time โ I've seen it, and I've been tempted myself.
Lesson #3: Give Your AI Good Tools
This one might be obvious in hindsight, but I underestimated it at first.
The agent's output improved noticeably once I gave it a proper feedback loop. It's the same way we work: write code, run tests, see failures, fix them, repeat. The agent benefits from the same cycle.
graph LR
R[Red: Test Fails] --> F[Fix: Agent Patches Code]
F --> G[Green: Tests Pass]
G --> C[Commit]
C -.->|Next change| R
style R fill:#161b22,stroke:#da3633,color:#c9d1d9
style F fill:#161b22,stroke:#58a6ff,color:#c9d1d9
style G fill:#161b22,stroke:#238636,color:#c9d1d9
style C fill:#161b22,stroke:#a371f7,color:#c9d1d9Here's a practical checklist. Anything you have access to, the agent should have too โ via CLI:
| Command | Purpose |
|---|---|
setup | Set up your project from scratch |
dev | Run your project locally |
test | Run your test suite |
logs | Check server / Sentry logs for issues |
pre-commit | Git hooks to block untested commits |
A simple example that surprised me: error messages. When the agent hits Error: Module not found, it might not know which package manager you're using, whether you have a custom install command, or if there's a setup script it should run first. So it starts guessing โ trying npm install, then yarn, then something else โ wasting tokens along the way. But if your tooling says Setup your env first: yarn install, the agent self-corrects immediately.
I've started thinking of error messages as instructions for a co-worker, not just debug info for myself. Writing clear, actionable error messages massively improves the feedback loop quality for agents.
Lesson #4: Set Up Good Permissions
One thing that made a big difference for longer sessions was setting up a proper settings.json file with detailed allow/deny permissions tailored to the project.
By default, the agent pauses to ask permission for most shell commands. That's fine when you're getting started, but it breaks your flow quickly โ especially if you're trying to let the agent run uninterrupted while you focus on something else.
The idea is simple: pre-approve commands you know are safe, and explicitly deny access to things that are sensitive. For example:
| Permission | Why |
|---|---|
Allow gh pr view, gh pr list | Safe read-only commands โ useful for the agent to search through past changes and context |
| Allow reading project files and directories | The agent needs to explore the codebase to do its job |
Allow npm test, npm run lint | Running tests and linters is the core feedback loop |
Deny reading .env, credentials, or secret files | The agent doesn't need access to secrets, and denying this instantly without a prompt is safer |
Deny rm -rf, git push --force | Destructive commands should require explicit human approval |
A well-tuned permission setup means the agent can run for longer stretches without interrupting you for approvals on routine operations, while still being blocked from anything risky. It's the difference between babysitting the agent and actually delegating to it.
Preparing for Multi-Agent: Worktrees / Dev Containers
Once you're comfortable at Stage 5, scaling to multiple agents is a process problem. The core principle is simple: give each agent an isolated environment where it can work without stepping on other agents' files.
How you do that depends on your setup. There are multiple approaches:
- Git worktrees โ lightweight and work well for many startups. Each agent gets its own worktree of the same repo.
- Local dev containers โ more isolation, useful if your project has complex dependencies or environment-specific tooling.
- Cloud dev environments โ some larger companies spin up remote containers per agent, especially when security or resource constraints make local execution impractical.
The analogy is the same regardless of approach: humans work on different laptops because they need isolated environments. Agents need the same thing.
graph TD
M[Main Repo] --> W1[Agent 1 Worktree]
M --> W2[Agent 2 Worktree]
M --> W3[Agent 3 Worktree]
W1 -.->|PR| M
W2 -.->|PR| M
W3 -.->|PR| M
style M fill:#161b22,stroke:#58a6ff,color:#c9d1d9
style W1 fill:#161b22,stroke:#238636,color:#c9d1d9
style W2 fill:#161b22,stroke:#238636,color:#c9d1d9
style W3 fill:#161b22,stroke:#238636,color:#c9d1d9If you're running a monorepo, this gets even better. Shared CLAUDE.md conventions apply across all services. A single worktree setup covers the entire codebase. Agents can see the full dependency graph and make cross-service changes without you having to coordinate.
graph LR
S5[Stage 5: CLI Ready] --> S6[Stage 6: Multi-Agent]
S6 --> S7[Stage 7: 10+ Agents]
S7 --> S8[Stage 8: Orchestrator]
style S5 fill:#161b22,stroke:#238636,color:#c9d1d9
style S6 fill:#161b22,stroke:#238636,color:#c9d1d9
style S7 fill:#161b22,stroke:#a371f7,color:#c9d1d9
style S8 fill:#161b22,stroke:#a371f7,color:#c9d1d9After Stage 5, each step is a process improvement โ not a paradigm shift.
Advanced: Building Custom Skills
Once your team is comfortable with CLI agents, something worth exploring is encoding your repeatable workflows as skills โ packaged instructions that any team member can invoke with a single command.
Every team has workflows that get repeated: handling on-call alerts, triaging Sentry errors, debugging user-specific data, running deployments. Each of these can become a skill.
| Command | What it does |
|---|---|
/on-call | Agent receives a PagerDuty alert โ checks runbooks โ identifies root cause โ suggests or applies a fix โ updates the incident channel |
/sentry-fix | Agent pulls an error from Sentry โ traces it to source โ writes a fix with a test case โ opens a PR |
/debug-user | Agent queries a read-only production DB โ investigates user-specific data issues โ generates a report |
/deploy | Run pre-deploy checks, bump version, create release notes, deploy to staging, verify health checks |
Building a skill is straightforward. Create a directory with a SKILL.md file:
my-skill/
SKILL.md โ instructions
The frontmatter tells Claude what the skill does:
---
name: on-call-handler
description: Handle on-call alerts using our runbooks
---
A good way to start: take a prompt you find yourself typing every week, and turn it into a skill. Test it on low-stakes work, refine based on output, and check it into your repo. The whole team benefits immediately.
Think of skills as your team's institutional knowledge โ codified, versioned, and executable by AI.
Who to Follow
This space changes weekly. Here's who I follow to stay current:
Boris Cherny โ Creator and Head of Claude Code at Anthropic. Shares practical, grounded workflows with zero hype. @bcherny on X / Threads
Anthropic Blog โ Official model releases, safety research, and product updates. Straight from the source.
Geoffrey Huntley โ Created the "Ralph Wiggum" autonomous loop technique. Pushes agents to their absolute limits with autonomous multi-agent loops and evolutionary software. A bit far-fetched at times, but worth watching for where things are headed. ghuntley.com
Steve Yegge โ Another power user pushing agentic workflows to the edge. His GAS TOWN post on the eight stages of developer evolution is referenced earlier in this post. Writes long, opinionated, and consistently insightful takes on where AI development is heading.
Wrapping Up
The jump from Stage 4 to Stage 5 was the hardest part for me. It's more of a mindset shift than a tooling problem. But once I got past it, the path to multi-agent workflows felt much more incremental.
Here's what I keep coming back to:
- The CLI opens doors โ it scales beyond the editor into CI/CD, testing, and automation
- Treat AI as a co-worker โ guide, don't micro-manage. Every correction is an investment
- Own your changes โ accountability doesn't transfer to the model
- Invest in tooling โ setup, test, logs, pre-commit. The agent is only as good as its feedback loops
- Prepare for scale โ git worktrees and monorepos make multi-agent workflows smoother
Your mileage will vary โ every team and codebase is different. But these principles have held up for me so far, and I hope some of them are useful to you too.