I Had 40 Engineers and AI Tools. Here's Why Adoption Still Stalled.
An engineering director's honest story of failed AI adoption: stalled workflows, review bottlenecks, and how to build an AI workflow your dev team will actually use.
On day one, I set up AI tooling: CodeRabbit for automated code review, Claude Code for development. I was an engineering director scaling a team from 20 to 40 software engineers across five teams, and I genuinely believed rolling out AI tools to my engineering team would change how we shipped code.
Six months later, I had pockets of brilliance sitting right next to PRs rotting in queues. Senior devs were calling each other’s work “AI slop” in Slack threads. And my VP was asking me to show proof that our investment was paying off.
I had proof for one team. Anecdotes for two others. And nothing for the rest.
This is the story of how AI adoption failed on my engineering team, and what I’d do differently now.
I Treated AI Like a Perk
Here’s the thing. I bought seats and hoped culture would do the rest. What I should have done was treat AI adoption like any other major process change - one that needed explicit expectations, training, and a shared playbook.
I had a budget for seats but not for slowing down long enough to decide how we’d actually use them.
Three Ways Our AI Rollout Broke
1. Nobody Agreed on How to Use the Tools
Some of my senior engineers quietly avoided the AI tooling altogether. It “got in the way of their flow.” They’d been writing code for fifteen years and didn’t want a copilot.
Other seniors went deep. They built workflows for debugging production issues from log files. They used Claude to write postmortem reports after incidents. They spent time on YouTube and Reddit learning from other agentic engineers - picking up techniques, testing approaches, sharing what worked in side channels.
Both groups were acting rationally. The problem was mine: I never stopped to answer the basic questions. Where in our development lifecycle should AI be mandatory? Where should it be optional? Where should it be off-limits?
I never carved out that time because I was buried in hiring loops and sprint planning across five teams.
The result: five different AI cultures inside one org. No shared playbook, no reusable prompts, no consistent expectations.
2. The Review Bottleneck
Here’s where things got ugly.
We had AI-assisted code review and generation tooling that could produce PRs - refactors, bug fixes, test additions. The output was often decent. The problem: nobody owned them. There was no process for who shepherds an AI-generated PR to merge. So they just piled up in the queue and rotted.
Meanwhile, the “AI slop” fights started. A few senior devs started pushing back hard against other seniors. The tells were obvious to anyone looking: dead code left in, tests that didn’t make sense, patterns that screamed “a model wrote this and nobody reviewed it.” PRs got longer, not clearer. Reviewers started asking, “Did you write this, or did the model?”
They weren’t wrong to push back. But the conversation turned into finger-pointing instead of process-building.
The fix was clear - a rotation system where dedicated engineers handled support tickets, code reviews, and AI-generated PRs each week. But the org never got there before I moved on.
The bottleneck didn’t disappear when we added AI; it moved. Code got written faster, but reviewing that code got harder. More diff lines, more generated tests, same number of reviewers. We had a budget for seats but not for the review workflow those seats would require.
3. No Definition of Success
My VP wasn’t pushing for ROI dashboards. He was pushing to “find things that worked” - proof that our AI rollout to developers was actually moving the needle. That’s a reasonable ask, and I didn’t have a systematic answer for him.
Here’s what I didn’t do before rolling out the tools:
- No baseline metrics. I didn’t capture lead time, PR size, or review time before the rollout.
- No target behaviors. I never defined what “good AI usage” looked like for our teams.
- No guardrails. No agreement on when to say “no” to AI output.
Without those basics, I had no way of measuring AI adoption success. All I had were stories. Not evidence.
Six months in, I could tell a good story in a leadership meeting. But I couldn’t point to a dashboard and say “here’s what changed.” That gap was on me.
The One Thing That Worked
Despite all of that, one use case actually worked.
We had one massive real-time application: sprawling microservice architecture, test coverage under 15%, sparse documentation, and key institutional knowledge lost to team turnover. We were flying partially blind on the most important app in the business.
I pointed AI at that. Not at feature work. Not at the shiny stuff. At the ugliest, most thankless problem we had: getting documentation written and test coverage up on an app that was too big for the team to fully understand without help.
It worked. Documentation got to a place where engineers could actually onboard to the codebase. Coverage went from under 15% to about 20%. CI was green and stayed green. On an application that large, with a team that small, every percentage point of coverage was a fight. We had SDETs on each team building out automated end-to-end testing alongside the AI-assisted unit test work.
That team loved the AI tools. The codebase was too expansive for humans alone; they needed the help and they knew it.
The lesson is right there in the contrast: AI adoption in engineering teams works when you point it at a real, painful problem the team already feels. Hand out seats and say “go be productive” and you get five teams doing five different things. Point the tools at a specific pain point everyone agrees on and you get buy-in, momentum, and results.
What I Didn’t Get To Build
I was scaling a team from 20 to 40 in six months. Hiring engineers, interviewing, onboarding. I’d just started hiring engineering managers - one had been on for a month, another for three days.
Something had to give. And what gave was the thing that would have made the biggest difference: building a shared workflow.
I never got to create the shared repository of skills, commands, and plugins that would’ve given every developer the same starting point. I never got to present findings across all five teams and say “here’s what’s working, here’s what’s not, here’s how we’re going to standardize.” I never got to answer the questions I should have answered before day one.
Those are the exact things that would’ve fixed the failure modes I described above. Shared workflow fixes the “five different AI cultures” problem. A shared repo and playbook fixes the review bottleneck. Defining target behaviors and tracking metrics fixes the “no definition of success” problem.
None of it got built.
What I Built After
This might sound like a pitch. It’s just what happened next.
After I left, the frustration stuck with me. I kept thinking about what I would have built with three more months:
- A standardized workflow so we weren’t improvising on every team.
- Clear separation between “who writes the code” and “who reviews it.”
- A way to classify which work should get heavy AI involvement and which shouldn’t.
So I built it. I call it the A(i)-Team, because I love it when a plan comes together.
It’s a multi-agent pipeline - an engineering team AI workflow where different agents handle different jobs. One classifies work items: features get behavioral tests, tasks get smoke tests, bugs get regression tests. One writes code. One reviews everything with fresh context. The agent that writes the code never gets the final say.
Each design choice maps directly back to something that broke during that six-month rollout:
- Classification and constraints address “nobody agreed how to use it.” Now there are explicit rules for what gets AI involvement and how much.
- Separate agents for writing vs. reviewing address the review bottleneck and the “AI slop” problem. Code gets reviewed by something that didn’t write it.
- Work item classification with defined test levels address “no definition of success.” Every type of work has clear expectations for what “done” looks like.
If you want to see how the pieces fit together under the hood:
- I Told AI to Write Tests First. It Wrote 3,400 of Them. - what happens when your AI instructions are too vague, and how classification fixed it
- MCPs Are Dead - why I ripped out the tool layer everyone was recommending
- Ralph Wiggum and the Art of AI Subagents - how the agent orchestration actually works
If This Sounds Familiar
If you’re an engineering leader dealing with stalled AI adoption - a Copilot line item, a team with mixed usage, and no clear story about whether it’s working - that was me six months ago. The tools are fine. They were never the problem.
The missing piece is the workflow around them: who uses what, when, how, and who reviews it. It’s the boring part. It’s also the part that makes everything else work.
I’m building that process now and working with teams who want to skip the six months of figuring it out the hard way. If you’re staring at half-used seats and no shared playbook, let’s talk.
// RELATED POSTS
I Told AI to Write Tests First. It Wrote 3,400 of Them.
TDD with AI agents sounded like the right call. Then I got 3,400 tests, half of them testing JSON files and config. Here's what I learned fixing it.
Read the full breakdown →I Couldn't Find Anything in My Claude Chats, So I Accidentally Built a Life OS
How I turned 30+ scattered Claude chats into a 500-file markdown system that runs my entire life - in one month.
Read the full breakdown →