The standup looked great. Lukas, four months into the job, had submitted more pull requests than anyone else on the team combined. Forty-seven in fourteen days. His manager pulled up the sprint dashboard and the bars were taller than anything they'd seen since launching the product.
Then someone asked how many were actually in production.
Lukas's merge rate was 31%. Out of 47 pull requests, 11 made it through CI, passed review, and got merged. The rest were scattered across the PR queue in various states of red. Fourteen had failed lint checks. Nine had failing tests on dependencies Lukas's agent hadn't accounted for. Thirteen had been pushed, ignored, and eventually closed by a stale-bot someone had set up months earlier. The remaining were still open, untouched, slowly becoming invisible.
Lukas was vibecoding. Cursor open all day, agent generating code, PRs pushed the moment the output looked reasonable. He wasn't doing anything wrong, exactly. He was doing what the tool is designed to help you do. Write code faster. Ship more often. Move to the next thing.
The problem is what "ship" means when you measure it at the PR level instead of the merge level.
The gap nobody tracks
I co-authored a study covering 24,560 pull requests across 447 GitHub repositories. One of the things we measured was what happens after an AI-generated PR fails CI. Does somebody come back and fix it?
For human PRs, 23.4% of failures eventually get repaired and pass on a later run. Someone looks at the error, adjusts the code, pushes again. For AI-generated PRs, that number drops to 9.3%.
Ninety-one percent of failed AI pull requests are abandoned. The agent generates, submits, and moves on. Nobody returns.
This creates a specific kind of accounting problem. If your team tracks productivity by PR count, or by lines of code submitted, or by tickets touched, vibecoding will make your numbers look incredible. The dashboard fills up. The activity charts spike. In a standup where people report what they worked on, the vibecoder always has the longest list.
But the actual output, code that compiled, passed CI, got reviewed, and reached production, is often a fraction of what the numbers suggest.
Where the waste hides
Lukas's 36 unmerged PRs weren't free. Each one triggered a full CI pipeline run. Fifteen minutes per run on their setup. That's 9 hours of compute time on code that would never ship. At GitHub Actions pricing, roughly $4.30 in raw minutes. Not much for one sprint. But Lukas's pace was consistent, and he wasn't the only person on the team experimenting with AI tooling.
Across the full team, they were burning about 120 hours of runner time per month on builds that produced nothing. Around $58 worth of GitHub Actions minutes going to code that nobody would look at again.
The compute cost was real but honestly manageable. What wasn't manageable was the confusion.
New engineers joining the project would open the PR list and find dozens of open branches, most failing, some weeks old. They'd spend an afternoon reading through them trying to understand what was in progress and what was dead. The PR queue had become noise. Actual work in progress was buried under speculative attempts that happened to have the same visual weight in the GitHub UI.
Velocity is a downstream metric
Lukas's manager eventually sat down with him and walked through the numbers. Not to punish him. To recalibrate.
Forty-seven PRs submitted sounds like 4x productivity. Eleven merged is still above the team average, which was around eight per two-week sprint. Lukas was genuinely faster with Cursor. The tool was working. But the multiplier he thought he was getting, the one he felt while using it, was off by a factor of four.
The problem is that velocity feels immediate. You prompt the agent, code appears, you push. The feedback loop is 90 seconds. It feels like shipping. The failure signal comes later, asynchronously, in a CI notification that lands in a Slack channel full of other failures, most of which are also from bot PRs that nobody is going to fix.
After a few weeks in that environment, you stop checking. The channel becomes noise. And once that happens, you also stop noticing when your own PRs fail.
What his team changed
They didn't restrict Cursor. That would have been pointless. What they changed was how they counted.
The sprint dashboard got a new column: merge rate. Not total PRs opened. Not lines submitted. Merged pull requests as a percentage of opened pull requests. Lukas's number was 23%. The senior engineer who wrote everything by hand and submitted four PRs per sprint was at 100%.
Neither number alone told the whole story. But together they showed something useful: Lukas was generating more attempts, and the senior was generating fewer but more reliable ones. The ideal was somewhere in between. Use the tool, but review before pushing. Run the linter locally. Check if someone else already has a branch touching the same files.
They also added a pre-check gate (lint and type check only, 30 seconds) that blocked the full pipeline on obvious failures. This cut their wasted CI minutes by about 60%. And they set up auto-close for any AI-labeled PR with failing checks and no activity for 48 hours. Cleaned the queue overnight.
Lukas's merge rate went from 23% to 61% within a month. His PR volume dropped from 47 per sprint to around 20. His actual merged output went from 11 to 12. Almost identical. But the waste around it shrank dramatically, and the team stopped drowning in dead PRs.
The number your standup should include
If your team is adopting AI coding tools, the most important metric is not how many PRs get opened. It is how many get merged. And more specifically, it is the ratio between the two.
A vibecoder with a 25% merge rate is generating three units of waste for every unit of shipped code. A vibecoder with an 80% merge rate, after lint gates and config files and local checks, is genuinely outproducing their pre-AI self.
The difference between those two engineers is not talent. It is four config files and an afternoon of CI setup.
Based on an empirical study of 24,560 PRs across 447 open-source repositories. Names and details have been changed. I help startups and teams set up CI/CD pipelines, DevOps infrastructure, and development workflows that work with AI tools instead of against them.
Is your CI/CD ready for AI agents?
Most pipelines weren't built for the way AI coding tools work. A quick audit can save you hours of wasted compute every week.