There’s a scoreboard on GitHub that doesn’t track people at all. It tracks bots. The project is called PR Arena, and for months it has been quietly tallying how well artificial intelligence coding agents perform when they try their hand at one of the most sacred rituals in programming: the pull request.
A pull request, or PR, is the formal way developers propose changes to a codebase. Normally it’s a careful process involving review, discussion, and testing. But PR Arena shows just how much of that work is now being attempted by machines.
The numbers are staggering. OpenAI’s Codex agent has produced more than 1.7 million ready-to-merge pull requests, with about 1.5 million accepted. GitHub’s own Copilot agent boasts a success rate above 92 percent. Even newcomers like Cursor and Devin are clocking tens of thousands of attempts. The site updates every few hours, turning code contribution into a kind of leaderboard for bots.
Rank | Agent | Success Rate | Draft PRs | Ready PRs | Merged PRs |
---|---|---|---|---|---|
#1 | OpenAI Codex | 87.6% | 14.5K | 1.74M | 1.53M |
#2 | GitHub Copilot Agent | 92.8% | 85.1K | 168K | 156K |
#3 | Cursor Agents | 93.9% | 64.9K | 100K | 93.6K |
#4 | Devin | 63.6% | 1.3K | 41.8K | 26.6K |
#5 | Codegen | 61.1% | 2.9K | 5.0K | 3.1K |
What makes this notable isn’t just the raw volume. It’s the fact that AI agents are no longer experimental toys. They are participating in the same workflows as human developers, generating code, submitting it for review, and getting it merged into real projects. In other words, AI is quietly becoming a teammate.
For developers, this shift is double-edged. On one hand, agents can churn out boilerplate code, fix bugs, or keep dependencies up to date, freeing humans to focus on bigger design questions. On the other hand, trust is still fragile. Ask around and you’ll hear stories of AI-authored PRs that look clean but conceal subtle errors, or require heavy rewriting before they’re safe to merge. One academic study found that more than 80 percent of AI pull requests were accepted, but only about half of those went in without further modification. That means humans remain firmly in the loop.
The popularity of these tools speaks for itself. Codex, Copilot, Cursor, Devin, Codegen — each has its own style, some preferring to draft in private before revealing a polished PR, others posting early and often to iterate in public. Developers can literally watch their bots battle it out on PR Arena’s charts. It feels like a strange new kind of competition, one where productivity is measured not in late-night hours but in the steady hum of servers cranking out commits.
What this says about the future of programming is still unsettled. Optimists see the rise of agents as the next logical step in automation, a way to handle the tedious plumbing of software so people can innovate faster. Skeptics worry about bloat, hidden bugs, or over-reliance on code no one fully understands. The truth is probably somewhere in between.
But one thing is clear: the era when AI was just an autocomplete assistant is already behind us. Agents are shipping code at industrial scale, and projects like PR Arena give us a window into how much of modern software is already touched by machine hands.