Software developers appear to evaluate the performance of artificial intelligence with a surprising degree of optimism, often overestimating its benefits in their own work. New research reveals a significant disconnect between developers’ perception of AI-driven productivity gains and their actual performance, suggesting a level of trust that stands in contrast to the critical evaluation typically applied to human-generated code. This gap highlights a growing paradox where the enthusiasm for AI capabilities may overshadow objective outcomes.
This phenomenon was starkly illustrated in a recent study where seasoned developers, despite believing AI tools made them substantially faster, were in fact slower when using them on familiar, complex projects. While developers widely report using AI tools, a deeper look reveals significant hesitation in deploying them for high-stakes tasks, alongside a growing frustration with debugging AI-generated solutions that are nearly, but not entirely, correct. This suggests that while developers are embracing AI, their skepticism is not always applied effectively, leading to critical misjudgments of its practical impact on their daily workflow.
Perception Collides with Performance Metrics
A recent randomized controlled trial conducted by the research nonprofit METR provided startling evidence of the disparity between belief and reality in AI-assisted coding. The study focused on experienced open-source developers working within their own large, established repositories. Before starting their tasks, the developers estimated that using AI coding assistants would speed up their work by an average of 24%. After completing the work, their conviction held firm, with participants reporting a perceived productivity gain of 20%.
However, the recorded data told a completely different story. The study’s objective measurements found that using AI tools actually increased the time it took to complete tasks by 19%. This counterintuitive result, a near-20% slowdown, directly contradicted both the developers’ expectations and their post-task assessments. The lead authors of the study expressed their own surprise at the findings, having initially anticipated a significant acceleration, perhaps even a doubling of productivity. The results challenge the prevailing narrative that AI integration seamlessly boosts efficiency for all software engineering tasks.
The Crucial Role of Experience and Familiarity
The METR study’s findings are particularly relevant because they focused on a specific, high-value scenario: experts working in codebases they know intimately. Previous studies that found significant AI-driven speed increases, some as high as 56%, often involved less experienced developers or tasks in unfamiliar environments where the AI could provide more foundational support. In contrast, seasoned developers working on million-line repositories possess a deep, nuanced understanding of the existing code’s architecture, complexities, and idiosyncrasies—a level of insight that current AI models cannot replicate.
For these experts, the process of writing prompts, managing context for the AI, and carefully reviewing generated code introduced significant overhead. Correcting AI outputs that were plausible but flawed often proved more time-consuming than writing the code from scratch. One chief technology officer with over two decades of experience noted that this highlights a core challenge: experienced developers often know their complex codebases better than any AI can. The study suggests that for experts engaged in intricate work, AI’s contributions can disrupt an already efficient workflow rather than enhance it.
A Broader Landscape of Developer Skepticism
While some developers may overestimate AI’s benefits in specific coding tasks, the broader community remains cautious about its role in critical functions. Data from a 2025 Stack Overflow survey of nearly 50,000 developers reveals widespread reluctance to entrust AI with high-responsibility work. For instance, 76% of developers do not plan to use AI for deployment and monitoring, and 69% resist its use for project planning. This indicates a clear-eyed skepticism about the technology’s reliability and strategic capabilities.
This caution is rooted in practical experience. A primary frustration cited by developers is the challenge of dealing with “AI solutions that are almost right, but not quite,” which can turn a potential time-saver into a time-consuming debugging session. This sentiment is reflected in declining trust metrics; one report found that developer trust in AI-generated code fell from 40% to 29% in a single year. When the stakes are high, the preference is clear: 75% of developers report they would rather turn to a human colleague than an AI for assistance.
Explaining the Trust-Performance Gap
The Challenge of Self-Assessment
One of the core issues highlighted by the METR study is the fundamental difficulty humans have in accurately estimating their own productivity. The developers’ persistent belief that they were working faster with AI, even when the data showed the opposite, points to a powerful cognitive bias. The interaction with a novel and powerful tool may create a feeling of efficiency that is not borne out in reality. The act of generating large blocks of code quickly can feel like progress, even if that code requires extensive review and revision.
The “Black Box” Problem
A significant barrier to effective skepticism and trust is the opaque nature of many AI systems. Teams often struggle to understand the reasoning behind an AI’s output, whether it’s a generated test case or a self-healing mechanism in code. This “black box” nature can create hesitation and slow adoption, as developers are accustomed to having full control and understanding of their tools and environments. Without transparency, it is difficult to build the kind of reliable, earned trust that developers have in their experienced human colleagues.
Implications for the Future of Software Development
The METR researchers caution against interpreting their findings as a blanket declaration that “AI slows down developers.” Instead, the results provide a crucial snapshot of early-2025 AI capabilities within a specific but important context. The study underscores the need for a more nuanced conversation about where and how AI tools should be applied. The goal is to better inform developers on how to use AI appropriately, leveraging its strengths while being aware of its current limitations, particularly in complex, mature software projects.
Organizational culture will play a pivotal role in navigating this transition. Leaders who view AI as a purely technical solution without addressing the human element are likely to see disappointing results. Data shows that three out of four workers will abandon an AI tool mid-task over accuracy concerns, and nearly half question the quality of AI-assisted work from their colleagues. Building confidence requires a human-centered approach, starting with pilot projects in noncritical areas and maintaining manual oversight to allow teams to adapt. Ultimately, aligning AI technology with human readiness and fostering a culture of constructive, well-placed skepticism will be the key to unlocking its true potential.