Steve Smith argues that in the age of generative AI, organizations are falling into the trap of measuring superficial developer activity metrics—like lines of code or AI prompt counts—instead of focusing on meaningful, outcome-based measures. He urges companies to prioritize team-level outcomes and business value, using validated frameworks like Accelerate, rather than being misled by vendor hype and easily gamed productivity metrics.
The video, hosted by Steve Smith, explores the current obsession with measuring developer productivity in the age of generative AI (GenAI). Smith argues that organizations are falling into old traps by focusing on easily measurable but low-value metrics, such as lines of code or AI prompt counts, rather than meaningful outcomes. He draws parallels between the hype around GenAI and previous technology bubbles, warning that the rush to quantify productivity often leads to misguided management practices reminiscent of Taylorism. Smith emphasizes that while GenAI is both a genuine innovation and a source of hype, the industry’s renewed focus on activity metrics is a step backward.
Smith provides a brief history of software productivity measurement, referencing Martin Fowler’s skepticism about measuring productivity and the significant shift brought by Dr. Nicole Forsgren’s research. Forsgren’s work, particularly the Accelerate book and the State of DevOps reports, introduced outcome-based metrics such as deployment frequency, lead time, change failure rate, and time to restore. These metrics, validated through rigorous statistical analysis, focus on team-level outcomes and business success, rather than individual activity or output. Smith credits these approaches with providing the first scientifically robust framework for measuring software delivery performance.
Turning to the impact of GenAI, Smith notes that while AI tools can make coding faster and more automatable, organizations are reverting to measuring superficial activity and output metrics because they are easier to implement. He invokes Goodhart’s Law: “When a measure becomes a target, it ceases to be a good measure,” warning that developers will game whatever metric they are measured against. For example, tracking the number of AI prompts or the percentage of AI-generated code can lead to distorted behaviors that do not actually improve business outcomes.
Smith critiques a recent industry report from DX, a developer intelligence platform, which claims high AI adoption rates and productivity gains based on activity metrics like AI usage, self-reported time savings, and pull request counts. He points out methodological flaws, such as lack of clarity on sampling and reliance on self-reported data, and argues that these metrics are not predictive of real business value. Smith stresses that output metrics like pull request throughput are noisy and can be misleading, as they do not necessarily correlate with faster or more reliable delivery to end users.
In conclusion, Smith advises organizations to resist the temptation to over-measure individual developer activity and instead focus on team-level outcomes using validated frameworks like Accelerate. He recommends measuring deployment throughput, service reliability, and technical quality, then connecting these to broader business outcomes such as profitability or customer lifetime value. While it is reasonable to track some AI usage metrics, Smith cautions that these should be clearly understood as low-value activity measures and not confused with true indicators of success. Ultimately, he urges viewers to think critically, avoid vendor hype, and prioritize outcome-based measurement for genuine improvement.