GitHub Copilot: Productivity boost or DORA metrics disaster?

Imagine a world where measuring developer productivity is as straightforward as checking your fitness stats on a smartwatch. With AI programming assistants like GitHub Copilot, this seems within reach. GitHub Copilot claims to turbocharge developer productivity with context-aware code completions and snippet generation. By leveraging AI to suggest entire lines or modules of code, GitHub Copilot aims to reduce manual coding efforts, equivalent to having a supercharged assistant that helps you code faster and focus on complex problem-solving.

Organizations have used DevOps Research and Assessment (DORA) metrics as a structured approach to evaluating their software development and devops team performance. This data-driven approach enables teams to deliver software faster with greater reliability and improved system stability. By focusing on deployment frequency, lead time for changes, change failure rate, and mean time to restore (MTTR), teams gain invaluable insights into their workflows.

AI impact on DORA metrics

Here’s the kicker—DORA metrics are not all sunshine and rainbows. Misusing them can lead to a narrow focus on quantity over quality. Developers might game the system just to improve their metrics, like students cramming for exams without truly understanding the material. This can create disparities, as developers working on modern microservices-based applications will naturally shine in DORA metrics compared to those handling older, monolithic systems.

The advent of AI-generated code exacerbates this issue significantly. While tools like GitHub Copilot can boost productivity metrics, the results might not necessarily reflect better deployment practices or system stability. The auto-generated code could inflate productivity stats without genuinely improving development processes.

Despite their potential, AI coding assistants introduce new challenges. Besides concerns about developer skill atrophy and ethical issues surrounding the use of public code, experts predict a massive increase in QA and security issues in software production, directly impacting your DORA metrics.

Trained on vast amounts of public code, AI coding assistants might inadvertently suggest snippets with bugs or vulnerabilities. Imagine the AI generating code that doesn’t properly sanitize user inputs, opening the door to SQL injection attacks. Additionally, the AI’s lack of project-specific context can lead to misaligned code with the unique business logic or architectural standards of a project, causing functionality issues discovered late in the development cycle or even in production.

There’s also the risk of developers becoming overly reliant on AI-generated code, leading to a lax attitude toward code review and testing. Subtle bugs and inefficiencies could slip through, increasing the likelihood of defects in production.

These issues can directly impact your DORA metrics. More defects due to AI-generated code can raise the change failure rate, negatively affecting deployment pipeline stability. Bugs reaching production can increase the mean time to restore (MTTR), as developers spend more time fixing issues caused by the AI. Additionally, the need for extra reviews and tests to catch errors introduced by AI assistants can slow down the development process, increasing the lead time for changes.

Guidelines for development teams

To mitigate these impacts, development teams must maintain rigorous code review practices and establish comprehensive testing strategies. These vast volumes of ever-growing AI-generated code should be tested as thoroughly as manually written code. Organizations must invest in end-to-end test automation and test management solutions to provide tracking and end-to-end visibility into code quality earlier in the cycle and systematically automate testing throughout. Development teams must manage the increased load of AI-generated code by becoming smarter about how they conduct code reviews, apply security tests, and automate their testing. This would ensure the continued delivery of high-quality software with the right level of trust.

Here are some guidelines for software development teams to consider:

Code reviews — Incorporate testing best practices during code reviews to maintain code quality even with AI-generated code. AI assistants like GitHub Copilot can actually contribute to this process by suggesting improvements to test coverage, identifying areas where additional testing may be required, and highlighting potential edge cases that need to be addressed. This helps teams uphold high standards of code quality and reliability.

Security reviews — Treat every input in your code as a potential threat. To bolster your application against common threats like SQL injections or cross-site scripting (XSS) attacks that can creep in through AI-generated code, it’s essential to validate and sanitize all inputs rigorously. Create robust governance policies to protect sensitive data, such as personal information and credit card numbers, demanding additional layers of security.

Automated testing — Automate the creation of test cases, enabling teams to quickly generate steps for unit, functional, and integration tests. This will help manage the massive surge of AI-generated code in applications. Expand beyond just helping developers and traditional QA people by bringing in non-technical users to create and maintain those tests for automated end-to-end testing.

API testing — Using open specifications, create an AI-augmented testing approach for your APIs, including the creation and maintenance of API tests and contracts. Seamlessly integrate these API tests with developer tools to accelerate development, reduce costs, and maintain current tests with ongoing code changes.

Better test management — AI can help with intelligent decision-making, risk analysis, and optimizing the testing process. AI can analyze vast amounts of data to provide insights on test coverage, effectiveness, and areas that need attention.

While GitHub Copilot and other AI coding assistants promise a productivity boost, they raise serious concerns that could render DORA metrics unmanageable. Developer productivity might be superficially enhanced, but at what cost? The hidden effort in scrutinizing and correcting AI-generated code could overshadow any initial gains, leading to a potential disaster if not carefully managed. Armed with an approach that’s ready for AI-generated code, organizations must re-evaluate their DORA metrics to align better with AI-generated productivity. By setting the right expectations, teams can achieve new heights of productivity and efficiency.

Madhup Mishra is senior vice president of product marketing at SmartBear. With over two decades of technology experience at companies like Hitachi Vantara, Volt Active Data, HPE SimpliVity, Dell, and Dell-EMC, Madhup has held a variety of roles in product management, sales engineering, and product marketing. He has a passion for how artificial intelligence is changing the world.

—

Generative AI Insights provides a venue for technology leaders—including vendors and other outside contributors—to explore and discuss the challenges and opportunities of generative artificial intelligence. The selection is wide-ranging, from technology deep dives to case studies to expert opinion, but also subjective, based on our judgment of which topics and treatments will best serve InfoWorld’s technically sophisticated audience. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Contact doug_dineley@foundryco.com.