Data scientists and developers need a better working relationship for AI

Good teamwork is key to any successful AI project but combining data scientists and software engineers into an effective force is no easy task.

According to Gartner, 30 percent of AI projects will be abandoned by the end of 2025 thanks to factors such as poor data quality, escalating costs and a lack of business value. Data scientists are pessimistic, too, expecting just 22 percent of their projects to make it through to deployment.

Much of the debate on turning these poor figures around by delivering better AI has focused on technology but little attention has been paid to improving the relationship between those scientists and engineers responsible for producing AI in the first place.

This is surprising because although both are crucial to AI, their working practices don’t exactly align — in fact they can be downright incompatible. Failing to resolve these differences can scupper project delivery, jeopardize data security and threaten to break machine learning models in production.

Data scientists and software engineers need a better working relationship – but what does that look like and how do we achieve it?

DevOps forgot the data science people

As cloud has burgeoned, much of the industry’s attention has been devoted to bringing together developers and operations to make software delivery and lifecycle management more predictable and improve build quality.

Data scientists, during this time, have flown under the radar. Drafted into enterprise IT to work on AI projects, they are joining an environment that’s not quite ready for them.

What do I mean? Data scientists have a broad remit, taking a research-driven approach to solving business- and domain-level challenges through data manipulation and analysis. They operate outside the software delivery lifecycle using special tools and test platforms to build models using a subset of languages employed by developers.

Software engineering, while a creative and problem-solving discipline, takes a different approach. Engineers are delivery-focused and tackle jobs in priority order with results delivered in sprints to hit specific goals. Tool chains built on shared workflows are integrated and automated for team-based collaboration and communication.

These differences have bred friction in four notable areas:

Process. Data scientists’ longer cycles don’t fit neatly into the process- and priority-driven flow of Agile. Accomplish five tasks in two days or deliver a new release every few hours? Such targets run counter to the nature of data science and failure to accommodate this will soon see the data science and software engineering wheels on an AI running out of sync.
Deployment. Automated delivery is a key tenet of Agile that’s eliminated the problems of manual delivery in large and complex cloud-based environments and helps ensure uptime. But a deployment target of, say, 15-30 minutes cannot work for today’s large and data-heavy LLMs. Deployment of one to two hours is more like it — but this is an unacceptable length of time for a service to go offline. Push that and you will break the model.
Lifecycle. Data scientists using their own tools and build processes breed machine learning model code that lives outside the shared repo where it would be inspected and understood by the engineering team. It can fly under the radar of Quality Assurance. This is a fast-track to black-box AI, where engineers cannot explain the code to identify and fix problems, nor undertake meaningful updates and lifecycle management downstream.
Data Security. There’s a strong chance data scientists in any team will train their models on data that’s commercially sensitive or that identifies individuals, such as customers or patients. If that’s not treated before it hits the DevOps pipeline or production environment, there’s a real chance that information will leak.

No right or wrong answer

We need to find a collaborative path — and we can achieve that by fostering a good working environment that bridges the two disciplines to deliver products. That means data scientists internalizing the pace of software engineering and the latter adopting flexible ways to accommodate the scientists.

Here’s my top three recommendations for putting this into practice:

Establish shared goals. This will help the teams to sync. For example, is the project goal to deliver a finished product such as a chatbot? Or is the goal a feature update, where all users receive the update at the same time? With shared goals in place it’s possible to set and align project and team priorities. For data scientists that will mean finding ways to accelerate aspects of their work to hit engineering sprints, for example by adopting best practices in coding. This is a soft way for data scientists to adopt a more product-oriented mindset to delivery but it also means software engineers can begin to factor research backlogs into the delivery timelines.
Create a shared workflow to deliver transparent code and robust AI. Join the different pieces of the AI project team puzzle: make sure the data scientists working on the model are connected to both the back-end production system and front-end while software engineers focus on making sure everything works. That means working through shared tools according to established best practices, following procedures such as common source control, versioning and QA.
Appoint a project leader who can step in when needed on product engineering and delivery management. This person should have experience in building a product and understand the basics of the product life cycle so they can identify problems and offer answers for the team. They should have the skills and experience to make tactical decisions such as squaring the circle of software sprints. Ultimately they should be a project polyglot — capable of understanding both scientists and engineers, acting as translator and leading both.

Data scientists and software developers operate differently but they share a common interest in project success — exploiting that is the trick. If data scientists can align with Agile-driven delivery in software engineering and software engineers can accommodate the pace of their data-diving colleagues it will be a win for all concerned. A refined system of collaboration between teams will improve the quality of code, mean faster releases and — ultimately — deliver AI systems that make it through deployment and start delivering on the needs of business.

You may also like…

Generative AI development requires a different approach to testing

The secret to better products? Let engineers drive vision