Leland Hyman, Lead Data Scientist at Sherlock Biosciences - Interview Series

Leland Hyman is the Lead Data Scientist at Sherlock Biosciences. He is an experienced computer scientist and researcher with a background in machine learning and molecular diagnostics.

Sherlock Biosciences is a biotechnology company based in Cambridge, Massachusetts developing diagnostic tests using CRISPR. They aim to disrupt molecular diagnostics with better, faster, affordable tests.

What initially attracted you to computer science?

I started programming at a very young age, but I was mainly interested in making video games with my friends. My interest grew in other computer science applications during college and graduate school, particularly with all of the groundbreaking machine learning work happening in the early 2010s. The whole field seemed like such an exciting new frontier that could directly impact scientific research and our daily lives — I couldn’t help but be hooked by it.

You also pursued a Ph.D. in Cellular and Molecular Biology, when did you first realize that the two fields would intersect?

I started doing this type of intersectional work with computer science and biology early on in graduate school. My lab focused on solving protein engineering problems through collaborations between hardcore biochemists, computer scientists, and everyone in between. I quickly recognized that machine learning could provide valuable insights into biological systems and make experimentation much easier. Conversely, I also gained an appreciation for the value of biological intuition when constructing machine learning models. In my view, framing the problem accurately is the crucial element in machine learning. This is why I believe collaborative efforts across different fields can have a profound impact.

Since 2022 you’ve been working at Sherlock Biosciences, could you share some details on what your role entails?

I currently lead the computational team at Sherlock Biosciences. Our group is responsible for designing the components that go into our diagnostic assays, interfacing with the experimentalists who test these designs in the wet lab, and building new computational capabilities to improve designs. Beyond coordinating these activities, I work on the machine learning portions of our codebase, experimenting with new model architectures and new ways to simulate the DNA and RNA physics involved in our assays.

Machine learning is at the core of Sherlock Biosciences, could you describe the type of data and the volume of data that is being collected, and how ML then parses that data?

During assay development, we test dozens to hundreds of candidate assays for each new pathogen. While the vast majority of those candidates won’t make it into a commercial test, we see them as an opportunity to learn from our mistakes. In these experiments, we’re measuring two key things: sensitivity and speed. Our models take the DNA and RNA sequences in each assay as input and then learn to predict the assay’s sensitivity and speed.

How does ML predict which molecular diagnostic components will perform with the greatest speed and accuracy?

When we think about how a human learns, there are two major strategies. On one hand, a person could learn how to do a task through pure trial-and-error. They could repeat the task, and after many failures, they’d eventually figure out the rules of the task on their own. This strategy was pretty popular before the internet. However, we could provide this person with a teacher to tell them the rules of the task right away. The student with the teacher could learn much faster than with the trial-and-error approach, but only if they have a good teacher who fully understands the task.

Our approach to training machine learning models is partway between these two strategies. While we don’t have a perfect “teacher” for our machine learning models, we can start them off with some knowledge about the physics of DNA and RNA strands in our assays. This helps them learn to make better predictions with less data. To do this, we run several biophysical simulations on our assay’s DNA and RNA sequences. We then feed the results into the model and ask it to predict the speed and sensitivity of the assay. We repeat this process for all of the experiments we’ve performed in the lab, and the model shows the difference between its predictions and what really happened. Through enough repetition, it eventually learns how the DNA and RNA physics relate to the speed and sensitivity of each assay.

What are some other ways that AI algorithms are used by Sherlock Biosciences?

We have used machine learning algorithms to solve a wide variety of problems. A few examples that come to mind are related to market research and image analysis. For market research, we were able to train models which learn about different types of customers, and how many people might have an unmet need for disease testing. We have also built models to analyze pictures of lateral flow strips (the type of test commonly used in over-the-counter COVID tests), and automatically predict whether a positive band is present. While this seems like a trivial task for a human, I can say first-hand that it’s an incredibly convenient alternative to manually annotating thousands of pictures.

What are some of the challenges behind building ML models that work hand in hand with cutting edge bioscience technology such as CRISPR?

Data availability is the main challenge with applying machine learning models to any bioscience technology. CRISPR and DNA or RNA-based technologies face a distinctive challenge, mainly due to the significantly smaller structural datasets available for nucleic acids compared to proteins. This is why we’ve seen huge protein ML advances in recent years (with AlphaFold2 and others), but DNA and RNA ML advances are still lagging behind.

What is your vision for the future of how AI will integrate with CRISPR, and bioscience?

We are seeing a massive AI boom in the protein engineering and drug discovery fields right now, and I expect this will continue to accelerate development in the pharmaceutical industry. I would love to see the same happen with CRISPR and other DNA and RNA–based technologies in the coming years. This could be incredibly impactful in diagnostics, human medicine, and synthetic biology. We have already seen the benefits of computational tools in our development of diagnostics and CRISPR technologies here at Sherlock, and I hope that this type of work will encourage a “snowball” effect to push the field forward.

Thank you for the great interview, readers who wish to learn more should visit Sherlock Biosciences.