Researchers at the University of Michigan and Carnegie Mellon University have launched a new computational tool designed to make sustainable chemical synthesis more accessible to a broader range of scientists. The platform, known as CATNIP (Catalytic Nitrogen Incorporation Prediction), addresses a significant bottleneck in the adoption of biocatalysis, a greener alternative to traditional chemical manufacturing methods. Detailed in a study published on October 1 in the journal Nature, the open-access online tool leverages machine learning to predict which enzymes are best suited to carry out specific chemical reactions, thereby reducing the need for expensive and toxic reagents.
The development of CATNIP is a direct response to the long-standing challenge of enzyme specificity in industrial applications. While enzymes, or biocatalysts, are highly efficient and can perform complex chemistry at room temperature in water, they have evolved to work with a very specific set of molecules found in nature. This has limited their use in synthesizing the vast array of compounds needed for pharmaceuticals, polymers, and other advanced materials. By creating a data-driven system that can predict new enzyme-substrate pairings, the research team has built a bridge between the world of natural enzymes and the needs of modern chemists, potentially accelerating the transition to more environmentally friendly manufacturing processes.
Overcoming Biocatalysis Barriers
The core challenge in utilizing biocatalysis for industrial chemistry lies in the highly selective nature of enzymes. These proteins have evolved over millennia to interact with specific molecules, known as substrates, found within their natural environments. This specificity, while a strength in biological systems, becomes a significant hurdle when chemists want to use them to create novel molecules not found in nature. Alison Narayan, a professor of chemistry at the University of Michigan, noted that while biocatalysis offers a more sustainable path to building molecules, the known substrates for these enzymes represent a small fraction of the compounds chemists work with. Consequently, finding the right enzyme for a desired chemical transformation has often been a process of trial and error, a resource-intensive and time-consuming endeavor that has hindered the widespread adoption of greener chemistry.
Traditional chemical synthesis frequently relies on harsh conditions, such as high temperatures and pressures, and often involves toxic solvents and heavy-metal catalysts. Biocatalysis presents an opportunity to perform these reactions under much milder, water-based conditions, reducing both energy consumption and hazardous waste. However, without a reliable method for identifying which enzymes can act on non-natural substrates, the potential of biocatalysis has remained largely untapped by the broader chemical industry. The CATNIP project was conceived to address this gap directly by systematically mapping the interactions between enzymes and a wide range of chemical compounds, creating a foundational dataset for predictive modeling.
A Data-Driven Approach to Enzyme Discovery
The creation of CATNIP was a multi-stage process that combined large-scale experimental work with sophisticated machine learning techniques. The researchers aimed to build a predictive tool grounded in a comprehensive and diverse dataset of enzyme-substrate interactions.
High-Throughput Screening
The project began by focusing on a single family of enzymes. Alexandra Paton, then a postdoctoral fellow in Narayan’s lab, developed a high-throughput reaction platform that allowed the team to test over 100 different substrates against each protein in the enzyme family. This systematic approach generated hundreds of new, previously unknown connections between the enzymes and various chemical substrates. This foundational dataset provided the empirical evidence needed to understand the reactivity patterns within the enzyme family far beyond their naturally known partners. Paton, now an assistant professor at the University of Rochester, noted that this diverse dataset was the catalyst for thinking about what could be built with all the accumulated data.
Machine Learning Model
With a robust dataset in hand, Narayan’s team collaborated with a group at Carnegie Mellon University led by Gabe Gomes, an assistant professor of chemical engineering and chemistry. Gomes and his graduate student, Daniil Boiko, applied machine learning techniques to analyze the data and build a predictive model. Their approach is unique in the field, employing a two-step process to recommend enzymes. The model first maps a user’s desired substrate in chemical space and identifies similar compounds with known enzyme reactivity data. Based on this neighborhood of known reactions, it constructs a preliminary list of potentially compatible enzymes. A second model then reranks this list, pushing the most promising candidates to the top. This method effectively translates complex data on protein sequences and molecular structures into actionable recommendations for laboratory chemists.
How the CATNIP Platform Works
The CATNIP platform is an open-access online tool designed to be intuitive for chemists. It functions in a way that is conceptually similar to a web search engine, providing ranked results to help scientists quickly identify the most promising candidates for their experiments. A user can approach the tool in two ways. First, a chemist can input a starting compound, or substrate, and CATNIP will return a ranked list of biocatalysts from the studied protein family that are most likely to facilitate the desired chemical transformation.
Alternatively, a researcher who is interested in a particular enzyme can input it into the platform to receive a list of its potential substrates. This dual functionality allows for greater flexibility in experimental design. According to Gomes, the model’s predictive capability helps scientists “derisk their experimental planning” when selecting an enzyme. While the top-ranked result is not guaranteed to be the absolute best choice, it provides a highly educated starting point, significantly narrowing the field of candidates and reducing the need for extensive trial-and-error screening.
Accelerating Sustainable Chemical Manufacturing
The introduction of CATNIP is poised to have a significant impact on the chemical industry, particularly in the sectors of specialty chemicals and advanced polymers. By democratizing access to biocatalysis, the tool enables a wider range of researchers and manufacturers, including smaller companies, to innovate in sustainable process design. The platform’s ability to quickly identify optimal biocatalysts can drastically reduce research and development costs and accelerate go-to-market timelines for new products.
This data-driven approach aligns with growing industry and regulatory demands for greener and safer manufacturing methodologies. As companies face increasing pressure to comply with environmental, social, and governance (ESG) mandates, tools like CATNIP provide practical pathways for reducing reliance on hazardous materials and adopting more sustainable practices. By bridging the gap between the specificity of natural enzymes and the diverse needs of industrial synthesis, CATNIP removes a major technical barrier and creates new opportunities for innovation and competitive differentiation based on eco-friendly value propositions.
The Future of Biocatalytic Design
The current version of CATNIP is just the beginning. The research team is already working on expanding the platform’s database beyond the initial enzyme family. The long-term vision is to incorporate a wider range of enzyme families, which would broaden the scope of accessible biocatalytic reactions for chemists. As more data is generated and integrated, the predictive accuracy of the machine learning model is expected to improve, further enhancing its utility. This ongoing development reflects a broader trend at the intersection of life sciences, data science, and engineering, where innovation is driving the transition toward more sustainable technologies.
The collaborative and cross-institutional nature of the project, with support from the National Science Foundation and the Novartis Global Scholars Program, underscores its scientific validity and long-term potential. By making the tool open-access, the creators are fostering a global community of users who can apply biocatalysis to their unique challenges, ultimately helping to realize the full promise of green chemistry. CATNIP represents a pivotal shift from serendipitous enzyme discovery to a more rational, data-driven design process that can accelerate the development of the next generation of chemical manufacturing.