MCS-Designed “Artificial Chemist” Promises Record-Performing Materials with Unequaled Speed of Discovery

Using PSC’s Bridges-2 system, scientists at Mellon College of Science (MCS) have created an “artificial chemist,” a computer program that mimics the expertise of human chemists. The artificial intelligence (AI) program, or algorithm, is capable of directing an automated laboratory to synthesize new contrast agents for medical MRI imaging. The new contrast agents, thanks to the AI, have a ratio of signal to noise as much as 50 percent higher than previous state-of-the-art, human-designed materials. This performance boost offers the possibility of improved diagnosis as well as a host of possible breakthroughs in other fields relying on new chemical compounds.

The MCS team built their AI using advanced research computers at PSC as well as at the Texas Advanced Computing Center (TACC). The robotic lab instrument is located at the University of North Carolina at  Chapel Hill (UNC). The collaborators plan to develop the software so that it is capable of more general chemical design for other applications in medicine, chemistry and materials science.

“Previous efforts in materials discovery have relied on either luck or human intuition, which both suffer from inherent biases and limitations in knowledge,” said Olexandr Isayev, Assistant Professor of Chemistry at MCS. Because humans learn chemistry from other humans, there was always the concern that human chemists were all stuck on the same “track,” unaware of unintuitive new possibilities for chemical design.

The AI software designed by Filipp Gusev, a graduate student in Isayev’s group in the joint CMU-Pitt PhD Program in Computational Biology, as well as a co-first author in the work, by contrast (pun noted) does not rely on knowledge as experienced by humans. Instead, in a process called machine learning, or ML, it starts with a training set of successful MRI contrast agents. By a process of trial and error against a testing set of more MRI agents and using a limited set of reagents, it explores chemistry on its own. The MCS scientists used the approach because the traditional “Edisonian,” or trial-and-error, approach for materials design is extremely ineffective.

Unique Synthesis Tool

The work was enabled by the robotic system built by the Frank Leibfarth group at UNC Chapel Hill. Leibfarth’s group  had built an automated continuous-flow system designed to build polymers that can be used to create plastics, packaging and a number of useful materials. The lab provides the “hands” of the operation. The “brain” was supplied by the Isayev team’s AI.

Graduate student Marcus Reis of Leibfarth’s group was the co-first author for the work from UNC.

“This was a true collaboration between two labs, an experimental group at the University of North Carolina and my group at CMU ,” said Isayev. “We implemented a bold vision of a self-driving laboratory  for materials discovery; in this case, an example of  polymers and soft materials.”

The continuous-flow operation and its automation of complex reaction sequences with software gave scientists across the world the ability to work with the UNC team remotely.  These capabilities allowed the MCS team to perform the experiment during the COVID-19 pandemic, despite the widespread shutdowns.

“The idea is not to replace human chemists, but substantially enhance human creativity and intuition with AI to design new materials,” Isayev added.


Thanks to the back-and-forth between the MCS AI’s exploration of improved MRI contrast agents (left) and lab testing of the candidates by a robotic instrument (center), the work produced 397 candidates with improved performance compared with known contrast agents. Surprisingly, while as expected the best-performing of these candidates had a high number of fluorine-19 atoms in their structure, they showed better performance than known contrast agents with more of the element (right).

Challenges of AI-Orchestrated Polymer Design

The overall project posed serious challenges. The first was a truly vast number of possible polymers that could be produced by the automated lab’s reagent set. Unlike working with simulated or historical data, the development of an AI algorithm plus acquiring new data on the fly by real experiments required taking into account the cost and number of experiments.  To some extent, the team could control this by giving the AI a limited set of reagents—but even for a small set of six organic “monomers,” the space of possible experiments, in this case, was over 50,000. The MCS team would need powerful computing resources. They would also have to refine the AI model in repeated training steps to cover the huge “multidimensionality” of the problem in search of the best-performing polymers while conducting only a small fraction of possible experiments.

Because of the limited number of properly characterized MRI contrast agents, “the initial training dataset was relatively small, which means that model has to be prepared in a way to maximize information gain,” said Gusev. “Otherwise, a not-optimal construction of an AI algorithm could lead to performing uninformative experiments. Moreover, even a small imperfection in the performance of the model” would amplify the training process in terms of the number of experiments needed.

In other words, the AI, starting from limited examples, could initially draw wrong conclusions—like the child who thinks that red means stop, green means go and yellow means go really fast. Fortunately, this problem can be avoided by giving the AI reasonable examples early.

Another challenge was that the team wanted to create a general-purpose AI chemist that could be put to any chemical task, not one that could only design MRI agents.

“This is a critical weakness in the current ML pipelines,” Isayev said. “There is no single ML method that works for all applications.”  Almost all ML models are designed and tuned by hand, and there is no single ML model that works for all applications. Typically, a scientific paper will report only the successful application of a particular method. This introduces biases, both in the selection of the model and the data samples. This leads to researchers selecting suboptimal models or investing a significant amount of time into model tuning for a particular application.

“An automatic algorithm that requires no expert insight is required,” Isayev said.

The scientists used an approach called automated machine learning to try to overcome this limitation of ML. The first step in the process is the usual one of selecting the best performer from a large group of possible models. The scientists then refine the winning model by real-world testing of the resulting contrast-agent candidates in the UNC lab, putting the results of that testing back into the AI. By going back and forth between the computer and the lab, the AI could correct its mistakes.

Computational Challenge

The need to train the AI on a computational space that ballooned as the reagents combined, and re-run the computations repeatedly, surpassed the ability of any personal computer or even a typical university supercomputing “cluster.” PSC’s Bridges-2 system and its AI-optimized components offered the team powerful new graphics processing units (GPUs) and massive memory (RAM) to carry out the work.

“Even a small model space leads to intensive computing requirements,” said Gusev. “Because the calculations are done over and over again exhaustively to get maximum information from the data, we needed a lot of computing power. PSC helped us speed up this project.”

GPUs were originally designed primarily for computing games. But their unique capabilities for “parallel computing” proved to be ideal for AI research. Starting in 2012, a GPU revolution swept the AI field, powering many of the groundbreaking  AI tools we now take for granted.

Bridges-2/Bridges-AI offered the team 264 NVIDIA Tesla V100 GPUs and the 16 Volta V100 GPUs—the latter in a powerful DGX-2 unit. The machine enabled just the kind of state-of-the-art ML training paired with Big Data data-handling that they needed. Along with the Frontera system at TACC, Bridges-2 gave the project enough GPU heft to perform the massive calculations needed.

Surprising Results

Through a series of eight refinements, Gusev’s AI was able to narrow a potential 50,000 polymers to a list of only 397 experimentally synthesized. Trading between the computer and the lab identified the best performing of these candidates. These performed as much as 50 percent better than current MRI contrast agents.

“This is a prime example of the transfer of the decision power from human experts to AI,” Isayev noted.

These winning candidates posed a surprise to the human chemists. Clinical MRI works by detecting changes in a strong magnetic field created by substances in the human body. One family of MRI contrast agents uses the element fluorine-19 (19F), which has the ability to interact with dissolved oxygen in body fluids. This interaction can be detected in a strong magnetic signal. This signal tells doctors where oxygen is concentrated in living tissues.

Scientists had long thought that more is better in terms of  “19F solution concentration”—the more 19F atoms that a contrast agent could pack in a smaller space, the better. But 19F also makes the polymer less soluble in water—and if the polymer can’t be dissolved, it can’t be injected.

The leading candidates the AI picked did contain enough 19F to create a strong signal. But their fluorine concentration was not as high as the scientists had expected. As they had hoped, the AI had found a “Goldilocks” point of just enough 19F to give a strong signal while still being soluble, a point that humans had not predicted. The result offers hope that AI-guided design can create chemical tools that surpass what human experts can design.

“The majority of the top chosen agents contain fewer amounts per molecular weight of 19F compared with known literature examples, but perform better,” said Gusev. “This was unexpected compared with naïve chemical intuition.”

The MCS team reported their results in a paper in the Journal of the American Chemical Society in November, 2021. They would next like to extend the approach toward other types of polymers and organic materials.