A Galactic Choice
Artificial Intelligence Running on Bridges Surpasses Humans at Classifying Galaxies
New telescope surveys are discovering hundreds of millions of new galaxies—far more than humans can classify. A National Center for Supercomputing Applications (NCSA)-led team has employed “deep learning” artificial intelligence (AI) on the graphics processing units (GPUs) of PSC’s Bridges platform to produce a galaxy-classifying artificial intelligence with better-than-human accuracy and capacity.
Why It’s Important
Astronomers estimate there are at least 100 billion galaxies in the observable universe.
Scientists would like to get a better handle on these huge collections of stars for a number of reasons. For one, most of the mass of the universe seems to be invisible. One way we “see” the presence of this dark matter is through its effects on galaxies. Also, the motions of galaxies tell us that the expansion of the universe is accelerating. The reason for this may be that most of the energy of the universe is in an unknown form called dark energy. Astrophysical Surveys, such as the recent Dark Energy Survey (DES) and the upcoming Legacy Survey of Space and Time (LSST), are collecting data to study these fundamental questions.
“Cataloging all the galaxies in the universe are of fundamental interest in science for a number of reasons. For instance, combining gravitational wave observations with large scale galaxy catalogs has enabled the first gravitational wave standard-siren measurement of the Hubble constant which tells us how fast the universe is expanding … Astronomers have been trying to use AI to automate these tasks for quite some time, but traditional machine-learning algorithms, while promising, couldn’t achieve human-level accuracy.”—Asad Khan, NCSA
As a first step, scientists are studying the shapes of galaxies. The shape of a galaxy tends to be strongly intertwined with the history of its evolution. Shape also sheds light on a galaxy’s star-formation rate, past mergers and interactions with other galaxies and so on.
The logical starting point for astronomers in modern surveys is to classify and sort the vast number of galaxies observed. The main classification is whether a galaxy has a spiral shape, with curving arms like the Milky Way, or elliptical, which looks like a uniform ball of stars.
This simple task is enormous. Astronomers initially turned to crowdsourcing to solve it. One highly successful effort was Galaxy Zoo. It used thousands of volunteers to classify galaxies. They classified 900,000 in the project’s first phase. Volunteers will continue to have a role. But newer surveys of farther-away galaxies will dwarf that effort. The earlier Sloan Digital Sky Survey (SDSS) identified 50 million galaxies. The DES has identified more than 300 million. Even with thousands of volunteers, astronomers could never classify that many.
Graduate student Asad Khan, his advisor Eliu Huerta, and colleagues at NCSA at the University of Illinois Urbana-Champaign, as well as at Argonne National Laboratory, decided to solve this problem using deep learning on Bridges.
To gain astronomers’ confidence, the NCSA team demonstrated how their AI was classifying galaxies.
A method of visualizing how the AI classified galaxies helped give astronomers confidence. Classification of the labeled Dark Energy Survey test set, the Sloan Digital Sky Survey test set and the predictions made by the AI for unlabelled galaxies.
How PSC Helped
Previous attempts to apply AI to galaxy classifications couldn’t achieve human-level accuracy. To improve on that, the NCSA-led team turned to a type of machine learning called deep learning (DL). In DL, the computer learns a representation of the data, using a multi-level artificial neural network. Bridges was particularly useful for their work, as it had the most advanced processors that were optimized for deep learning at the time—NVIDIA Tesla P100 GPUs.
For the data set, the scientists used a subset of the SDSS classified by the volunteers of Galaxy Zoo and verified as being above 90 percent accurate. They divided the data into three subsets: A roughly 36,000-galaxy training data set, a 1,000-galaxy validation data set and a 12,500-galaxy testing data set. They chose the latter two data sets so that the galaxies in them lie in parts of the sky that both the SDSS and the DES had surveyed, taking advantage of the lessons learned by the earlier study.
“In order to accelerate the adoption of AI tools for big-data analytics, it is essential to understand how these algorithms process data and extract information to make trustworthy predictions. For this article, we first designed AI algorithms that significantly outperform humans at classification and data labelling tasks, and then produced scientific visualizations that shed new and detailed information about how neural networks perform these tasks.”—Eliu Huerta, NCSA
In the testing phase, the AI matched the Galaxy Zoo classifications 85 percent of the time. But when they adjusted for the known error rate in Galaxy Zoo, they found their AI was over 99 percent accurate—better than the humans. As a last step, the scientists applied their AI to predict galaxy types in a set of about 10,000 not-yet-labelled galaxies. In addition, they had built their AI so that its processes for classifying the galaxies could be examined by humans. This step, which explained how the AI works, was important for convincing astronomers that the AI’s methods can be trusted.
The team reported their results in the journal Physics Letters B in August 2019. They presented their visualization the following November at the SC19 conference. Future work will be to apply the method to larger groups of unidentified galaxies, automating galaxy identification to keep pace with the hundreds of millions expected to be discovered in the near future. The team has also begun using Bridges-AI, whose NVIDIA Tesla V100 GPUs are currently the most advanced GPUs for deep learning. The platform’s NVIDIA DGX-2 enterprise AI research system enables high performance deep learning across sixteen V100s.