Finding Cause

CMU Group Uses PSC’s Bridges to Nail Down Cause in Brain-Region Activity

April 3, 2018

Why It’s Important:

At first blush, fMRI seems magical. Doctors can put a person in an fMRI scanner and watch their brain work as they think, perform simple mental tasks or just sit and daydream. But the field has also seen its controversies. Scientists have doubts about what fMRI can teach us about how the brain works. Part of the problem is how to identify cause and effect when so much activity in the brain is going on at the same time.

Clark Glymour, Joseph Ramsey and their colleagues in Carnegie Mellon University’s Department of Philosophy are trying to tackle this problem with a powerful set of statistical tools. Their work, as part of the Center for Causal Discovery (CCD) at Pitt, CMU and PSC, focuses on the difference between correlation and causation. Correlation is when events tend to happen together—for example, back in the days of unfiltered cigarettes, smokers tended to have yellow-stained fingers. They also tended to develop lung cancer. Yellow fingers and cancer were correlated. But neither caused the other. Instead, as we know today, smoking cigarettes caused both yellow fingers and lung cancer. The difference is important. You can’t prevent lung cancer by giving smokers better hand soap. You have to tackle the common cause: smoking causes damage to lung tissues that leads to cancer and other diseases. Its relationship with cancer is causal.

“Understanding connectivity in the brain is just about the most extraordinarily difficult statistical problem that you can imagine … Nothing is more fun for me than realizing we can do something that five years ago people said was impossible.”—Clark Glymour, Carnegie Mellon University

The CMU scientists wanted to test a statistical method called fast greedy equivalence search (fGES), which is designed to detect causal relationships, rather than only correlations, on fMRI brain scans. But they had a Big Data problem to deal with. The fMRI method can scan the brain in bits—“voxels”—as small as a cubic millimeter. That’s not much more than the size of a pinhead. To test whether activity in parts of the brain were causing activity in other parts, that meant they’d need to test the mutual interactions between about 51,000 voxels. That’s more than 2.6 billion possible interactions they needed to consider, with an average of four bytes of data measured for each. Calculations on those data grow much larger, and the algorithm couldn’t run on a cloud or a cluster—a high performance computer system (HPC) made up of many smaller, commodity computers—let alone a laptop.

How PSC and XSEDE Helped:

To help the CMU team solve their problem, Nick Nystrom, PSC interim director and the CCD co-investigator at PSC, recommended first the Greenfield and then the even more powerful XSEDE-allocated Bridges HPCs at PSC. First the scientists needed to test their “discovery procedure.” They did that by programming a much larger, simulated causal system with a million variables on Greenfield and then Bridges. They used the PSC systems to simulate signals from all those variables.

Ramsey, the team’s computational specialist, built a network of causal relationships between simulated brain areas that approximated what doctors were seeing in real brains. Eventually, they simulated the influences among one million voxels. Applying their causal discovery algorithm to the simulated data, the scientists found that fGES was able to discover the causal relations behind the data with excellent accuracy. With this simulation result in hand, the group applied the fGES algorithm to data from 51,000 voxels of the brain of a subject resting in the magnetic resonance machine. The result is an estimate of effective connections in the entire human cortex—the outer part of the brain responsible for higher functions.

Bridges’ large-memory (LM) nodes were critical in the scaled-up simulation. The team’s work with the 128-gigabyte nodes of Greenfield—eight times the RAM in a high-end personal computer, and typical of HPC systems—sometimes crashed because of insufficient memory. But the calculations ran without this problem on Bridges’ 3-terabyte LM nodes, which are 24 times larger. Better, once Ramsey had the simulation working on Bridges, other members of their group were able to use the system without further refinement.

“We’re working with pretty big numbers here. The memory capacity and parallelization capacity of the PSC systems is essential. Thanks to PSC, it took only one work day—12 hours—[of computation]. If you think of it as a percentage of the time invested in a research project, you want to say, ‘Thank you.’”—Clark Glymour, Carnegie Mellon University

The CMU team was able to compare their fGES simulation to several other possible methods, finding that their method identified simulated brain areas that were in a causal relationship with over 90 percent accuracy. It also correctly identified the directions of those relationships—which of the two voxels caused activity in the other—over 80 percent of the time. This performance was the best of any statistical method tested to date.

The team’s current goal is to create a more sophisticated causal map of the entire brain, charting out forward and backward communication between brain voxels and both direct and indirect relationships. Such a map would be an important piece of understanding how the brain works, along with the anatomical wiring maps and the genetic changes in brain cells being discovered by other methods. They say that PSC’s HPC systems will be indispensable to the upcoming work.

Related Reading:

Zebrafish Study Reveals First Fine Structure of a Complete Vertebrate Brain

CMU, PSC and Pitt to Build Brain Data Repository

Using “Cloud”-Based “Backfill Cycles” on XSEDE Machines Enables Very High Resolution Functional Brain Imaging