PHOTO: Tom Mitchell
Tom Mitchell, Carnegie Mellon University
PHOTO: Indrayana Rustandi
Indrayana Rustandi
In his thesis, Rustandi acknowledged PSC scientists Raghu Reddy and Joel Welling. “The thesis research became tremendously more productive,” says Rustandi, “in the beginning of 2009 when I finally had working access to Star-P running on Pople at PSC. Thanks to Joel Welling for facilitating access to Pople, and to Raghu Reddy for all his help early on resolving issues with Star-P and for being a liaison to the Star-P developers.”

Relying on PSC consulting, software and hardware resources, a Carnegie Mellon computer model that predicts what you’re thinking gets even smarter

How do you know what you know? How does your brain, for instance, hold onto the meaning of words, so you can form thoughts? Or — to focus the question more sharply — what parts of your brain become active when you read and think about a word? Which regions of your brain, which neurons, are activated, for instance, when you think about the noun celery?

Now try a different noun. Airplane. Surely your brain does something at least a little differently than when you thought celery. Suppose there were a computer connected to your brain that could detect which brain cells are firing — a very smart, crypto-analytic computer that could tell from the patterns of activated neurons what you’re thinking, a computer that can read minds. Science fiction?

Tom Mitchell, who chairs the Department of Machine Learning at Carnegie Mellon University — the first such department in the world — and his collaborators have shown that such a mind-reading computer, though a long reach from current reality, is within the realm of the possible. Their intriguing experiments with functional magnetic resonance imaging (fMRI) and the meaning of nouns — reported in Science (May 2008) — showed that a computer model can predict with 77-percent accuracy whether you’re thinking celery or airplane.

Their paper, titled “Predicting Human Brain Activity Associated with the Meanings of Nouns,” captured the attention of many scientists, and laypeople too — especially after Mitchell and his CMU colleague Marcel Just demonstrated their model’s ability in an episode of “60 Minutes” (Jan. 4, 2009). Since then, the model has become even smarter.

A person undergoes a brain scan in an MRI scanner (left) while his brain activity as he thinks about specific nouns displays on the monitor (right).

For his Ph.D. thesis supervised by Mitchell and completed this year at CMU, computer scientist Indrayana Rustandi — now working on Wall Street — found a way to improve the model, and it can now predict whether you’re thinking celery or airplane (or other choices between two nouns among a list of 60) with 85-percent accuracy. Working with PSC staff and making effective use of PSC’s shared-memory SGI system, Pople, Rustandi designed an algorithm that makes it possible to integrate fMRI datasets, basically images of the brain in action, from different people, each with their own individual brain-activation patterns.

The result: the model is freed from the limitation of using fMRI images from a single individual to predict that individual’s responses. “This improvement,” says Mitchell, “that Indra was able to make using PSC resources is really important to this line of work.”

Doing the Two Step

How can a computer read minds? As presented in Science — prior to Rustandi’s work, the model implemented a two-step algorithm, the first step of which derives from research by computational linguists. The statistics of word association show that a word’s meaning can be represented as a statistical relationship to words and phrases with which it commonly occurs. A noun like breakfast, for instance, often occurs in close association with the verb eat and also, less often, with drink.

Drawing from this idea, the model used a list of 60 concrete nouns (e.g., celery, airplane, telephone, screwdriver), and for each of them searched a huge lexical database (from Google) and gathered data on how each of the nouns correlated with a list of 25 sensory verbs (see, hear, listen, taste, smell, etc.). From this data, the model encoded each noun as a collection of statistical “semantic features.”

Images from fMRI show brain activation (colored dots) overlaid on a transverse slice of the corresponding structural brain image (grayscale).

The model’s second step relies on image data from fMRI experiments. For the results reported in Science, nine college-age participants viewed each of the 60 nouns, along with an associated picture, while in an MRI scanner. The researchers repeated this process six times for each participant, with the nouns in random order. For each participant, then, the model found a statistical mean fMRI image for each of the 60 nouns — building a separate dataset for each participant.

Using these fMRI datasets, the model then trains itself to predict an fMRI image associated with these semantic features. How do we know if the training worked? If the model has learned? The test, explains Mitchell, is to train it on 58 of the nouns, matching semantic features with fMRI activation patterns, then present it with fMRI images from the other two nouns in the list and have it decide which image — from nouns it doesn’t yet know — goes with which noun. “It got that right 77-percent of the time in the original publication, and with some optimizations we got it up to 79-percent. That means the model is predicting something correctly about the neural activity in the brain, though not perfectly, and that’s where we were.”

The idea that there’s commonality in how our brains represent similar thoughts is important for understanding how the brain works.

Looking for ways to improve the model, Mitchell and his colleagues wanted to overcome the limitation of separate datasets for each participant. “Because we use this kind of algorithm where you train it on data, we felt like the limiting factor was that we were data starved. The model is predicting something that involves 20,000 different locations in your brain, but we had only 60 words of data from each person.” One possible direction was to combine fMRI datasets from different people, to give the model more information from which to train itself.

One Big Brain

In 2009, Rustandi began working with PSC staff to get the model to run on PSC’s Pople, which allowed the entire dataset — expanded to 20 human subjects — to reside in memory simultaneously, a significant advantage. The model, however, was written in MatLab, a serial processing software environment. To gain the benefit of parallelism and improved performance it can make possible, Rustandi turned to Star-P, proprietary software for which PSC holds a license, that allows a MatLab program to run on a parallel system such as Pople.

With computational tools in place, Rustandi explored various approaches to integrating datasets. A major problem he faced is that, like fingerprints, no two brains are alike. The simplistic approach is to try to register all the subjects’ brains within a common spatial framework and pool all the data as if it were all from one brain. Rustandi tried several such approaches and achieved no significant improvement in prediction accuracy. “A major challenge in doing predictive analysis with fMRI data from multiple subjects and multiple studies,” says Rustandi, “is having a model that can effectively account for variations among subjects’ brains.”

His breakthrough came with “canonical correlation analysis” (CCA), a statistical formulation to find combinations among multiple sets of variables that have maximum correlation with each other. “CCA isn’t a new method,” says Rustandi, “but the application to fMRI hasn’t been widespread.” Unlike the spatial-registration approaches, CCA has the advantage that it integrates the datasets without disturbing the essential distinctness of the data for each subject. The data doesn’t have to fit into a mold.

These two schematics compare the initial predictive model (top image) with the CCA approach (bottom image) developed by Rustandi, which finds underlying common factors in different fMRI datasets.

CCA in effect, explains Mitchell, looks at the different datasets and mathematically replaces the semantic features derived from the 25 verbs with a combination of intensities computed from the fMRI images, and these replacement features are maximally correlated across the 20 brains. “It’s a way of framing the question we were really interested in, which is ’What features can represent word meaning across all the fMRI datasets and are the best set of features possible in how they map to the fMRI data?’ This algorithm that Indra developed based on CCA is a very nice way to solve that, but it’s computationally intensive, and we needed PSC.”

The improved accuracy with the CCA integrated dataset is substantial, adds Mitchell. “In terms of error, it’s reduced from 21-percent to 15-percent.” Rustandi removed a third of the room left for improving the model. In future work, Mitchell and his colleagues plan to apply the CCA method to integrate fMRI data from different studies, even when datasets, for instance, aren’t from an identical set of nouns, thus further allowing fMRI studies to break free from limitations of sparse data relative to the complexity of phenomena studied, the human brain.

The idea that there’s commonality among different people in how our brains represent similar thoughts, which the model tends to confirm, is important, Mitchell believes, not only for demonstrating what a computer model can do but also, more importantly, for understanding how the brain works. “It bodes well,” he says, “for the feasibility of developing a real theory of how the brain represents things. If there’s something in common, we can aspire to develop a unifying theory.”

[2nd pullout]