Recovered History

edmonia lewis

Massive Data Analysis Uncovering Black Women's Experiences

We say “History is written by the victors.” But it’s probably more true to say it’s written by the people who have the opportunity to write. One example of this is the study of Black women, their lives and their experiences. Many earlier documents don’t mention them directly, though these works may offer clues. Those that do mention Black women often are historically obscure, hidden away in vast library collections and unintentionally misleadingly titled or cataloged. Until recently researchers had no good way of recovering this “lost history.”

Ruby Mendenhall of the University of Illinois at Urbana-Champaign is leading a collaboration of social scientists, humanities scholars and digital researchers that hopes to harness the power of high performance computing to find and understand the historical experiences of Black women, searching two massive databases of written works from the 18th through 20th centuries. The work also offers a common toolbox that can help other digital humanities projects.

“With a Big Data approach we get a chance to make use of hundreds of thousands of texts—journals, books, periodicals,” says Mendenhall. “The number is greater than what you would normally be able to look at during an entire career.”


Mendenhall’s team realized that to search tens or even hundreds of thousands of books, articles and letters, they’d need considerably more computing power than available on a typical campus “cluster” of commodity computers. They consulted with colleagues on campus who were members of the NSF’s XSEDE network of supercomputing centers. With these new collaborators, they identified PSC’s now-retired Blacklight supercomputer as a good fit for their project. With help from PSC’s Sergiu Sanielevici, they adapted their earlier work to Blacklight and then moved the project to PSC’s interim Greenfield system, a precursor to the new Bridges system.

“We chose Blacklight specifically because the tools we’re using need huge amounts of shared memory,” a strength of that system, says Mark Van Moer, an XSEDE staff member at the National Center for Supercomputing Applications at the University of Illinois who worked as the team’s visualization specialist. PSC’s continued focus on memory-intensive computing with Greenfield, and soon the new Bridges system, support such work well.

Using Blacklight, the researchers analyzed 20,000 documents known to contain information about Black women in the HathiTrust and JSTOR databases to create a computational model. They’re now using this model on Greenfield to study the entire 800,000 documents in both databases.



To make sense of the numbers, the investigators turned to two sets of computational techniques: topic modeling and data visualization.

Topic modeling looks at how often certain key words associate with each other. For example, a book that contains the word “negro”—at the time considered the most respectful term to describe Black men and women—the word “vote” and the word “women” might offer clues about Black women’s participation in the women’s suffrage movement. Mike Black, then at the University of Illinois and currently at the University of Massachusetts, headed this topicmodeling project for the team.

“We’re hoping, in the next stage, to ramp up and check these topics against the larger corpus of works,” Mendenhall adds. “Do the ‘recovery’ part.”

Van Moer’s data visualization work is building ways of displaying results in a way that helps humans make more intuitive sense of them. A “tree map” displays key words in boxes that correspond to each word’s frequency (right). A “network graph” charts how often key words appear close to each other, also offering insight into how those words are being used and what they mean in context.

Yet another visualization technique plots key terms in histographs that allow users to track the emergence and prominence of a given topic over time.


network graph

An initial finding on Blacklight confirmed the prediction that the same documents referenced the post- World War I Black Women’s Club and New Negro movements. This raises interesting questions about how the two movements, which historians knew were contemporaneous, may have interacted. The Illinois researchers hope to begin answering these questions in their current work on Greenfield, as well as proposed work on Bridges.

“The beauty of computation and Big Data lies in how it complements the traditional close reading,” says Nicole Brown, a postdoctoral fellow in Mendenhall’s group who is interpreting the computational results in light of Black feminist theory. “The two methods complement each other to give you a full picture of what’s going on.”

“Working with the social science and humanities people has been a real eye opener in a lot of ways,” adds Van Moer. “In the previous seven years I pretty much worked with physical scientists. Humanities and social science researchers have to be worried about not just what the numbers mean at a surface level. They have a whole theory behind how you go about interpreting things as it relates to the larger society—that’s really an interesting aspect of the project for me.”

Another of the group’s goals will be to create a set of computational tools that researchers in many fields will be able to use to search various texts for topics of interest—and to understand how those topics interrelate. Topic modeling and visualization methods can be modules in a larger toolbox for digital humanities research.

“We’re generally interested in Black women and their life experience,” Mendenhall says. “But we also see this as a tool that social scientists and people in the humanities can use to study many topics.”

Upcoming Events

Events Calendar

<<  August 2017  >>
 Su  Mo  Tu  We  Th  Fr  Sa 
    1  2  3  4  5
  6  7  8  9101112

System status

  Bridges status: Partial outage

Bridges has returned from maintenance with the exception of the 12TB (ESM) nodes. These nodes may be unavailable through Wednesday August 23. Thank you for your patience as we work to bring the system back to being fully operational.


Featured Projects

Data Exacell (DXC)


The Data Exacell (DXC) is a research pilot project to create, deploy, and test software and hardware building blocks to enable data analytics in scientific research.

XSEDE Service Provider

image gallery

PSC is a service provider of the Extreme Science and Engineering Discovery Environment (XSEDE).