This post was going to be an introduction to Hadoop, but I found there are many, many, many pages doing just that. (And an hour long O’Reilly Webcast, which I highly recommend.) Instead I thought I’d give you a little tour of what Hadoop does for us at PSC.
Hadoop allows us to better utilize our newly upgraded disk storage system. The tricky part is that we are now tracking approximately 4,000 disk drives, 7 servers, and Ethernet ports. Each of these items generates elephantine log files that would be otherwise impractical to analyze. So this is where the little yellow elephant, Hadoop, comes in.
Hadoop allows us to easily analyze the vast amounts of information coming in from all of the disks, servers, and ports. (Not that we have failed disks, but you know, just in case!) We can follow trends and pinpoint any arising issues.
One way to analyze issues and trends is via Mahout. (We did the Wikipedia search for you: A mahout is a person who drives an elephant, cool right!) Apache Mahout has machine-learning libraries built in: collaborative filtering, clustering, frequent pattern recognition, and genetic algorithms. There are a number of add-ons for Hadoop, which is the beauty of open source.
If you’d like to learn more about Hadoop, Mahout, or anything else relating to managing, storing, and analyzing big data, you can come visit PSC for our monthly Pittsburgh Hadoop Users Group Meetup.
Our next meetup will be Wednesday, February 15 and we’ll be joined by Shannon Quinn, who will talk to us about Mahout; Bryon Gill, who will talk about how he used Hadoop to revitalize old hardware and build a new cluster; and JRay Scott, who will give a brief status update on building a Hadoop cluster on his Lenovo X200 Windows laptop.
The Hadoop Users Group Pittsburgh consists of members from local universities and companies. 6 – 8 p.m., Pittsburgh Supercomputing Center, 300 S. Craig St., The Stiles Lecture Hall, Room 103.
Editor’s note: Beth Albert is an occasional blog contributor. She works in our Facilities group as a Technical Writer.