This post was going to be an introduction to Hadoop, but I found there are many, many, many pages doing just that. (And an hour long O’Reilly Webcast, which I highly recommend.) Instead I thought I’d give you a little tour of what Hadoop does for us at PSC.

Hadoop allows us to better utilize our newly upgraded disk storage system. The tricky part is that we are now tracking approximately 4,000 disk drives, 7 servers, and Ethernet ports. Each of these items generates elephantine log files that would be otherwise impractical to analyze. So this is where the little yellow elephant, Hadoop, comes in.

Hadoop allows us to easily analyze the vast amounts of information coming in from all of the disks, servers, and ports. (Not that we have failed disks, but you know, just in case!) We can follow trends and pinpoint any arising issues.

One way to analyze issues and trends is via Mahout. (We did the Wikipedia search for you: A mahout is a person who drives an elephant, cool right!) Apache Mahout has machine-learning libraries built in: collaborative filtering, clustering, frequent pattern recognition, and genetic algorithms. There are a number of add-ons for Hadoop, which is the beauty of open source.

If you’d like to learn more about Hadoop, Mahout, or anything else relating to managing, storing, and analyzing big data, you can come visit PSC for our monthly Pittsburgh Hadoop Users Group Meetup.

Our next meetup will be Wednesday, February 15 and we’ll be joined by Shannon Quinn, who will talk to us about Mahout; Bryon Gill, who will talk about how he used Hadoop to revitalize old hardware and build a new cluster; and JRay Scott, who will give a brief status update on building a Hadoop cluster on his Lenovo X200 Windows laptop.

The Hadoop Users Group Pittsburgh consists of members from local universities and companies. 6 – 8 p.m., Pittsburgh Supercomputing Center, 300 S. Craig St., The Stiles Lecture Hall, Room 103.

Editor’s note: Beth Albert is an occasional blog contributor.  She works in our Facilities group as a Technical Writer.

Posted in General | Tagged , , | Leave a comment

Here at the Pittsburgh Supercomputing Center we provide a vast amount of computing resources to scientists and researchers all over the country. In many cases, these users need to transfer an incredibly large amount of data into and out of PSC in order to do their work. This may seem like a simple task, but when you are transferring terabytes of data every day, even a small inefficiency can add hours to the transfer time. At the same time, we need to see to the security of our systems and user data. This means we have to insist that all of our users make use of cryptographically secure methods to log in and move data around. There have been a number of applications that have tried to make data transfer fast, secure, and easy but getting all three at the same time was a hard problem to solve.

Quite a few of our users really wanted to make use of SSH and SCP – very easy to use, cryptographically secure applications that are available on almost every computer in the world. The problem is that they’re very slow. With that in mind, I started looking into why SSH was so slow and I (along with many others) noticed that it was very fast for local connections, but the further away the two hosts were, the slower it became. In the networking world this is what we call delay dependency. Simply put, it means that the longer it takes for a packet to go from one computer to the other,  the more of an effect you’ll see. In this case, the effect was slowness.

Why would this be the case? After all, it’s a fast network from one end to the other, so why would the distance matter in terms of performance? This happens because when you use TCP (Transmission Control Protocol – a reliable method to send data packets) the data packets sent from one computer have to be acknowledged by the receiver before more packets can be sent. This is what we call flow control, and it makes sure that we don’t flood the network or the computers with too much data too quickly. The amount of data that can be whizzing across the network before the sender must stop and wait for an acknowledgement is called the ‘window’. The protocol allows this window to slide open so that more data can be sent before pausing to wait for the acknowledgement. The bigger the window, the more data you can send each time. In a perfect world, the window ends up being the same size as the carrying capacity of the path (better known as the bandwidth delay product). If it’s smaller than that, you just aren’t making the best use of the network.

So what does this have to do with SSH and SCP? Well, it turns out that the developers had to create a kind of flow control window in SSH and SCP. This window sits ‘on top’ of the one that TCP makes use of. This wouldn’t be a problem if the windows were the same size. Unfortunately this wasn’t the case, and the one used by SSH and SCP was very small (64K) in comparison to the windows used on high performance networks (4000K and higher). The end result was that SSH and SCP were taking far too long to get data into and out of PSC. Since this is what our users wanted to use for data transfers I, along with Ben Bennett and Mike Stevens, decided to fix the problem.

We were able to change the SSH and SCP flow control so that its window would slide open at the same time and at the same rate as the TCP window. This alone gave many users a 10 to 30 times performance improvement. We then found a way to turn off the data encryption after people have logged in. Many times, people are not transferring sensitive information and just want to move the data as fast as possible. By disabling encryption after they securely logged in, we didn’t need to use as much of the CPU and that improved performance even more. We then went a step further and made one of the encryption methods significantly faster by allowing parts of it to work in parallel. In some instances, we could move fully encrypted data as fast as unencrypted data. We released all of these changes as HPN-SSH and it has, over the years, become an invaluable tool used widely by Google, Facebook, NASA, scientists, and computer users all over the world.

Just recently, HPN-SSH was made the default version of SSH in all distributions of FreeBSD 9.0. As the lead author of HPN-SSH, it’s gratifying to see my work coming out to a wider audience. There is a lot more work to be done too, and I’m looking forward to it as a way to help both my users and the whole internet community. You can find out more about HPN-SSH at http://www.psc.edu/networking/projects/hpn-ssh

Posted in Networking | Tagged , , , , | Leave a comment

I love science, and being around smart women, and chocolate. The combination of all three is nearly unbeatable, and made for a fun event last weekend. I had a great time with three of my female co-workers on Saturday talking with ten middle and high school girls about scientific computing, networking (the computer kind), and computational biology.

The girls came to PSC with Nina Barbuto of the Girls, Math and Science Partnership, sponsored by Carnegie Science Center, for a Tour Your Future event. I got to tag along as Kathy Benninger, Marcela Madrid, and Laura McGinnis spoke to the girls about careers in computational science.

Part of the "cloud" lives here.

Kathy explained what she does as a network engineer and took the girls to tour one of the machine rooms containing networking and cloud storage equipment. They wanted to know what her favorite project was (designing the network for the Software Engineering Institute building) and asked – or more precisely, shouted over the cooling units in the machine room – what happens if the air conditioning is off (best guess: machines would start to shut themselves down within an hour, but let’s hope we don’t have to find out.)

A participant manipulates a myoglobin molecule

Manipulating myoglobin.

Marcela introduced the girls to computational biology by showing them how to access the Protein Data Bank online and manipulate a myoglobin molecule, and demonstrated how the design of new drugs can be affected by molecular structure. Better yet, she explained that, yes kids, you can try this at home, for free!, which left several girls chattering excitedly about the possibilities.

Parallel processing to count M&Ms

Collision management: this jar isn't big enough for the both of us!

By now you’re asking, “But what about the chocolate?” Laura brought it when she engaged the girls in computational thinking to solve the big question, “How many M&Ms are in this family-size bag?” Using spoons (processors) to scoop M&Ms out of a narrow-necked jar, the group learned about idle time (waiting for a turn with one of the spoons), collision management (trying to get more than one spoon into the jar at once) and the advantages of parallel processing (many girls counting) over serial processing ( just one girl counting the M&Ms). And then they got to eat them.

Thank you, Nina, for bringing the girls in, and thank you, Ally, Sai, Sarah, Tajah, Caroline, Olivia, Lindsey, Arianna, Mariya, and Tristyn for coming and for proving – again – that women and science is a very cool combination. Good luck in school, and we hope to see you around the lab soon!

Posted in Outreach | Tagged , , | 2 Comments

We’ve been talking about starting a PSC blog for a while now.  We ran out of excuses of why not to blog so here we are ready to write for your reading pleasure!

Deb Nigra was one of the first people to be hired when PSC opened back in 1986.   Deb’s the go-to gal around here for most things pertaining to our website.  She’s actually in the process of acquiring as many college degrees as possible.  Currently she has three -  a B.A. in Chemistry from Thiel College, an M.S. in Chemistry from Carnegie Mellon University and an M.S. in Information Science from the University of Pittsburgh – and is  taking classes and applying to the  M.A. in Professional Writing program at Carnegie Mellon.   When not in a classroom or working at PSC, Deb enjoys gardening and has been seen lugging around grocery bags of bulbs and plants that she’s dug up from her garden to share with those of us who still have room in our gardens.

Chris Rapier is one of our “networking peeps” –   my affectionate term for the staffers who keep our network up and running. Chris graduated from Carnegie Mellon with a B.S. in Applied History.  I’m sure you’re wondering how did this history geek end up becoming a networking peep? After working in a variety of different jobs, he went to work for a friend at Telerama where he excelled as a help desk manager and system administrator.  Nowadays, he’s involved in research to improve network performance and diagnostics.  When Chris isn’t researching how to get our emails sent faster, he enjoys cooking, eating and talking to me about various home improvement projects.

Shandra Williams joined the PSC team in 2007, officially as a graphic designer, and unofficially as the PSC jokester.  She attended the Art Institute, Tuskegee University and Auburn University.  She handles most of the graphic design work around here.  I’ve been told that in her spare time one of her hobbies is “model train engineer”.  However, I’ve yet to see one of these model trains in a photo or in person…  She enjoys German style board games and video games when she’s not doing graphic design, model train engineering or enjoying a weekday gluten-free lunch with Deb Nigra.

Robin Scibek  <— that’s me.  I began working at PSC as a student and was hired as a staffer after graduating with my B.S. in Computer Science in 2003 from the University of Pittsburgh. I began working in the database group; since finishing an M.B.A at Pitt, I’ve also been responsible for organizing PSC events like our annual open house and other outreach projects.  When I need a break at work I’ve made it my personal mission to help my co-workers develop a passion for fashion and transform their terrible “offbeat” fashion choices. I haven’t been very successful though. Recently, I issued a fashion citation to Shandra Williams for wearing chartreuse socks with sandals – she has it proudly posted on her office door.

Well, now that you’ve been officially introduced to all of us, I hope you check back often to see what we’re all up to here at PSC.

Posted in People | 1 Comment