Here at the Pittsburgh Supercomputing Center we provide a vast amount of computing resources to scientists and researchers all over the country. In many cases, these users need to transfer an incredibly large amount of data into and out of PSC in order to do their work. This may seem like a simple task, but when you are transferring terabytes of data every day, even a small inefficiency can add hours to the transfer time. At the same time, we need to see to the security of our systems and user data. This means we have to insist that all of our users make use of cryptographically secure methods to log in and move data around. There have been a number of applications that have tried to make data transfer fast, secure, and easy but getting all three at the same time was a hard problem to solve.
Quite a few of our users really wanted to make use of SSH and SCP – very easy to use, cryptographically secure applications that are available on almost every computer in the world. The problem is that they’re very slow. With that in mind, I started looking into why SSH was so slow and I (along with many others) noticed that it was very fast for local connections, but the further away the two hosts were, the slower it became. In the networking world this is what we call delay dependency. Simply put, it means that the longer it takes for a packet to go from one computer to the other, the more of an effect you’ll see. In this case, the effect was slowness.
Why would this be the case? After all, it’s a fast network from one end to the other, so why would the distance matter in terms of performance? This happens because when you use TCP (Transmission Control Protocol – a reliable method to send data packets) the data packets sent from one computer have to be acknowledged by the receiver before more packets can be sent. This is what we call flow control, and it makes sure that we don’t flood the network or the computers with too much data too quickly. The amount of data that can be whizzing across the network before the sender must stop and wait for an acknowledgement is called the ‘window’. The protocol allows this window to slide open so that more data can be sent before pausing to wait for the acknowledgement. The bigger the window, the more data you can send each time. In a perfect world, the window ends up being the same size as the carrying capacity of the path (better known as the bandwidth delay product). If it’s smaller than that, you just aren’t making the best use of the network.
So what does this have to do with SSH and SCP? Well, it turns out that the developers had to create a kind of flow control window in SSH and SCP. This window sits ‘on top’ of the one that TCP makes use of. This wouldn’t be a problem if the windows were the same size. Unfortunately this wasn’t the case, and the one used by SSH and SCP was very small (64K) in comparison to the windows used on high performance networks (4000K and higher). The end result was that SSH and SCP were taking far too long to get data into and out of PSC. Since this is what our users wanted to use for data transfers I, along with Ben Bennett and Mike Stevens, decided to fix the problem.
We were able to change the SSH and SCP flow control so that its window would slide open at the same time and at the same rate as the TCP window. This alone gave many users a 10 to 30 times performance improvement. We then found a way to turn off the data encryption after people have logged in. Many times, people are not transferring sensitive information and just want to move the data as fast as possible. By disabling encryption after they securely logged in, we didn’t need to use as much of the CPU and that improved performance even more. We then went a step further and made one of the encryption methods significantly faster by allowing parts of it to work in parallel. In some instances, we could move fully encrypted data as fast as unencrypted data. We released all of these changes as HPN-SSH and it has, over the years, become an invaluable tool used widely by Google, Facebook, NASA, scientists, and computer users all over the world.
Just recently, HPN-SSH was made the default version of SSH in all distributions of FreeBSD 9.0. As the lead author of HPN-SSH, it’s gratifying to see my work coming out to a wider audience. There is a lot more work to be done too, and I’m looking forward to it as a way to help both my users and the whole internet community. You can find out more about HPN-SSH at http://www.psc.edu/networking/projects/hpn-ssh