Chris Rapier, Senior Research Programmer at PSC

From the beginning, Chris and the team have worked to make HPN-SSH ubiquitous, easy to use, and setup-free for data transfers. These efforts are part of the team’s goal to continuously enhance the scientific research workflow.

SSH, the precursor program to HPN-SSH, stands for secure shell. This ubiquitous program is widely used by Mac OS, Linux, and Windows users for remote logins and data transfer. In the HPC world there are other dedicated apps used for data transfer, but those programs don’t always translate well for smaller labs, making SSH, and HPN-SSH (high performance networking secure shell), popular tools for the transfer of both small and large data sets. 

HPN-SSH AND DATA TRANSFERS

Chris Rapier, Senior Research Programmer at PSC, has headed up the HPN-SSH project since 2004. In his career at PSC, Chris’ focus has primarily been on the performance, monitoring, and diagnosis of data transport over high performance networks. His overarching goal has been to improve the way in which networks support the process of scientific discovery. Prior to this project, Chris has also worked on the Web10g, TestRig, and XSight projects at PSC. 

A question that is often asked by users of SSH is, if I have a high performance network, why is my connection so slow? To answer that question, we need to look at how data transfers work and the factors that can affect this transfer rate. In any file transfer, when you are sending data from computer A to computer B, using Transmission Control Protocol (TCP), the speed of transmission is largely dependent on the distance between computers. For computers sitting next to each other, the transfer of a single packet can be nearly instantaneous. By comparison, if the two computers are separated by an ocean or continent the same packet could take up to 150 milliseconds to reach its destination.

Another factor is the size of the receive buffer, known as rmem in Linux systems. The receive buffer is a type of memory on the receiving computer that temporarily stores data as they are being transferred. Protocols like TCP/IP (internet protocol) break data down into packets, initially sending one packet at a time, and verifying each one was received (using acknowledgement packets or ACKs) before sending the next packet. In order to send an increasing number of packets, a method known as “sliding windows” is used to control the flow of packets. In this method, the receiving system will tell the sender, through data in the ACKs, how much data it can receive at once. This amount corresponds to the available space in the receive buffer. If this buffer gets “full” of unprocessed data packets, then the receiver will tell the sender to pause. As this buffer is cleared the transmission can continue. Historically, most receive buffers started out with only 64 KB of memory. Later work, inspired by research conducted at PSC, has allowed this buffer to grow to almost any size required by the network path.

How big should this buffer be? The basic formula for calculating the size of the buffer is the bandwidth delay product (BDP) where bandwidth times round trip time (BW x RTT) gives the optimal size of the receive buffer. Using this formula, if your network bandwidth is 10 Gigabits  per second, and your round trip time is 10 milliseconds, your optimal buffer size is 12 MB. 

SSH is a multiplexed protocol in that there can be multiple independent data channels inside of one TCP session. Each channel is unaware of the flow control provided by TCP and each channel must have its own flow control implemented within the program itself. In OpenSSH, this is accomplished using a method similar to sliding windows with an independent receive buffer for each channel. However, OpenSSH is unaware of what’s happening with the TCP receive buffer. That’s due, in part, to what is known as the 7-layer OSI model. OSI is a conceptual model for standard communications between computers. 

The Seven Layers of the OSI Model

Typically, each layer in the OSI model is unaware of what’s happening in the underlying layer. In order to gain some performance improvements, Chris and his colleagues started to experiment with certain features of SSH. They noticed that even though the TCP receive buffers would grow to meet network needs, the SSH receive buffers remained static at just 64KB (later increased to 2MB). They realized that if they poked a hole through layer seven of the OSI model, SSH could become aware of the size of the TCP receive buffer and allow SSH to change its own buffer size to match. When using HPN-SSH, there is much less drop-off in throughput with increased round trip times due to this dynamic buffer growth. 

ENCRYPTION WITH HPN-SSH

It should be noted that often the data being transferred doesn’t necessarily need to be encrypted if the data is already public or not sensitive. So the HPN-SSH team devised a way to give users the option of sending this data in the clear. While the login process will still use cryptographically secure ways to authenticate the user, the data itself will be sent without any encryption. When data is sent in the clear, the throughput can be up to 9400 Megabits per second when using HPN-SSH. Transferring encrypted information, using a cipher such as ChaCha20 or AES-CTR, can take longer as there are additional CPU resources taken up by the encryption. So the team also focuses on continued improvements to the overall HPN-SSH process by making such ciphers more efficient. Areas of exploration include multithreading, hardware acceleration, and the use of optimized assembly language. Even though such improvements may only yield a 15 to 20% improvement, with large amounts of data that can mean completing a transfer in 8 to 10 hours versus taking most of a day. Chris and the team view this as a way of enhancing the scientific research workflow. 



The performance of HPN-SSH for various ciphers at different roundtrip times.

Other approaches to improving the functionality of HPN-SSH include inline network telemetry, automatically resuming failed SCP transfers, and deploying via data transfer nodes (DTNs). Inline network telemetry gathers TCP stack data from both sides of the connection, providing network engineers with better insights. Secure Copy Protocol (SCP) allows HPN-SSH to recover if the connection is dropped when transferring data by locating the cutoff point and picking the transfer back up from there. If the transfer is resumed on the side where you are sending your data (receive side), you will see better performance. Some institutions may not want or may not be able to install HPN-SSH on every system. In these cases, they can install it on their DTNs, dedicated systems deployed and configured specifically for transferring data over networks.

Chris’ grant and this work is ongoing through April. HPN-SSH was initially supported by a grant from Cisco Systems, but has also received support from the National Library of Medicine (part of NIH) and the NSF. Chris also maintained the project for years when it didn’t have any funding support. In all, Chris and the team have been working on this effort for 18 years. Besides Chris, the idea guy, Mike Stevens, Ben Bennet, Mike Desoto, Brian Learn, and Mitch Dorrell have contributed to the project, turning Chris’ theories into reality. 

All of their HPN-SSH work is covered by Open Source Licenses, available via GitHub and as binary packages for Linux distributions. The code has also been incorporated into GSI-SSH, a modified version of OpenSSH that adds support for GSI authentication and credential forwarding (delegation), allowing single sign-on remote login and file transfer. From the beginning, Chris and the team have worked to make HPN-SSH ubiquitous, easy to use and administer, and “just work” for data transfers of 10 GB and beyond. 

Recently, the team has been working more closely with OpenSSH and opening up new avenues focused on bulk data transfers. As to future development, Chris plans to talk to lots of people. It’s important to assure that any development efforts are useful to the entire community. Ease of use and implementation remain key areas of focus.

The performance of OpenSSH for various ciphers at different roundtrip times. 

PLUGGING A SECURITY HOLE

Chris is also credited with finding a security flaw in a related program: OpenSSL (Secure Sockets Layer). Finding the issue was really an accident. It was discovered in the widely-used OpenSSL cryptographic library. The AES-CTR cipher used in HPN-SSH is a multi-threaded cipher using OpenSSL routines. The code of this cipher permitted substitution of certain standard routines with custom routines using API calls. This method worked great when using OSSL version 1.1. When OSSL 3 was released, this ability to substitute routines was deprecated. While this technique could still work, it actually contributed to the security issue.

As it turns out, there is a very specific value, the NID value, that needed to be set or OSSL would pass everything in the clear. At the time, it wasn’t obvious that the data was being sent unencrypted. This was only discovered when testing the program against the older version (1.1). Once this problem was discovered, it was reported to OSSL who filed a CVE, a list of publicly disclosed computer security flaws, and patched the code. HPN-SSH now works natively with OSSL 3. There is really no way to accurately estimate how many applications would have gotten caught up in this flaw. Afterall, it wouldn’t affect people (or applications) that use OSSL conventionally. Additionally, around the same time, this NID-related issue was actually overshadowed by an even bigger flaw in OSSL.

Chris and the team will continue to keep their focus on making things easier for the user —  no configuration needed, a program that just works. One thing is certain: As computers get faster and bandwidth increases, this becomes a scaling problem and gets more and more difficult.

View more information on this topic on our HPN-SSH page.