High Performance SSH/SCP - HPN-SSH
(PI) Chris Rapier PSC, Michael Stevens CMU, Benjamin Bennett PSC, Mike Tasota PSC/CMU
On this page:
Notes and News
The 6.3 patches are now available - which is probably the fastest I've put out a new round of patches in ages. The patches are pretty much straight forward ports except for some minor changes in the cipher subsystem. cipher.c modified the array of known ciphers to make it a const struct. This broke the method of redefining the AES cipher post authentication to use the threaded cipher. This was resolved by removing the const in the cipher array and in the cipher_by_name function. There shouldn't be any impact on performance or security but if I'm wrong please let me know.
I should also point out that the patches are now available on sourceforge at http://hpnssh.sourceforge.net/ This location may prove to be easier for people who don't want to deal with the remository interface I've been required to use.
If you care about HPN-SSH there is no better way to show your suport than making a donation to the Pittsburgh Supercomputing Center. I do not personally receive any money from these donations but your support ends up supporting our work. When you donate please make a note in the comments field that this is for HPN-SSH. Any amount is worth while - even a dollar will show PSC and CMU your support for our work.
We've been lucky enough to have volunteers translate this page into Belorussian, German, and Hindi. You can find these translations here:
- HPN-SSH in Belorussian provided by fatcow
- HPN-SSH in German provided by Andreas Beraz.
- HPN-SSH in Hindi provided by Ashish Jha.
- HPN-SSH in Russian provided by portablecomponentsforall
SCP and the underlying SSH2 protocol implementation in OpenSSH is network performance limited by statically defined internal flow control buffers. These buffers often end up acting as a bottleneck for network throughput of SCP, especially on long and high bandwith network links. Modifying the ssh code to allow the buffers to be defined at run time eliminates this bottleneck. We have created a patch that will remove the bottlenecks in OpenSSH and is fully interoperable with other servers and clients. In addition HPN clients will be able to download faster from non HPN servers, and HPN servers will be able to receive uploads faster from non HPN clients. However, the host receiving the data must have a properly tuned TCP/IP stack. Please refer to this tuning page for more information.
The amount of improvement any specific user will see is dependent on a number of issues. Transfer rates cannot exceed the capacity of the network nor the throughput of the I/O subsystem including the disk and memory speed. The improvement will also be highly influenced by the capacity of the processor to perform the encryption and decryption. Less computational expensive ciphers will often provide better throughput than more complex ciphers.
With many high bandwidth connections, there is a performance gap between what SSH is capable of and what the network link has the capacity to do. The difference between these two numbers is the performance gap, or the underutilized portion of your network connection. This gap, in most situations, is the direct cause of undersized receive buffers in the SSH congestion control mechanism. The graph below shows the optimal receive buffer versus the effective SSH channel receive buffer for various round trip times along a 100Mbps path
The difference between the red and blue line is, essentially, wasted throughput potential along the path.
Normal vs. HPN SCP Performance
The effect of raising the SSH buffer sizes can be seen in the following chart. The standard SSH throughput, represented by the red columns, closely matches the expected throughput for this path if the receive buffer was limited to 64KB. By increasing the size of the SSH channel receive buffers throughput, represented by the blue columns, improved by as much as 1000%. The variation now seen is due to the complexity of the cipher and the limits of the hard drive.
Clearly, the HPN patches significantly boost throughput performance. This enhancement is entirely from tuning the SSH buffer sizes.
All patches should be applied to the OpenSSH source files using the 'patch' utility from the command line. Building SSH from source is actually quite easy and the recommended method. Some binary packages will be made available as a convenience but will not be officially supported.
Solaris Users: Some versions of Solaris use an older version of the patch and diff commands which are incompatible with this patch. Please make sure you are using a recent version of gnu patch.
This is the 1st revision of the 14th major version of the HPN patch set. The HPN13 patches will remain available on this page for the time being. The HPN12 patch set remains available here. There are two fundamental differences between the HPN13 and HPN14 patch set. The most significant of these is the inclusion of fully functional Multi-Threaded AES CTR (MT-AES-CTR) mode cipher. A paper and presentation about this work are available. The previous version of MT-AES-CTR failed when the process forked to the background or, starting in OpenSSH 6.1, when the rlimit sandbox was used. In the former case the threads lost their context from the parent during the fork. In the later, the rlimit prveented the creation of new threads/processes by setting NPROCS to 0. This issue was resolved by using the single process AES CTR cipher during the pre-authentication phase. After authentication takes place the pointer to the AES CTR cipher was replaced with a pointer to the MT-AES-CTR cipher. The application then forces a rekeying to take place which starts up the threads. No real change was made to the cipher itself - just how it was being called in SSH.
This cipher mode introduces multi-threading into the OpenSSH application in order to allow it to make full use of CPU resources available on multi-core systems. As the canonical distribution of OpenSSH is unable to make use of more than one core, high performance transfers can be bottlenecked by the cryptographic overhead. HPN12 dealt with this by the introduction of None Cipher Switching. However, this technique is limited to those users who are willing to allow their data to be transferred without encipherment. It also was, by design, limited to bulk data transfers which further restricts its value to some users. The MT-AES-CTR mode will allow users, on multicore platforms, to attain throughput rates comparable or equal to unencrypted data transfers. In both lab and real world tests throughput at full GigE line rates, with full encryption, were commonly seen.
Obviously, the MT AES-CTR mode cipher breaks through the single core bottleneck.
MT-AES-CTR produces a cipherstream that is indistinguishable from the distributed Single Thread AES-CTR (ST-AES-CTR) mode cipher and is fully compatible with all other AES-CTR mode implementations. In other words, its completely backward compatible and will function in heterogenous connections with no problem. However, it is important to note MT-AES-CTR does impose additional overhead and may impose a performance penalty on single core machines. Additionally, the MT-AES-CTR mode cipher replaces the default ST-AES-CTR mode cipher post authentication.
The second major difference between HPN13 and HPN14 is that the NONE cipher switching routines have been split off into their own patch. There are some circumstances in which users may have need of the NONE cipher without the additional overhead of the dynamic windowing (packet radio under an amatuer license for example). It also helps keep the patches a little cleaner. Please note, it's not always trivial to layer the patches on top of each other. If you don't have much experience delaing with reject files produced by diff I suggest making use of either the kitchen sink patch or the dynwindow-noneswitch patch sets.
HPN-14 Kitchen Sink
|OpenSSH 6.2p2||OpenSSH-6.2p2-hpn14 v1|
|OpenSSH 6.3p1||OpenSSH-6.3p1-hpn14 v2|
HPN-14 A la Carte
|Dynamic Window & None Switch||This is the most commonly impemented patchset. It provides dynamic window in SSH and the ability to switch to a NONE cipher post authentication. This patch is gziped.||openssh6.2-dynwindow_noneswitch.diff.gz
|Dynamic Window||This patch is the basis for HPN-SSH. It allows users to make optimal use of long/fat network paths. Without this patch it's not really HPN!||openssh6.2-dynwindows.diff.gz
|None Switch||This patch *only* provides the ability to switch to the NONE cipher after authentication.||openssh6.2-none_switch.diff
|Multithreaded AES-CTR Cipher||This patch adds threading to the CTR block mode for AES and other supported ciphers. This allows SSH to make use of multiple cores/cpus during transfers and significantly increase throughput.||openssh6.2-CTR-threading.diff
|Peak Throughput||This patch displays the current (1 second average) throughput for SCP transfers.||openssh6.2-peaktput.diff
|Server Logging||This patch adds additional logging to the SSHD server including encryption used, remote address and port, user name, remote version information, total bytes transferred, and average throughput. In order to use this patch you *must* direct syslogd to use an additional logging socket. This socket will be located in the sshd chroot, typically /var/empty. As such you will need to create a /var/empty/dev directory and add '-a /var/empty/dev/log' to your syslogd configuration. Example output can be seen here||openssh6.2-server-logging.diff
HPN-13 Kitchen Sink
Note: This patch has been gziped. You must gunzip it before applying it.
|OpenSSH Version||HPN-SSH Patch|
|OpenSSH 4.7p1||OpenSSH-4.7p1-hpn13 v3|
|OpenSSH 5.0p1||OpenSSH-5.0p1-hpn13 v4|
|OpenSSH 5.1p1||OpenSSH-5.1p1-hpn13 v5|
|OpenSSH 5.2p1||OpenSSH-5.2p1-hpn13 v6|
|OpenSSH 5.3p1||OpenSSH-5.3p1-hpn13 v7|
|OpenSSH 5.4p1||OpenSSH-5.4p1-hpn13 v8|
|OpenSSH 5.5p1||OpenSSH-5.5p1-hpn13 v9|
|OpenSSH 5.6p1||OpenSSH-5.6p1-hpn13 v10|
|OpenSSH 5.8p1||OpenSSH-5.8p1-hpn13 v11|
|OpenSSH 5.9p1||OpenSSH-5.9p1-hpn13 v12|
|OpenSSH 6.0p1||OpenSSH-6.0p1-hpn13 v13|
|OpenSSH 6.1p1||OpenSSH-6.1p1-hpn13 v14|
HPN-13 A la Carte
These are the a la carte patches and some of the version numbers may skew from time to time. For example, if the peak throughput patch doesn't need to be updated for various OpenSSH releases the patch number won't be updated. Not all of the patches are available just yet, as the NONE cipher switching still needs to be broken out from the HPN12 patch set.
|Dynamic Windows and None Cipher||This is a basis of the HPN-SSH patch set. It provides dynamic window in SSH and the ability to switch to a NONE cipher post authentication. Based on the HPN12 v20 patch. This patch is gziped.||
|Threaded CTR cipher mode||This patch adds threading to the CTR block mode for AES and other supported ciphers. This may allow SSH to make use of multiple cores/cpus during transfers and significantly increase throughput. This patch should be considered experimental at this time.||
|Peak Throughput||This patch modifes the progress bar to display the 1 second throughput average. On completion of the transfer it will display the peak throughput through the life of the connection.||
|Server Logging||This patch adds additional logging to the SSHD server including encryption used, remote address and port, user name, remote version information, total bytes transferred, and average throughput. In order to use this patch you *must* direct syslogd to use an additional logging socket. This socket will be located in the sshd chroot, typically /var/empty. As such you will need to create a /var/empty/dev directory and add '-a /var/empty/dev/log' to your syslogd configuration. Example output can be seen here For OpenSSH 4.7p1.||
How to apply the patches:
- Get the OpenSSH source code from OpenSSH.org.
- Untar OpenSSH source.
- cd into the OpenSSH source directory
- If gzipped type 'zcat pathtopatch/patchfile | patch -p1'
Otherwise 'patch -p1 < pathtopatch/patchfile'
- type 'configure && make'
- type 'make install'
Problems with buffer_append_space in HPN-SSH. If you are experiencing disconnects due to a failure in buffer_append_space please let us know. We're currently tracking some problems with this and we're trying to gather more information to help resolve it. You may want to try using -oHPNBufferSize=16384 to restrict the growth of the buffer. Let us know if that helps.
|This work was made possible in part by grants from Cisco Systems, Inc., The National Science Foundation, and The National Library of Medicine|