High Performance Enabled SSH/SCP In Depth

Chris Rapier PSC, Michael Stevens CMU
email: This email address is being protected from spambots. You need JavaScript enabled to view it.

Abstract: SCP and the underlying SSH protocol is network performance limited by statically defined internal flow control buffers. These buffers often end up acting as a brake on the network throughput of SCP especially on long and wide paths. Modifying the ssh code to allow the flow control buffers to be defined at run time eliminates this bottleneck.


High Bandwidth and High Latency links are becoming more prevalent in corporate and academic institutions. Applications that use windowing thus need to ensure that the window size is at least equal to the Bandwidth Delay Product, or BDP, are to obtain maximum utilization of the link. The BDP is the product of the narrowest portion of the network path and the round trip delay time and represents the total data carrying capactity of the path. For TCP it is already possible to tune the tcp window size manually or use an autotuning mechanism, such as the Web100 linux kernel patch to ensure maximum throughput with TCP. However, when applications above the TCP layer implement windowing, the limitation on throughput then becomes the less of either TCP or the application. In OpenSSH the limitation appears in the static window sizes that appear in channels.h as defined values.


Modifying the static size to be a larger value would only serve to waste space in the event that it is larger than the underlying protocol’s window size. Asking the user to specify the size also presents the problem of requiring users to be knowledgable in network performance tuning. Adjusting the size of window to be large enough so that it is no longer the limitation on throughput, but not much larger than it needs to be in order to obtain the desired performance would be the ideal solution.

There were only two changes needed to adjust the SSH window based on the TCP window. One was to enable the buffer code to allocate larger sizes. This was done using a variable that replaced the constant that was the maximum size allowed by the buffer code, and a function to modify the variable’s default value to something larger. The second change was to get the TCP window size from getsockopt and adjust the window size to match, but only if the new size was larger than the old one. The returned value from getsockopt is also doubled because OpenSSH only sends a WINDOW_ADJUST message when the window is half full in order to save on the number of WINDOW_ADJUST messages sent with a cost of doubling the buffer size.


The following hosts were used in the performance tests. kirana was running a 2.6 linux kernel with the Web100 patch. tg-login was runing a 2.6 kernel without autotuning, but a tcp window size of 10,000,000 bytes. The link BDP of a 1Gbps with a 0.04 second delay is 40,000,000 bits or 5,000,000 bytes. The 300MB file was copied from /dev/shm on one machine to /dev/null on the other.


  • kirana.psc.edu
    • Dual PIII 1.0Ghz (Coppermine)
    • 1Gig RAM
    • GigaBit Ethernet
  • tg-login.ncsa.teragrid.org
    • Quad Itanium2 1.3Ghz
    • 8Gig Ram
    • GigaBit Ethernet

Traceroute log:

1bar-kirana-ge-0-2-0-0.psc.net( ms9.452 ms0.204 ms
2beast-bar-g4-0-1.psc.net( ms0.099 ms0.094 ms
3abilene-psc.abilene.ucaid.edu( ms9.792 ms9.805 ms
4nycmng-washng.abilene.ucaid.edu( ms14.036 ms14.138 ms
5chinng-nycmng.abilene.ucaid.edu( ms41.711 ms34.326 ms
6mren-chin-ge.abilene.ucaid.edu( ms34.466 ms34.417 ms
7sbr0-lsd6509.gw.ncsa.edu( ms36.949 ms36.920ms
8acb-2-vlan101.gw.ncsa.edu( ms36.957 ms36.943ms
9core-10-acb-2.gw.ncsa.edu( ms36.965 ms36.958 ms
10hg-core-core-10.gw.ncsa.edu( ms38.866 ms38.312 ms
11hg-1-hg-core.ncsa.teragrid.org( ms39.187 ms38.340 ms
12tg-login1.ncsa.teragrid.org( ms36.959 ms36.950 ms

Unmodified SCP Performance

Graph of Normal SCP
This email address is being protected from spambots. You need JavaScript enabled to view it.1.8MB/s

Modified SCP Performance

Graph of Modified SCP
This email address is being protected from spambots. You need JavaScript enabled to view it.12.2MB/s


The tests showed that throughput was increased dramaticly, and the limitation was no longer the TCP or SSH window size, but the ability of the host to encrypt at a rate fast enough to send out over the Gigabit Ethernet. This is clearly demonstrated by the vast performance difference between 3des-cbc, the slowest cipher, and arcfour, the fastest cipher.

Security implications

There are no implications that we know of with the following caveat: The use of the none cipher in the experimental hpn+none patch is experimental and you must use it at your own risk. Its use via the -z switch in scp will transfer your bulk data in the clear even though your authentication is encrypted. This should, natually, be seen as riskier than transfering data via an encryption cipher. Also, while we did our best to make sure that you can only use the none cipher to transfer bulk data via scp it may be possible to run an interactive session with the none cipher (see the note of 15 January 2005). We're investigating this but we think this situation to be unlikely. If you have issues with this use the approved non-experimental hpn patch.

This work made possible in part by grants from The National Science Foundation and Cisco Systems, Inc.