Pittsburgh Supercomputing Center 

Advancing the state-of-the-art in high-performance computing,
communications and data analytics.

Enabling High Performance Data Transfers

Specific notes for Historical Operating Systems

The information on this page covers three groups of systems: obsolete operating systems that are no longer being supported by the original vendor, and systems for which our information is too far out of date and we have no access to current systems for testing. For pedantic consistency reasons it also includes current systems that appear on the main performance tuning page. We will continue to keep archival information on historical systems as long as it appears to be useful to the community.

For operating system which are still being supported but do not appear on the main performance tuning page, please accept our apologies and drop us a note. We will be happy to update the main tuning page with fresh information.

For all other information (The TCP tuning tutorial, etc) please see the main performance tuning page.

 

 

TCP Features Support by Various Operating Systems

Operating System (Alphabetical) (Click for additional info)RFC1191 Path MTU DiscoveryRFC1323 SupportDefault maximum socket buffer sizeDefault TCP socket buffer sizeDefault UDP socket buffer sizeApplications (if any) which areuser tunableRFC2018 SACK Support 
More info
BSD/OS 2.0 No Yes 256kB 8kB 9216 snd 41600 rcv None Hari Balakrishnan's BSD/OS 2.1 implementation
BSD/OS 3.0 Yes Yes 256kB 8kB 9216 snd 41600 rcv None
CRI Unicos 8.0 Yes Yes       FTP
(Compaq) Digital Unix 3.2   Yes Winscale, No Timestamps 128kB 32kB   None
(Compaq) Digital Unix 4.0 Yes Yes Winscale, No Timestamps 128kB 32kB 9216 snd 41600 rcv None PSC Research version
FreeBSD Yes Yes 256kB 32kB 40kB None Yes  
HPUX 9.X No 9.05 and 9.07 provide patches for RFC1323 1 MB (?) 8kB 9216 FTP (with patches)  
HPUX 10.{00,01,10,20,30} Yes Yes 256kB 32kB 9216 FTP  
HPUX 11 Yes Yes >31MB? 32kB 65535 FTP  
IBM AIX 3.2 & 4.1 No Yes 64kB 16kB 41600 Bytes recieve/9216 Bytes send None
IBM MVS TCP stack by Interlink, v2.0 or greater No Yes 1MB      
Linux 2.4 and 2.6 Yes Yes 64kB 32kB (seenotes 32kB(?) None Yes
Mac OS X Yes Yes 256kB 32kB 42kB (receive) ftp (for a terminal shell) Yes!
as of 10.4.6
NetBSD 1.1/1.2 No Yes 256kB 16kB   None PSC Research version
FTP Software (NetManage) OnNet Kernel 4.0 for Win95/98 Yes Yes 963.75 MB 8K [146K for Satellite tuning] 8K send 48K recv FTP server Yes
Novell Netware5 Yes No 64kB 31kB   None  
SGI IRIX 6.5 Yes Yes Unlimitted 60kB 60kB None Yes, as of 6.5.7. It is on by default.
Sun Solaris 10 Yes Yes 1MB TCP, 256kB UDP 48kB 8kB Unknown Yes
Microsoft Windows NT 3.5/4.0 Yes No 64kB max(~8kB, min(4*MSS, 64kB))     No
Microsoft Windows NT 5.0 Beta   Yes         Yes
Microsoft Win98   Yes 1GB(?!) 8kB     Yes (on by default)
Microsoft Windows 2000   Yes 1GB(?!) 8kB     Yes (on by default)
Operating System (Alphabetical) (Click for additional info)Path MTU DiscoveryRFC1323 SupportDefault maximum socket buffer sizeDefault TCP socket buffer sizeDefault UDP socket buffer sizeApplications (if any) which areuser tunableSACK Support

Detailed procedures for system tuning under various operating systems

Procedure for raising network limits under BSD/OS 2.1 and 3.0 (BSDi)

MTU discovery is now supported in BSD/OS 3.0. RFC1323 is also supported, and the procedure for setting the relevant kernel variable uses the "sysctl" interface described for FreeBSD. See sysctl(1) andsysctl(3) for more information.


Procedure for raising network limits under CRI systems under Unicos 8.0

System configuration parameters are tunable via the command "/etc/netvar". Running "/etc/netvar" with no arguments shows all configurable variables:

% /etc/netvar
Network configuration variables
        tcp send space is 32678
        tcp recv space is 32678
        tcp time to live is 60
        tcp keepalive delay is 14400
        udp send space is 65536
        udp recv space is 68096
        udp time to live is 60
        ipforwarding is on
        ipsendredirects is on
        subnetsarelocal is on
        dynamic MTU discovery is on
        adminstrator mtu override is on
        maximum number of allocated sockets is 3750
        maximum socket buffer space is 409600
        operator message delay interval is 5
        per-session sockbuf space limit is 0

The following variables can be set:

  • dynamic MTU discovery: This is "off" by default and should be changed to "on".
  • maximum socket buffer space: This should be set to the desired maximum socket buffer size (in bytes).
  • tcp send space, tcp recv space: These are the default buffer sizes used by applications. These should be changed with caution.

Once variables have been changed in by /etc/netvar, they take effect immediately for new processes. Processes which are already running with open sockets are not modified.

 

o


Procedure for raising network limits on (Compaq) DEC Alpha systems under Digital Unix 3.2c

 

  • By default, the maximum allowable socket buffer size on this operating system is 128kB.
  • In order to raise this maximum, you must increase the kernel variable sb_max. In order to do this, run the following commands as root:
    # dbx -k /vmunix
    (dbx) assign sb_max = (u_long) 524288
    (dbx) patch sb_max = (u_long) 524288
    

    In this example, sb_max is increased to 512kB. The first command changes the variable for the running system, and the second command patches the kernel so it will continue to use the new value, even after rebooting the system. Note, however, that reinstalling (overwriting) the kernel will undo this change.

  • The Digital Unix manuals also recommend increasing mbclusters to at least 832.
  • Standard applications do not have a mechanism for setting the socket buffer size to anything but the default. However, you can change the kernel default by modifying the kernel variables (tcp_sendspace, tcp_recvspace)

 

 


Procedure for raising network limits on (Compaq) DEC Alpha systems under Digital Unix 4.0

 

  • Under version 4.0 of Digital Unix, many variables can now be tuned with the sysconfigcommand. Some (but not all!) of the relevant variables from sysconfig are shown here:
    	    % /sbin/sysconfig -q inet
    	    inet:
    	    tcp_sendspace = 32768
    	    tcp_recvspace = 32768
    	    tcp_keepidle = 14400
    	    tcp_keepintvl = 150
    	    tcp_keepinit = 150
    	    tcp_keepcnt = 8
    	    tcp_ttl = 60
    	    tcp_mssdflt = 536
    	    tcp_rttdflt = 3
    	    tcp_dont_winscale = 0
    	    tcpnodelack = 0
    	    tcptwreorder = 1
    	    udp_sendspace = 9216
    	    udp_recvspace = 41600
    	    udpcksum = 1
    	    udp_ttl = 30
    	    pmtu_enabled = 1
    	    pmtu_rt_check_intvl = 20
    	    pmtu_decrease_intvl = 1200
    	    pmtu_increase_intvl = 240
    	    ...
    
    	    % /sbin/sysconfig -q socket
    	    socket:
    	    sominconn = 0
    	    somaxconn = 1024
    	    sb_max = 131072
    
    
    To make a change (for example):
                # /sbin/sysconfig -r inet tcp_sendspace 65536
                # /sbin/sysconfig -r inet tcp_recvspace 65536
    
  • Specific advice for tuning (Compaq) Digital UNIX systems (for both V4.0 releases and many of the V3.2x releases) may be found at http://www.unix.digital.com/internet/tuning.htm

    This document contains information on other important parameters (not just the ones directly associated with the socket, IP, and TCP layers) and gives instructions on how to modify things. It also includes important patch information, and is updated every few months.

 

 


Procedure for raising network limits under FreeBSD

All system parameters can be read or set with 'sysctl'. E.g.:

sysctl [parameter]
sysctl -w [parameter]=[value]

You can raise the maximum socket buffer size by, for example:

	sysctl -w kern.ipc.maxsockbuf=4000000

FreeBSD 7.0 implements automatic receive and send buffer tuning which are enabled by default. The default maximum value is 256KB which is likely too small. These should likely be increased, e.g. with follows:

    net.inet.tcp.sendbuf_max=16777216
    net.inet.tcp.recvbuf_max=16777216

You can also set the TCP and UDP default buffer sizes using the variables

	net.inet.tcp.sendspace
	net.inet.tcp.recvspace
	net.inet.udp.recvspace

When using larger socket buffers, you probably need to make sure that the TCP window scaling option is enabled. (The default is not enabled!) Check 'tcp_extensions="YES"' in /etc/rc.conf and ensure it's enabled via the sysctl variable:

        net.inet.tcp.rfc1323

FreeBSD's TCP has a thing called "inflight limiting" turned on by default, which can be detrimental to TCP throughput in some situations. If you want "normal" TCP behavior you should

         sysctl -w net.inet.tcp.inflight_enable=0

You may also want to confirm that SACK is enabled: (working since FreeBSD 5.3):

        net.inet.tcp.sack.enable

MTU discovery is on by default in FreeBSD. If you wish to disable MTU discovery, you can toggle it with the sysctl variable:

        net.inet.tcp.path_mtu_discovery

Contributors: Pekka Savola and David Malone.
Checked for FreeBSD 7.0, Sept 2008


Procedure for raising network limits under HPUX 9.X

HP-UX 9.X does not support Path MTU discovery.

There are patches for 9.05 and 9.07 that provide 1323 support. To enable it, one must poke the kernel variables tcp_dont_tsecho and tcp_dont_winscale to 0 with adb (the patch includes a script, but I don't recall the patch number).

Without the 9.05/9.07 patch, the maximum socket buffer buffer size is somewhere around 58254 bytes. With the patch it is somewhere around 1MB (there is a small chance it is as much as 4MB).

The FTP provided with the up to date patches should offer an option to change the socket buffer size. The default socket buffer size for this could be 32KB or 56KB.

There is no support for SACK in 9.X.

Procedure for raising network limits under HPUX 10.X

HP-UX 10.00, 10.01, 10.10, 10.20, and 10.30 supports Path MTU discovery. It is on by default for TCP, and off by default for UDP. On/Off can be toggled with nettune.

Up through 10.20, RFC 1323 support is like the 9.05 patch, except the maximum socket buffer size is somewhere between 240 and 256KB. In other words, you need to do the same adb "pokes" as described above.

10.30 does not require adb "pokes" to enable RFC1323. 10.30 also replaces nettunewith ndd. The 10.X default TCP socket buffer size is 32768, the default UDP remains unchanged from 9.X. Both can be tweaked with nettune.

FTP should be as it is in patched 9.X.

There is no support for SACK in 10.X up through 10.20.

Procedure for raising network limits under HPUX 11

HP-UX 11supports PMTU discovery and enables it by default. This is controlled through the ndd setting ip_pmtu_strategy.

Note: Addition (extensive) information is available atftp://ftp.cup.hp.com/dist/networking/briefs/annotated_ndd.txt

RFC 1323 support is enabled automagically in HP-UX 11. If an application requests a window/socket buffer size greater than 64 KB, window scaling and timestamps will be used automatically.

The default TCP window size in HP-UX 11 remains 32768 bytes and can be altered though ndd and the settings:

    tcp_recv_hiwater_def
    tcp_recv_hiwater_lfp
    tcp_recv_hiwater_lnp
    tcp_xmit_hiwater_def
    tcp_xmit_hiwater_lfp
    tcp_xmit_hiwater_lnp

FTP in HP-UX 11 uses the new sendfile() system call. This allows data to be sent directly from the filesystem buffer cache through the network without intervening data copies.

HP-UX 11 (patches) and 11i (patches or base depending on the revision) have commercial support for SACK (based on feedback from HP - Thanks!)

Here is some ndd -h parm output for a few of the settings mentioned above. For those not mentioned, use ndd -h on an HP-UX 11 system, or consult the online manuals at http://docs.hp.com/

# ndd -h ip_pmtu_strategy

ip_pmtu_strategy:

    Set the Path MTU Discovery strategy: 0 disables Path MTU
    Discovery; 1 enables Strategy 1; 2 enables Strategy 2.

    Because of problems encountered with some firewalls, hosts,
    and low-end routers, IP provides for selection of either
    of two discovery strategies, or for completely disabling the
    algorithm. The tunable parameter ip_pmtu_strategy controls
    the selection.

    Strategy 1: All outbound datagrams have the "Don't Fragment"
    bit set. This should result in notification from any intervening
    gateway that needs to forward a datagram down a path that would
    require additional fragmentation. When the ICMP "Fragmentation
    Needed" message is received, IP updates its MTU for the remote
    host. If the responding gateway implements the recommendations
    for gateways in RFCM- 1191, then the next hop MTU will be included
    in the "Fragmentation Needed" message, and IP will use it.
    If the gateway does not provide next hop information, then IP
    will reduce the MTU to the next lower value taken from a table
    of "popular" media MTUs.

    Strategy 2: When a new routing table entry is created for a
    destination on a locally connected subnet, the "Don't Fragment"
    bit is never turned on. When a new routing table entry for a
    non-local destination is created, the "Don't Fragment" bit is
    not immediately turned on. Instead,

    o  An ICMP "Echo Request" of full MTU size is generated and
       sent out with the "Don't Fragment" bit on.

    o  The datagram that initiated creation of the routing table
       entry is sent out immediately, without the "Don't Fragment"
       bit. Traffic is not held up waiting for a response to the
       "Echo Request".

    o  If no response to the "Echo Request" is received, the
       "Don't Fragment" bit is never turned on for that route;
       IP won't time-out or retry the ping. If an ICMP "Fragmentation
       Needed" message is received in response to the "Echo Request",
       the Path MTU is reduced accordingly, and a new "Echo Request"
       is sent out using the updated Path MTU. This step repeats as
       needed.

    o  If a response to the "Echo Request" is received, the
       "Don't Fragment" bit is turned on for all further packets
       for the destination, and Path MTU discovery proceeds as for
       Strategy 1.

    Assuming that all routers properly implement Path MTU Discovery,
    Strategy 1 is generally better - there is no extra overhead for the
    ICMP "Echo Request" and response. Strategy 2 is available
    only because some routers, or firewalls, or end hosts have been
    observed simply to drop packets that have the DF bit on without
    issuing the "Fragmentation Needed" message. Strategy 2 is more
    conservative in that IP will never fail to communicate when using
    it. [0,2] Default: Strategy 2

# ndd -h tcp_recv_hiwater_def | more

tcp_recv_hiwater_def:

    The maximum size for the receive window. [4096,-]
    Default: 32768 bytes

# ndd -h tcp_xmit_hiwater_def

tcp_xmit_hiwater_def:

    The amount of unsent data that triggers write-side flow control.
    [4096,-] Default: 32768 bytes

HP has detailed networking performance information online, including information about the "netperf" tool and a large database of system performance results obtained with netperf:

http://www.netperf.org/netperf/NetperfPage.html


Procedure for raising network limits on IBM RS/6000 systems under AIX 3.2 or AIX 4.1

RFC1323 options and defaults are tunable via the "no" command.

See the "no" man page for options; additional information is available in the IBM manual AIX Versions 3.2 and 4.1 Performance Tuning Guide, which is available on AIX machines through the InfoExplorer hypertext interface.


Procedure for raising network limits on IBM MVS systems under the Interlink TCP stack

The default send and receive buffer sizes are specified at startup, through a configuration file. The range is from 4K to 1MByte. The syntax is as follows:

  • TCP SCALE(4) - specifies to support window scaling of 4 bits. Range is 0 (suppress both window scaling and timestamps) to 14 bits.

    If SCALE is not zero, and the user bufferspace is > 65535, negotiating window scaling and timestamps will be attempted.

    If SCALE is not zero, and the remote user negotiates window scaling or timestamps, we will accept those requests.

  • FTP IBUF(4 20480) - would specify a receive bufferspace of 81920 bytes, and thus eligible for window scaling and timestamps.

FTP and user programs can be configured to use Window Scaling and Timestamps. This is done through the use of SITE commands:

  • QUOTE SITE IBUF(num size) - specifies the input bufferspace for file transfers. When the product is larger than 65535, negotiating window scaling and timestamps will be attempted (if SCALE is not zero).

 


Tuning TCP for Linux 2.4 and 2.6

NB: Recent versions of Linux (version 2.6.17 and later) have full autotuning with 4 MB maximum buffer sizes. Except in some rare cases, manual tuning is unlikely to substantially improve the performance of these kernels over most network paths, and is not generally recommended

Since autotuning and large default buffer sizes were released progressively over a succession of different kernel versions, it is best to inspect and only adjust the tuning as needed. When you upgrade kernels, you may want to consider removing any local tuning.

All system parameters can be read or set by accessing special files in the /proc file system. E.g.:

	cat /proc/sys/net/ipv4/tcp_moderate_rcvbuf

If the parameter tcp_moderate_rcvbuf is present and has value 1 then autotuning is in effect. With autotuning, the receiver buffer size (and TCP window size) is dynamically updated (autotuned) for each connection. (Sender side autotuning has been present and unconditionally enabled for many years now).

The per connection memory space defaults are set with two 3 element arrays:

	/proc/sys/net/ipv4/tcp_rmem       - memory reserved for TCP rcv buffers
	/proc/sys/net/ipv4/tcp_wmem       - memory reserved for TCP snd buffers

These are arrays of three values: minimum, initial and maximum buffer size. They are used to set the bounds on autotuning and balance memory usage while under memory stress. Note that these are controls on the actual memory usage (not just TCP window size) and include memory used by the socket data structures as well as memory wasted by short packets in large buffers. The maximum values have to be larger than the BDP of the path by some suitable overhead.

With autotuning, the middle value just determines the initial buffer size. It is best to set it to some optimal value for typical small flows. With autotuning, excessively large initial buffer waste memory and can even hurt performance.

If autotuning is not present (Linux 2.4 before 2.4.27 or Linux 2.6 before 2.6.7), you may want to get a newer kernel. Alternately, you can adjust the default socket buffer size for all TCP connections by setting the middle tcp_rmem value to the calculated BDP. This is NOT recommended for kernels with autotuning. Since the sending side is autotuned, this is never recommended for tcp_wmem.

The maximum buffer size that applications can request (the maximum acceptable values for SO_SNDBUF and SO_RCVBUF arguments to the setsockopt() system call) can be limited with /proc variables:

	/proc/sys/net/core/rmem_max       - maximum receive window
	/proc/sys/net/core/wmem_max       - maximum send window

The kernel sets the actual memory limit to twice the requested value (effectively doubling rmem_max and wmem_max) to provide for sufficient memory overhead. You do not need to adjust these unless your are planing to use some form of application tuning.

NB: Manually adjusting socket buffer sizes with setsockopt() disables autotuning. Application that are optimized for other operating systems may implicitly defeat Linux autotuning.

The following values (which are the defaults for 2.6.17 with more than 1 GByte of memory) would be reasonable for all paths with a 4MB BDP or smaller (you must be root):

	echo 1 > /proc/sys/net/ipv4/tcp_moderate_rcvbuf
       	echo 108544 > /proc/sys/net/core/wmem_max 
       	echo 108544 > /proc/sys/net/core/rmem_max 
       	echo "4096 87380 4194304" > /proc/sys/net/ipv4/tcp_rmem
       	echo "4096 16384 4194304" > /proc/sys/net/ipv4/tcp_wmem

Do not adjust tcp_mem unless you know exactly what you are doing. This array (in units of pages) determines how the system balances the total network buffer space against all other LOWMEM memory usage. The three elements are initialized at boot time to appropriate fractions of the available system memory.

You do not need to adjust rmem_default or wmem_default (at least not for TCP tuning). These are the default buffer sizes for non-TCP sockets (e.g. unix domain and UDP sockets).

All standard advanced TCP features are on by default. You can check them by:

	cat /proc/sys/net/ipv4/tcp_timestamps
	cat /proc/sys/net/ipv4/tcp_window_scaling
	cat /proc/sys/net/ipv4/tcp_sack 

Linux supports both /proc and sysctl (using alternate forms of the variable names - e.g. net.core.rmem_max) for inspecting and adjusting network tuning parameters. The following is a useful shortcut for inspecting all tcp parameters:

sysctl -a | fgrep tcp

For additional information on kernel variables, look at the documentation included with your kernel source, typically in some location such as /usr/src/linux-<version>/Documentation/networking/ip-sysctl.txt. There is a very good (but slightly out of date) tutorial on network sysctl's at http://ipsysctl-tutorial.frozentux.net/ipsysctl-tutorial.html.

If you would like to have these changes to be preserved across reboots, you can add the tuning commands to your the file /etc/rc.d/rc.local .

Autotuning was prototyped under the Web100 project. Web100 also provides complete TCP instrumentation and some additional features to improve performance on paths with very large BDP.

Contributors: John Heffner and Matt Mathis

Checked for Linux 2.6.18, 12/5/2006

Tuning TCP for Mac OS X

Mac OS X has a single sysctl parameter, kern.ipc.maxsockbuf, to set the maximum combined buffer size for both sides of a TCP (or other) socket. In general, it can be set to at least twice the BDP. E.g:

 sysctl -w kern.ipc.maxsockbuf=8000000 

The default send and receive buffer sizes can be set using the following sysctl variables:

 sysctl -w net.inet.tcp.sendspace=4000000 sysctl -w net.inet.tcp.recvspace=4000000 

If you would like these changes to be preserved across reboots you can edit /etc/sysctl.conf.

RFC1323 features are supported and on by default. SACK is present and enabled by defult in OS X version 10.4.6.

Although we have never tested it, there is a commercial product to tune TCP on Macintoshes. The URL is http://www.sustworks.com/products/prod_ottuner.html. I don't endorse the product they are selling (since I've never tried it). However, it is available for a free trial, and they appear to do an excellent job of describing perf-tune issues for Macs.

Tested for 10.3, MBM 5/15/05


Procedure for raising network limits under NetBSD

RFC1323 is on by default in NetBSD 1.1 and above. Under NetBSD 1.2, it can be verified to be on by typing:

      sysctl net.inet.tcp.rfc1323

The maximum socket buffer size can be modified by changing SB_MAX in/usr/src/sys/sys/socketvar.h.

The default socket buffer sizes can be modified by changing TCP_SENDSPACE and TCP_RECVSPACE in /usr/src/sys/netinet/tcp_usrreq.c.

It may also be necessary to increase the number of mbufs, NMBCLUSTERS in/usr/src/sys/arch/*/include/param.h.

Update: It is also possible to set these parameters in the kernel configuration file.

options		SB_MAX=1048576		# maximum socket buffer size
options		TCP_SENDSPACE=65536	# default send socket buffer size
options		TCP_RECVSPACE=65536	# default recv socket buffer size
options		NMBCLUSTERS=1024	# maximum number of mbuf clusters

Procedure for raising network limits under FTP Software (NetManage) OnNet 4.0 for Win95/98

OnNet Kernel has a check box "Enable Satellite tuning" which was intended and tested for 2Mb Satellite link with 600ms delay. This sets tcp window to 146K.

Many default settings, all of the above and more, may be overriden with registry entries. We plan to make available tuning guidelines at "some future time". Also default TCP window may be set with Statistics app which is installed with OnNet Kernel.

The product "readme" discusses changing TCP window size and Initial slow start threshold with the Windows registry.

Statistics also has interesting graphs of TCP/UDP/IP/ICMP traffic. Also IPtrace app is shipped with OnNet Kernel to view unicast / multicast / broadcast traffic (no unicast traffic for other hosts - it does not run in promiscuous mode).


Procedure for raising network limits under SGI systems under IRIX 6.5

Under this version, there are two locations where configuration is done. Although I list the BSD information first, SGI recommends using systune which is described below.

The BSD values are now stored in /var/sysgen/mtune/bsd.

For instance from the file:

* name                  default         minimum   maximum
*
* TCP window sizes/socket space reservation; limited to 1Gbyte by RFC
1323
*
tcp_sendspace                   61440   2048    1073741824
tcp_recvspace                   61440   2048    1073741824

These variables are used similarly to earlier IRIX 5 and 6 versions.

There is now a systune command. This command allows you to configure other networking variables. systune keeps strack of the chances you make in a file calledstune so that you can see them all in one place. Also note that changes made usingsystune are permanent. Here is a sample of things which can be tuned usingsystune:

/usr/sbin/systune (which is like sysctl for BSD) is what you use for
tuneable values.

 group: net_stp (statically changeable)
        stp_ttl = 60 (0x3c)
        stp_ipsupport = 0 (0x0)
        stp_oldapi = 0 (0x0)

 group: net_udp (dynamically changeable)
        soreceive_alt = 1 (0x1)
        arpreq_alias = 0 (0x0)
        udp_recvgrams = 2 (0x2)
        udp_sendspace = 61440 (0xf000)
        udp_ttl = 60 (0x3c)

 group: net_tcp (dynamically changeable)
        tcp_gofast = 0 (0x0)
        tcp_recvspace = 61440 (0xf000)
        tcp_sendspace = 61440 (0xf000)
        tcprexmtthresh = 3 (0x3)
        tcp_2msl = 60 (0x3c)
        tcp_mtudisc = 1 (0x1)
        tcp_maxpersistidle = 7200 (0x1c20)
        tcp_keepintvl = 75 (0x4b)
        tcp_keepidle = 7200 (0x1c20)
        tcp_ttl = 60 (0x3c)

 group: net_rsvp (statically changeable)
        ps_num_batch_pkts = 0 (0x0)
        ps_rsvp_bandwidth = 50 (0x32)
        ps_enabled = 1 (0x1)

 group: net_mbuf (statically changeable)
        mbretain = 20 (0x14)
        mbmaxpages = 16383 (0x3fff)

 group: net_ip (dynamically changeable)
        tcpiss_md5 = 0 (0x0)
        subnetsarelocal = 1 (0x1)
        allow_brdaddr_srcaddr = 0 (0x0)
        ipdirected_broadcast = 0 (0x0)
        ipsendredirects = 1 (0x1)
        ipforwarding = 1 (0x1)
        ipfilterd_inactive_behavior = 1 (0x1)
        icmp_dropredirects = 0 (0x0)

 group: network (statically changeable)
        netthread_float = 0 (0x0)

 group: inpcb (statically changeable)
        udp_hashtablesz = 2048 (0x800)
        tcp_hashtablesz = 8184 (0x1ff8)

Changes made using systune may or may not require a reboot. This can be easily determined by looking at the 'group' heading for each section of tunables. If the group heading says dynamic, changes can be made on the fly. Group headings labelled static require a reboot.

Finally, the tcp_sendspace and tcp_recvspace can be tuned on a per-interface basis using the rspace and sspace options to ifconfig.

SACK: As of 6.5.7, SACK is included in the IRIX operating system and is on by default.


Procedure for raising network limits under Solaris

All system TCP parameters are set with the 'ndd' tool (man 1 ndd). Parameter values can be read with:

  ndd /dev/tcp [parameter]

and set with:

  ndd -set /dev/tcp [parameter] [value]

RFC1323 timestamps, window scaling and RFC2018 SACK should be enabled by default. You can double check that these are correct:

  ndd /dev/tcp tcp_wscale_always  #(should be 1)
  ndd /dev/tcp tcp_tstamp_if_wscale  #(should be 1)
  ndd /dev/tcp tcp_sack_permitted  #(should be 2)

Set the maximum (send or receive) TCP buffer size an application can request:

  ndd -set /dev/tcp tcp_max_buf 4000000

Set the maximum congestion window:

  ndd -set /dev/tcp tcp_cwnd_max 4000000

Set the default send and receive buffer sizes:

  ndd -set /dev/tcp tcp_xmit_hiwat 4000000
  ndd -set /dev/tcp tcp_recv_hiwat 4000000

Contributors: John Heffner (PSC), Nicolas Williams (Sun Microsystems, Inc)

Checked for Solaris 10.?, 4/12/06


Misc Info about Windows NT

Editor's note: See Windows 98 above for a detailed description of how this all works. In NT land, the Registry Editor is called regedt32.

Any Registry Values listed appear in:
	HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters

Receive Window
	maximum value = 64kB, since window scaling is not supported
	default value = min( max( 4 x MSS, 
				  8kB rounded up to nearest multiple of MSS),
			     64kB) 
	Registry Value: 
		TcpWindowSize

Path MTU Discovery Variables:
		EnablePMTUDiscovery	(default = enabled)
			turn on/off path MTU discovery
		EnablePMTUBHDetect	(default = disabled)
			turn on/off Black Hole detection

Using Path MTU Discovery:
        EnablePMTUDiscovery     REG_DWORD
	Range: 0 (false) or 1 (true)
	Default: 1

    Determines whether TCP uses a fixed, default maximum transmission unit
    (MTU) or attempts to find the actual MTU. If the value of this entry is
    0, TCP uses an MTU of 576 bytes for all connections to computers outside
    of the local subnet. If the value of this entry is 1, TCP attempts to
    discover the MTU (largest packet size) over the path to a remote host.

Using Path MTU Discovery's "Blackhole Detection" algorithm:
        EnablePMTUBHDetect     REG_DWORD
	Range: 0 (false) or 1 (true)
	Default: 0 

    If the value of this entry is 1, TCP tries to detect black hole routers
    while doing Path MTU Discovery. TCP will try to send segments without
    the Don't Fragment bit set if several retransmissions of a segment go
    unacknowledged. If the segment is acknowledged as a result, the MSS will
    be decreased and the Don't Fragment bit will be set in future packets on
    the connection.

I received the following additional notes about the Windows TCP implementation.

PMTU Discovery. If PMTU is turned on, NT 3.1 cannot cope with routers that have the BSD 4.2 bug (see RFC 1191, section 5). It loops resending the same packet. Only confirmed on NT 3.1.


Procedure for raising network limits under Microsoft Windows 98

New: Some folks at NLANR/MOAT in SDSC have written a tool to do guide you through some of this stuff. It can be found athttp://moat.nlanr.net/Software/TCPtune/.

Even newer: I've updated some sending window information that was inaccurate. See below.

Several folks have recently helped me to figure out how to accomplish the necessary tuning under Windows98, and the features do appear to exist and work. Thanks to everyone for the assistance! The new description below should be useful to even the complete Windows novice (such as me :-).

Windows98 includes implementation of RFC1323 and RFC2018. Both are on by default. (However, with a default buffer size of only about 8kB, window scaling doesn't do much).

Windows stores the tuning parameters in the Windows Registry. In the registry are settings to toggle on/off Large Windows, Timestamps, and SACK. In addition, default socket buffer sizes can be specified in the registry.

In order to modify registry variables, do the following steps:

  1. Click on Start -> Run and then type in "regedit". This will fire up the Registry Editor.
  2. In the Registry Editor, double click on the appropriate folders to walk the tree to the parameter you wish to modify. For the parameters below, this means clicking on HKEY_LOCAL_MACHINE -> System -> CurrentControlSet -> Services -> VxD -> MSTCP.
  3. Once there, you should see a list of parameters in the right half of your screen, and MSTCP should be highlighted in the left half. The parameters you wish to modify will probably not appear in the right half of your screen; this is OK.
  4. In the menu bar, Click on "Edit -> New -> String Value". It is important to create the parameter with the correct type. All of the parameters listed below are strings.
  5. A box will appear with "New Value #1"; change the name to the name listed below, exactly as shown. Hit return.
  6. On the menu, click on "Edit -> Modify" (your new entry should still be selected). Then type in the value you wish to assign to the parameter.
  7. Exit the registry editor, and reboot windows. (The rebooting is important, *sigh*.)
  8. When your system comes back up, you should have access to the features you have just turned on. The only real way to verify this is through packet traces (or by noticing a significant performance improvement).

TCP/IP Stack Variables

Support for TCP Large Windows (TCPLW)

Win98 TCP/IP supports TCP large windows as documented in RFC 1323. TCP large windows can be used for networks that have large bandwidth delay products such as high-speed trans-continental connections or satellite links. Large windows support is controlled by a registry key value in:

HKLM\system\currentcontrolset\services\VXD\MSTCP

The registry key Tcp1323Opts is a string value type. The values for Tcp1323Optare

ValueMeaning
0 No Windowscaling and Timestamp Options
1 Window scaling but no Timestamp options
3 Window scaling and Time stamp options

The default value for Tcp1323Opts is 3: Window Scaling and Time stamp options. Large window support is enabled if an application requests a Winsock socket to use buffer sizes greater than 64K. The current default value for TCP receive window size in Memphis TCP is 8196 bytes. In previous implementations the TCP window size was limited to 64K, this limit is raised to 2**30 through the use of TCP large window support.

Support for Selective Acknowledgements (SACK)

Win98 TCP supports Selective Acknowledgements as documented in RFC 2018. Selective acknowledgements allow TCP to recover from IP packet loss without resending packets that were already received by the receiver. Selective Acknowledgements is most useful when employed with TCP large windows. SACK support is controlled by a registry key value in:

HKLM\system\currentcontrolset\services\VXD\MSTCP

The registry key SackOpts is a string value type. The values for SackOpts are

ValueMeaning
0 No Sack options
1 Sack Option enabled

Support for Fast Retransmission and Fast Recovery

Win98 TCP/IP supports Fast Retransmission and Fast Recovery of TCP connections that are encountering IP packet loss in the network. These mechanisms allow a TCP sender to quickly infer a single packet loss by reception of duplicate acknowledgements for a previously sent and acknowledged TCP/IP packet. This mechanism is useful when the network is intermittently congested. The reception of 3 (default value) successive duplicate acknowledgements indicates to the TCP sender that it can resend the last unacknowledged TCP/IP packet (fast retransmit) and not go into TCP slow start due to a single packet loss (fast recovery). Fast Retransmission and Recovery support is controlled by a registry key value in:

HKLM\system\currentcontrolset\services\VXD\MSTCP\Parameters

The registry key MaxDupAcks is DWORD taking integer values from 2 to N. IfMaxDupAcks is not defined, the default value is 3.

Update: If you wish to set the default receiver window for applications, you should set the following key:

DefaultRcvWindow

HKLM\system\currentcontrolset\services\VXD\MSTCP

DefaultRcvWindow is a string type and the value describes the default receive windowsize for the TCP stack. Otherwise the windowsize has to be programmed in apps with setsockopt.

For a long time, I had the following sentence on this page:

  • I presume that there is also a DefaultSndWindow what you would want to use on servers sending data to get higher performance. I have not yet verified this, however.

 

It turns out that there is not in fact such a variable. My limited experience has shown that, in some cases, it is possible to see very large send windows from Microsoft boxes. However, recent reports on the tcpsat mailing list have also stated that a number of applications under Windows severely limit the sending window. These applications appear to include FTP and possibly also the CIFS protocol which is used for file sharing. With these applications, it appears to be impossible to exceed the performance limit dictated by this sending window.

If anyone has any further information on these specific applications under Windows, I would be happy to include it here.


Procedure for raising network limits under Microsoft Windows 2000

New: The following URL: http://rdweb.cns.vt.edu/public/notes/win2k-tcpip.htmappears to be a pretty good summary of the procedure for TCP tuning under Windows 2000. It also has the URL for the Windows 2000 TCP tuning document from Microsoft.

We are not sure if it still necessary to set DefaultReceiveWindow even after setting the parameters indicated in the URL above.

If your machine does a lot of large outbound transfers, it will be necessary to setDefaultSendWindow in addition to the suggestions mentioned above.


Matt Mathis <This email address is being protected from spambots. You need JavaScript enabled to view it. >; and Raghu Reddy <This email address is being protected from spambots. You need JavaScript enabled to view it. >
(with help from many others, especially Jamshid Mahdavi)