In ATOMIC-2, security focuses on fast authentication.
Stand-alone authentication algorithm performance
Compares the performance of various authentication algorithms, when running on in-memory data. ATOMIC-2 is investigating mechanisms for high-speed encryption and authentication. We have investigated the following authentication mechanisms, and have not found one suited to 640 Mbps implementation, as well as implementation in software that can keep pace with IP over ATOMIC speeds (140 Mbps).
- MD5
- MD5 is an authentication algorithm proposed as a “required option” for the next IP (IPv6).
- We measured the performance of MD5 on various platforms:
We measured the reference implementation of MD5 as provided in its RFC.
For comparison, we also show UDP/IP application bandwidth on a Sun SPARC-10/51 (100 Mbps) and SPARC-20/71 (120 Mbps). In each case, MD5 is 1/3-1/2 the required speed to keep pace with IP.
The code was measured with caching disabled for Raw and Optimized. The Optimized code was also measured with external caching and internal caching enabled.
Here are some performance results. The modified assembly code is available here.
- Without rearrangement (MD5-Opt)
- md5 -c 60 -l 100000 -t -s -r -dMD5 time trial. Digesting 60 100000-byte blocks (no reordering required) … done
Digest = 36023ea6e959c9e678702bdba0f95ff8
Time = 1.01703 U : 0.001894 S :: 1.01892 seconds
Speed = 5.88858e+06 bytes/second,4.71086e+07 bits/sec
minflt 0 majflt 0 nswap 0 nvcsw 0 nivcsw 11
MD5 (“”) = d41d8cd98f00b204e9800998ecf8427e
- md5 -c 60 -l 100000 -t -s -r -dMD5 time trial. Digesting 60 100000-byte blocks (no reordering required) … done
- After rearrangement
- md5 -c 60 -l 100000 -t -s -r -d MD5 time trial. Digesting 60 100000-byte blocks (no reordering required) … done
Digest = 36023ea6e959c9e678702bdba0f95ff8
Time = 0.478771 U : 0 S :: 0.478771 seconds
Speed = 1.25321e+07 bytes/second, 1.00257e+08 bits/sec
minflt 0 majflt 0 nswap 0 nvcsw 0 nivcsw 5
MD5 (“”) = d41d8cd98f00b204e9800998ecf8427e
- md5 -c 60 -l 100000 -t -s -r -d MD5 time trial. Digesting 60 100000-byte blocks (no reordering required) … done
- Cache modifications (mddriver.c)
- -l num : specify length of test block (used with -t, was DEFINE’d)
- -c num : specify block repeat count (used with -t, was DEFIN’d)
- -s : skip initialization of test block (to avoid first-touch of data)
- -r : pseudo-random test block init (determines data-dependent perf)
- -d : double-buffer test block (switch-off – forces data out of the cache)
- Performance optimizations (md5c.c)
- use memset() and memcpy() (as suggested)
- force state variables into registers
- avoid Decode() for little-endians (Intel ix86, Dec Alpha)
- avoid block copy for little-endians (Intel ix86, Dec Alpha)
- unroll swap loop in Decode()
- use optimized byte reordering code (C code that compiles better)
- Other changes
- replace time() with getrusage()
- change block length to 1M from 1K
- print bits/sec
- change LEN and COUNT in status print – they were incorrect
- Without rearrangement (MD5-Opt)
-
- Alternate Hash Algorithm (AHA)
The Alternate Hash Algorithm is designed as a replacement for MD5 in IPv6. AHA runs over twice as fast as MD5.
Performance of Authentication in IPv4
Compares the performance of various algorithms in IPv4 in a SunOS 4.1.3 kernel.Also includes analysis of the components of the IP AH processing itself, i.e., header processing, data touching overheads, etc.
We compare the following algorithms:MD5, MD5-optimzied, AHA, the Internet checksum, a null checksum (to measure data-touching overheads), MD5-opt in network-standard byte order (to measure the overhead of endian reordering), Phil Rogaway‘s AH (alternate hash) that combines buckets in other ways, no AH (baseline TCP), and null-AH (to measure the cost of adding the AH headers).
The hardware configuration is: Sun SPARCStation 20/71’s running SunOS 4.1.3, Myricom’s Myrinet, a is a 640-Mbps packet-switched LAN. TCP process to process throughputs are generally approximately 90 Mbps and can exceed 100 Mbps with careful tuning, and A SAM-300 solid state disk from Texas Memory Systems. Although the SAM-300 is capable of 1200 MB/sec our SBus configuration can only use roughly 85 Mbits/sec of it, due to Sbus and Unix file system overheads.
We also used these kernel patch files.
Calculation of the Authentication digest requires to touch all bytes of the packet. Data-touching overhead contributes a significant part to overall processing overheads. Instead of going twice (once for UDP checksum calculation and once for digest calculation) to the end of the packet, ILP combines both these operations into a single one. ILP calculates UDP checksum and authentication digest in a single loop.