ATOMIC2 Fast Security

In ATOMIC-2, security focuses on fast authentication.

Stand-alone authentication algorithm performance

Compares the performance of various authentication algorithms, when running on in-memory data. ATOMIC-2 is investigating mechanisms for high-speed encryption and authentication. We have investigated the following authentication mechanisms, and have not found one suited to 640 Mbps implementation, as well as implementation in software that can keep pace with IP over ATOMIC speeds (140 Mbps).

  • MD5
MD5 is an authentication algorithm proposed as a “required option” for the next IP (IPv6).
We measured the performance of MD5 on various platforms:

We measured the reference implementation of MD5 as provided in its RFC.

For comparison, we also show UDP/IP application bandwidth on a Sun SPARC-10/51 (100 Mbps) and SPARC-20/71 (120 Mbps). In each case, MD5 is 1/3-1/2 the required speed to keep pace with IP.

The code was measured with caching disabled for Raw and Optimized. The Optimized code was also measured with external caching and internal caching enabled.

This code is available here.

Here are some performance results. The modified assembly code is available here.

  • Without rearrangement (MD5-Opt)
    • md5 -c 60 -l 100000 -t -s -r -dMD5 time trial. Digesting 60 100000-byte blocks (no reordering required) … done
      Digest = 36023ea6e959c9e678702bdba0f95ff8
      Time = 1.01703 U : 0.001894 S :: 1.01892 seconds
      Speed = 5.88858e+06 bytes/second,4.71086e+07 bits/sec
      minflt 0 majflt 0 nswap 0 nvcsw 0 nivcsw 11
      MD5 (“”) = d41d8cd98f00b204e9800998ecf8427e
  • After rearrangement
    • md5 -c 60 -l 100000 -t -s -r -d MD5 time trial. Digesting 60 100000-byte blocks (no reordering required) … done
      Digest = 36023ea6e959c9e678702bdba0f95ff8
      Time = 0.478771 U : 0 S :: 0.478771 seconds
      Speed = 1.25321e+07 bytes/second, 1.00257e+08 bits/sec
      minflt 0 majflt 0 nswap 0 nvcsw 0 nivcsw 5
      MD5 (“”) = d41d8cd98f00b204e9800998ecf8427e
  • Cache modifications (mddriver.c)
    • -l num : specify length of test block (used with -t, was DEFINE’d)
    • -c num : specify block repeat count (used with -t, was DEFIN’d)
    • -s : skip initialization of test block (to avoid first-touch of data)
    • -r : pseudo-random test block init (determines data-dependent perf)
    • -d : double-buffer test block (switch-off – forces data out of the cache)
  • Performance optimizations (md5c.c)
    • use memset() and memcpy() (as suggested)
    • force state variables into registers
    • avoid Decode() for little-endians (Intel ix86, Dec Alpha)
    • avoid block copy for little-endians (Intel ix86, Dec Alpha)
    • unroll swap loop in Decode()
    • use optimized byte reordering code (C code that compiles better)
  • Other changes
    • replace time() with getrusage()
    • change block length to 1M from 1K
    • print bits/sec
    • change LEN and COUNT in status print – they were incorrect
    • Alternate Hash Algorithm (AHA)

The Alternate Hash Algorithm is designed as a replacement for MD5 in IPv6. AHA runs over twice as fast as MD5.

The code is available here.

Performance of Authentication in IPv4

Compares the performance of various algorithms in IPv4 in a SunOS 4.1.3 kernel.Also includes analysis of the components of the IP AH processing itself, i.e., header processing, data touching overheads, etc.

We compare the following algorithms:MD5, MD5-optimzied, AHA, the Internet checksum, a null checksum (to measure data-touching overheads), MD5-opt in network-standard byte order (to measure the overhead of endian reordering), Phil Rogaway‘s AH (alternate hash) that combines buckets in other ways, no AH (baseline TCP), and null-AH (to measure the cost of adding the AH headers).

The hardware configuration is: Sun SPARCStation 20/71’s running SunOS 4.1.3, Myricom’s Myrinet, a is a 640-Mbps packet-switched LAN. TCP process to process throughputs are generally approximately 90 Mbps and can exceed 100 Mbps with careful tuning, and A SAM-300 solid state disk from Texas Memory Systems. Although the SAM-300 is capable of 1200 MB/sec our SBus configuration can only use roughly 85 Mbits/sec of it, due to Sbus and Unix file system overheads.

We also used these kernel patch files.

Calculation of the Authentication digest requires to touch all bytes of the packet. Data-touching overhead contributes a significant part to overall processing overheads. Instead of going twice (once for UDP checksum calculation and once for digest calculation) to the end of the packet, ILP combines both these operations into a single one. ILP calculates UDP checksum and authentication digest in a single loop.