How to Maximize Software Performance of Symmetric Primitives on Pentium III and 4

Mitsuru MATSUI  Sayaka FUKUDA  

IEICE TRANSACTIONS on Fundamentals of Electronics, Communications and Computer Sciences   Vol.E89-A   No.1   pp.2-10
Publication Date: 2006/01/01
Online ISSN: 1745-1337
DOI: 10.1093/ietfec/e89-a.1.2
Print ISSN: 0916-8508
Type of Manuscript: Special Section PAPER (Special Section on Cryptography and Information Security)
Category: Symmetric Key Cryptography
fast software encryption,  optimization,  Pentium,  

Full Text: PDF>>
Buy this Article

This paper studies the state-of-the-art software optimization methodology for symmetric cryptographic primitives on Pentium III and 4 processors. We aim at maximizing speed by considering the internal pipeline architecture of these processors. This is the first paper studying an optimization of ciphers on Prescott, a new core of Pentium 4. Our AES program with 128-bit key achieves 251 cycles/block on Pentium 4, which is, to our best knowledge, the fastest implementation of AES on Pentium 4. We also optimize SNOW2.0 keystream generator. Our program of SNOW2.0 runs at the rate of 2.75 µops/cycle on Pentium III, which seems the most efficient code ever made for a real-world cipher primitive. Our another interest is to optimize cryptographic primitives that essentially utilize 64-bit operations on Pentium processors. For the first example, the FOX128 block cipher, we propose a technique for speeding-up by interleaving two independent blocks using a register group separation. For another examples, we consider fast implementation of SHA512 and Whirlpool. It will be shown that the new SIMD instruction sets introduced in Pentium 4 excellently contribute to fast hashing of SHA512.