Configurable AES CCM/GCM Processor with Scaleable Througput
AES (Advanced Encryption Standard) has become the preferred encryption algorithm for use in modern communication systems. AES defines how a block of 128 bits may be encrypted using a key. AES 'Modes' define different ways of utilising the basic encryption algorithm and then the system modes of CCM, CMAC and GCM, GMAC set policies for dealing with the encryption and authentication of a whole packet. Above this there are then standards such as CCMP, GCMP, MacSec and IpSec, which are merely applications of CCM, CMAC, GCM, GMAC. The following sections detail these levels in the context of the BRC AES Crypto Processor.
AES is defined in the NIST FIPS 197 specification. A block of 128 bits is encrypted using a key, which may be of 128, 256 or 512 bits. The output is 128 bits. The process involves 10, 12 or 14 rounds (or iterations) for each key size respectively. We implement each round in 4, 2 or 1 clock-cycle according to the hardware configuration, and add a further clock-cycle for IO, resulting in a throughput of 3.12 bits/clock-cycle for the 4 cycles/round mode. At 500 MHz (on a modern ASIC process) the throughput is 1560MBPS with a 128 bit key. Higher throughput is achievable in the 2 and 1 cycles/round modes (but note that timing closure may demand a reduction in the clock speed).
There are two problems with simply passing your data-stream through the AES function in order to encrypt it. The first is that if two input blocks are the same then the output blocks will be the same. This can compromise security. The second problem is that the AES decryption algorithm is costly to implement and there is actually a way that AES decryption can be avoided completely. So let's look at the modes (these are the ones supported in the BRC product):
Electronic Code Book Mode. This is what we have just described - an input block is simply encrypted according to the algorithm to form the output block.
Counter Mode.This is the key mode in most systems. The 128 bit AES input is not the data to encrypt. Instead it is a block formed from information that is known at both ends, and includes a count that increments each code block. The resulting stream of encrypted counter blocks is XOR'd with the data-stream to produce an encrypted stream. Decryption is then a symmetric process, which means the receiver simply reproduces the encrypted counter blocks and XORs them with the received stream to recover the original data. CTR mode is very useful because it allows decryption without actually implementing the AES decryption algorithm.
Cipher Block Chaining is where each input block is XOR'd with the previous output block before being encrypted, which ensures that identical input blocks will not map to identical output blocks. This mode is not easily decrypted, but is useful in authentication. Authentication is where a stream of blocks are processed to form a single output block that represents a hash or check-sum. This is sometimes called the MAC (Message Authetication Code) but due to a clash with another well-known meaning for MAC, the term MIC (Message Integrity Code) is more often used. The sender builds the MIC and appends it to the transmitted data. The receiver then recovers the transmitted data (i.e. decrypts the encrypted parts, typically using a separate AES-CTR function and generates its own MIC. If this matches the transmitted one then it is known that the data has not been tampered with (this assumes that a party attempting to alter the data does not have access to the AES key).
Counter with CBC-MAC (aka MIC) is a system mode with very specific policies on how a complete packet of information is to be encrypted and authenticated. A packet is split into 'Payload' that is to be encrypted and 'Additional' data that is sent as plain text but is still subject to authentication. The Additional data preceeds the payload and, in practice, it is the header part of a packet. Additionally a CCM specific header block is prepended, which contains the Nonce and packet length. The Nonce is a number that is only used 'once' and is used as part of the CTR input block. It is static for the packet but must change on subsequent packets - all to improve security).
The BRC Crypto Processor manages one or two AES sub-modules in order to implement CCM on-the-fly . In the two module mode, the first module encrypts the payload in AES-CTR mode, while the second module operates in AES-CBC mode to authenticate the additional data as well as the payload to form a MIC. The MIC is then passed to the AES-CTR unit for encryption and attachment to the end of the transmitted packet.
In CCM decryption mode it will be noted that payload data must first be decrypted before it can be authenticated. Thus in decryption mode the AES-CBC block receives payload data from the AES-CTR block. At the end of the packet the transmitted MIC is decrypted and compared with the locally generated MIC to establish authenticity (which is indicated by an inValidPacket output signal).
In single AES module mode, one AES unit is time-multiplexed between the authentication and encryption functions. This saves gate-count but reduces throughput factor 2 (compared to that listed in the AES section). The processing is still 'on-the'fly' because the time-multiplexing is performed as each block is processed.
CCM is restricted in its maximum throughput by the fact that the MIC processing uses AES-CBC, which cannot be parallelised due to the recursive nature of the algorithm. Note that AES-CTR mode could be parallelised, however it is the MIC processing that is the bottleneck.
This is an authetication algorithm used in 802.11 (WiFi). While CCM with zero payload can provide authetication, CMAC is a little more sophisticated and involves generation of 2 sub-keys from the primary input key.
Galois Counter Mode is functionally similar to CCM, but the MIC generation is performed using a Galois Hash (GHASH) function. The actual encryption, however is still performed using AES-CTR. The GHASH is much faster than AES and also much lower cost to implement.This brings about a possibility of increased throughput because multiple AES-CTR units can be used to speed up encryption, without the GHASH being a bottleneck. In the BRC implementation the number of AES units is configurable and the throughput increases in proportion.
GCM is different from CCM in a few other respects. In encryption mode the GHASH is performed on the encrypted payload whereas CCM mode generates the MIC using the plain payload. This means that GCM decryption performs the GHASH on the received input (that being encrypted already). Also, instead of a CCM header there is a GCM trailer block.
This is GCM with zero payload - so authentication without encryption.
The AES Crypto Processor
The preceeding mode descriptions outline the capability of the Crypto Processor.The cryptographic functionality is implemented by one or more AES sub-modules and, in the case of GCM, a GHASH sub-module. These sub-modules are instanced in a framework with cross coupled controllers for the MIC and encryption paths and multiplexing units to steer the input and output of each sub-module according to the mode of operation. This provides a single IP block with a broad range of applications, which is ideal if your SOC (or FPGA) is to support multiple standards.
CCMP, GCMP, MacSec, IpSec etc.
CCMP and GCMP are merely CCM and GCM as applied to 802.11ac and 802.11ad. Similarly MacSec and IpSec, which are suites of encryption capabilities, directly use CCM and GCM without modification. The difference is that these top level standards define the nature the packet with regard to headers and set policies for how the Nonce is generated. Furthermore, packets are typically subject to forwarding where parts of the header may change after each forwarding step so these standards define which part of the headers (transmitted as Additional data) are subject to modification and these are then masked (zeroed) before authetication.
The AES Crypto Processor is agnostic to the system level formatting - you provide the formatting for CCMP etc externally to the module. In typical systems this merely involves manipulation of headers, which can be done by software. By making the module independent of this formatting we make it more general purpose, which avoids the need to manage multiple product variants and helps you by providing an IP block with a broad range of applications.