WARNING :- RijnType has
NOT been subjected to independent assessment.
I guarantee only that it
represents my best efforts.
I give no warranty that
it is fit for any purpose!
This document is one of 5 inter linked
documents. The others are :-
(a) The RijnType home
page (no technical content).
(b) RijnType.cpp C++
code.
(c) The RijnType.h
file to accompany (b).
(d) Technical
references.
This document describes the block cipher RijnType. This cipher
takes all of its inspiration from the block cipher Rijndael, the winner
of the AES competition in 2000.
[Note the version of Rijndael submitted to the AES had a fixed data
block bit size of 128 bits, and key block bit size of 128 or 192 or 256
bits. Rijndael itself is defined with data block bit sizes of 128 to 256
bits, in steps of 32 bits, and key block bit sizes of 128 to 256 bits,
also in steps of 32 bits. Below, I shall refer to these 2 versions as either
Rijndael (implying the full version) or AES Rijndael (implying the reduced
capability version).]
In modern cryptography, ciphers generally consist of 2 components :- a 'public key' component (which might be RSA, or an elliptic curve system, to name but the two most common systems) followed by a block cipher. The public key component arranges a 'session key' for use by a block cipher which carries out the message encryption/decryption. Public key ciphers are typically several hundred times slower than block ciphers in operation, but are included to solve the difficult problem of 'session key' set up. Hence these two components combine to provide a cipher with a good balance of ease of use and speed.
The RSA algorithm has been in the public domain for over 20 years and has been the focus of relentless attack. It is now so weak that it would not be considered in 'new build' projects. Elliptic curve (EC) systems have superseded RSA for public key components.
From the security viewpoint, the two cipher components can be thought of as a chain with two links; with the weaker of the two dictating overall penetration resistance. When an elliptic curve system is the provider of the public key role, the bit size of the Field prime of the elliptic curve has to be double the bit size of the block cipher if the two components are to be matched in strength. (see e.g. ref 1 Section 1.3) (The ratio of '2' is best thought of as a 'rule of thumb' rather than an iron law. A few percent difference in the ratio of '2' is neither here nor there. The ratio could in any case be changed at any time by advances in theory or technology.)
Currently, (mid 2008), the main purpose of my business is the
sale of elliptic curves for embedding into the cryptographic products of
other software teams. These curves are some of the strongest elliptic curves
available to the general public and are large enough to provide communications
and database security, for any commercial purpose, over at least a medium
term time scale.
If sensitive commercial information is to remain secure for,
say, 10 years, then security planning has to estimate the computer power
available to an attacker 10 years hence. Contemplation of the advances
in processor power over the past 10 years, leads to the conclusion that
block cipher bit strengths in the mid hundreds of bits matched to elliptic curve bit
strengths up to 1000 bits are needed to guarantee security.
Block ciphers of this strength are hard to find and elliptic
curves to match them even harder. As a result of developing software to
implement the Atkin Morain method for setting up elliptic curves, I can
generate prime order elliptic curves of any bit size at will. However,
the commercial value of the elliptic curves that I can generate is much
reduced by the absence of a 'matching' strength block cipher to go with
them. For this reason, I have reluctantly been obliged to enter the world
of block cipher design, and RijnType is the result.
The assumptions underlying the design of RijnType :-
(a) RijnType is founded on the design principles of block cipher 'Rijndael' as given in refs. [2] and [3].
(b) Where I have gone beyond the scope of Rijndael, I have introduced some new features but these remain 'in the spirit' of Rijndael.
(c) If a mode of operation for RijnType is within the limits of Rijndael, RijnType produces identical results to Rijndael. (however, see the section on 'compatibility' below)
(d) Rijndael was developed to be satisfactory
on smart cards and equipment with tiny memory resources. No such considerations
have influenced the design of RijnType. My target machine is a PC with
a 32 bit Intel x86 processor and with unrestricted memory resources.
Both Rijndael and RijnType benefit considerably
in speed when lookup tables are introduced. Tables T0[], T1[], T2[]
and T3[] for encryption and iT0[], iT1[], iT2[] and iT3[] for decryption
are used in the code file rtdotcpp.html. The
theory and use of these tables is discussed in ref.
2 section 4.2.
The speed gain that results from these
tables, and the functional equivalence of RijnType to Rijndael (so long
as RijnType is restricted to 256 bit (or below) block size and key size)
makes RijnType of interest to those software teams looking for nothing
more than a fast implementation of Rijndael.
(e) RijnType is intended to interface with
elliptic curve public key cipher systems.
/* ------------------------------------------------
The interface between elliptic
curves and block ciphers
Since
elliptic curve public key cipher systems are several hundred times slower than block
ciphers, It is generally expedient to call them just once per 'session'.
The
secret information passed through the elliptic curve is a scalar multiplier less than
the 'order' of the elliptic curve. Hasse's Theorem indicates that the 'order' is limited
to (very nearly) the same bit size as the elliptic curve Field bit size. As mentioned
above, for balanced security the elliptic curve module Field bit size has to be double
the bit size of the block cipher. Hence the possibility exists for the
transmission of a key of bit size up to double that of the block cipher.
[From here down, I shall use the label ECscalar[] when referring to
this item. From above, ECscalar[] will be a word32 array of double the
word32 size of the block cipher.]
In fact, operating RijnType with a double sized key would be
unusual, since ECscalar[] would typically include fields for items like
MAC's (message authenticating codes) time stamps and others, along with
the block cipher key material. If the elliptic curve is to be called just once, and
it is not considered wise to include these fields into the block cipher
key, this will reduce the space available for block cipher key material.
Hence the key material needs to be adjustable in size to handle different
project requirements. Allocating fields for the input to the elliptic curve stage needs
careful thought on a project by project basis.
For RijnType the minimum cipher block bit size = the minimum
key bit size = 128 bits (also the Rijndael minimum); and these bit sizes
can be independently increased in steps of 32 bits to a maximum cipher
block bit size of 512 bits and a maximum key bit size of 1024 bits. This
has been done to enable RijnType to be adaptable to handling the considerations
in the paragraphs above.
[RijnType (as Rijndael) processes data in a rectangular array of bytes
with 4 rows and a variable number of columns (given the variable name Nb).
The upper limit of 512 bits on the data block bit size causes the rectangular
array of bytes used by RijnType to be 4 rows by 16 columns. This compares
to a maximum of 4 rows by 8 columns in Rijndael and a maximum of 4 rows
by 4 columns for AES Rijndael.
[At some point in the future I may develop a RijnType8 block cipher
which will have 8 rows and 8-16 columns for data block bit sizes of 512-1024
bits in steps of 64 bits. RijnType8::MixColumn() would need a mutually
inverse pair of degree 8 polynomials over GF(2^8) ideally with small coefficients.
Such a block cipher would work very well on a processor capable of 64 bit
arithmetic.]]
-------------------------------------------------- */
Rijndael (and RijnType) uses the following variables to control its
working :-
Nr = the number of rounds for the block cipher, dependent
on Nb and Nk.
Nb = the number of 32 bit words for the data block = data
block bit size/32.
Nk = the number of 32 bit words for the key block = key
block bit size/32.
C0, C1, C2, C3 = the rotations applied to the 4 rows of
the data block within ShiftRow().
From the data already given Nb for Rijndael is el{4, 5, 6, 7,
8} and {4} for AES Rijndael.
Similarly Nk for Rijndael is el{4, 5, 6, 7, 8} or el{4, 6, 8}
for AES Rijndael.
[For RijnType Nb is el{4,...16} and Nk is el{4, ...32}.]
Nr is obtained from Nk and Nb as :- Nr = 6 + max(Nb, Nk)
(For Rijndael 10 <= Nr <= 14)
[This formula is also used in RijnType; for which 10 <= Nr <=
38]
The values of C0, C1, C2, C3 are given in ref.
2 table 3.1 (C0 is always 0, and C1 always 1). These values are used unchanged
in RijnType. For values in RijnType beyond the limits of Rijndael, see
comments on class ShiftRow, below.
I shall now discuss the RijnType classes, one by one, to explain the working of the software. These classes are largely self documenting in the files rtdotcpp.html and rtdoth.html which contain code listings, hence the following merely fills in a few gaps.
Class RT_ByteOps
This
carries out the standard Rijndael S-box substitutions and word32 rotations.
Class ShiftRow
This class both holds the intermediate data values as word32
items, and it's do_shift() method performs the same task as ShiftRow()
in the Rijndael reference code. Although functionally identical to ShiftRow(),
it works differently in that no word32 shifts left or right are done.
Two word32 arrays colsource[] and coldest[] are used to hold
the data block. In methods RijnType::Encrypt() and RijnType::Decrypt()
(see below) data is read from colsource[] and written to coldest[] after
use of the look up tables Tn[] and iTn[]. The ShiftRow::do_shift() is called
at the end of each round simultaneously moving the data from coldest[]
back to colsource[]. This is done for 2 reasons :-
(a) It avoids 'byte' MOV's which on Pentium machines involve
a time penalty (this processor actually 'moves' 64 bit quantities and then
has to shift the data to extract a byte. This shift carries a time penalty.
32 bit items (so long as the last 2 bits of their address are zero) are
not affected by this. Failure to align data on a pentium in this way carries
a typical 30-40% time penalty and possibly more. (Alignment of data is
ensured at compile time by setting 'flags' - an important detail when preparing
a RijnType executable.) RijnType is intended to be very fast and a time
penalty of this sort is unacceptable.
(b) No shifts (as in '<<' or '>>' C code) are needed in
ShiftRow::do_shift(). These operations translate into SHL and SHR machine
operations which are not what Intel calls 'UV' operations. This means that
they cause V-pipeline stalls, which implies time penalties. A glance down
the code of ShiftRow::do_shift() indicates that only AND, OR, MOV will
be needed all of which are in the 'UV' set; hence V pipeline stalls (time
penalties) will be avoided.
Given that C0=0 and C1=1 always (as Rijndael), the C2 and C3
values used in ShiftRow were derived using the following formulae :-
C2 = max (2, floor(Nb/2) - 1) ;
if (Nb > 8) {
if (Nb > 12)
C2-- ;
while (gcd (Nb,
C2) != 1) {C2--} ;
}
C3 = min (3, ceil(Nb/2)
[These formulae were set using the following deliberations :-
(a) They give the Rijndael values when Nb <= 8.
(No formulae for these coefficients is given in the Rijndael references
and these formulae are not unique.)
(b) Row 3 is intended to perform the largest shifts possible,
and this requires C3 to be near to the 'half cyclic shift' value Nb/2.
(c) C2 is intended to shift row 2 roughly mid way between row
1 and row 3; with the additional condition that row 2 should 'visit' every
possible position in Nb calls, which leads to the gcd() condition.]
Class KeyedLFSR
This class is the only item in RijnType which has no connection
to Rijndael. It can be viewed as performing a secondary key-expansion.
[Any user wanting a 'pure' Rijndael extension, can just go through
RijnType.h and RijnType.cpp stripping out all references to this class.
Please rename the block cipher that results to something of your choice,
perhaps RijndaelE - 'E' indicating 'extended'.]
With RijnType working at its maximum capacity, the expanded
key (array KeySpace[]) occupies about 5 times the space needed by Rijndael
at its maximum. During development of RijnType, it occurred to me that
the key-expansion method used in Rijndael would benefit from a 'boost'
to handle this larger expanded key.
KeyedLFSR uses a highly populated 32 bit LFSR together with
the key material handed to RijnType to form a moderate quality key driven
32 bit random number generator. It takes 3 successive word32's of key material
and 'expands' them into 8 word32's for use in KeyHandler::KeyExpansion();
then starts again with the next 3 word32's of key material .... The output
word32's are intermittently XOR'd into the expanded key formed using the
Rijndael key expansion. Hence the Rijndael key expansion process is left
unchanged and this secondary key expansion technique is added to reinforce
it.
[If Nb and Nk are both <= 8 (the Rijndael limits) then KeyedLFSR
is not called into action - this is necessary to maintain RijnType as a
Rijndael 'clone']
LFSR's, when operated in the 'multiply by x' mode, are very
fast in software, and, for cryptographic products, have some attractive
properties and some unattractive ones. The main problem with them is that
the auto correlation curve of the output bit pattern is not ideal. Various
schemes are in use to improve the quality of LFSR output e.g. see ref.4
chapter 16.
For this application a very high quality output is not essential
- the quality of the output of KeyedLFSR::next_value() is mainly dependent
on the quality of the key material. Nevertheless, I have included some
quality enhancing features.
Three variables are used to improve the output quality - 'state',
'run length' and 'selector' all set from the 'raw' key material. After
8 word32's have been output from method KeyedLFSR::next_value(), all 3
variables are replaced by using new 'raw' key material. The quality enhancing
technique uses multiple output values from the LFSR and XOR's them together
to form the composite output value of KeyedLFSR::next_value(). The 3 variables
control this process - 'State' is the LFSR initial contents, nibbles from
'run length' give the number of raw LFSR outputs that are XOR'd together,
and the bit pattern in 'selector' is used to decide whether an LFSR output
value is XOR'd into the output value of KeyedLFSR::next_value(), or not.
Class KeyHandler
This class performs the same function as Rijndael's KeyExpansion()
method, but with the following exceptions :-
(a) When Nk is beyond the limit of that covered by Rijndael
(max Nk=8) then KeyedLFSR discussed above is introduced. This is a secondary
key expansion process that has no part in Rijndael.
(b) As mentioned above, RijnType makes a greater effort for
speed than Rijndael. The main speed enhancing step is the introduction
of the arrays Tn[] and iTn[], which are used in the RijnType::Encrypt()
and RijnType::Decrypt() methods discussed below. This technique is introduced
in ref. 2 section 4.2, (Ref.2. Fig 3.12
is particularly useful for clarifying the process.). The use of these tables
requires changes to the expanded key when decrypting a message. Excluding
the first and last Nb word32's of the expanded key, the expanded key has
then to be passed through Rijndael's InvMixColumns() method.
[If you search through rtdotcpp.html for method MixColumns() you will
not find it. The Rijndael methods MixColumns() and InvMixColumns() are
redundant (for encryption and decryption) once tables Tn[] and iTn[] are
brought into play. However, InvMixColumns() is needed by KeyHandler, and
is a private method (cannot be accessed by other classes) in KeyHandler.]
The method KeyHandler::InvMixColumns() is functionally equivalent
to Rijndael's InvMixColumns() but performs the calculation on word32's
rather than bytes. This, together with KeyHandler::Mod(), which 'mod's
4 bytes simultaneously, provides another small speed gain.
[see Ref.2 eqn 3.15 to understand the code
in KeyHandler::InvMixColumns(). Matrix multiplication eqn 3.15 of Ref.2
can be rearranged into multiplications by 0xE, 0xB, 0xD and 0x9; then adding
(XORing) after suitable rotations.]
Class RijnType
This class contains the Encrypt() and Decrypt() methods and
inherits all the above. From the comments on class KeyHandler, it will
be clear that because the use of the lookup arrays Tn[] and iTn[] is built
into RijnType, the expanded key when encrypting differs from the expanded
key when decrypting.
This means that only one of RijnType::Encrypt()
or RijnType::Decrypt() can be used with any specific RijnType object.
Which of these is accessible is fixed by 'Mode' in the RijnType constructor;
following which a call to RijnType::Process_block() is sent to the appropriate
function. Users must be aware of this behaviour! Clearly, where simultaneous
encryption's and decryption's (with a key held constant for a long period)
can occur (a server perhaps), 2 objects must be set up - 1 to access Encrypt()
and 1 to access Decrypt().
The 2 methods Encrypt() and Decrypt() organize the computational
core of RijnType. The other methods handle administrative details only.
Inspecting the listing for, say, Encrypt() the use of the Tn[] arrays at
the center of the nested loops will be clear. In brief, the 4 table look-ups
replace the Rijndael calls to (SubBytes() + ShiftRows() + MixColumns())
and this saves a great deal of time; indeed it saves more time than all
the other RijnType speed enhancing features put together. (see ref.2
section 4.2 and Fig. 3.12 for details) (A similar sentence applies to Decrypt()
with all references to tables/methods using inverses.)
[The tables Tn[] and iTn[] are derived from Rijndael's (SubBytes()
+ ShiftRows() + MixColumns()).]
RijnType::Process_block() performs what is technically called
an 'Electronic Code Book' (ECB) encryption or decryption. This gets its
name from the fact that (given a fixed key) if asked to encrypt some plaintext
block repeatedly it will always return an identical ciphertext - a behaviour
which mimics 'code books'.
[These were popular up to about WW II. They enabled common letters,
words, phrases and even sentences to be changed into typically 5 digit
numbers. Code book security systems can be steadily 'solved' by statistical
analysis of letter and word frequencies typical of any language. A good
flow of ciphertext had to be available and the process was analogous to
putting together a huge jig-saw puzzle. To foil statistical attack, code
book's needed regular replacement, something very difficult to organize
in the chaos of war. An even better attack method was to go and steal one
of the books! Better yet, photo-copy the code book so as to avoid alerting
the user to the collapse of his message security system!]
A similar situation might arise with RijnType working in ECB
mode. Consider the following scenario :- suppose a secure server, holding
simple text files, is operated with a fixed key in ECB mode for a long
period. If an attacker could obtain a large portion of the encrypted files
(but not the key), it is just conceivable that a statistical attack, on
the lines of the 'Code Book' attack above, might succeed. Of course, if
he can obtain the key as well, the attacker can skip the hard work involved
in a statistical attack, and use the source code of RijnType (or any block
cipher) to decrypt away.
For the above and other reasons ECB mode is rarely used in cryptographic
projects; rather it's importance is that it forms a building block for
more sophisticated schemes (see RT_CBC next).
An Initialization Vector (I.V.) is a random bit string the same
size as the block length (Nb), which can (optionally) be fed into the RijnType
constructor. If RijnType is paired with an elliptic curve (the target situation)
the I.V. will have to be embedded into ECscalar[], reducing the space for
key material. The decryption of a ciphertext which was obtained using an
I.V., requires the same I.V. to be available at decrypt time and this requires
organizing.
The use of IV's will defeat the statistical attack discussed
above, but at the cost of having to securely archive the I.V.'s themselves.
Hence, the use, or otherwise, of I.V.'s requires careful thought on a project
by project basis.
Class RT_CBC
This class performs 'Cipher Block Chaining' mode encryption/decryption
with RijnType as the block cipher. ( ref. 4 Article
9.3 guidelines are used) In essence, the output of each block encryption
is mixed (XOR'd) into the next input block. This might be thought of as
internal I.V. generation process to prevent the 'Code Book' statistical
attack discussed above. When RijnType is operated in CBC mode, only the
first block is susceptible to statistical attack and, ordinarily, there
will be too few of these for statistical attack to have any chance of success.
RT_CBC can accept an I.V. for which similar comments to RijnType
I.V.'s apply.
An alternative scheme to protect the first block when RT_CBC
is in use is to part (or completely) fill the first block from a secure
random number generator. Then, on decryption, this part can be stripped
out.
Class RT_MAC
This class generates a Message Authenticating Code a.k.a. a
MAC a.k.a. a keyed Hash. These are a 'Modification Detection Code' (see
RT_Hash below) with a bit string 'signature', authenticating the sender,
attached.
[These items are invariably a component of paper less office schemes.
Such schemes seem further from fruition than ever, but hope springs eternal.]
RT_MAC inherits RT_CBC which it uses to implement the CBC_MAC
algorithm (see ref. 5 algorithm 9.58).
The idea is that someone wishing to send a message such that
:-
(a) any attempt to modify it will be detected,
(b) it can be irrefutably linked to the sender,
can use a MAC to achieve these ends.
[The 'signature' referred to is actually a very large number, uniquely
associated with an individual. For very large networks, e.g. the World
Wide Web, they would be issued by some business founded on that line of
work. For smaller networks a nominated 'Key Master' might issue them.]
When MAC's are in use, the message itself need not be encrypted.
Many messages in commerce are of the form "Deliver 10 lengths of 4x2 to
...". Such messages are meat and drink for many businesses and are of no
interest whatsoever to an eavesdropper, but carry financial connotations.
The receiver wants to know whether the person sending the message is authorized
to do so by a valid client (i.e. he will get paid!) and that the message
is not either a total invention or a slight adjustment of a valid order
(e.g. the number of items surreptitiously changed from '1' to '10').
[The discussions in the paragraph above may seem (on first sight) to
be the germ of an idea for a huge business empire, since it is a service
with universal appeal. Beware :- many very clever people have tried to
set up paper less systems; some have had slight success, most have
failed.]
Class RT_Hash
This class generates a 'Modification Detection Code' a.k.a.
a 'Message Digest' a.k.a. an unkeyed-Hash a.k.a. a Hash a.k.a. a message/file
'fingerprint'.
RT_Hash uses the Miyaguchi-Preneel algorithm (see ref.
5, algorithm 9.43) with RijnType to achieve this. The output of RT_Hash
depends only on the message/file fed into it; whereas the output of RT_MAC
depends on both the message/file and the signature material (used in RT_MAC
as the key material) fed into it. Hashes are used for the purpose of detecting
tampering with a message/file. A typical use is to prevent 'hackers' replacing
operating system 'executables' with versions of their own devising with
the objective of easing future system penetration.
[Hashes are actually cryptographic primitives with a wider use than
the last sentence may indicate.]
The Miyaguchi-Preneel algorithm requires the block cipher key
to be updated after each block is entered; hence RT_Hash has to call KeyHandler::KeyExpansion()
at the end of each block, to update the key. For RijnType, this step takes
as much time as encrypting 1.5 blocks, and makes RT_Hash 2.5 times slower
than RT_MAC (see the comments on 'speed performance' below for the time
to execute the constructor - most of of which is due to KeyExpansion()).
It is possible to generate a MAC by using any Hash generator
by pre pending the message with the 'signature number', then Hash the composite
message.
++++++++++++++++++++++++++++++++++
Feed 'buff' to Rijndael
and feed pbuff32 into RijnType
...
/***************/
will generate
different ciphertext between Rijndael and RijnType. The little endian effect
is the cause of this.
H (0x48) | o (0x6F) | i (0x69) | . (0x2E) |
e (0x65) | " " (0x20) | l (0x6C) | " " (0x20) |
l (0x6C) | s (0x73) | o (0x6F) | C (0x43) |
l (0x6C) | a (0x61) | r (0x72) | o (0x6F) |
This encrypts in Rijndael to :-
0xE0 | 0x46 | 0x20 | 0x2E |
0x68 | 0xDA | 0xF0 | 0xFD |
0x0F | 0x20 | 0xD3 | 0xE5 |
0x2E | 0x24 | 0x3F | 0xEA |
Using a 'cast' to set pbuff32, as in the code snippet above, we might print out the array pbuff32[] to find [0x6C6C6548, 0x6173206F, 0x726F6C69, 0x6F43202E] which, it will be noticed, has it's bytes swapped round (the little endian effect at work!). Encrypting this in RijnType the output is [0x2F2AA57C, 0x3EDD46AE, 0x8FEC3C30, 0x2B7E2965] which is unrelated to the Rijndael ciphertext.
If function byte_swap() (listed towards the end of this section) is used on the array pbuff32[], it becomes [0x48656C6C, 0x6F207361, 0x696C6F72, 0x2E20436F] (the little endian effect having been reversed). Encrypting this with RijnType we get a ciphertext of [0xE0680F2E, 0x46DA2024, 0x20F0D33F, 0x2EFDE5EA] which are seen to be the columns of the Rijndael ciphertext.
The little endian problem is still not quite sorted. Suppose that the
ciphertext now has to be filed. If we want a Rijndael compatible result
(which requires the bytes copied out of the 2-D ciphertext array, column
by column), we might use a C 'cast' such as :-
/****************/
word8 *pOP ;
word32 ciphertext[] ;
** execute a RijnType encryption into ciphertext[] here, then **
pOP = (word8 *) ciphertext ;
... then start feeding the pOP[] array
byte by byte to the required file.
/****************/
Doing this sends bytes [0x2E, 0x0F, 0x68, 0xE0, ...], in that order,
to the file. The little endian effect strikes again! To prevent this use
byte_swap() on ciphertext[] to reverse the order, then the 'cast', then
write to the file.
A high level way to load/unload RijnType data arrays, if results
identical to Rijndael are required, is as follows :-
(This was part of a real test program that I used to prove compatibility.
In order to make a full test program, you will need to include the Rijndael
reference code from ref. 2, Appendix E.)
/***************/
int main()
/* A DOS test program to run
Rijndael and RijnType head-to-head. */
{
word16 i, j, k, Nk, Nb ;
word8 aIP[4][8], a[4][8],
rk[8+6+1][4][8] ; /* Rijndael variables */
word8 sk[4][8] ;
word32 RTa[8], RTk[8], RTop[8],
RTop2[8] ; /* RijnType variables */
word32 random ;
BOOL flag ;
for (Nb=4; Nb<=8; Nb++)
{
for (Nk=4; Nk<=8;
Nk++) {
printf("Nb=%u
Nk=%u\n", Nb, Nk) ;
for
(k=0; k<1000; k++) {
for (i=0; i<Nb; i++) {
/* fill variable 'random' from a random number generator here, then */
RTa[i] = random ;
/* Rijndael overwrites the IP array a[][] with the encrypted
results. Using aIP[][] as a record holder of the IP. */
a[0][i] = aIP[0][i] = (word8) (random>>24) ;
a[1][i] = aIP[1][i] = (word8) (random>>16) ;
a[2][i] = aIP[2][i] = (word8) (random>>8) ;
a[3][i] = aIP[3][i] = (word8) random ;
}
for (i=0; i<Nk; i++) {
/* fill variable 'random' from a random number generator here, then */
RTk[i] = random ;
sk[0][i] = (word8) (random>>24) ;
sk[1][i] = (word8) (random>>16) ;
sk[2][i] = (word8) (random>>8) ;
sk[3][i] = (word8) random ;
}
/* recall that BC, KC, ROUNDS are global variables in the Rijndael reference
code. In RijnType Nb=BC, Nk=KC, and Nr=ROUNDS */
BC = Nb ;
KC = Nk ;
ROUNDS = numrounds[KC-4][BC-4] ;
/* do a Rijndael encryption :- */
KeyExpansion (sk, rk) ;
Encrypt (a, rk) ;
/* and now a RijnType encryption :- */
RijnType rt (ENCRYPT_MODE, Nb, RTk, Nk) ;
rt.Process_block (RTa, RTop) ;
/* Check encryption's are identical :- */
j = 0 ;
flag = TRUE ;
while ((j<Nb)&& flag) {
if (a[0][j] != (word8) (RTop[j] >> 24)) flag = FALSE ;
if (a[1][j] != (word8) (RTop[j] >> 16)) flag = FALSE ;
if (a[2][j] != (word8) (RTop[j] >> 8)) flag = FALSE ;
if (a[3][j] != (word8) (RTop[j])) flag = FALSE ;
j++ ;
}
if (!flag) {
printf("Cipher failure with Nb=%u Nk=%u j=%u k=%u.\n", Nb, Nk, j,
k) ;
exit(0) ;
}
/* do a RijnType decryption :- */
RijnType rt2 (DECRYPT_MODE, Nb, RTk, Nk) ;
rt2.Process_block (RTop, RTop2) ;
/* and compare this with the original IP :- */
j = 0 ;
flag = TRUE ;
while ((j<Nb) && flag) {
if (aIP[0][j] != (word8) (RTop2[j] >> 24)) flag = FALSE ;
if (aIP[1][j] != (word8) (RTop2[j] >> 16)) flag = FALSE ;
if (aIP[2][j] != (word8) (RTop2[j] >> 8)) flag = FALSE ;
if (aIP[3][j] != (word8) (RTop2[j])) flag = FALSE ;
j++ ;
}
if (!flag) {
printf("Plain failure with Nb=%u Nk=%u j=%u k=%u.\n", Nb, Nk,
j, k) ;
exit(0) ;
}
}
}
}
puts("All OK") ;
return 0 ;
}
/***************/
Note carefully how RijnType's word32's are packed into (and unpacked from) Rijndael's two dimensional arrays, byte by byte.
These deliberations impact on project planning as follows :-
(a) If you are looking only for a fast implementation of Rijndael,
and the extensions in RijnType are of no interest to you, then pack/unpack
data arrays in one of the ways discussed above. This must be done if your
target network, now or at any point in the future, uses, or might use,
a mixture of Rijndael and RijnType code. This will generally be the case
if your objective is no more than to speed up sluggish secure servers,
without having to bother with upgrading the client's whole network.
[A fast packing/unpacking technique, if the machines to be upgraded
are all Intel powered, is to use the Intel (486 and later) assembler instruction
BSWAP (byte-swap) (a 1 clock operation). This achieves precisely the same
effect as all the sliding and ORing in the high level code above, with
a negligible time penalty.
A possible implementation in C :-
/************/
word32 byte_swap (word32 IP)
{
asm {
mov eax,IP
bswap eax
}
return _EAX ;
}
/************/]
(b) If you are looking to enhance the security of a subset of your target
network, already using Rijndael, and RijnType meets the enhanced security
requirements, then the considerations in (a) apply again. This would mean
that only the subset would need a software upgrade, and it's interface
with the remainder of the network would be unaffected.
(c) In summary, only if you are starting out on a self-contained 'green
field' security project with no prospect of ever having to interface with
a Rijndael equipped user should you not pack/unpack data arrays as discussed
in this section.
Nb,Nk | 4 | 5 | 6 | 7 | 8 |
Rijndael | 309 | 379 | 498 | 627 | 785 |
RijnType | 22 | 28 | 36 | 45 | 55 |
This gives an average speed gain ratio for RijnType of nearly
14.
The main cause of this speed gain is the use of the look-up
tables Tn[] and iTn[] which are built into RijnType. They could just as
easily be built into Rijndael, since they are derived from the main Rijndael
reference: ref.2 section 4.2. All the other features
included in RijnType contribute only a few % speed gain.
Inspecting RijnType::Encrypt() with a view to obtaining yet greater speed, the method ShiftRow::do_shift() seems a likely candidate. It is possible to write modules dedicated to each permutation of Encrypt/Decrypt and Nb. A total of 10 such modules would be needed for a Rijndael clone, and 26 for a full RijnType class. The module for Encrypt/Nb=8 would look something like :-
/************/
void
ShiftRow::do_shift_E8 (void)
// dedicated to ENCRYPT_MODE
with Nb=8 with the code 'straight-lined'.
// Notice that C2 and C3 are
embedded in this code
{
word32 blank[4] = {0xFF000000L,
0xFF0000L, 0xFF00L, 0xFFL} ;
colsource[0] = (coldest[0] &
blank[0])
| (coldest[1] & blank[1])
| (coldest[3] & blank[2])
| (coldest[4] & blank[3]) ;
colsource[1] = (coldest[1]
& blank[0])
| (coldest[2] & blank[1])
| (coldest[4] & blank[2])
| (coldest[5] & blank[3]) ;
colsource[2] = (coldest[2]
& blank[0])
| (coldest[3] & blank[1])
| (coldest[5] & blank[2])
| (coldest[6] & blank[3]) ;
colsource[3] = (coldest[3]
& blank[0])
| (coldest[4] & blank[1])
| (coldest[6] & blank[2])
| (coldest[7] & blank[3]) ;
colsource[4] = (coldest[4]
& blank[0])
| (coldest[5] & blank[1])
| (coldest[7] & blank[2])
| (coldest[0] & blank[3]) ;
colsource[5] = (coldest[5]
& blank[0])
| (coldest[6] & blank[1])
| (coldest[0] & blank[2])
| (coldest[1] & blank[3]) ;
colsource[6] = (coldest[6]
& blank[0])
| (coldest[7] & blank[1])
| (coldest[1] & blank[2])
| (coldest[2] & blank[3]) ;
colsource[7] = (coldest[7]
& blank[0])
| (coldest[0] & blank[1])
| (coldest[2] & blank[2])
| (coldest[3] & blank[3]) ;
}
/************/
Test results from the above were disappointing - the figure of '55' in the table above dropping to 50, which raises the speed gain to about 15.5. This seems hardly worth the effort for all but the most speed sensitive of projects.
Rijndael's speed for decryption is slightly slower than encryption;
this being caused by InvMixColumns() being slower than MixColumns(). This
effect is absent from RijnType - the lookup tables Tn[] and iTn[] nullify
it.
However, the time to execute a RijnType ENCRYPT_MODE constructor
differs from that for a DECRYPT_MODE constructor. Recall, from above, that
a DECRYPT_MODE constructor has to call InvMixColumns() on all the columns
of all the intermediate blocks of the expanded key, and this makes the
DECRYPT_MODE constructor roughly 4 times slower than the ENCRYPT_MODE constructor.
Very roughly, the time to execute an ENCRYPT_MODE constructor is 1.5 *
(the time to encrypt 1 block), and the time to execute a DECRYPT_MODE constructor
is 6 * (the time to encrypt 1 block).
A large set of timed results could be generated for RijnType when worked beyond the scope of Rijndael since Nb and Nk can vary independently. The following might be enough to indicate trends :-
Block encryption time (relative to Nb=Nk=8 case) with Nb=Nk and 8 <= Nb <= 16
Nb,Nk | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 |
rel. time | 1.0 | 1.2 | 1.4 | 1.63 | 1.85 | 2.11 | 2.38 | 2.67 | 3.0 |
Block encryption time (relative to Nb=Nk=8 case) with Nb=4 and 4 <= Nk <= 32
Nk | 4 | 6 | 8 | 12 | 16 | 20 | 24 | 28 | 32 |
rel. time | 0.49 | 0.6 | 0.69 | 0.89 | 1.11 | 1.27 | 1.45 | 1.65 | 1.85 |
Block encryption time (relative to Nb=Nk=8 case) with Nb=8 and 8 <= Nk <= 32
Nk | 8 | 12 | 16 | 20 | 24 | 28 | 32 |
rel. time | 1.0 | 1.26 | 1.55 | 1.85 | 2.08 | 2.37 | 2.64 |
Block encryption time (relative to Nb=Nk=8 case) with Nb=16 and 8 <= Nk <= 32
Nk | 8 | 12 | 16 | 20 | 24 | 28 | 32 |
rel. time | 3.0 | 3.0 | 3.0 | 3.50 | 4.05 | 4.58 | 5.11 |
From the above we can find that the speed ratio for RijnType at it's maximum security level (Nb=16, Nk=32) relative to it's minimum (Nb=4, Nk=4) (also the AES Rijndael and the Rijndael minima) is 5.11/0.49=10.4. Since RijnType is about 14 times faster than Rijndael, this means that RijnType at its maximum security level is faster than Rijndael or AES Rijndael at their minima. In other words substituting RijnType for Rijndael always brings a speed benefit, even when comparing worst case to best case, along with a huge increase in security.