Tuesday, November 17, 2009

More on TMTO and Rainbow Tables

I recently posted about the new project to compute rainbow tables for the A5/1 encryption algorithm of GSM. As is often the case, I end up writing more than can fit in a reasonable length post (though that does not stop me from occasionally writing apparently unreasonable length posts such as here and here). Rainbow tables, and more generally Time-Memory trade-off (TMTO) tables, are fascinating subjects that I have a bit more to say about.

Babies and Giants

I think the best place to start with understanding TMTO is to return to one its first applications, which turns out to have implications for the security of public key systems. The Baby-steps Giant-steps method was proposed by the number theorist Daniel Shanks in the early 70's to solve the discrete logarithm problem (DLP). We recall that in the DLP we are given a generator g of the prime p and a value 0 < X < p, and we seek the value x such that

X = g^x mod p

Since g is a generator then { g, g^2, g^3, ..., g^(p-1) } is equal to {1, 2, 3, ..., (p-1)}. That is, every non-zero number less than p can be expressed as a power of g mod p. So we know that X has a logarithm to the base g, but how do we find it? The brute force approach is to just start searching through 1, 2, 3, ... until we reach x, which is equivalent to computing the following sequence of exponentiations (note that we omit the "mod p")

g, g^2, g^3, ... , g^{x-1}, g^x = X

That is, we start with g and keep multiplying by g until we reach X. But there is another approach to finding X based on the following sequence

g*X, g^2*X, g^3*X, ... , g^{p-2}, g^{p-1} = 1

In this second sequence we start with X and keep multiplying by g and stop when we reach 1. We know from Fermat's Little theorem that the log of 1 must be (p-1), and since g is a generator, no other value smaller than (p-1) will exponentiate to 1. Now to determine x, if it took k multiples to reach 1 from X then it follows that x = (p-1) - k.

So which approach is better? Well if we write the sequences as exponents, then the first sequence is 1, 2, 3, ..., x while the second is x, x+1, x+2, ..., (p-1). On average both sequences are about p/2 steps in length if we assume x is randomly chosen. However the second sequence contains the germ of a good idea, which is as follows. Even though we do not know the value of x we can increment it towards a value whose log we do know, and then subtract off the number of increments to determine x. In another way, repeatedly multiplying X by g increments the target value x towards 1 which has the known log of (p-1), and x is just (p-1) less the number of increments.

Now we can generalise this approach by creating more values with known logs. For example, we can first compute g^{(p-1)/2} and then start our search sequence as before

g*X, g^2*X, g^3*X, ...

but this time stopping if we reach 1 = g^{p-1} or g^{(p-1)/2}. If we consider the sequence 1, 2, 3, ..., (p-1), then (p-1)/2 is the midpoint and (p-1) is the endpoint, and x will reach (p-1)/2 first if it is in the first half of the list, and will reach (p-1) first if it is in the second half of the sequence. The point is that the average distance from x to one of these two known points is now about p/4, and in fact x must reach one of the known value after at most p/2 steps.

But we can just keep going and introduce more known points. For 3 points we could use the exponents (p-1)/3, 2(p-1)/3 and (p-1), where we need to round to the nearest integer as required. Now the search sequence x, x+1, x+2, ... can be stopped whenever it encounters one of these three values, which will occur after at most p/3 steps or p/6 steps on average. In general to pre-compute k known points that can be used to partition 1, 2, 3, ..., (p-1) into k non-overlapping segments, take multiples of p/k (rounded to the nearest integer).

As it turns out the optimal choice of k that trades-off pre-computation time of the known points and the search time is k = sqrt(p). The Baby-steps Giant-steps (BSGS) name comes from first computing the exponent sequence sqrt(p), 2*sqrt(p), 3*sqrt(p) ..., which are considered the giants steps space at intervals of sqrt(p). Then to find the nearest Giant step we next compute the exponent sequence x, x+1, x+2, ... x + sqrt(p), which we think of as Baby steps of unit increments.

What principle is at work here?

The point is to improve over the brute force approach of just searching through 1, 2, 3, ... until we reach x, and by "improve" we mean reduce the number of elements to be searched before we find x. In the BSGS the improvement is achieved by searching forward from X to a known value rather than starting at an arbitrary value and searching forward towards X. We can pre-compute a set of values or points in the keyspace which limits the number of values that need to be considered in the searching forward process. But the whole searching forward process exists because we are dealing with a cyclic group, and repeated multiplication by g the generator moves us forward through the keyspace. The more values that are pre-computed, the shorter the time for forward search.

Hellman's TMTO for Block Ciphers

In 1980 (around the time Kerberos was coming into being) the famous cryptographer Martin Hellman, co-inventor of public key cryptography, proposed what he called a time-memory trade-off (TMTO) for recovering keys for block ciphers. His goal was to pre-compute a set of values that would limit the fraction of the keyspace that would need searching to find a particular key. This sounds like BSGS, but we have a big problem in that practical block ciphers are not cyclic - no one encryption transformation can be used to generate all the other possible set of encryptions under different keys.

The Ideal TMTO

To explain TMTO let's consider a cipher for which the length of the plaintext, ciphertext and key are all the same. So this is the case for AES-128 for example, but is not true for DES or AES-256, and the TMTO attack has to be modified for these latter cases. The idea is to build a TMTO table that covers all the possible encryptions of a given plaintext P.

The attacker starts by selecting a random key R1 and then encrypts P to produce C1 = E(R1, P), which can also be interpreted as another key K1 = C1 since keys are ciphertexts are bit strings of the same length. The attacker then encrypts P with K1, to yield K2 = E(K1, P) an so on. Assume that the attacker repeats this operation N times to produce a sequence of keys

R1, K1, K2…, K(N-1), KN = S1

where R1 is called the start point and S1 is called the endpoint, and overall the sequence of encryption is called a chain. The attacker then stores the pair (R1, S1) to represent the first row of the TMTO table. Then the attacker selects another start point key R2, and produces another chain of length N represented by the pair (R2, S2). More chains are generated, using different random keys for starting points, until all or a large fraction of the key space is expected to be covered by (or represented in) the chains of the table.

Let the attacker produce M table rows in this fashion at a cost of computing M chains of length N and a storage cost of 2*N*M (the start and end point for each chain). How large should N and M be made? Well the attacker wants each possible key to appear in some chain of the table. So if the key has n bits, then selecting N = M = sqrt(2^n) = 2^{n/2}, will ensure that N*M is approximately 2^n. Note that, if everything works out nicely, we get a compressed form of the keyspace, where the keys have been reordered into chains and each chain is compressed just be storing its start- and endpoints.

Computing these tables will be a large effort, proportional to the number of possible keys. But the advantage here is that once computed, the tables can be repeatedly used to break keys efficiently, which is achieved as follows. Say an attacker has a plaintext-ciphertext P, C computed under some unknown key K. What the attacker can do is then create chain beginning with C as the startpoint, and extends the chain until an endpoint in the table is reached. The found endpoint then defines the row of the table where K lies, and the attacker can then use the startpoint for the row to re-compute the keys of the row, searching until K is found. The advantage over a brute-force search is that the attacker will need to only search over one row in the table, which will be much smaller than searching over the whole key space.

The TMTO Principle at Work

Before we consider the practical issues with implementing TMTO as described above, let's step back and look at the principle of TMTO and draw out its advantages. If an attacker is given a plaintext-ciphertext pair (P,C) then to perform an exhaustive search the attacker selects some order for the keys, and then performs trial encryptions until a key K is found that encrypts P to C. So the attacker has only one point in the key space which can be used to stop the search. This is pretty much the same as trying to solve the DLP as described above by searching through 1, 2, 3, ..., x - you only have one stopping point in your search.

The principle of TMTO is to identify more keys that can be used to terminate a search or define a relatively small interval of the keyspace where K must lie. Each (key) chain of a TMTO table defines such a interval, and when the attacker starts a chain using C as the initial key, then reaching an endpoint indicates the row that contains K. In the ideal case described above, the attacker can store about 2^{n/2} end points which can then be used to identify a subset of 2^{n/2} keys that must contain K. Thus the search for the key K is reduced to about 2^{n/2} encryptions as opposed to 2^{n-1} encryptions on average for an exhaustive search. So the endpoints in the table correspond to the giant-steps in the BSBG method, and the searching forward from a given ciphertext roughly corresponds to the baby-steps.

TMTO in Practice

We have left out many details in the description above but we are mainly giving the idea behind the TMTO principle. The main problem in practice is that the chains as not very well-behaved. What we want is that each chain produces distinct values and also that distinct chains produce distinct values as well - in short, one new and distinct key for each chain encryption. However due to the random properties of encryption we cannot expect this, which reduces the coverage of the keyspace in the tables. More chains need to be generated to compensate.

The diagram below (taken from here) shows nicely behaved chains, with startpoints (SP) and endpoints (EP) that neatly cover some distinct subset of the keyspace. The red dot shows a given ciphertext that when iterated will lead to the endpoint EP1, and then the chain can be reconstituted from SP1 and searched forward for the key K, which is the predecessor to C in the chain.

zrclip_001n7476813b

In practice, constructing TMTO tables that give a high probability of recovering keys is much more messy than described for the ideal case. The main issue is that each key from the keyspace should appear in some row of the TMTO table. However chains based on iterating (repeatedly applying) an encryption function need not produce distinct keys in a given chain, or produce distinct keys between different chains. A given chain can loop back upon itself or two chains may hit upon a common key and then merge in the table. In both cases the coverage of the keyspace in the TMTO table is reduced. It is therefore possible that the chain constructed by the attacker using C as the initial key may not reach an endpoint in the table. Even if it does, regenerating the endpoint chain may not lead to C since the chain for C may start outside the table and only merge into some row of the table. In practice chains can look more like the diagram below, snaking and merging so that K need not even be a predecessor on the chain beginning at SP, even though iterating C forward leads to EP.

zrclip_002n7c477e3e

Also we have assumed that each encryption in a chain defines a new key, but this is only true when keys and ciphertexts are the same bit length. When this is not the case then an additional function must be used to reduce (or expand) the ciphertext to match the stated key length.

Most of these difficulties are addressed in a collection of optimizations on TMTO tables that result in rainbow tables, originally devised by Philippe Oechslin in 2003 in the context of improving TMTO for recovering Windows NT passwords. The optimizations are subtle but the basic idea is to introduce a series of iteration functions (not just encryption) that also incorporates the position in the chain as a parameter in the iteration, which reduces the looping and merging of chains. Even so, the parameters need to be carefully chosen to ensure appropriate coverage of the keyspace.

Rainbow Tables for A5/1

Given this background we can now take a more informed look at the rainbow chain parameters being proposed by Karsten Knol in his recent project for producing rainbow tables for A5/1. The opening sentence on the page states that the tables are a mixture of 2 concepts, namely distinguished points and rainbow tables.

Distinguished Points (DP) are an optimisation, suggested by Ron Rivest, to reduce the number of endpoint lookups. In theory such a lookup must be made after each iteration, which in practice means a costly disk access outside the cosy realm of RAM. The DP optimisation is to create endpoints which have a simple distinguishing property, such as the first or last 10 bits being all zero. Thus the iteration function need only be interrupted for a disk search if the next chain value has this distinguished property.

Each A5/1 chain has an average length of 2^{20}, and each rainbow table is expected to have 2^{28.5} such chains. To get the right coverage just over 2^8 tables need to be generated and stored for a total effort of 2^{57}. These 2^8 tables will be generated in a distributed manner (by different nodes), and it is part of Knol's project to find volunteers for this effort. The total storage for the rainbow tables is about 2.5 TB or just 6 GB per node for the distributed network that Knol is proposing. To find a key from a given ciphertext requires examining about 3.5 billion chain values, which on one machine with a good graphics card will take about 2 hours, or just over a minute from the network.

However the project page cautions about the accuracy of these estimates and some data is presented on the likelihood and extent of chain merging. The objective of the above chain calculation is to cover just under 2^{49} elements in the keyspace, but this apparently is optimistic. Some heuristics for coping with degenerate chains are discussed but it seems that we will have to wait for the project to complete (that is, for the tables to be generated) to determine the success rate.

Final Remarks

Research into TMTO-based cryptanalysis continues, since on the one hand, there are many variations and optimizations of TMTO that can be applied, while on the other, cheaper hardware and faster processors make this attack approach more viable, particularly for ciphers like A5/1 for which brute-forcing the key length is on the boundary of a feasible computation. If you want to read more about TMTO then Karsten Knol has a great article here on the topic, as well as this excellent survey paper from Mark Stamp.

1 comment:

Anonymous said...

thanks for sharing your formula.


Laby[big and tall suits]