Wednesday, November 19, 2008

On the DNS Birthday Probability

bb In a recent post I stated that a common reason why the Birthday Paradox (BP) seems puzzling is because people often confuse a 1-to-N matching problem (matching their birthday to someone else in a group) with the required N-to-N matching problem (matching a birthday amongst any pair of people in a group, including themselves). I called this the Johnny Fallacy after famed American TV Johnny Carson host mistook these two events. The post also looked at a recent variant of the Johnny Fallacy applied to DNA matching, and here we consider another variant applied to a cache poisoning attack on the Internet Domain Naming System (DNS).

DNS is a fundamental Internet service to resolve symbolic names into IP addresses (for example, DNS tells us that www.google.com is currently at IP address 209.05.129.99). While people are quite good at remembering symbolic names they are lousy at remembering IP addresses, and packets can't be routed using names like www.google.com. So DNS is the bridge between what people can remember and what Internet packets require.

When you attach your laptop to the Internet you are assigned a local (default) DNS server. As you enter new symbolic references to services, your laptop contacts your local DNS server for resolution, and your laptop stores the returned result in a local cache. If your local DNS server can't resolve your request then it passes it onto a "better informed" DNS server to for resolution. A resolution that involves a chain of DNS servers is called recursive. To your laptop the final resolution still looks like it was returned by your local DNS server. The trust between your laptop and your local DNA server is implicit - there is no formal authentication for resolution requests. What is returned by the local DNS server is accepted by your laptop as a correct resolution.

In 2002 it was announced that BIND, a common implementation of the DNS protocol, permitted DNS server resolutions to be falsified, called cache poisoning. The diagram below gives an overview of the attack. The goal of the attacker is to poison the cache of DNS server with false resolutions, and then poison the client cache on the next request to the DNS server.
image
So how does the attacker get the DNS server to accept false resolutions? The attacker can make a name request to the targeted (victim) DNS server that requires a recursive resolution and then also replies to the recursive request with false information! This is entirely possible due to the unauthenticated nature of DNS requests and replies, and a feature of an earlier version of BIND that allows simultaneous requests for the same name. The only piece of information that links a request to its reply is a 16-bit identifier, that takes on at most 65,536 distinct values. This identifier is chosen by the victim DNS server and the attacker must guess it to have a false response accepted. But then the attacker sends n recursive requests for the same domain and n replies with random identifiers. When n is about 700 the birthday paradox shows that one of the name requests generated by the victim DNS server will match one of the responses generated by the attacker with high probability.

There are many excellent articles that explain the DNS/BIND Birthday Attack and consider the following ones here and here. Both give details on the success of the birthday attack using the following formula for a collision (match) amongst n uniform samples from t possible values

collision

But this collision formula is not quite correct. When n > t then a collision occurs with probability one, but the collision formula above will never be one since (1-1/t)^k is never zero for any value of k. What's the glitch?

Well n*(n-1)/2 is the number of possible pairings amongst n objects, each of which could yield a match. But instead of computing the match as an n-to-n problem, the collision formula computes the probability of a match as a 1-to-(n*(n-1)/2) problem. That is, the collision formula assumes that each of the potential matches from pairing n objects are independent. The collision formula therefore provides an underestimate on the correct probability, which is given as

Now when n = t + 1 the term (1-(n-1)/t) = 0 and the formula yields a collision with probability one. The collision formula is convincing because it looks like the correct function when graphed

image

The graph says that there is a 50% likelihood of a match around 300 samples (or DNS queries in this case), which seems fine since we expect a repeat at slightly more than sqrt(65536) = 256 samples, and the correct value is about 280. As we mentioned above the collision formula underestimates the correct probability, but for increasing n the error as compared to the correct value is decreasing. The yellow part of the graph shows the probability of the conventional spoofing attack where the attacker has a likelihood of n/65536 to match. In a conventional spoofing attack the attacker generates false replies to a single DNS request.

1 comment:

boy labyog said...

Google make some updates about this.


Laby[big suits for men]