After quite a delay (since the initial idea) I have finally completed a good draft of some research on traffic confirmation attacks. Traffic analysis is a collection of techniques for inferring communication relationships without having access to the content that is being communicated. Traffic confirmation is a special subcase that attempts to infer communication relationships amongst users whose communications are mediated through an anonymity system.
We assume that the underlying anonymity system is a threshold MIX, with batch size B, supporting N participants, each of whom may act as both a sender or receiver. Typical MIX system parameters are N = 20,000 participants (all potentially send and receiving), and a batch of size B = 50 (the MIX accepts 50 messages, randomises their order, then delivers them). The MIX makes it difficult to correlate the 50 senders to the 50 receivers.
The traffic confirmation attack targets a specific user Alice and attempts to infer her M communication partners (who she is sending to) by observing her communication patterns through the MIX. Alice for example might be sending messages to M = 20 people on a regular basis.
The attacker observes when Alice sends a message in a batch, and can also observe who receives a message from that batch. This will include at least one of Alice's communication partners, but which one? The uncertainty that the attacker has with respect to the M communication of Alice is called the entropy her partners.
It is well-known that given the properties of the MIX and the power of the adversary (what she can see), that the entropy of Alice's partners is decreasing as the number of message batches that Alice participates in are observed (a higher entropy means more uncertainty). Eventually the entropy of her partners will go to zero and their identities will be known to the attacker. But how many such observations need to be made before the anonymity of Alice's communications partners is effectively zero?
My paper shows that after the attacker has observed approximately
M * log N
messages, subsequent messages drive the entropy to zero at a geometric rate. That is, the MIX system protects Alice's partners to a point but then essentially collapses from that point forward. So if the attacker is prepared to wait for (and observe) those first M * log N messages from Alice, then the uncertainty of her partners reduces rapidly as Alice sends additional messages. One reason that the attacker has to wait for that first group of messages to be sent by Alice is that she won't be able to uniquely determine Alice's communication partners until Alice has communicated with all of them at least once.
By way of an example, consider a system with N = 20,000 participants, a batch size of B = 50, and Alice having 20 communicating partners. In this case, after observing 292 initial messages, the attacker can identify Alice's partners with 99% certainty after observing an additional 110 messages (that's 402 messages in total). So way before she has sent 400 messages, Alice should either change her partners or change her MIX system (probably the later).
You can find the draft paper here. Comments welcome!