(repost from October 2008 for SBN)
In June I blogged about an anonymity incident concerning the ToR network. The draft version of the article was quite long and I eventually edited out a lengthier introduction. I would like to return to one interesting topic here. The popularity of commercial and open source internet anonymity services has waxed and waned over the last decade. A high profile company from the late 90's was Zero Knowledge Systems (recently renamed to Radialpoint) that possessed a collection of anonymity services, notably their flagship offering, the Freedom Network. For various reasons, the company was downsized to near extinction during the dot.com bust, though much of the technology, experience and expertise lives on in other products and former employees. ToR was a major beneficiary, and luminaries, such as Adam Shostack and Stefan Brands, are now employees of Microsoft (Ian Goldberg returned to academia). Shostack has an interesting presentation entitled Will People Ever Pay For Privacy? where his gives his views on the lack of commercial success of privacy and anonymity offerings, circa 2003.
The company name, Zero Knowledge Systems, is not just a puzzling juxtaposition of words but actually a deliberate reference to an underlying technology that formed the basis of their intellectual property. In this post we look more closely at the paradoxical-sounding concept of Zero Knowledge.
Let's start by recalling a proof that is a few thousand years old which at the time it was discovered caused considerable consternation amongst integer-loving Greeks. We are referring to the (unpleasant) fact that the square root of two is irrational (can't be expressed of the ratio of two whole numbers). Here's one proof. Assume root 2 is rational and let sqrt(2) = P/Q where P/Q are in lowest terms (share no factors). Squaring and rearranging yields that P^2 = 2*Q^2, so P must be even, from which it follows that Q must be odd (not in lowest terms otherwise). Since P is even then P^2 must be divisible by 4, but since Q is odd then Q^2 must also be odd. Hence 2*Q^2 is not divisible by 4 and we have our proof (by contradiction).
So we began with a statement S = "root 2 is irrational" and the proof is the evidence that the statement is correct. We read the proof and are convinced that is it correct. Not only that, we can remember the proof and use it to prove to someone else that root 2 is irrational. We might say that this proof is transferable. This seems a natural property since we commonly think of proofs as being available to be read and verified by anyone who has the interest and patience to do so. Would we ever have a need to create proofs that are not transferable? Does this notion even make sense? For security, yes.
In security we deal with proofs pertaining to authentication (S = "I claim this is my identity") and authorization (S = "I claim these privileges and access rights"). The proof of these assertions is typically achieved by a user Alice providing the appropriate credentials to a target system for verification. Proof in this case can be equated to presentation, or to the transfer of credentials from Alice to the target system. If the target system is a phishing site then Alice gives away (transfers) her security information to a potentially malicious party. Could Alice prove statements concerning her authenticity and authorization privileges without transferring the full details of those credentials to the target system?
Zero Knowledge proofs are a technique created in theoretical computer science which show how one party can convince another party that a given statement is true but not reveal any other information. In particular, if Alice uses a Zero Knowledge Proof to convince Bob that a statement S is true, then Bob cannot use the proof supplied by Alice to prove that S is true to another party Carol. This is the zero knowledge aspect - Bob is convinced that S is true (he learns something) but he cannot convince anyone else (what he learns has zero value in the sense Bob can't transfer it). This seems counterintuitive and indeed it is.
Recall vs. Capability
Passwords are a common authentication mechanism used to prove your identity to a system. You prove that you know the password associated with a given user identifier by providing the password to the target system. The target system will have a copy of your password in a database and it simply compares its (hashed) copy of the password with the purported copy that you provide, and a match completes the authentication proof. So in this case demonstrating (proving) that you know a password is achieved by revealing it (or a suitable hashed derivative).
We might call this a full knowledge proof in the sense that what you know is fully transferred to the system for authentication. And if it happens that you are responding to a phishing email or your DNS service has been poisoned then your full authentication details are transferred to a rogue system.
To improve the situation consider the user answering 5 questions when they register for a system account, such that 3 are randomly chosen to be asked at each login. There are 10 ways 3-from-5 questions can be selected, so if a rogue phishing system observes the answers then the probability that the same three answers can be used to logon again is 0.1. If the number of questions is increased to 10 then there are 120 ways 3-from-10 questions and the probability of transfer is under 0.009.
So you prove your identity by answering a few simple questions that hopefully someone who is not you would find difficult to all answer correctly. The proof of identity can be made non-transferable with high probability by having a K-from-N question system where getting the same K questions from one login to another is unlikely.
The idea is sound but there are problems in practice. Our memory is not that good so the number of questions can't be made too large. Also think of Sarah Palin who recently had the password to her email account reset by someone answering the 3 questions: what is your birthday, what is your postal code, and where did you meet your spouse? Unfortunately, this is public information for Palin. We can make the questions more personal and hopefully easier to remember but people may object for privacy purposes.
A better approach is to move away from knowledge-based questions & answers to capability-based tasks & solutions. That is, you prove who you are not by answering questions that you should know but by performing some tasks that only you should be able to do.
Consider Alice proving her identity to Bob by proving her capability to solve the Rubik's cube (yes many people can do this, but the example is instructive). Alice could opt for a full transfer proof by writing a set of instructions for solving the cube which she could give to Bob for verification. But let's assume that Alice is too lazy to create such a set of instructions, and that Bob is equally too impatient to read them in any case. They both agree that a practical demonstration is better: Bob will take a cube, repeatedly twist and turn it until he is satisfied that it is in a random-looking configuration, and then ask Alice to return the cube to its monochrome state (one colour on each face).
He hands the scrambled cube to Alice, who turns her back to Bob and starts solving the cube (she turns away so if Bob does not know how to solve the cube he is not improving by watching her). If Alice can actually solve the cube then she can use her skills to return Bob's cube to the monochrome state. If she does not have the cube-solving capability then she can take a stab at the solution but she will have to be lucky.
While Bob may be sceptical of Alice's cube-solving abilities, he is prepared to believe that Alice may have memorized a large number of solution sequences to specific configurations. If Bob arranged the cube into one of these configurations then Alice is able to solve the cube directly from memory. Let's be wildly optimistic here and assume that Alice has memorized the solutions for half of the possible 10^(19) configurations.
So Bob can proceed as follows. He randomizes the cube and asks Alice to solve it. If she can't then he knows that she has falsely claimed to have the cube-solving capability. If she does solve the cube, he repeats the test again by re-randomizing the cube and asking Alice for the solution. Each time Bob poses a cube to be solved by Alice there is a 50% chance that she will be caught out if she can't in fact solve the cube (Bob did not randomize the cube to one of her memorized configurations). After 20 iterations there is only 1 chance in a million that Alice could fake having the cube-solving capability.
So here we have replaced questions (do you know the answer to this?) by tasks (do you know the solution to this?). The advantage of tasks is that they are capability-based rather than knowledge-based, and in general more tasks can be posed than questions answered from recall. Also we don't have to recall a skill, we just use it. As we mentioned the cube has 10^(19) configurations, which means that someone with the cube-solving capability can solve 10^(19) tasks of the form "return the cube to its monochrome state from this configuration". There is really no comparison to the number of knowledge-based questions that we could answer accurately on a regular basis.
Finally, Bob is not learning anything about the cube-solving capability. If he does not have this capability, after interacting with Alice he is none wiser. He has observed that she unscrambled a series of scrambled cubes chosen by him. He is convinced that she can unscramble the cube but he will not be able to convince Carol that either he or Alice can unscramble the cube. This is the Zero Knowledge aspect.
Zero Knowledge Proofs
The example above explained to essence of Zero Knowledge proof (ZKP) which we can summarize here as
- Alice knows a secret and wants to convince Bob of this without revealing the secret.
- Alice and Bob go through a series of exchanges (rounds) where Bob sets Alice a task to solve using her claimed secret as a capability. If she has the capability then each task is easy to solve for Alice.
- If Alice does not have this capability then after a few rounds she will fail to solve one of the tasks set by Bob, and he then knows that she does not have the capability implied by knowledge of the claimed secret.
- If Alice successfully solves enough tasks set by Bob (say 20 to 50) then Bob is convinced that Alice knows the secret
- As a result of undertaking the exchange with Alice, Bob does not learn anything about the capability and cannot convince a third party Carol that he knows the secret associated with the capability.
But how will this work in practice? The capabilities used in practice for ZKPs are derived from computational secrets such as the prime factors of a number or the discrete logarithm of a number modulo a prime. Alice can easily create a public and private key pair suitable for this purpose. The private key takes the role of Alice's secret, and her capability is being able to easily solve specific equations involving her secret. The role of the public key is to verify that her solutions are correct. So when Bob wants proof that Alice knows her claimed secret, he creates tasks in the form of equations that Alice can solve using her secret key. If Alice does not know the secret key then she can guess the solution but will be caught out after a few rounds.
In a future post we will talk about some practical applications of ZKPs, but if you just can't wait then take a look at the Direct Anonymous Attestation (DAA) protocol used in the Trusted Computing Platform (TPM) from the Trusted Computing Group.