Saturday, March 1, 2008

Counting Restricted Password Spaces

(A second post on this topic - May 29, 08)

I came across an August 2007 post from Bruce Marshall in the Password Research blog where he was considering the following issue:

A recent discussion on a SecurityFocus.com mailing list raised a concern about password policies that I hadn't previously given much thought to. The post author commented that he wanted to enforce a policy that required passwords to have lowercase, uppercase, number, and symbol characters. By this he meant every password must have at least one character from each of these characters sets. One reader responded that by requiring the use of all four character sets he would reduce the total number of possible passwords, causing a negative impact on password security.

The concern here is that the security implied by a large password space (the set of all possible passwords) can be significantly reduced by restricting the actual set of passwords that can be chosen by a user through additional character composition requirements. To understand the implications we need to count the number of restricted passwords (those having at least one character that is lowercase, uppercase, number, and a symbol) as compared to the unrestricted password space (any combination of characters).

Choosing a Counting Tool

There are several ways to count the number of restricted passwords and the method I describe here will be the Inclusion-Exclusion Principle, or IEP for short (sometimes the letters are written as PIE because it rolls of the tongue better). I have used the IEP to solve many problems and here is an example from the security of anonymity systems. The IEP is a general counting method used to solve problems of the following type: given a set of objects and a corresponding set of properties, determine

  • How many of the objects have none of the properties?
  • How many of the objects have at least one of the properties?
  • How many objects have some combination of the properties?

To explain the IEP, imagine that we wish to calculate the size of the union of the sets A, B, and C shown in the diagram below

This can be achieved as follows

size(A + B + C) = size(A) + size(B) + size(C) - size(AB + AC + BC) + size(ABC)

You can verify that each region of the diagram is counted exactly once, and thus produces the correct value for the size of the union of A, B and C. The rule for the calculation is easily observed. Start by adding the sizes of the individual sets, then subtract off the sizes of intersecting sets taken 2 at a time, then add back the size of intersecting sets taken 3 at a time. If we had 5 sets to consider (A, B, C, D, E) we would continue this process of intersecting sets taken 4 and then 5 at a time. When we take an odd number of intersections, the term is added; when a even number of intersections is taken, the term is subtracted. The IEP is the underlying theory as to why adding and subtracting the sizes of increasingly larger intersections can be used to count the size of a set union.

Counting Restricted Passwords

Returning to our present context, we are interested in counting the set of passwords that satisfy a given restrictive policy (filter). Let the underlying password characters be the 94 printable ASCII characters, which consists of 26 uppercase characters, 26 lowercase characters, 10 numbers and 32 symbols. If passwords have length 8, then the unrestricted password space will have size (94)^8. We will determine the size of the restrictive password set as follows:

  1. Define a collection of sets whose union fully characterises the set of passwords that do not satisfy the restrictive policy.
  2. Use the IEP to compute the size of the union of the sets defined in Step 1.
  3. Subtract the result from Step 2 away from the number of unrestricted passwords (the full password space).

In our case the judicious choice of sets for Step 1 is

  • L - the set of passwords that do not contain a lowercase letter
  • U - the set of passwords that do not contain an uppercase letter
  • N - the set of passwords that do not contain a number
  • S - as the set of passwords that do not contain a symbol

The union of L, U, N and S is the set of passwords that has at least one of the four character sets missing, and this represents all passwords that do not satisfy the restrictive policy. The table below shows the full IEP expansion for computing the size of the union of L, U, N and S, where n is the general length of the password. The table has 15 = 2^4-1 rows corresponding to the 15 non-empty intersections of the 4 sets. The Size column shows the number of passwords contained in each intersection. For example, the LUN row represents the set of passwords that do not have lower, upper or number characters, of which there are (94 - 26 -26 - 10)^n = 32^n. Equivalently, this is simply the number of passwords that consist only of symbols. The last row corresponding to the intersection LUNS must have a zero size since there are no passwords that do not contain a character from L, U, N or S.

Intersection Sign Size
L + (94 - 26)^n
U + (94 - 26)^n
N + (94 - 10)^n
S + (94 - 32)^n
LU - (94 - 26 -26)^n
LN - (94 - 26 -10)^n
LS - (94 - 26 -32)^n
UN - (94 - 26 -10)^n
US - (94 - 26 -32)^n
NS - (94 - 10 - 32)^n
LUN + (94 - 26 -26 - 10)^n
LUS + (94 - 26 - 26 -32)^n
LNS + (94 - 26 - 10 -32)^n
UNS + (94 - 26 - 10 -32)^n
LUNS - 0

The size of the union is simply the sum of the terms in the Size column, added or subtracted according to the Sign column. Then following Step 3 above in our process, the table below shows the number of restricted passwords and their percentage as compared to unrestricted passwords, for lengths in the range 6 to 20 characters.

Length Unrestricted Restricted Percentage
6 .689 x 10^12 .188 x 10^12 27%
7 .648 x 10^14 .241 x 10^14 37%
8 .609 x 10^16 .280 x 10^16 46%
9 .572 x 10^18 .307 x 10^18 53%
10 .538 x 10^20 .323 x 10^20 60%
11 .506 x 10^22 .331 x 10^22 65%
12 .475 x 10^24 .333 x 10^24 70%
13 .447 x 10^26 .330 x 10^26 74%
14 .420 x 10^28 .324 x 10^28 77%
15 .395 x 10^30 .315 x 10^30 80%
16 .371 x 10^32 .305 x 10^32 82%
17 .349x 10^34 .294 x 10^34 84%
18 .328x 10^36 .282 x 10^36 86%
19 .308 x 10^38 .270 x 10^38 88%
20 .290 x 10^40 .258 x 10^40 89%

For practical password lengths (6 to 10 characters) we see that the restrictive policy reduces the set of possible passwords by at least 40%. For passwords with 8 or less characters the reduction is more than 50%. On the other hand, as the password length increases, the fraction of passwords that satisfy the restrictive policy also increases. This just merely indicates that it is increasingly unlikely that one of the character sets will be excluded in longer and longer passwords. For 50 (!) character passwords, 99.6% of all passwords satisfy the restrictive policy.

Discussion

The original question we set out to consider was how to quantify the impact of a restrictive password policy on an unrestricted password space - to what extent are password choices reduced? The IEP shows us that the reduction in the password space due to the restrictive policy is between 73% to 40% for password lengths in the range 6 to 10 characters.

This may seem a significant reduction in the password space over practical password lengths, but I don't think this observation is too alarming. For length 8 passwords the password space is halved by the restrictive policy, but even so, we are still talking about 10^16 passwords to the nearest power of ten. That is still a large number of passwords to search or guess.