Ishaan - Randomness of One Time Passwords

Introduction

At school, a couple of my friends were discussing the ubiquitous OTPs or One Time Passwords that are used for authentication to various apps and websites. They were saying that these 6 digit OTPs are not really random but that they contain repeated digits much more often than would be expected in truly random strings of 6 digits. They were surmising that this is by design since repeated digits tend to make numbers easier to remember at one glance. This can reduce errors that users may make when entering an OTP after receiving it by text message on a mobile phone or over email. In other words, they were saying that repeated digits make OTPs user friendly.

I found this line of thought quite intriguing. While there was some sense to their point of view, I felt that such predictability would dramatically reduce the security provided by OTPs. Since these OTPs are extensively used for financial transactions and identity verification, such laxity in security would be unacceptable.

So I decided to carry out some experimentation in this area with the aim of drawing my own conclusions.

Approach

Let us stick to 6 digit OTPs as they are the most common and also the topic of the discussion my friends were having.

We will generate a large number of random 6 digits strings using a cryptographically secure random number generator and then check the distribution of duplicate pairs of digits in that large set of generated 6 digit random strings. Then we will check if they match the probabilities that would be expected in theory if the 6 digit strings are truly random.

We will repeat the above experiment using a standard Python package pyotp that generates 6 digit OTPs. This is a typical library that is used in real-world apps.

Theoretical Probabilities of duplicate pairs

So, first let us figure out the expected theoretical probabilities using combinatorics.

Let N be the number of OTPs. For 6 digit OTPs, N = $10^6$. We have to calculate the number of ways of arranging digits into 6 slots. This will give us the number of OTPs that have the desired characteristic.

We define four cases based on the number of duplicate pairs of digits in an OTP i.e. 0,1,2 and 3.

Case i): all distinct digits i.e. 0 duplicate pairs

We have to fill 6 slots by selecting 6 digits from the set of 10 available and for each selection, we can permute the digits in 6! ways. In other words this is ${}^{10} _6 ! $ which is actually ${}^{10} \mathrm { P }_6$, and it leads to the probability $p_1$.

$p_1 = {}^{10} \mathrm{ P }_6 / N$
Case ii): exactly 1 duplicate pair of digits, other 4 unique

We have to select the duplicate digit and this can be done in ${}^{10} \mathrm { C }_1$ ways. For each choice of the duplicate digit, we have to select 2 out of the 6 slots which this digit will occupy. And the remaining 4 slots can be filled by selecting 4 out of the remaining 9 digits and permuting these 4 for each choice. The probability $p_2$ of getting exactly 1 duplicate pair if digits is thus,

$p_2 = {}^{10} \mathrm { C }_1 \cdot {}^{6} \mathrm { C }_2 \cdot {}^{9} \mathrm { P }_4 / N$
Case iii): exactly 2 duplicate pairs of digits, other 2 unique

We have to select the 2 duplicated digits out of the 10. Next we select 2 out of the 6 slots for the first duplicated digit and then 2 out of the remaining 4 slots for the second duplicated digit. Finally we select 2 from the remaining 8 digits and permute them in the 2 remaining slots. The relevant probablity $p_3$ works out to

$p_3 = {}^{10} \mathrm { C }_2 \cdot {}^{6} \mathrm { C }_2 \cdot {}^{4} \mathrm { C }_2 \cdot {}^{8} \mathrm { P }_2 / N$
Case iv): exactly 3 duplicate pairs of digits

Extending the logic in Case iii), we can select the 3 duplicated digits and assign the pairs of duplicate digits into the slots successively. Finally, this gives us the probablity $p_4$.

$p_4 = {}^{10} \mathrm { C }_3 \cdot {}^{6} \mathrm { C }_2 \cdot {}^{4} \mathrm { C }_2 \cdot {}^{2} \mathrm { C }_2 / N$

Next we calculate the above theoretical probabilities using the Python math library. Note that these four cases cover only duplicate pairs and not 3 through 6 repetitions of the same digit. Since all possibilties are not included, the sum of the probababilities $p_1$ through $p_4$ is less than 1.

Code

N = 1_000_000
# 0 duplicate pairs i.e. all 6 distinct digits
n_0d = math.perm(10,6) / N

# exactly 1 duplicate pair, others unique
n_1d = math.comb(10,1) * math.comb(6,2) * math.perm(9, 4) / N

# exactly 2 duplicate pairs, others unique
n_2d = math.comb(10,2) * math.comb(6,2) * math.comb(4,2) * math.perm(8,2) / N

# exactly 3 duplicate pairs, other unique
n_3d = math.comb(10,3) * math.comb(6,2) * math.comb(4,2) * math.comb(2,2) / N

print(f"Case 1: p1=%0.3f \nCase 2: p2=%0.3f \nCase 3: p3=%0.3f \nCase 4: p4=%0.3f " 
      % (n_0d, n_1d, n_2d, n_3d))

Case 1: p1=0.151 
Case 2: p2=0.454 
Case 3: p3=0.227 
Case 4: p4=0.011

The results are summarized in the table below for convenience.

Case	Description	Probability Expression	Computed Probability
1	all distinct	${}^{10} \mathrm{ P }_6 / N$	0.151
2	1 duplicate pair	${}^{10} \mathrm { C }_1 \cdot {}^{6} \mathrm { C }_2 \cdot {}^{9} \mathrm { P }_4 / N$	0.454
3	2 duplicate pairs	${}^{10} \mathrm { C }_2 \cdot {}^{6} \mathrm { C }_2 \cdot {}^{4} \mathrm { C }_2 \cdot {}^{8} \mathrm { P }_2 / N$	0.227
4	3 duplicate pairs	${}^{10} \mathrm { C }_3 \cdot {}^{6} \mathrm { C }_2 \cdot {}^{4} \mathrm { C }_2 \cdot {}^{2} \mathrm { C }_2 / N$	0.011

The theoretical probabilities themselves are a bit unintuitive and definitely seem to challenge my friends’ hypothesis. For example, you are likely to encounter a pair of duplicate digits in an OTP almost 45% of the time, whereas the likelihood of seeing all unique digits is only around 15%. Thus, the probability of one pair of duplicate digit is about 3 times as much as all unique digits. And the probability of two duplicate pairs is 1.5 times that of all unique digits. However, the probability of three duplicate pairs is indeed pretty low.

Upon reflection, this isn’t particularly surprising. After all, OTPs are 6-digit strings selected from 10 possible digits, making repetitions fairly likely. Notably, if OTPs were only 4 digits long, the probability of having all unique digits would be signiicantly higher (p=${}^{10} \mathrm {P}_4 / {10}^6$ = 0.504).

Verifying against random 6-digit strings

We generate K=10,000 random 6-digit strings using a cryptographically secure random number generator. Specifically, we utilize secrets.SecureRandom, which leverages the operating system’s entropy device (/dev/random) to ensure randomess. We instruct the generator to produce uniformly distributed random numbers within the range [0, 999999] and pad with leading zeroes if a generated number has less than 6 digits. This gives us 10000 6-digit strings, and for each string, we calculate the number of pairs of duplicate digits, taking care to exclude cases where a digit is repeated more than twice. Finally, we compute the probability for each of the four cases under consideration based on the results from our experimental run.

Code

# return None for cases where the unpaired are not unique (i.e. repeated more than twice)
# e.g. 121333 => None as the digits other than 1,1 are not unique
def getDigitPairCount(otp_str):
    counts = np.zeros(10)
    for d in list(otp_str):
        counts[int(d)]+=1
    
    numPairs = 0
    for c in counts:
        if c == 2:
            numPairs+=1                

    uniqueDigits = np.unique(list(otp_str)).size
    if not uniqueDigits == 6 - numPairs:
        numPairs = None
        
    return numPairs


K=10_000

pairCounts = np.zeros(4) # No. of pairs can be 0,1,2 or 3
sample_otps = []
my_secure_rng = secrets.SystemRandom()
for i in range(K):
    otp = my_secure_rng.randrange(0, 1_000_000) # otp E [000000,999999]
    otp_str = "%06d" % otp
    if(i%100) == 0:    # print every 100th OTP
        sample_otps.append(otp_str) #print(otp_str, end=' ')
    if getDigitPairCount(otp_str) != None:
        pairCounts[getDigitPairCount(otp_str)]+=1

for i, count in enumerate(pairCounts):
    print(f"Case %d: p=%0.3f" % (i, count/K))

print(f"25 Sample OTPs: %s" % (" ".join(sample_otps[0:25])))

Case 0: p=0.157
Case 1: p=0.451
Case 2: p=0.227
Case 3: p=0.011
25 Sample OTPs: 481855 431434 659052 894006 149413 357159 507715 288934 236784 648318 249206 002777 912940 462657 826182 617526 003026 497279 112231 744754 054510 801436 743241 834209 392919

The output above clearly show that the experimental probabilities closely match the theoretical probabilities we computed earlier. A list of 25 6-digit OTPs from among the 10000 produced during the run are also provided. A quick scan through it shows that a significant number of them have one or more duplicate pairs and only pretty few have all unique digits. This matches what we expect based on our analysis so far.

As we mentioned before, this might not be so intuitive when a lay person thinks about random strings of 6 digits. No wonder my friends found it unexpected and started contemplating that the generated OTPs are filtered to include only those with repeated digits. However, our analysis shows that they were clearly off the mark.

Finally we repeat the experiment using a standard OTP generation library PyOTP instead of a base secure random number generator.

Verifying against OTPs generated by PyOTP library

Code

hotp = pyotp.HOTP('base32secret3232') # arbitrary secret string is used only for illustration. Such strings should not be displayed in serious usage !!

pairCounts = np.zeros(4) # No. of pairs can be 0,1,2 or 3
sample_otps = []

for i in range(K):
    otp_str = hotp.at(i)
    if(i%100) == 0:    # print every 100th OTP
        sample_otps.append(otp_str)
    if getDigitPairCount(otp_str) != None:
        pairCounts[getDigitPairCount(otp_str)]+=1

for i, count in enumerate(pairCounts):
    print(f"Case %d: p=%0.3f" % (i, count/K))

print(f"25 Sample OTPs: %s" % (" ".join(sample_otps[0:25])))

Case 0: p=0.153
Case 1: p=0.455
Case 2: p=0.224
Case 3: p=0.012
25 Sample OTPs: 260182 702511 837273 522557 197509 746497 629786 978190 106958 417581 673467 517053 514859 219121 764321 770353 463739 383506 139672 782891 316341 009184 101881 187781 450406

In this run also, we find that the experimental probabilities are closely aligned with the theoretical ones.

Conclusion

We find no good reason to doubt that the OTPs we encounter every day are random. Probably our intuitive sense of random numbers in not too strong and we get a better feel only after acquiring experience and familiarity with them. In the absence of that, we tend to assume incorrectly that there are far too many patterns in the numbers.

Although we looked for duplicate pairs only , it gave us good insight into the topic. For a more detailed analysis we should consider higher numbers of repeated digits, sequences of digits in ascending or descending order, frequencies of digits and so on. But the approach for analysing them will be roughly the same i.e. compute the theoretical probability assuming uniformly distributed random numbers and then verify against the distribution observed in a large set of genererated OTPs.

References

Easy Copy MathJax