A common operation in software is to reduce a larger number to a smaller one by the modulo operation. As a toy example, suppose we have random numbers in the range 0–7 but need to reduce those values to the range 0–4. The standard way of doing that is to reduce the first set modulo 5.
Now suppose the first set is uniformly distributed. That is, each value from 0 to 7 is equally likely. Suppose that we want the reduced set to be uniformly distributed too. Let’s see what happens:
n | n (mod 5) |
---|---|
0 | 0 |
1 | 1 |
2 | 2 |
3 | 3 |
4 | 4 |
5 | 0 |
6 | 1 |
7 | 2 |
Notice that the values 0, 1, 2 occur twice but the numbers 3 and 4 occur only once so the reduced set is definitely not uniformly distributed. That’s because 5 does not divide the number of values in the first set (8) evenly. If the original set had been 0–9, the reduced set would have been uniformly distributed too.
This problem is called modulo bias. I wrote about it in passing in my post on a Diceware implementation. Yolan Romailler has a nice post on modulo bias over at Kudelski Security Research. He gives a slightly larger example of modulo bias, and describes some of the ways of dealing with it. More importantly, he describes how it can be a significant problem, especially in cryptographical applications, where reduction with the modulo operation is common.
If you ever use the modulo operation, you should take a look at Romailler’s post to make sure you aren’t introducing unintended biases.