Try to think of completely random integers between and . Each number should have no relationship at all to the others. It’s not that easy right?
Human brains follow directions very well. However, this makes them very bad at being completely random, or even random at all. Similarly, computers struggle with being truly random. Yet, a \verb|random| function is ubiquitous across programming languages, calculators, and more. Where do these devices get their capabilities?
It turns out that these random functions are just very good at pretending to be random. Many such functions follow a known formula that just happens to be hard to guess by hand. In this article, we will examine pseudorandom functions, particularly the one in Java.
First, for the skeptics. Run the following Java code as many times as you want, and verify for yourself that the result is always the same.
import java.util.Random; public class JavaIsNotRandom { public static void main(String[] args) { randomNumbers(6); } public static void randomNumbers(long seed) { Random r = new Random(seed); for (int i = 0; i < 100; i++) { System.out.println(r.nextInt(32)); } } } You can find the same pattern by changing any of the numbers (, , , ). While the output differs between different parameters, the output is the same no matter how many times you run your code with the same set of parameters. Very suspicious for a “random” function.
Java employs what is known as a Linear Congruential Generator (LCG). This is a fancy way to describe the sequence where for constant integers , , . When you create a variable in the Random class, you (or Java) provide , a seed. Each successive time you use the same random variable, Java just takes the next term in the sequence and modifies it in a known way based on what you query for (e.g. truncate if you set a upper bound, add a decimal point, etc.). For instance, Java in essence returns the five bits from each of for in the code above.
The principle behind an LCG is that the period of will be for good choices of of , , and . For instance, if , is prime, and and are relatively prime, then by Fermat’s Little Theorem. If one makes really big, e.g. , then the period becomes large enough where it is not immediately obvious something deterministic is happening.
In fact, we don’t even need to be so picky about the values of the constants. The Hull-Dobell theorem states that the period of will be so long as and are relatively prime, is divisible by all factors of , and is divisible by if is divisible by . This allows developers to set to powers of , which makes modular arithmetic really simple computationally.
Specifically, Java has , , and . Other languages, such as certain random functions in C++, utilize different hyperparameters. Java also tries to make its random function even more random by letting the default seed be the current Unix time, which constantly varies. In other words, don’t set a seed for more random results.
LCGs have a number of advantages, including efficiency and simplicity. However, it is very easy to reverse an LCG, and LCGs also fail many statistical tests. Other pseudorandom functions, such as shift-register generators, which use bit-shift operations instead of addition and multiplications, are more random. For instance, Python uses a Mersenne twister, a shift-generator with a very large modulus. One can also pair multiple pseudorandom functions, or apply output mixing functions, to make the resuts even more random. Nevertheless, only certain natural phenomenon, such as atmospheric noise and nuclear decay are considered truly random.
All of this is not to say that pseudorandom number generators are bad. Randomness is important in applications ranging from clinical trials to gambling, and pseudomrandom number generators have served us well so far.