Recently I decided to pick up probability theory again. This is a subject that always puzzles me. One thing I noticed is that sometimes the notations used in probability theory can be quite misleading. In contrast, a programming language is meant to be precise on what it says. So can we express concepts in probability theory using some programming language constructs so that one can understand them better? This post is such an attempt.
Modern probability theory is based on the concept of Sets. A probability space consists a set $\Omega$, a probability function $P$ that maps a subset (event) of $\Omega$ to a real number. (To be complete, it also includes a $\sigma$-algebra of subsets of $\Omega$ which I will not talk about.) The probability function $P$ satisfies two properties:
- $P(\emptyset) = 0, P(\Omega) = 1$
- If $A_1, A_2, \dots$ satisfy $A_i\cap A_j = \emptyset$ when $i \ne j$, we have
$$P(\cup A_i) = \sum P(A_i)$$
As in any complex system, we have (see SICP for an excellent introduction)
- Some primitive building blocks, e.g., elements of a set.
- Ways of combining primitive building block to form more complex object, e.g., forming of subsets, and union and intersection of sets
It is thus very important to understand to how to combine events to form new events. However, I found that the operator $\cap$ is often so heavily overloaded that its usage can be quite confusing. In set theory, $A\cap B$ means a new subset where $x \in A$ and $x \in B$ must both be true for any element $x$ of it. But in probability theory, it means both event $A$ and event $B$ occur in an experiment. This can mean the same thing as in set theory. But it can also mean quite different thing, namely a Cartesian product formed from two different sets, e.g., the definition of independency of events: $P(A\cap B) = P(A)P(B)$. If you think of $\cap$ as intersection of $A$ and $B$, this will never happen.
Tossing dice
We can use types in OCaml to experiment and clarify these concepts. Let’s use dice as an example. We can define dice as follows,
To be more general, let’s assume there are two types of dice, namely, fair dice and biased dice. For any dice, there are 6 outcomes of tossing it. These 6 outcomes are the primitive elements of the probability space. We can represent such an element as Five(Biased_dice)
, which means that we toss the fair dice and get a Five
. If we type it in the REPL, we have
|
|
Note the type of five
is biased_dice toss
. How nice it is! Now following the definition of probability space, we need to define a function that assign a number to each set element.
|
|
Given the above definitions, we can now form various events.
Sum-event
I call an event a sum-event when it is a subset of the probability space $\Omega$ which is the set containing all the primitive elements. For example, consider the event when tossing a biased dice, the outcome is either 3, 5, or 6. Let’s call this event a
. For simplicity, we can use list to represent a set:
The probability space itself can be represented as:
Using a plain list to represent a set, we have to be careful not to have duplicate entries, and it does not support union and intersection operations automatically. A better way is to use a proper set library such as Core.Set.
Now we can define the probability function for the probability space of dice tossing.
The above two functions simply sum the probabilities of each of the elements in an event. For example, for event a
, its probability is
It can be verified that they satisfy the two requirement for $P$. In programming language theory, the types toss
and fair_dice
are called sum types.
Come back to the $\cap$ operator, for sum-events, $\cap$ means intersection, this is because the result of tossing a dice can only have one value. Two events can both happen only when they share some common elements. Sum- events can not be independent, since occurrence of one value means other values are excluded.
Product-event
Suppose we wish to study the probability of tossing both a fair dice and a biased dice. For example, we can ask what is the probability that fair dice has value 3 and biased dice has value 6. Starting again from the definition of probability space, we can first ask what the primitive elements are. Clearly we can use an ordered pair $(x, y)$ to record the outcomes, where $x$ is the result of tossing the fair dice, $y$ is the result of biased dice. Since each of the dice has six faces, there are $6\times 6 = 36$ such ordered pairs. Next we can ask what probability we shall assign to each such pair. The answer is the product of the probabilities of each dice. The reason is that among all the outcomes, there are $P(x)$ of chance to get $x$ for the fair dice, and among the all the outcomes of tossing the two dice with the fair dice having a value $x$, the probability is $P(y)$. This is essentially the multiplication rule for counting outcomes of an experiment. So new treating these ordered pair as building blocks, we can proceed to construct probability function and events.
In OCaml, such an ordered pair can be nicely represented as a tuple, e.g., (Three(Fair_dice), Six(Biased_dice))
. Because these new building blocks have much richer internal structures, they have more interesting properties. A general event in these two-dice space looks like [(Five(Fair_dice), Four(Biased_dice)); (One(Fair_dice), Six(Biased_dice))]
, which is simply a set (list) of ordered pairs. However, if examine them closely, it is easy to see that we can at least classify them into two classes of sets.
- Events for the fair and biased dice are independent
For this class of events, we can always represent them as a product of the event by tossing fair dice and that of the biased dice. One example:1let two_dice_event = ([Two(Fair_dice); Three(Fair_dice)], [One(Biased_dice); Five(Biased_dice); Six(Biased_dice)])
OCaml will report the type of it is val two_dice_event : fair_dice toss list * biased_dice toss list
. This is an example of product type. As for sum-event, we can call such an event product-event. Consider again $\cap$ operator, if we have an event $A$ for fair dice, and an event $B$ for biased dice, the event $A\cap B$ actually means forming a product-event, and we have $P(A\cap B) = P(A)P(B)$, meaning $A$ and $B$ are independent events.
There is a relation or constraint between the events of fair and biased dice
For example, the event where the sum of the face values when tossing the two dice is 5:1let sum_five = [(Two(Fair_dice), Three(Biased_dice)); (Three(Fair_dice), Two(Biased_dice)); (One(Fair_dice), Four(Biased_dice)); (Four(Fair_dice), One(Biased_dice)]Such an event cannot represented as a product of fair dice event and biased dice event, it is a sum-event with a constraint.
After I used computer programs to solve probability problems, I used to implicitly use these concepts without realizing them. Programming using a good typed language forced me to think more clearly. And doing so actually helped me to understand both probability and programming better.