Types in OCaml and Sets in Probability

Recently I decided to pick up probability theory again. This is a subject that always puzzles me. One thing I noticed is that sometimes the notations used in probability theory can be quite misleading. In contrast, a programming language is meant to be precise on what it says. So can we express concepts in probability theory using some programming language constructs so that one can understand them better? This post is such an attempt.

Modern probability theory is based on the concept of Sets. A probability space consists a set $\Omega$, a probability function $P$ that maps a subset (event) of $\Omega$ to a real number. (To be complete, it also includes a $\sigma$-algebra of subsets of $\Omega$ which I will not talk about.) The probability function $P$ satisfies two properties:

$P(\emptyset) = 0, P(\Omega) = 1$
If $A_1, A_2, \dots$ satisfy $A_i\cap A_j = \emptyset$ when $i \ne j$, we have
$$P(\cup A_i) = \sum P(A_i)$$

As in any complex system, we have (see SICP for an excellent introduction)

Some primitive building blocks, e.g., elements of a set.
Ways of combining primitive building block to form more complex object, e.g., forming of subsets, and union and intersection of sets

It is thus very important to understand to how to combine events to form new events. However, I found that the operator $\cap$ is often so heavily overloaded that its usage can be quite confusing. In set theory, $A\cap B$ means a new subset where $x \in A$ and $x \in B$ must both be true for any element $x$ of it. But in probability theory, it means both event $A$ and event $B$ occur in an experiment. This can mean the same thing as in set theory. But it can also mean quite different thing, namely a Cartesian product formed from two different sets, e.g., the definition of independency of events: $P(A\cap B) = P(A)P(B)$. If you think of $\cap$ as intersection of $A$ and $B$, this will never happen.

Tossing dice

We can use types in OCaml to experiment and clarify these concepts. Let’s use dice as an example. We can define dice as follows,

type 'a toss =
  | One   of 'a
  | Two   of 'a
  | Three of 'a
  | Four  of 'a
  | Five  of 'a
  | Six   of 'a
type fair_dice   = Fair_dice
type biased_dice = Biased_dice

To be more general, let’s assume there are two types of dice, namely, fair dice and biased dice. For any dice, there are 6 outcomes of tossing it. These 6 outcomes are the primitive elements of the probability space. We can represent such an element as Five(Biased_dice), which means that we toss the fair dice and get a Five. If we type it in the REPL, we have

1 2	let five = Five(Biased_dice);; (* val five : biased_dice toss = Five Biased_dice *)

Note the type of five is biased_dice toss. How nice it is! Now following the definition of probability space, we need to define a function that assign a number to each set element.

let prob_fair_elem x =
  match x with
  | One(Fair_dice) -> 1.0 /. 6.0
  | _              -> 1.0 /. 6.0
let prob_biased_elem x =
  match x with
  | One(Biased_dice)   -> 1.0 /. 12.0
  | Two(Biased_dice)   -> 2.0 /. 12.0
  | Three(Biased_dice) -> 3.0 /. 12.0
  | Four(Biased_dice)  -> 1.0 /. 12.0
  | Five(Biased_dice)  -> 1.0 /. 12.0
  | Six(Biased_dice)   -> 4.0 /. 12.0

Given the above definitions, we can now form various events.

Sum-event

I call an event a sum-event when it is a subset of the probability space $\Omega$ which is the set containing all the primitive elements. For example, consider the event when tossing a biased dice, the outcome is either 3, 5, or 6. Let’s call this event a. For simplicity, we can use list to represent a set:

1	let a = [Three(Biased_dice); Five(Biased_dice); Six(Biased_dice)]

The probability space itself can be represented as:

1	let omega_biased = [One(Biased_dice); Two(Biased_dice); Three(Biased_dice); Four(Biased_dice); Five(Biased_dice); Six(Biased_dice)]

Using a plain list to represent a set, we have to be careful not to have duplicate entries, and it does not support union and intersection operations automatically. A better way is to use a proper set library such as Core.Set.

Now we can define the probability function for the probability space of dice tossing.

let prob_fair event =
  List.fold ~f:(fun x y -> x +. prob_fair_elem y) ~init:0.0 event
let prob_biased event =
  List.fold ~f:(fun x y -> x +. prob_biased_elem y) ~init:0.0 event

The above two functions simply sum the probabilities of each of the elements in an event. For example, for event a, its probability is

1	(prob_biased_elem Three(Biased_dice)) +. (prob_biased_elem Five(Biased_dice)) +. (prob_biased_elem Six(Biased_dice))

It can be verified that they satisfy the two requirement for $P$. In programming language theory, the types toss and fair_dice are called sum types.

Come back to the $\cap$ operator, for sum-events, $\cap$ means intersection, this is because the result of tossing a dice can only have one value. Two events can both happen only when they share some common elements. Sum- events can not be independent, since occurrence of one value means other values are excluded.

Product-event

Suppose we wish to study the probability of tossing both a fair dice and a biased dice. For example, we can ask what is the probability that fair dice has value 3 and biased dice has value 6. Starting again from the definition of probability space, we can first ask what the primitive elements are. Clearly we can use an ordered pair $(x, y)$ to record the outcomes, where $x$ is the result of tossing the fair dice, $y$ is the result of biased dice. Since each of the dice has six faces, there are $6\times 6 = 36$ such ordered pairs. Next we can ask what probability we shall assign to each such pair. The answer is the product of the probabilities of each dice. The reason is that among all the outcomes, there are $P(x)$ of chance to get $x$ for the fair dice, and among the all the outcomes of tossing the two dice with the fair dice having a value $x$, the probability is $P(y)$. This is essentially the multiplication rule for counting outcomes of an experiment. So new treating these ordered pair as building blocks, we can proceed to construct probability function and events.

In OCaml, such an ordered pair can be nicely represented as a tuple, e.g., (Three(Fair_dice), Six(Biased_dice)). Because these new building blocks have much richer internal structures, they have more interesting properties. A general event in these two-dice space looks like [(Five(Fair_dice), Four(Biased_dice)); (One(Fair_dice), Six(Biased_dice))], which is simply a set (list) of ordered pairs. However, if examine them closely, it is easy to see that we can at least classify them into two classes of sets.

Events for the fair and biased dice are independent
For this class of events, we can always represent them as a product of the event by tossing fair dice and that of the biased dice. One example:
1
let two_dice_event = ([Two(Fair_dice); Three(Fair_dice)], [One(Biased_dice); Five(Biased_dice); Six(Biased_dice)])

OCaml will report the type of it is val two_dice_event : fair_dice toss list * biased_dice toss list. This is an example of product type. As for sum-event, we can call such an event product-event. Consider again $\cap$ operator, if we have an event $A$ for fair dice, and an event $B$ for biased dice, the event $A\cap B$ actually means forming a product-event, and we have $P(A\cap B) = P(A)P(B)$, meaning $A$ and $B$ are independent events.

There is a relation or constraint between the events of fair and biased dice
For example, the event where the sum of the face values when tossing the two dice is 5:
1
let sum_five = [(Two(Fair_dice), Three(Biased_dice)); (Three(Fair_dice), Two(Biased_dice)); (One(Fair_dice), Four(Biased_dice)); (Four(Fair_dice), One(Biased_dice)]
Such an event cannot represented as a product of fair dice event and biased dice event, it is a sum-event with a constraint.

After I used computer programs to solve probability problems, I used to implicitly use these concepts without realizing them. Programming using a good typed language forced me to think more clearly. And doing so actually helped me to understand both probability and programming better.