After binary coin toss visualization with Bayesian update in our earlier post, we move to generalization of multinomial Random Variable. What’s the example? When we did the coin toss, we got the T (tail) or H (head) as the outcome. For the multinomial, we will have the dice roll. So instead of having two outcomes, now there are more than two (multinomial).


Unfortunately, it is hard to visualize a six even four dimensions space as you shall see later on. So for the sake of visualization, we need a simplification, hence we reduce six possible outcomes became just three possible outcomes. Is there any dice with three dies? There is! You can have regular dice and just replace the label 4,5,6 with 3,2,1 and you’re good to go!

Of course, it doesn’t need to be a dice. It could be a party selections in politics, president selections (it’s binomial again), shoes brand selections, or anything that results in more than two outcomes.


Mathematicians are very fond of doing generalization. In the previous experiment, we played along with two dimensionals using conjugate prior “beta”. But, it is not enough now. We need a higher level of conjugate prior: Dirichlet. Dirichlet distribution is often aptly called the distribution of “distribution”. With certain parameters we could get various and odd PDF. And yet, when we sum all the probability values, we still get 1 as the result. Dirichlet distribution can have as many dimension as you like. For 3 dimensional Dirichlet, you will be provided with two dimensional plane to sample the values. For four dimensional dirichlet, you will have three dimensional object to sample the values. That’s the “simplex” of Dirichlet where you pick the area and you get the probabilities from. With the combination of multinomial as the likelihood, becomes Dirichlet-Multinomial update, that is mathematically convenient: i.e. you will have constant posterior form that you can use it as often as you like.


Take a look at the dice roll I visualize below : 12331121211112121. Look! The movement of maximal distribution leads downwards and evades the upwards. The upwards reflects the value of 3. It means that the dice is not fair, because it tends showing the die of value 1 or 2. You guys, who want to simulate the distribution changes of this dice roll, can look at this code or at this github link (with Python code).


Note: if you want to complain why don’t make a hexagonal for dice with six dies to show the probability average, it is because the problem will be too reduced. Dirichlet could handle 6 or even infinite dimensions, that cannot be reduced easily to a planar area for visualization.