A mathematical diversion
I ran across the following problem (paraphrased) at the website of Shlomi Fish.
Throw 4 standard dice, remove the lowest valued die (if there is more than one minimal value die, remove only one). What is the expected value of the sum of the three remaining dice?
The author presents a solution which I found a bit hard to follow, so I decided to try my hand at it. Here is my solution, along with some thoughts on how to approach this sort of problem.
The post that follows is long, detailed, and continued after the fold.
Problems in probability and combinatorics can be deceptively tricky. It is easy to screw up. To guard against this, it is useful to approach the problem a couple of different ways. If different methods come up with different answers, you know something is wrong.
Often it is easy to run a computer simulation to get a ball park idea of the answer. This problem is very easy to simulate. I like to do this kind of stuff in Mathematica. Here’s the code for the simulation. I put in boldface certain terms which will be used throughout this discussion.
toss[howManyDice_] := Table[Random[Integer, {1, 6}], {howManyDice}]
score[aRoll_] := Total @ aRoll - Min @ aRoll
Aside: in Mathematica, f @ x means apply the function f to the argument x; this can also be written as f[x]. Square brackets mean functional application (method call, subroutine call), not array indexing as in many languages. I tend to use both f@x and f[x] notation; many Mma programmers prefer to stick to the latter. Total adds the elements of a list, Min selects the minimum.
Let’s put these definitions to work. A trial will consist of 50000 rolls of 4 dice. We run 10 trials.
trial := Mean@Table[score @ toss @ 4, {50000}] // N
trials = Table[trial, {10}]
The mysterious //N instructs Mathematica to present the answers as decimal numbers rather than as exact fractions. It is easier to read the answers that way. Functional application is right associative: f @ g @ h means f @ ( g @ h ). Mean computes the arithmetic mean, aka the average.
The results of the trials:
{12.2479, 12.2313, 12.2438, 12.2524, 12.2329,
12.2441, 12.2445, 12.2297, 12.2354, 12.2494}
The mean of the trials is 12.2412, and the variance is 0.0000664099, so we have a pretty good idea what the answer should look like.
Typically, one resorts to simulation when the sample space is too large to explore exhaustively. That is not the case here. Determining the exact answer is as easy as a simulation. The next bit of code is not long, but requires some explanation.
cp[a_, b_, c_, d_] := Flatten[Outer[List, a, b, c, d], 3] cp[a_] := cp[a, a, a, a] outcomes = cp @ Range @ 6;
Here cp denotes the set theoretic cross product, in this case of four sets, one per die. The single argument version of cp is syntactic sugar. Range@6 denotes the set {1,2,3,4,5,6}. The details need not concern us. The set called outcomes turns out to have 6^4 = 1296 elements and looks like this:
{{1, 1, 1, 1}, {1, 1, 1, 2}, {1, 1, 1, 3}, {1, 1, 1, 4},
{1, 1, 1, 5}, {1, 1, 1, 6}, {1, 1, 2, 1},
lots of stuff omitted
{6, 6, 6, 1}, {6, 6, 6, 2}, {6, 6, 6, 3}, {6, 6, 6, 4},
{6, 6, 6, 5}, {6, 6, 6, 6}}
The expected value of the score function over the set of outcomes is given by
Mean @ (score /@ outcomes)
which has value 15869/1296 or 12.2246. Recall that f @ x denotes f applied to x; f /@ x denotes the set {f @ y} taken over all elements y in x.
The exact answer shows that the simulation was pretty much on target. That’s not terribly surprising here, but in more complicated problems it might be hard to enumerate the sample space. In those cases simulation might be the way to go.
We have the answer to the original question, but this type of exhaustive computation does not provide the insight of a mathematical analysis. In particular, it is hard to generalize. It is easy to enumerate or sample rolling four dice. Rolling 100 dice results in a sample space too large to enumerate in practice. With the answer in hand to keep us on track, let’s do the math.
The approach is divide and conquer. Suppose you wanted to measure the area of an irregular polygon. You could divide it into non-overlapping triangles, find the area of each triangle, and then take the sum. We are going to do something similar. We will divide outcomes into suitable disjoint sets, compute the expected value of the score on each set, and combine these results to get the overall answer.
We need to decompose the sample space, outcomes, into suitable disjoint sets. Let L denote the value of the minimal die in a toss. Let {L = k} denote the subset of outcomes where the minimal value is k. The disjoint sets we want are precisely the {L=k} for k in {1, 2, 3, 4, 5, 6}.
Example: {L=5} is the set
{{5, 5, 5, 5}, {5, 5, 5, 6}, {5, 5, 6, 5}, {5, 5, 6, 6}, {5, 6, 5, 5}, {5, 6, 5, 6}, {5, 6, 6, 5}, {5, 6, 6, 6}, {6, 5, 5, 5}, {6, 5, 5, 6}, {6, 5, 6, 5}, {6, 5, 6, 6}, {6, 6, 5, 5}, {6, 6, 5, 6}, {6, 6, 6, 5}, {6, 6, 6, 6}}
We need to figure out how many elements are in {L=k} for each k, a quantity we write as #{L=k}. We also need the expected value of score over each set {L=k}. It would be simple to let the computer do the work, but not very revealing.
The key turns out to be the sets {L >= k}. These sets are not disjoint, but are easier to deal with initially. From their sizes and expected values we can derive those of the {L=k}.
First we compute the sizes #{L>=k}. Recall {L=5}. In particular, it looks like the sample space resulting from throwing four 2-sided dice with face values 5 and 6. In general, {L>=k} looks like the sample space of 4 dice with 7-k sides numbered k through 6. A bit of thought shows that #{L>=k} = (7-k)^4.
Next consider the expected value of the sum of the four dice in element of {L>=k}. It is four times the expected value of a single such die. This in turn is the average of the numbers k through 6, which is (k+6)/2. Thus the expected value of the sum of the four dice over {L>=k} is 2(k+6). Let us write this expected value as ES{L>=k}. Note that ES is the expected value of the sum of the dice, not the score of the dice. Still, it will be useful later. Armed with these results, we turn to the sets {L=k}.
First let’s figure out the sizes #{L=k}. The key is that {L=6}={L>=6}, and for k in 1…5,
#{L=k} = #{L>=k} - #{L>=k+1}. This is because {L=k} = {L>=k} - {L>=k+1}. where the minus sign denote a set difference: A-B is the set of elements of A not in B. In particular:
#{L=6} = 1
#{L=5} = #{L>=5} - #{L>=6} = (7-5)^4 - (7-6)^4 = 15
#{L=4} = #{L>=4} - #{L>=5} = (7-4)^4 - (7-5)^4 = 65
#{L=3} = #{L>=3} - #{L>=4} = (7-3)^4 - (7-4)^4 = 175
#{L=2} = #{L>=2} - #{L>=3} = (7-2)^4 - (7-3)^4 = 369
#{L=1} = #{L>=1} - #{L>=2} = (7-1)^4 - (7-2)^4 = 671
As a sanity check, we see that 671+369+175+65+15+1 = 1296 = 6^4.
We are almost there. Here is the key:
( ES{L>=k} * #{L>=k} - ES{L>=k+1} * #{L>=k+1} ) / (#{L>=k} - #{L>=k+1}) = ES{L=k}
Let’s unpack this, taking it from the top. {L>=k} is a subset of outcomes. In particular the set of rolls whose least element is k. #{L>=k} is the number of elements in {L>=k}. ES{L>=k} is the average of the sums of the elements in {L>=k}.
Recall that mean@X is sum@X divided by #X. Then mean@X * #X = sum@X. Using that general result, the product ES{L>=k} * #{L>=k} is seen to be the total of the sums of the elements in {L>=k}, and ES{L>=k+1} * #{L>=k+1} is the total of the sums of the elements in {L>=k+1}. Since {L=k} = {L>=k} - {L>=k+1}, we have computed the total of the sums of the rolls in {L=k}. Dividing by the number of elements in {L=k} gives us ES{L=k}
ES{L=k} is not what we really want. It is the expected value when all four dice are summed, we want the expected value of the three dice remaining after the least is thrown out, what I have been calling the expected value of the score. But all the elements of {L=k} have the same least value, k. The expected value of the score is simply ES{L=k}-k.
The final step is to put the pieces together. We simply weight the expected value of the score over {L=k} by the probability of {L=k} and sum the products over k. The probabilities are easy: P{L=k} = #{L=k}/6^4.
Now to go full circle, we can express the formula above as a program, evaluate it, and see whether it works.
The expected sum of four dice when L=k, aka ES{L=k}
ESum[k_] := 2*(k + 6)
The number of elements in the set {L>=k}, aka #{L>=k}
count[k_] := (7 - k)^4
The expected score over the set {L=k}
EScore[k_] :=
(ESum[k]*count[k] - ESum[k+1]*count[k+1])/(count[k] - count[k+1]) - k
expectations = Table[EScore@k, {k, 1, 6}]
probs = Table[count[k] - count[k + 1], {k, 1, 6}]/6^4
Mathematica use the period symbol to denote the inner (dot) product of two vectors:
probs . expectations // N
yields the value 12.2446, as expected.
The code is simple enough, but it suffers from magic numbers. We can abstract them out. Let maxpoint denote the highest possible point on a single die. It has been 6 so far, but we might want to allow for dice with more or fewer faces. We have been tossing four dice at a time; that could change, so we introduce ndice. The new code:
maxpoint = 6
ndice = 4
ESum[k_] := ndice * (k + maxpoint) / 2
count[k_] := (maxpoint + 1 - k)^ndice
EScore[k_] := (ESum[k]*count[k] - ESum[k + 1]*count[k + 1])/(count[k] - count[k + 1]) - k
expectations = Table[EScore@k, {k, 1, maxpoint}]
probs = Table[count[k] - count[k + 1], {k, 1, maxpoint}]/maxpoint^ndice
In trials not included here, I was able to vary the values of maxpoint and ndice, and the results agreed with the outcomes of (suitably modified) simulations. Changing the number of dice removed is left as an exercise for the reader.
Posted on July 29th, 2006 by pwyll
Filed under: General
Entries RSS