The shortcut method can be justified formally, but there are some additional details that need to be checked for it to work. To see why, consider an alternate ruleset for the game: imagine that
- You begin with flipping all the coins, as before.
- However, in the second phase, you're only allowed to select an even number of coins to flip again.
Then the shortcut reasoning still seems to say "after the first phase, we have $4$ heads on average, and we can reflip the other $4$ coins and get $2$ more heads on average". However, in fact, the expected value is not $6$ in the alternate version of the game - it decreases, because sometimes you'll have a coin left over that you're not going to get to flip again.
To justify a shortcut like this formally, and see what needs to happen for it to work, we use the law of total expectation.
The law of total expectation says $\mathbb E[\mathbb E[X \mid Y]] = \mathbb E[X]$ , but the first time I saw that, it made no sense to me, so I'll try to explain it in a bit more detail.
In this case, let $Y$ be the number of heads in the first part of the game, and let $X$ be the number of heads at the end. Then the key is to work out, for every possible value of $Y$ : what is the expected value of $X$ given that value of $Y$ ?
If there are $Y$ coins that land heads in the first phase, then there are $8-Y$ coins that we can flip again. Of those coins, on average, $\frac{8-Y}{2}$ will land heads. Together with the $Y$ coins that were already heads, that's a total of $\frac{8-Y}{2} + Y = \frac{8+Y}{2}$ . This calculation - "when there are $Y$ heads in the first phase, on average there are $\frac{8+Y}{2}$ heads at the end" - is summarized as the equation " $\mathbb E[X \mid Y] = \frac{8+Y}{2}$ ".
The idea behind the law of total expectation is: to figure out the average value of $X$ (whose distribution is a complicated thing we don't understand yet), we can instead figure out the average value of $\frac{8+Y}{2}$ . Here, we have $$\mathbb E\left[\frac{8+Y}2\right] = 4 + \mathbb E[Y/2] = 4 + \mathbb E[Y]/2 = 4 + 4/2 = 6.$$ So the arithmetic is almost as simple as in the shortcut calculation.
However, an important ingredient is computing the formula $\mathbb E[X \mid Y] = \frac{8+Y}{2}$ . This required us to figure out the average value of $X$ for every possible value of $Y$ , not just for the average value of $Y$ . Only once we have the formula $\frac{8+Y}{2}$ can we start looking at the average value of $Y$ .
There is actually another approach that lets us calculate the final distribution, not just its average. Consider what happens from the perspective of a single coin:
- It gets flipped in the first phase.
- If it lands tails, it gets flipped again in the second phase.
As a result, the probability of the coin being heads at the end is $\frac34$ , and the probability of it being tails is $\frac14$ . In other words, the distribution of the number of heads at the end is simply $\text{Binomial}(n=8, p = \frac34)$ .