Inprobability theory,thelaw of total variance[1]orvariance decomposition formulaorconditional variance formulasorlaw of iterated variancesalso known asEve's law,[2]states that ifandarerandom variableson the sameprobability space,and thevarianceofis finite, then
In language perhaps better known to statisticians than to probability theorists, the two terms are the "unexplained" and the "explained" components of the variance respectively (cf.fraction of variance unexplained,explained variation). Inactuarial science,specificallycredibility theory,the first component is called the expected value of the process variance (EVPV) and the second is called the variance of the hypothetical means (VHM).[3]These two components are also the source of the term "Eve's law", from the initials EV VE for "expectation of variance" and "variance of expectation".
To understand the formula above, we need to comprehend the random variablesand.These variables depend on the value of:for a given,andare constant numbers. Essentially, we use the possible values ofto group the outcomes and then compute the expected values and variances for each group.
The "unexplained" componentis simply the average of all the variances ofwithin each group.
The "explained" componentis the variance of the expected values, i.e., it represents the part of the variance that is explained by the variation of the average value offor each group.
Weight of dogs by breed
For an illustration, consider the example of a dog show (a selected excerpt ofAnalysis_of_variance#Example). Let the random variablecorrespond to the dog weight andcorrespond to the breed. In this situation, it is reasonable to expect that the breed explains a major portion of the variance in weight since there is a big variance in the breeds' average weights. Of course, there is still some variance in weight for each breed, which is taken into account in the "unexplained" term.
Note that the "explained" term actually means "explained by the averages." If variances for each fixed(e.g., for each breed in the example above) are very distinct, those variances are still combined in the "unexplained" term.
Five graduate students take an exam that is graded from 0 to 100. Letdenote the student's grade andindicate whether the student is international or domestic. The data is summarized as follows:
Student
1
20
International
2
30
International
3
100
International
4
40
Domestic
5
60
Domestic
Among international students, the mean isand the variance is.
Among domestic students, the mean isand the variance is.
International
3/5
50
1266.6
Domestic
2/5
50
100
The part of the variance of"unexplained" byis the mean of the variances for each group. In this case, it is.The part of the variance of"explained" byis the variance of the means ofinside each group defined by the values of the.In this case, it is zero, since the mean is the same for each group. So the total variation is
SupposeXis a coin flip with the probability of heads beingh.Suppose that whenX= headsthenYis drawn from anormal distributionwith meanμhand standard deviationσh,and that whenX= tailsthenYis drawn from normal distribution with meanμtand standard deviationσt.Then the first, "unexplained" term on the right-hand side of the above formula is the weighted average of the variances,hσh2+ (1 −h)σt2,and the second, "explained" term is the variance of the distribution that givesμhwith probabilityhand givesμtwith probability1 −h.
There is a general variance decomposition formula forcomponents (see below).[4]For example, with two conditioning random variables:
which follows from the law of total conditional variance:[4]
Note that theconditional expected valueis a random variable in its own right, whose value depends on the value ofNotice that the conditional expected value ofgiven theeventis a function of(this is where adherence to the conventional and rigidly case-sensitive notation of probability theory becomes important!). If we writethen the random variableis justSimilar comments apply to theconditional variance.
One special case, (similar to thelaw of total expectation) states that ifis a partition of the whole outcome space, that is, these events are mutually exclusive and exhaustive, then
In this formula, the first component is the expectation of the conditional variance; the other two components are the variance of the conditional expectation.
The law of total variance can be proved using thelaw of total expectation.[5]First,
from the definition of variance. Again, from the definition of variance, and applying the law of total expectation, we have
Now we rewrite the conditional second moment ofin terms of its variance and first moment, and apply the law of total expectation on the right hand side:
Since the expectation of a sum is the sum of expectations, the terms can now be regrouped:
Finally, we recognize the terms in the second set of parentheses as the variance of the conditional expectation:
General variance decomposition applicable to dynamic systems[edit]
The following formula shows how to apply the general, measure theoretic variance decomposition formula[4]to stochastic dynamic systems. Letbe the value of a system variable at timeSuppose we have the internal histories (natural filtrations),each one corresponding to the history (trajectory) of a different collection of system variables. The collections need not be disjoint. The variance ofcan be decomposed, for all timesintocomponents as follows:
The decomposition is not unique. It depends on the order of the conditioning in the sequential decomposition.
The square of the correlation and explained (or informational) variation[edit]
In cases whereare such that the conditional expected value is linear; that is, in cases where
it follows from the bilinearity of covariance that
and
and the explained component of the variance divided by the total variance is just the square of thecorrelationbetweenandthat is, in such cases,
One example of this situation is whenhave a bivariate normal (Gaussian) distribution.
More generally, when the conditional expectationis a non-linear function of[4]
which can be estimated as thesquared from a non-linear regression ofonusing data drawn from the joint distribution ofWhenhas a Gaussian distribution (and is an invertible function of), oritself has a (marginal) Gaussian distribution, this explained component of variation sets a lower bound on themutual information:[4]
Law of propagation of errors– Effect of variables' uncertainties on the uncertainty of a function based on themPages displaying short descriptions of redirect targets
^abcdeBowsher, C.G. and P.S. Swain, Identifying sources of variation and the flow of information in biochemical networks, PNAS May 15, 2012 109 (20) E1320-E1328.
^Neil A. Weiss,A Course in Probability,Addison–Wesley, 2005, pages 380–383.