Marginal distribution

Inprobability theoryandstatistics,themarginal distributionof asubsetof acollectionofrandom variablesis theprobability distributionof the variables contained in the subset. It gives the probabilities of various values of the variables in the subset without reference to the values of the other variables. This contrasts with aconditional distribution,which gives the probabilities contingent upon the values of the other variables.

Marginal variablesare those variables in the subset of variables being retained. These concepts are "marginal" because they can be found by summing values in a table along rows or columns, and writing the sum in the margins of the table.^[1]The distribution of the marginal variables (the marginal distribution) is obtained bymarginalizing(that is, focusing on the sums in the margin) over the distribution of the variables being discarded, and the discarded variables are said to have beenmarginalized out.

The context here is that the theoretical studies being undertaken, or thedata analysisbeing done, involves a wider set of random variables but that attention is being limited to a reduced number of those variables. In many applications, an analysis may start with a given collection of random variables, then first extend the set by defining new ones (such as the sum of the original random variables) and finally reduce the number by placing interest in the marginal distribution of a subset (such as the sum). Several different analyses may be done, each treating a different subset of variables as the marginal distribution.

Definition

Marginal probability mass function

Given a knownjoint distributionof twodiscreterandom variables,say, $X$ and $Y$ ,the marginal distribution of either variable – $X$ for example – is theprobability distributionof $X$ when the values of $Y$ are not taken into consideration. This can be calculated by summing thejoint probabilitydistribution over all values of $Y$ .Naturally, the converse is also true: the marginal distribution can be obtained for $Y$ by summing over the separate values of $X$ .

p_{X}(x_{i})=\sum _{j}p(x_{i},y_{j})

,and

p_{Y}(y_{j})=\sum _{i}p(x_{i},y_{j})

Joint and marginal distributions of a pair of discrete random variables,XandY,dependent, thus having nonzeromutual information $I (X; Y)$ .The values of the joint distribution are in the 3×4 rectangle; the values of the marginal distributions are along the right and bottom margins.
X Y	x₁	x₂	x₃	x₄	p_Y(y) ↓
y₁	⁠4/32⁠	⁠2/32⁠	⁠1/32⁠	⁠1/32⁠	⁠8/32⁠
y₂	⁠3/32⁠	⁠6/32⁠	⁠3/32⁠	⁠3/32⁠	⁠15/32⁠
y₃	⁠9/32⁠	0	0	0	⁠9/32⁠
p_X(x) →	⁠16/32⁠	⁠8/32⁠	⁠4/32⁠	⁠4/32⁠	⁠32/32⁠

Amarginal probabilitycan always be written as anexpected value: $p_{X}(x)=\int _{y}p_{X\mid Y}(x\mid y)\,p_{Y}(y)\,\mathrm {d} y=\operatorname {E} _{Y}[p_{X\mid Y}(x\mid y)]\;.$

Intuitively, the marginal probability ofXis computed by examining the conditional probability ofXgiven a particular value ofY,and then averaging this conditional probability over the distribution of all values ofY.

This follows from the definition ofexpected value(after applying thelaw of the unconscious statistician) $\operatorname {E} _{Y}[f(Y)]=\int _{y}f(y)p_{Y}(y)\,\mathrm {d} y.$

Therefore, marginalization provides the rule for the transformation of the probability distribution of a random variableYand another random variable $X = g (Y)$ : $p_{X}(x)=\int _{y}p_{X\mid Y}(x\mid y)\,p_{Y}(y)\,\mathrm {d} y=\int _{y}\delta {\big (}x-g(y){\big )}\,p_{Y}(y)\,\mathrm {d} y.$

Marginal probability density function

Given twocontinuousrandom variablesXandYwhosejoint distributionis known, then the marginalprobability density functioncan be obtained by integrating thejoint probabilitydistribution, $f$ ,overY,and vice versa. That is

f_{X}(x)=\int _{c}^{d}f(x,y)\,dy

f_{Y}(y)=\int _{a}^{b}f(x,y)\,dx

where $x\in [a,b]$ ,and $y\in [c,d]$ .

Marginal cumulative distribution function

Finding the marginalcumulative distribution functionfrom the joint cumulative distribution function is easy. Recall that:

Fordiscreterandom variables, $F(x,y)=P(X\leq x,Y\leq y)$
Forcontinuous random variables, $F(x,y)=\int _{a}^{x}\int _{c}^{y}f(x',y')\,dy'dx'$

IfXandYjointly take values on [a,b] × [c,d] then

F_{X}(x)=F(x,d)

and

F_{Y}(y)=F(b,y)

Ifdis ∞, then this becomes a limit ${\textstyle F_{X}(x)=\lim _{y\to \infty }F(x,y)}$ .Likewise for $F_{Y}(y)$ .

Marginal distribution vs. conditional distribution

Definition

Themarginal probabilityis the probability of a single event occurring, independent of other events. Aconditional probability,on the other hand, is the probability that an event occurs given that another specific eventhas alreadyoccurred. This means that the calculation for one variable is dependent on another variable.^[2]

The conditional distribution of a variable given another variable is the joint distribution of both variables divided by the marginal distribution of the other variable.^[3]That is,

Fordiscreterandom variables, $p_{Y|X}(y|x)=P(Y=y\mid X=x)={\frac {P(X=x,Y=y)}{P_{X}(x)}}$
Forcontinuous random variables, $f_{Y|X}(y|x)={\frac {f_{X,Y}(x,y)}{f_{X}(x)}}$

Example

Suppose there is data from a classroom of 200 students on the amount of time studied (X) and the percentage of correct answers (Y).^[4]Assuming thatXandYare discrete random variables, the joint distribution ofXandYcan be described by listing all the possible values ofp(x_i,y_j), as shown in Table.3.

Two-way tableof dataset of the relationship in a classroom of 200 students between the amount of time studied and the percentage correct
X Y	Time studied (minutes)
% correct		x₁(0-20)	x₂(21-40)	x₃(41-60)	x₄(>60)	p_Y(y) ↓
	y₁(0-20)	⁠2/200⁠	0	0	⁠8/200⁠	⁠10/200⁠
	y₂(21-40)	⁠10/200⁠	⁠2/200⁠	⁠8/200⁠	0	⁠20/200⁠
	y₃(41-59)	⁠2/200⁠	⁠4/200⁠	⁠32/200⁠	⁠32/200⁠	⁠70/200⁠
	y₄(60-79)	0	⁠20/200⁠	⁠30/200⁠	⁠10/200⁠	⁠60/200⁠
	y₅(80-100)	0	⁠4/200⁠	⁠16/200⁠	⁠20/200⁠	⁠40/200⁠
	p_X(x) →	⁠14/200⁠	⁠30/200⁠	⁠86/200⁠	⁠70/200⁠	1

Themarginal distributioncan be used to determine how many students scored 20 or below: $p_{Y}(y_{1})=P_{Y}(Y=y_{1})=\sum _{i=1}^{4}P(x_{i},y_{1})={\frac {2}{200}}+{\frac {8}{200}}={\frac {10}{200}}$ ,meaning 10 students or 5%.

Theconditional distributioncan be used to determine the probability that a student that studied 60 minutes or more obtains a scored of 20 or below: $p_{Y|X}(y_{1}|x_{4})=P(Y=y_{1}|X=x_{4})={\frac {P(X=x_{4},Y=y_{1})}{P(X=x_{4})}}={\frac {8/200}{70/200}}={\frac {8}{70}}={\frac {4}{35}}$ ,meaning there is about a 11% probability of scoring 20 after having studied for at least 60 minutes.

Real-world example

Suppose that the probability that a pedestrian will be hit by a car, while crossing the road at a pedestrian crossing, without paying attention to the traffic light, is to be computed. Let H be adiscrete random variabletaking one value from {Hit, Not Hit}. Let L (for traffic light) be a discrete random variable taking one value from {Red, Yellow, Green}.

Realistically, H will be dependent on L. That is, P(H = Hit) will take different values depending on whether L is red, yellow or green (and likewise for P(H = Not Hit)). A person is, for example, far more likely to be hit by a car when trying to cross while the lights for perpendicular traffic are green than if they are red. In other words, for any given possible pair of values for H and L, one must consider thejoint probability distributionof H and L to find the probability of that pair of events occurring together if the pedestrian ignores the state of the light.

However, in trying to calculate themarginal probabilityP(H = Hit), what is being sought is the probability that H = Hit in the situation in which the particular value of L is unknown and in which the pedestrian ignores the state of the light. In general, a pedestrian can be hit if the lights are red OR if the lights are yellow OR if the lights are green. So, the answer for the marginal probability can be found by summing P(H | L) for all possible values of L, with each value of L weighted by its probability of occurring.

Here is a table showing the conditional probabilities of being hit, depending on the state of the lights. (Note that the columns in this table must add up to 1 because the probability of being hit or not hit is 1 regardless of the state of the light.)

Conditional distribution: $P(H\mid L)$
L H	Red	Yellow	Green
Not Hit	0.99	0.9	0.2
Hit	0.01	0.1	0.8

To find the joint probability distribution, more data is required. For example, suppose P(L = red) = 0.2, P(L = yellow) = 0.1, and P(L = green) = 0.7. Multiplying each column in the conditional distribution by the probability of that column occurring results in the joint probability distribution of H and L, given in the central 2×3 block of entries. (Note that the cells in this 2×3 block add up to 1).

Joint distribution:⁠ $P(H,L)$ ⁠
L H	Red	Yellow	Green	Marginal probability P(H)
Not Hit	0.198	0.09	0.14	0.428
Hit	0.002	0.01	0.56	0.572
Total	0.2	0.1	0.7	1

The marginal probability P(H = Hit) is the sum 0.572 along the H = Hit row of this joint distribution table, as this is the probability of being hit when the lights are red OR yellow OR green. Similarly, the marginal probability that P(H = Not Hit) is the sum along the H = Not Hit row.

Multivariate distributions

Many samples from a bivariate normal distribution. The marginal distributions are shown in red and blue. The marginal distribution of X is also approximated by creating a histogram of the X coordinates without consideration of the Y coordinates.

Formultivariate distributions,formulae similar to those above apply with the symbolsXand/orYbeing interpreted as vectors. In particular, each summation or integration would be over all variables except those contained inX.^[5]

That means, IfX₁,X₂,…,X_narediscreterandom variables,then the marginalprobability mass functionshould be $p_{X_{i}}(k)=\sum p(x_{1},x_{2},\dots,x_{i-1},k,x_{i+1},\dots,x_{n});$ ifX₁,X₂,…,X_narecontinuous random variables,then the marginalprobability density functionshould be $f_{X_{i}}(x_{i})=\int _{-\infty }^{\infty }\int _{-\infty }^{\infty }\int _{-\infty }^{\infty }\cdots \int _{-\infty }^{\infty }f(x_{1},x_{2},\dots,x_{n})dx_{1}dx_{2}\cdots dx_{i-1}dx_{i+1}\cdots dx_{n}.$

References

^Trumpler, Robert J. & Harold F. Weaver (1962).Statistical Astronomy.Dover Publications. pp. 32–33.
^"Marginal & Conditional Probability Distributions: Definition & Examples".Study.com.Retrieved2019-11-16.
^"Exam P [FSU Math]".www.math.fsu.edu.Retrieved2019-11-16.
^Marginal and conditional distributions,retrieved2019-11-16
^A modern introduction to probability and statistics: understanding why and how.Dekking, Michel, 1946-. London: Springer. 2005.ISBN 9781852338961.OCLC 262680588.{{cite book}}:CS1 maint: others (link)

Bibliography

Everitt, B. S.; Skrondal, A. (2010).Cambridge Dictionary of Statistics.Cambridge University Press.
Dekking, F. M.; Kraaikamp, C.; Lopuhaä, H. P.; Meester, L. E. (2005).A modern introduction to probability and statistics.London: Springer.ISBN 9781852338961.

[1] Trumpler, Robert J. & Harold F. Weaver (1962).Statistical Astronomy.Dover Publications. pp. 32–33.

[2] "Marginal & Conditional Probability Distributions: Definition & Examples".Study.com.Retrieved2019-11-16.

[3] "Exam P [FSU Math]".www.math.fsu.edu.Retrieved2019-11-16.

[4] Marginal and conditional distributions,retrieved2019-11-16

[:1-5] A modern introduction to probability and statistics: understanding why and how.Dekking, Michel, 1946-. London: Springer. 2005.ISBN 9781852338961.OCLC 262680588.{{cite book}}:CS1 maint: others (link)

[1]

[2]

[3]

[4]

[5]

Definition

Marginal probability mass function

Marginal probability density function

Marginal cumulative distribution function

Marginal distribution vs. conditional distribution

Definition

Example

Real-world example

Multivariate distributions

See also

References

Bibliography