Jensen's inequality
This articleneeds additional citations forverification.(October 2011) |
Inmathematics,Jensen's inequality,named after the Danish mathematicianJohan Jensen,relates the value of aconvex functionof anintegralto the integral of the convex function. It wasprovedby Jensen in 1906,[1]building on an earlier proof of the same inequality for doubly-differentiable functions byOtto Hölderin 1889.[2]Given its generality, theinequalityappears in many forms depending on the context, some of which are presented below. In its simplest form the inequality states that the convex transformation of a mean is less than or equal to the mean applied after convex transformation; it is a simplecorollarythat the opposite is true of concave transformations.[3]
Jensen's inequality generalizes the statement that thesecant lineof a convex function liesabovethegraphof thefunction,which is Jensen's inequality for two points: the secant line consists of weighted means of the convex function (fort∈ [0,1]),
while the graph of the function is the convex function of the weighted means,
Thus, Jensen's inequality is
In the context ofprobability theory,it is generally stated in the following form: ifXis arandom variableandφis a convex function, then
The difference between the two sides of the inequality,,is called theJensen gap.[4]
Statements
[edit]The classical form of Jensen's inequality involves several numbers and weights. The inequality can be stated quite generally using either the language ofmeasure theoryor (equivalently) probability. In the probabilistic setting, the inequality can be further generalized to itsfull strength.
Finite form
[edit]For a realconvex function,numbersin its domain, and positive weights,Jensen's inequality can be stated as:
(1) |
and the inequality is reversed ifisconcave,which is
(2) |
Equality holds if and only iforis linear on a domain containing.
As a particular case, if the weightsare all equal, then (1) and (2) become
(3) |
(4) |
For instance, the functionlog(x)isconcave,so substitutingin the previous formula (4) establishes the (logarithm of the) familiararithmetic-mean/geometric-mean inequality:
A common application hasxas a function of another variable (or set of variables)t,that is,.All of this carries directly over to the general continuous case: the weightsaiare replaced by a non-negative integrable function f (x),such as a probability distribution, and the summations are replaced by integrals.
Measure-theoretic form
[edit]Letbe aprobability space.Letbe a-measurable function andbe convex. Then:[5]
In real analysis, we may require an estimate on
where,andis a non-negative Lebesgue-integrablefunction. In this case, the Lebesgue measure ofneed not be unity. However, by integration by substitution, the interval can be rescaled so that it has measure unity. Then Jensen's inequality can be applied to get[6]
Probabilistic form
[edit]The same result can be equivalently stated in aprobability theorysetting, by a simple change of notation. Letbe aprobability space,Xanintegrablereal-valuedrandom variableandaconvex function.Then:
In this probability setting, the measureμis intended as a probability,the integral with respect toμas anexpected value,and the functionas arandom variableX.
Note that the equality holds if and only ifis a linear function on some convex setsuch that(which follows by inspecting the measure-theoretical proof below).
General inequality in a probabilistic setting
[edit]More generally, letTbe a realtopological vector space,andXaT-valuedintegrablerandom variable. In this general setting,integrablemeans that there exists an elementinT,such that for any elementzin thedual spaceofT:,and.Then, for any measurable convex functionφand any sub-σ-algebraof:
Herestands for theexpectation conditionedto the σ-algebra.This general statement reduces to the previous ones when the topological vector spaceTis thereal axis,andis the trivialσ-algebra{∅, Ω}(where∅is theempty set,andΩis thesample space).[8]
A sharpened and generalized form
[edit]LetXbe a one-dimensional random variable with meanand variance.Letbe a twice differentiable function, and define the function
Then[9]
In particular, whenis convex, then,and the standard form of Jensen's inequality immediately follows for the case whereis additionally assumed to be twice differentiable.
Proofs
[edit]Intuitive graphical proof
[edit]Jensen's inequality can be proved in several ways, and three different proofs corresponding to the different statements above will be offered. Before embarking on these mathematical derivations, however, it is worth analyzing an intuitive graphical argument based on the probabilistic case whereXis a real number (see figure). Assuming a hypothetical distribution ofXvalues, one can immediately identify the position ofand its imagein the graph. Noticing that for convex mappingsY=φ(x)of somexvalues the corresponding distribution ofYvalues is increasingly "stretched up" for increasing values ofX,it is easy to see that the distribution ofYis broader in the interval corresponding toX>X0and narrower inX<X0for anyX0;in particular, this is also true for.Consequently, in this picture the expectation ofYwill always shift upwards with respect to the position of.A similar reasoning holds if the distribution ofXcovers a decreasing portion of the convex function, or both a decreasing and an increasing portion of it. This "proves" the inequality, i.e.
with equality whenφ(X)is not strictly convex, e.g. when it is a straight line, or whenXfollows adegenerate distribution(i.e. is a constant).
The proofs below formalize this intuitive notion.
Proof 1 (finite form)
[edit]Ifλ1andλ2are two arbitrary nonnegative real numbers such thatλ1+λ2= 1then convexity ofφimplies
This can be generalized: ifλ1,...,λnare nonnegative real numbers such thatλ1+... +λn= 1,then
for anyx1,...,xn.
Thefinite formof the Jensen's inequality can be proved byinduction:by convexity hypotheses, the statement is true forn= 2. Suppose the statement is true for somen,so
for anyλ1,...,λnsuch thatλ1+... +λn= 1.
One needs to prove it forn+ 1.At least one of theλiis strictly smaller than,sayλn+1;therefore by convexity inequality:
Sinceλ1+... +λn+λn+1= 1,
- ,
applying the inductive hypothesis gives
therefore
We deduce the inequality is true forn+ 1,by induction it follows that the result is also true for all integerngreater than 2.
In order to obtain the general inequality from this finite form, one needs to use a density argument. The finite form can be rewritten as:
whereμnis a measure given by an arbitraryconvex combinationofDirac deltas:
Since convex functions arecontinuous,and since convex combinations of Dirac deltas areweaklydensein the set of probability measures (as could be easily verified), the general statement is obtained simply by a limiting procedure.
Proof 2 (measure-theoretic form)
[edit]Letbe a real-valued-integrable function on a probability space,and letbe a convex function on the real numbers. Sinceis convex, at each real numberwe have a nonempty set ofsubderivatives,which may be thought of as lines touching the graph ofat,but which are below the graph ofat all points (support lines of the graph).
Now, if we define
because of the existence of subderivatives for convex functions, we may chooseandsuch that
for all realand
But then we have that
for almost all.Since we have a probability measure, the integral is monotone withso that
as desired.
Proof 3 (general inequality in a probabilistic setting)
[edit]LetXbe an integrable random variable that takes values in a real topological vector spaceT.Sinceis convex, for any,the quantity
is decreasing asθapproaches 0+.In particular, thesubdifferentialofevaluated atxin the directionyis well-defined by
It is easily seen that the subdifferential is linear iny[citation needed](that is false and the assertion requires Hahn-Banach theorem to be proved) and, since the infimum taken in the right-hand side of the previous formula is smaller than the value of the same term forθ= 1,one gets
In particular, for an arbitrary sub-σ-algebrawe can evaluate the last inequality whento obtain
Now, if we take the expectation conditioned toon both sides of the previous expression, we get the result since:
by the linearity of the subdifferential in theyvariable, and the following well-known property of theconditional expectation:
Applications and special cases
[edit]Form involving a probability density function
[edit]SupposeΩis a measurable subset of the real line andf(x) is a non-negative function such that
In probabilistic language,fis aprobability density function.
Then Jensen's inequality becomes the following statement about convex integrals:
Ifgis any real-valued measurable function andis convex over the range ofg,then
Ifg(x) =x,then this form of the inequality reduces to a commonly used special case:
This is applied inVariational Bayesian methods.
Ifg(x) =x2n,andXis a random variable, thengis convex as
and so
In particular, if some even moment2nofXis finite,Xhas a finite mean. An extension of this argument showsXhas finite moments of every orderdividingn.
Alternative finite form
[edit]LetΩ = {x1,...xn},and takeμto be thecounting measureonΩ,then the general form reduces to a statement about sums:
provided thatλi≥ 0and
There is also an infinite discrete form.
Statistical physics
[edit]Jensen's inequality is of particular importance in statistical physics when the convex function is an exponential, giving:
where theexpected valuesare with respect to someprobability distributionin therandom variableX.
Proof: Letin
Information theory
[edit]Ifp(x)is the true probability density forX,andq(x)is another density, then applying Jensen's inequality for the random variableY(X) =q(X)/p(X)and the convex functionφ(y) = −log(y)gives
Therefore:
a result calledGibbs' inequality.
It shows that the average message length is minimised when codes are assigned on the basis of the true probabilitiesprather than any other distributionq.The quantity that is non-negative is called theKullback–Leibler divergenceofqfromp,where.
Since−log(x)is a strictly convex function forx> 0,it follows that equality holds whenp(x)equalsq(x)almost everywhere.
Rao–Blackwell theorem
[edit]IfLis a convex function anda sub-sigma-algebra, then, from the conditional version of Jensen's inequality, we get
So if δ(X) is someestimatorof an unobserved parameter θ given a vector of observablesX;and ifT(X) is asufficient statisticfor θ; then an improved estimator, in the sense of having a smaller expected lossL,can be obtained by calculating
the expected value of δ with respect to θ, taken over all possible vectors of observationsXcompatible with the same value ofT(X) as that observed. Further, because T is a sufficient statistic,does not depend on θ, hence, becomes a statistic.
This result is known as theRao–Blackwell theorem.
Risk aversion
[edit]The relation betweenrisk aversionanddeclining marginal utilityfor scalar outcomes can be stated formally with Jensen's inequality: risk aversion can be stated as preferring a certain outcometo a fair gamble with potentially larger but uncertain outcome of:
.
But this is simply Jensen's inequality for aconcave:autility functionthat exhibits declining marginal utility.[11]
See also
[edit]- Karamata's inequalityfor a more general inequality
- Popoviciu's inequality
- Law of averages
- A proof without words of Jensen's inequality
Notes
[edit]- ^Jensen, J. L. W. V.(1906)."Sur les fonctions convexes et les inégalités entre les valeurs moyennes".Acta Mathematica.30(1): 175–193.doi:10.1007/BF02418571.
- ^Guessab, A.; Schmeisser, G. (2013). "Necessary and sufficient conditions for the validity of Jensen's inequality".Archiv der Mathematik.100(6): 561–570.doi:10.1007/s00013-013-0522-3.MR3069109.S2CID56372266.
- ^Dekking, F.M.; Kraaikamp, C.; Lopuhaa, H.P.; Meester, L.E. (2005).A Modern Introduction to Probability and Statistics: Understanding Why and How.Springer Texts in Statistics. London: Springer.doi:10.1007/1-84628-168-7.ISBN978-1-85233-896-1.
- ^Gao, Xiang; Sitharam, Meera; Roitberg, Adrian (2019)."Bounds on the Jensen Gap, and Implications for Mean-Concentrated Distributions"(PDF).The Australian Journal of Mathematical Analysis and Applications.16(2).arXiv:1712.05267.
- ^p. 25 ofRick Durrett(2019).Probability: Theory and Examples(5th ed.). Cambridge University Press.ISBN978-1108473682.
- ^Niculescu, Constantin P."Integral inequalities",P. 12.
- ^p. 29 ofRick Durrett(2019).Probability: Theory and Examples(5th ed.). Cambridge University Press.ISBN978-1108473682.
- ^Attention: In this generality additional assumptions on the convex function and/ or the topological vector space are needed, see Example (1.3) on p. 53 inPerlman, Michael D. (1974)."Jensen's Inequality for a Convex Vector-Valued Function on an Infinite-Dimensional Space".Journal of Multivariate Analysis.4(1): 52–65.doi:10.1016/0047-259X(74)90005-0.hdl:11299/199167.
- ^Liao, J.; Berg, A (2018). "Sharpening Jensen's Inequality".American Statistician.73(3): 278–281.arXiv:1707.08644.doi:10.1080/00031305.2017.1419145.S2CID88515366.
- ^Bradley, CJ (2006).Introduction to Inequalities.Leeds, United Kingdom: United Kingdom Mathematics Trust. p. 97.ISBN978-1-906001-11-7.
- ^Back, Kerry (2010).Asset Pricing and Portfolio Choice Theory.Oxford University Press. p. 5.ISBN978-0-19-538061-3.
References
[edit]- David Chandler(1987).Introduction to Modern Statistical Mechanics.Oxford.ISBN0-19-504277-8.
- Tristan Needham(1993) "A Visual Explanation of Jensen's Inequality",American Mathematical Monthly100(8):768–71.
- Nicola Fusco;Paolo Marcellini;Carlo Sbordone (1996).Analisi Matematica Due.Liguori.ISBN978-88-207-2675-1.
- Walter Rudin(1987).Real and Complex Analysis.McGraw-Hill.ISBN0-07-054234-1.
- Rick Durrett(2019).Probability: Theory and Examples(5th ed.). Cambridge University Press. p. 430.ISBN978-1108473682.Retrieved21 Dec2020.
- Sam Savage (2012)The Flaw of Averages: Why We Underestimate Risk in the Face of Uncertainty(1st ed.) Wiley. ISBN 978-0471381976
External links
[edit]- Jensen's Operator Inequalityof Hansen and Pedersen.
- "Jensen inequality",Encyclopedia of Mathematics,EMS Press,2001 [1994]
- Weisstein, Eric W."Jensen's inequality".MathWorld.
- Arthur Lohwater (1982)."Introduction to Inequalities".Online e-book in PDF format.