Jump to content

Jensen's inequality

From Wikipedia, the free encyclopedia

Jensen's inequalitygeneralizes the statement that a secant line of a convex function lies above its graph.
Visualizing convexity and Jensen's inequality

Inmathematics,Jensen's inequality,named after the Danish mathematicianJohan Jensen,relates the value of aconvex functionof anintegralto the integral of the convex function. It wasprovedby Jensen in 1906,[1]building on an earlier proof of the same inequality for doubly-differentiable functions byOtto Hölderin 1889.[2]Given its generality, theinequalityappears in many forms depending on the context, some of which are presented below. In its simplest form the inequality states that the convex transformation of a mean is less than or equal to the mean applied after convex transformation; it is a simplecorollarythat the opposite is true of concave transformations.[3]

Jensen's inequality generalizes the statement that thesecant lineof a convex function liesabovethegraphof thefunction,which is Jensen's inequality for two points: the secant line consists of weighted means of the convex function (fort∈ [0,1]),

while the graph of the function is the convex function of the weighted means,

Thus, Jensen's inequality is

In the context ofprobability theory,it is generally stated in the following form: ifXis arandom variableandφis a convex function, then

The difference between the two sides of the inequality,,is called theJensen gap.[4]

Statements

[edit]

The classical form of Jensen's inequality involves several numbers and weights. The inequality can be stated quite generally using either the language ofmeasure theoryor (equivalently) probability. In the probabilistic setting, the inequality can be further generalized to itsfull strength.

Finite form

[edit]

For a realconvex function,numbersin its domain, and positive weights,Jensen's inequality can be stated as:

(1)

and the inequality is reversed ifisconcave,which is

(2)

Equality holds if and only iforis linear on a domain containing.

As a particular case, if the weightsare all equal, then (1) and (2) become

(3)
(4)

For instance, the functionlog(x)isconcave,so substitutingin the previous formula (4) establishes the (logarithm of the) familiararithmetic-mean/geometric-mean inequality:

A common application hasxas a function of another variable (or set of variables)t,that is,.All of this carries directly over to the general continuous case: the weightsaiare replaced by a non-negative integrable functionf (x),such as a probability distribution, and the summations are replaced by integrals.

Measure-theoretic form

[edit]

Letbe aprobability space.Letbe a-measurable function andbe convex. Then:[5]

In real analysis, we may require an estimate on

where,andis a non-negative Lebesgue-integrablefunction. In this case, the Lebesgue measure ofneed not be unity. However, by integration by substitution, the interval can be rescaled so that it has measure unity. Then Jensen's inequality can be applied to get[6]

Probabilistic form

[edit]

The same result can be equivalently stated in aprobability theorysetting, by a simple change of notation. Letbe aprobability space,Xanintegrablereal-valuedrandom variableandaconvex function.Then:

[7]

In this probability setting, the measureμis intended as a probability,the integral with respect toμas anexpected value,and the functionas arandom variableX.

Note that the equality holds if and only ifis a linear function on some convex setsuch that(which follows by inspecting the measure-theoretical proof below).

General inequality in a probabilistic setting

[edit]

More generally, letTbe a realtopological vector space,andXaT-valuedintegrablerandom variable. In this general setting,integrablemeans that there exists an elementinT,such that for any elementzin thedual spaceofT:,and.Then, for any measurable convex functionφand any sub-σ-algebraof:

Herestands for theexpectation conditionedto the σ-algebra.This general statement reduces to the previous ones when the topological vector spaceTis thereal axis,andis the trivialσ-algebra{∅, Ω}(whereis theempty set,andΩis thesample space).[8]

A sharpened and generalized form

[edit]

LetXbe a one-dimensional random variable with meanand variance.Letbe a twice differentiable function, and define the function

Then[9]

In particular, whenis convex, then,and the standard form of Jensen's inequality immediately follows for the case whereis additionally assumed to be twice differentiable.

Proofs

[edit]

Intuitive graphical proof

[edit]
A graphical "proof" of Jensen's inequality for the probabilistic case. The dashed curve along theXaxis is the hypothetical distribution ofX,while the dashed curve along theYaxis is the corresponding distribution ofYvalues. Note that the convex mappingY(X)increasingly "stretches"the distribution for increasing values ofX.
This is a proof without words of Jensen's inequality fornvariables. Without loss of generality, the sum of the positive weights is1.It follows that the weighted point lies in the convex hull of the original points, which lies above the function itself by the definition of convexity. The conclusion follows.[10]

Jensen's inequality can be proved in several ways, and three different proofs corresponding to the different statements above will be offered. Before embarking on these mathematical derivations, however, it is worth analyzing an intuitive graphical argument based on the probabilistic case whereXis a real number (see figure). Assuming a hypothetical distribution ofXvalues, one can immediately identify the position ofand its imagein the graph. Noticing that for convex mappingsY=φ(x)of somexvalues the corresponding distribution ofYvalues is increasingly "stretched up" for increasing values ofX,it is easy to see that the distribution ofYis broader in the interval corresponding toX>X0and narrower inX<X0for anyX0;in particular, this is also true for.Consequently, in this picture the expectation ofYwill always shift upwards with respect to the position of.A similar reasoning holds if the distribution ofXcovers a decreasing portion of the convex function, or both a decreasing and an increasing portion of it. This "proves" the inequality, i.e.

with equality whenφ(X)is not strictly convex, e.g. when it is a straight line, or whenXfollows adegenerate distribution(i.e. is a constant).

The proofs below formalize this intuitive notion.

Proof 1 (finite form)

[edit]

Ifλ1andλ2are two arbitrary nonnegative real numbers such thatλ1+λ2= 1then convexity ofφimplies

This can be generalized: ifλ1,...,λnare nonnegative real numbers such thatλ1+... +λn= 1,then

for anyx1,...,xn.

Thefinite formof the Jensen's inequality can be proved byinduction:by convexity hypotheses, the statement is true forn= 2. Suppose the statement is true for somen,so

for anyλ1,...,λnsuch thatλ1+... +λn= 1.

One needs to prove it forn+ 1.At least one of theλiis strictly smaller than,sayλn+1;therefore by convexity inequality:

Sinceλ1+... +λn+λn+1= 1,

,

applying the inductive hypothesis gives

therefore

We deduce the inequality is true forn+ 1,by induction it follows that the result is also true for all integerngreater than 2.

In order to obtain the general inequality from this finite form, one needs to use a density argument. The finite form can be rewritten as:

whereμnis a measure given by an arbitraryconvex combinationofDirac deltas:

Since convex functions arecontinuous,and since convex combinations of Dirac deltas areweaklydensein the set of probability measures (as could be easily verified), the general statement is obtained simply by a limiting procedure.

Proof 2 (measure-theoretic form)

[edit]

Letbe a real-valued-integrable function on a probability space,and letbe a convex function on the real numbers. Sinceis convex, at each real numberwe have a nonempty set ofsubderivatives,which may be thought of as lines touching the graph ofat,but which are below the graph ofat all points (support lines of the graph).

Now, if we define

because of the existence of subderivatives for convex functions, we may chooseandsuch that

for all realand

But then we have that

for almost all.Since we have a probability measure, the integral is monotone withso that

as desired.

Proof 3 (general inequality in a probabilistic setting)

[edit]

LetXbe an integrable random variable that takes values in a real topological vector spaceT.Sinceis convex, for any,the quantity

is decreasing asθapproaches 0+.In particular, thesubdifferentialofevaluated atxin the directionyis well-defined by

It is easily seen that the subdifferential is linear iny[citation needed](that is false and the assertion requires Hahn-Banach theorem to be proved) and, since the infimum taken in the right-hand side of the previous formula is smaller than the value of the same term forθ= 1,one gets

In particular, for an arbitrary sub-σ-algebrawe can evaluate the last inequality whento obtain

Now, if we take the expectation conditioned toon both sides of the previous expression, we get the result since:

by the linearity of the subdifferential in theyvariable, and the following well-known property of theconditional expectation:

Applications and special cases

[edit]

Form involving a probability density function

[edit]

SupposeΩis a measurable subset of the real line andf(x) is a non-negative function such that

In probabilistic language,fis aprobability density function.

Then Jensen's inequality becomes the following statement about convex integrals:

Ifgis any real-valued measurable function andis convex over the range ofg,then

Ifg(x) =x,then this form of the inequality reduces to a commonly used special case:

This is applied inVariational Bayesian methods.

Example: evenmomentsof a random variable

[edit]

Ifg(x) =x2n,andXis a random variable, thengis convex as

and so

In particular, if some even moment2nofXis finite,Xhas a finite mean. An extension of this argument showsXhas finite moments of every orderdividingn.

Alternative finite form

[edit]

LetΩ = {x1,...xn},and takeμto be thecounting measureonΩ,then the general form reduces to a statement about sums:

provided thatλi≥ 0and

There is also an infinite discrete form.

Statistical physics

[edit]

Jensen's inequality is of particular importance in statistical physics when the convex function is an exponential, giving:

where theexpected valuesare with respect to someprobability distributionin therandom variableX.

Proof: Letin

Information theory

[edit]

Ifp(x)is the true probability density forX,andq(x)is another density, then applying Jensen's inequality for the random variableY(X) =q(X)/p(X)and the convex functionφ(y) = −log(y)gives

Therefore:

a result calledGibbs' inequality.

It shows that the average message length is minimised when codes are assigned on the basis of the true probabilitiesprather than any other distributionq.The quantity that is non-negative is called theKullback–Leibler divergenceofqfromp,where.

Since−log(x)is a strictly convex function forx> 0,it follows that equality holds whenp(x)equalsq(x)almost everywhere.

Rao–Blackwell theorem

[edit]

IfLis a convex function anda sub-sigma-algebra, then, from the conditional version of Jensen's inequality, we get

So if δ(X) is someestimatorof an unobserved parameter θ given a vector of observablesX;and ifT(X) is asufficient statisticfor θ; then an improved estimator, in the sense of having a smaller expected lossL,can be obtained by calculating

the expected value of δ with respect to θ, taken over all possible vectors of observationsXcompatible with the same value ofT(X) as that observed. Further, because T is a sufficient statistic,does not depend on θ, hence, becomes a statistic.

This result is known as theRao–Blackwell theorem.

Risk aversion

[edit]

The relation betweenrisk aversionanddeclining marginal utilityfor scalar outcomes can be stated formally with Jensen's inequality: risk aversion can be stated as preferring a certain outcometo a fair gamble with potentially larger but uncertain outcome of:

.

But this is simply Jensen's inequality for aconcave:autility functionthat exhibits declining marginal utility.[11]

See also

[edit]

Notes

[edit]
  1. ^Jensen, J. L. W. V.(1906)."Sur les fonctions convexes et les inégalités entre les valeurs moyennes".Acta Mathematica.30(1): 175–193.doi:10.1007/BF02418571.
  2. ^Guessab, A.; Schmeisser, G. (2013). "Necessary and sufficient conditions for the validity of Jensen's inequality".Archiv der Mathematik.100(6): 561–570.doi:10.1007/s00013-013-0522-3.MR3069109.S2CID56372266.
  3. ^Dekking, F.M.; Kraaikamp, C.; Lopuhaa, H.P.; Meester, L.E. (2005).A Modern Introduction to Probability and Statistics: Understanding Why and How.Springer Texts in Statistics. London: Springer.doi:10.1007/1-84628-168-7.ISBN978-1-85233-896-1.
  4. ^Gao, Xiang; Sitharam, Meera; Roitberg, Adrian (2019)."Bounds on the Jensen Gap, and Implications for Mean-Concentrated Distributions"(PDF).The Australian Journal of Mathematical Analysis and Applications.16(2).arXiv:1712.05267.
  5. ^p. 25 ofRick Durrett(2019).Probability: Theory and Examples(5th ed.). Cambridge University Press.ISBN978-1108473682.
  6. ^Niculescu, Constantin P."Integral inequalities",P. 12.
  7. ^p. 29 ofRick Durrett(2019).Probability: Theory and Examples(5th ed.). Cambridge University Press.ISBN978-1108473682.
  8. ^Attention: In this generality additional assumptions on the convex function and/ or the topological vector space are needed, see Example (1.3) on p. 53 inPerlman, Michael D. (1974)."Jensen's Inequality for a Convex Vector-Valued Function on an Infinite-Dimensional Space".Journal of Multivariate Analysis.4(1): 52–65.doi:10.1016/0047-259X(74)90005-0.hdl:11299/199167.
  9. ^Liao, J.; Berg, A (2018). "Sharpening Jensen's Inequality".American Statistician.73(3): 278–281.arXiv:1707.08644.doi:10.1080/00031305.2017.1419145.S2CID88515366.
  10. ^Bradley, CJ (2006).Introduction to Inequalities.Leeds, United Kingdom: United Kingdom Mathematics Trust. p. 97.ISBN978-1-906001-11-7.
  11. ^Back, Kerry (2010).Asset Pricing and Portfolio Choice Theory.Oxford University Press. p. 5.ISBN978-0-19-538061-3.

References

[edit]
[edit]