Jensen's inequality

Visualizing convexity and Jensen's inequality

Inmathematics,Jensen's inequality,named after the Danish mathematicianJohan Jensen,relates the value of aconvex functionof anintegralto the integral of the convex function. It wasprovedby Jensen in 1906,^[1]building on an earlier proof of the same inequality for doubly-differentiable functions byOtto Hölderin 1889.^[2]Given its generality, theinequalityappears in many forms depending on the context, some of which are presented below. In its simplest form the inequality states that the convex transformation of a mean is less than or equal to the mean applied after convex transformation; it is a simplecorollarythat the opposite is true of concave transformations.^[3]

Jensen's inequality generalizes the statement that thesecant lineof a convex function liesabovethegraphof thefunction,which is Jensen's inequality for two points: the secant line consists of weighted means of the convex function (fort∈ [0,1]),

tf(x_{1})+(1-t)f(x_{2}),

while the graph of the function is the convex function of the weighted means,

f(tx_{1}+(1-t)x_{2}).

Thus, Jensen's inequality is

f(tx_{1}+(1-t)x_{2})\leq tf(x_{1})+(1-t)f(x_{2}).

In the context ofprobability theory,it is generally stated in the following form: ifXis arandom variableand $φ$ is a convex function, then

\varphi (\operatorname {E} [X])\leq \operatorname {E} \left[\varphi (X)\right].

The difference between the two sides of the inequality, $\operatorname {E} \left[\varphi (X)\right]-\varphi \left(\operatorname {E} [X]\right)$ ,is called theJensen gap.^[4]

Statements

The classical form of Jensen's inequality involves several numbers and weights. The inequality can be stated quite generally using either the language ofmeasure theoryor (equivalently) probability. In the probabilistic setting, the inequality can be further generalized to itsfull strength.

Finite form

For a realconvex function $\varphi$ ,numbers $x_{1},x_{2},\ldots,x_{n}$ in its domain, and positive weights $a_{i}$ ,Jensen's inequality can be stated as:

\varphi \left({\frac {\sum a_{i}x_{i}}{\sum a_{i}}}\right)\leq {\frac {\sum a_{i}\varphi (x_{i})}{\sum a_{i}}}

(1)

and the inequality is reversed if $\varphi$ isconcave,which is

\varphi \left({\frac {\sum a_{i}x_{i}}{\sum a_{i}}}\right)\geq {\frac {\sum a_{i}\varphi (x_{i})}{\sum a_{i}}}.

(2)

Equality holds if and only if $x_{1}=x_{2}=\cdots =x_{n}$ or $\varphi$ is linear on a domain containing $x_{1},x_{2},\cdots,x_{n}$ .

As a particular case, if the weights $a_{i}$ are all equal, then (1) and (2) become

\varphi \left({\frac {\sum x_{i}}{n}}\right)\leq {\frac {\sum \varphi (x_{i})}{n}}

(3)

\varphi \left({\frac {\sum x_{i}}{n}}\right)\geq {\frac {\sum \varphi (x_{i})}{n}}

(4)

For instance, the function $log(x)$ isconcave,so substituting $\varphi (x)=\log(x)$ in the previous formula (4) establishes the (logarithm of the) familiararithmetic-mean/geometric-mean inequality:

$\log \!\left({\frac {\sum _{i=1}^{n}x_{i}}{n}}\right)\geq {\frac {\sum _{i=1}^{n}\log \!\left(x_{i}\right)}{n}}$ $\exp \!\left(\log \!\left({\frac {\sum _{i=1}^{n}x_{i}}{n}}\right)\right)\geq \exp \!\left({\frac {\sum _{i=1}^{n}\log \!\left(x_{i}\right)}{n}}\right)$ ${\frac {x_{1}+x_{2}+\cdots +x_{n}}{n}}\geq {\sqrt[{n}]{x_{1}\cdot x_{2}\cdots x_{n}}}$

A common application has $x$ as a function of another variable (or set of variables) $t$ ,that is, $x_{i}=g(t_{i})$ .All of this carries directly over to the general continuous case: the weights $a i$ are replaced by a non-negative integrable function $f (x)$ ,such as a probability distribution, and the summations are replaced by integrals.

Measure-theoretic form

Let $(\Omega,A,\mu )$ be aprobability space.Let $f:\Omega \to \mathbb {R}$ be a $\mu$ -measurable function and $\varphi:\mathbb {R} \to \mathbb {R}$ be convex. Then:^[5] $\varphi \left(\int _{\Omega }f\,\mathrm {d} \mu \right)\leq \int _{\Omega }\varphi \circ f\,\mathrm {d} \mu$

In real analysis, we may require an estimate on

\varphi \left(\int _{a}^{b}f(x)\,dx\right)

where $a,b\in \mathbb {R}$ ,and $f\colon [a,b]\to \mathbb {R}$ is a non-negative Lebesgue-integrablefunction. In this case, the Lebesgue measure of $[a,b]$ need not be unity. However, by integration by substitution, the interval can be rescaled so that it has measure unity. Then Jensen's inequality can be applied to get^[6]

\varphi \left({\frac {1}{b-a}}\int _{a}^{b}f(x)\,dx\right)\leq {\frac {1}{b-a}}\int _{a}^{b}\varphi (f(x))\,dx.

Probabilistic form

The same result can be equivalently stated in aprobability theorysetting, by a simple change of notation. Let $(\Omega,{\mathfrak {F}},\operatorname {P} )$ be aprobability space,Xanintegrablereal-valuedrandom variableand $\varphi$ aconvex function.Then:

\varphi \left(\operatorname {E} [X]\right)\leq \operatorname {E} \left[\varphi (X)\right].

^[7]

In this probability setting, the measure $μ$ is intended as a probability $\operatorname {P}$ ,the integral with respect to $μ$ as anexpected value $\operatorname {E}$ ,and the function $f$ as arandom variableX.

Note that the equality holds if and only if $\varphi$ is a linear function on some convex set $A$ such that $\mathrm {P} (X\in A)=1$ (which follows by inspecting the measure-theoretical proof below).

General inequality in a probabilistic setting

More generally, letTbe a realtopological vector space,andXaT-valuedintegrablerandom variable. In this general setting,integrablemeans that there exists an element $\operatorname {E} [X]$ inT,such that for any elementzin thedual spaceofT: $\operatorname {E} |\langle z,X\rangle |<\infty$ ,and $\langle z,\operatorname {E} [X]\rangle =\operatorname {E} [\langle z,X\rangle ]$ .Then, for any measurable convex function $φ$ and any sub-σ-algebra ${\mathfrak {G}}$ of ${\mathfrak {F}}$ :

\varphi \left(\operatorname {E} \left[X\mid {\mathfrak {G}}\right]\right)\leq \operatorname {E} \left[\varphi (X)\mid {\mathfrak {G}}\right].

Here $\operatorname {E} [\cdot \mid {\mathfrak {G}}]$ stands for theexpectation conditionedto the σ-algebra ${\mathfrak {G}}$ .This general statement reduces to the previous ones when the topological vector space $T$ is thereal axis,and ${\mathfrak {G}}$ is the trivial $σ$ -algebra ${\emptyset, Ω}$ (where $\emptyset$ is theempty set,and $Ω$ is thesample space).^[8]

A sharpened and generalized form

LetXbe a one-dimensional random variable with mean $\mu$ and variance $\sigma ^{2}\geq 0$ .Let $\varphi (x)$ be a twice differentiable function, and define the function

h(x)\triangleq {\frac {\varphi \left(x\right)-\varphi \left(\mu \right)}{\left(x-\mu \right)^{2}}}-{\frac {\varphi '\left(\mu \right)}{x-\mu }}.

Then^[9]

\sigma ^{2}\inf {\frac {\varphi ''(x)}{2}}\leq \sigma ^{2}\inf h(x)\leq E\left[\varphi \left(X\right)\right]-\varphi \left(E[X]\right)\leq \sigma ^{2}\sup h(x)\leq \sigma ^{2}\sup {\frac {\varphi ''(x)}{2}}.

In particular, when $\varphi (x)$ is convex, then $\varphi ''(x)\geq 0$ ,and the standard form of Jensen's inequality immediately follows for the case where $\varphi (x)$ is additionally assumed to be twice differentiable.

Proofs

Intuitive graphical proof

Jensen's inequality can be proved in several ways, and three different proofs corresponding to the different statements above will be offered. Before embarking on these mathematical derivations, however, it is worth analyzing an intuitive graphical argument based on the probabilistic case where $X$ is a real number (see figure). Assuming a hypothetical distribution of $X$ values, one can immediately identify the position of $\operatorname {E} [X]$ and its image $\varphi (\operatorname {E} [X])$ in the graph. Noticing that for convex mappings $Y = φ (x)$ of some $x$ values the corresponding distribution of $Y$ values is increasingly "stretched up" for increasing values of $X$ ,it is easy to see that the distribution of $Y$ is broader in the interval corresponding to $X > X 0$ and narrower in $X < X 0$ for any $X 0$ ;in particular, this is also true for $X_{0}=\operatorname {E} [X]$ .Consequently, in this picture the expectation of $Y$ will always shift upwards with respect to the position of $\varphi (\operatorname {E} [X])$ .A similar reasoning holds if the distribution of $X$ covers a decreasing portion of the convex function, or both a decreasing and an increasing portion of it. This "proves" the inequality, i.e.

\varphi (\operatorname {E} [X])\leq \operatorname {E} [\varphi (X)]=\operatorname {E} [Y],

with equality when $φ (X)$ is not strictly convex, e.g. when it is a straight line, or when $X$ follows adegenerate distribution(i.e. is a constant).

The proofs below formalize this intuitive notion.

Proof 1 (finite form)

If $λ 1$ and $λ 2$ are two arbitrary nonnegative real numbers such that $λ 1 + λ 2 = 1$ then convexity of $φ$ implies

\forall x_{1},x_{2}:\qquad \varphi \left(\lambda _{1}x_{1}+\lambda _{2}x_{2}\right)\leq \lambda _{1}\,\varphi (x_{1})+\lambda _{2}\,\varphi (x_{2}).

This can be generalized: if $λ 1,..., λ n$ are nonnegative real numbers such that $λ 1 +... + λ n = 1$ ,then

\varphi (\lambda _{1}x_{1}+\lambda _{2}x_{2}+\cdots +\lambda _{n}x_{n})\leq \lambda _{1}\,\varphi (x_{1})+\lambda _{2}\,\varphi (x_{2})+\cdots +\lambda _{n}\,\varphi (x_{n}),

for any $x 1,..., x n$ .

Thefinite formof the Jensen's inequality can be proved byinduction:by convexity hypotheses, the statement is true forn= 2. Suppose the statement is true for somen,so

\varphi \left(\sum _{i=1}^{n}\lambda _{i}x_{i}\right)\leq \sum _{i=1}^{n}\lambda _{i}\varphi \left(x_{i}\right)

for any $λ 1,..., λ n$ such that $λ 1 +... + λ n = 1$ .

One needs to prove it for $n + 1$ .At least one of the $λ i$ is strictly smaller than $1$ ,say $λ n +1$ ;therefore by convexity inequality:

{\begin{aligned}\varphi \left(\sum _{i=1}^{n+1}\lambda _{i}x_{i}\right)&=\varphi \left((1-\lambda _{n+1})\sum _{i=1}^{n}{\frac {\lambda _{i}}{1-\lambda _{n+1}}}x_{i}+\lambda _{n+1}x_{n+1}\right)\\&\leq (1-\lambda _{n+1})\varphi \left(\sum _{i=1}^{n}{\frac {\lambda _{i}}{1-\lambda _{n+1}}}x_{i}\right)+\lambda _{n+1}\,\varphi (x_{n+1}).\end{aligned}}

Since $λ 1 +... + λ n + λ n +1 = 1$ ,

\sum _{i=1}^{n}{\frac {\lambda _{i}}{1-\lambda _{n+1}}}=1

,

applying the inductive hypothesis gives

\varphi \left(\sum _{i=1}^{n}{\frac {\lambda _{i}}{1-\lambda _{n+1}}}x_{i}\right)\leq \sum _{i=1}^{n}{\frac {\lambda _{i}}{1-\lambda _{n+1}}}\varphi (x_{i})

therefore

{\begin{aligned}\varphi \left(\sum _{i=1}^{n+1}\lambda _{i}x_{i}\right)&\leq (1-\lambda _{n+1})\sum _{i=1}^{n}{\frac {\lambda _{i}}{1-\lambda _{n+1}}}\varphi (x_{i})+\lambda _{n+1}\,\varphi (x_{n+1})=\sum _{i=1}^{n+1}\lambda _{i}\varphi (x_{i})\end{aligned}}

We deduce the inequality is true for $n + 1$ ,by induction it follows that the result is also true for all integer $n$ greater than 2.

In order to obtain the general inequality from this finite form, one needs to use a density argument. The finite form can be rewritten as:

\varphi \left(\int x\,d\mu _{n}(x)\right)\leq \int \varphi (x)\,d\mu _{n}(x),

whereμ_nis a measure given by an arbitraryconvex combinationofDirac deltas:

\mu _{n}=\sum _{i=1}^{n}\lambda _{i}\delta _{x_{i}}.

Since convex functions arecontinuous,and since convex combinations of Dirac deltas areweakly densein the set of probability measures (as could be easily verified), the general statement is obtained simply by a limiting procedure.

Proof 2 (measure-theoretic form)

Let $g$ be a real-valued $\mu$ -integrable function on a probability space $\Omega$ ,and let $\varphi$ be a convex function on the real numbers. Since $\varphi$ is convex, at each real number $x$ we have a nonempty set ofsubderivatives,which may be thought of as lines touching the graph of $\varphi$ at $x$ ,but which are below the graph of $\varphi$ at all points (support lines of the graph).

Now, if we define

x_{0}:=\int _{\Omega }g\,d\mu,

because of the existence of subderivatives for convex functions, we may choose $a$ and $b$ such that

ax+b\leq \varphi (x),

for all real $x$ and

ax_{0}+b=\varphi (x_{0}).

But then we have that

\varphi \circ g(\omega )\geq ag(\omega )+b

for almost all $\omega \in \Omega$ .Since we have a probability measure, the integral is monotone with $\mu (\Omega )=1$ so that

\int _{\Omega }\varphi \circ g\,d\mu \geq \int _{\Omega }(ag+b)\,d\mu =a\int _{\Omega }g\,d\mu +b\int _{\Omega }d\mu =ax_{0}+b=\varphi (x_{0})=\varphi \left(\int _{\Omega }g\,d\mu \right),

as desired.

Proof 3 (general inequality in a probabilistic setting)

LetXbe an integrable random variable that takes values in a real topological vector spaceT.Since $\varphi:T\to \mathbb {R}$ is convex, for any $x,y\in T$ ,the quantity

{\frac {\varphi (x+\theta \,y)-\varphi (x)}{\theta }},

is decreasing as $θ$ approaches 0⁺.In particular, thesubdifferentialof $\varphi$ evaluated at $x$ in the direction $y$ is well-defined by

(D\varphi )(x)\cdot y:=\lim _{\theta \downarrow 0}{\frac {\varphi (x+\theta \,y)-\varphi (x)}{\theta }}=\inf _{\theta \neq 0}{\frac {\varphi (x+\theta \,y)-\varphi (x)}{\theta }}.

It is easily seen that the subdifferential is linear in $y$ ^{[citation needed]}(that is false and the assertion requires Hahn-Banach theorem to be proved) and, since the infimum taken in the right-hand side of the previous formula is smaller than the value of the same term for $θ = 1$ ,one gets

\varphi (x)\leq \varphi (x+y)-(D\varphi )(x)\cdot y.

In particular, for an arbitrary sub- $σ$ -algebra ${\mathfrak {G}}$ we can evaluate the last inequality when $x=\operatorname {E} [X\mid {\mathfrak {G}}],\,y=X-\operatorname {E} [X\mid {\mathfrak {G}}]$ to obtain

\varphi (\operatorname {E} [X\mid {\mathfrak {G}}])\leq \varphi (X)-(D\varphi )(\operatorname {E} [X\mid {\mathfrak {G}}])\cdot (X-\operatorname {E} [X\mid {\mathfrak {G}}]).

Now, if we take the expectation conditioned to ${\mathfrak {G}}$ on both sides of the previous expression, we get the result since:

\operatorname {E} \left[\left[(D\varphi )(\operatorname {E} [X\mid {\mathfrak {G}}])\cdot (X-\operatorname {E} [X\mid {\mathfrak {G}}])\right]\mid {\mathfrak {G}}\right]=(D\varphi )(\operatorname {E} [X\mid {\mathfrak {G}}])\cdot \operatorname {E} [\left(X-\operatorname {E} [X\mid {\mathfrak {G}}]\right)\mid {\mathfrak {G}}]=0,

by the linearity of the subdifferential in theyvariable, and the following well-known property of theconditional expectation:

\operatorname {E} \left[\left(\operatorname {E} [X\mid {\mathfrak {G}}]\right)\mid {\mathfrak {G}}\right]=\operatorname {E} [X\mid {\mathfrak {G}}].

Applications and special cases

Form involving a probability density function

Suppose $Ω$ is a measurable subset of the real line andf(x) is a non-negative function such that

\int _{-\infty }^{\infty }f(x)\,dx=1.

In probabilistic language,fis aprobability density function.

Then Jensen's inequality becomes the following statement about convex integrals:

Ifgis any real-valued measurable function and ${\textstyle \varphi }$ is convex over the range ofg,then

\varphi \left(\int _{-\infty }^{\infty }g(x)f(x)\,dx\right)\leq \int _{-\infty }^{\infty }\varphi (g(x))f(x)\,dx.

Ifg(x) =x,then this form of the inequality reduces to a commonly used special case:

\varphi \left(\int _{-\infty }^{\infty }x\,f(x)\,dx\right)\leq \int _{-\infty }^{\infty }\varphi (x)\,f(x)\,dx.

This is applied inVariational Bayesian methods.

Example: evenmomentsof a random variable

Ifg(x) =x²ⁿ,andXis a random variable, thengis convex as

{\frac {d^{2}g}{dx^{2}}}(x)=2n(2n-1)x^{2n-2}\geq 0\quad \forall \ x\in \mathbb {R}

and so

g(\operatorname {E} [X])=(\operatorname {E} [X])^{2n}\leq \operatorname {E} [X^{2n}].

In particular, if some even moment2nofXis finite,Xhas a finite mean. An extension of this argument showsXhas finite moments of every order $l\in \mathbb {N}$ dividingn.

Alternative finite form

Let $Ω = {x 1,... x n},$ and take $μ$ to be thecounting measureon $Ω$ ,then the general form reduces to a statement about sums:

\varphi \left(\sum _{i=1}^{n}g(x_{i})\lambda _{i}\right)\leq \sum _{i=1}^{n}\varphi (g(x_{i}))\lambda _{i},

provided that $λ i \geq 0$ and

\lambda _{1}+\cdots +\lambda _{n}=1.

There is also an infinite discrete form.

Statistical physics

Jensen's inequality is of particular importance in statistical physics when the convex function is an exponential, giving:

e^{\operatorname {E} [X]}\leq \operatorname {E} \left[e^{X}\right],

where theexpected valuesare with respect to someprobability distributionin therandom variable $X$ .

Proof: Let $\varphi (x)=e^{x}$ in $\varphi \left(\operatorname {E} [X]\right)\leq \operatorname {E} \left[\varphi (X)\right].$

Information theory

If $p (x)$ is the true probability density for $X$ ,and $q (x)$ is another density, then applying Jensen's inequality for the random variable $Y (X) = q (X)/ p (X)$ and the convex function $φ (y) = -log(y)$ gives

\operatorname {E} [\varphi (Y)]\geq \varphi (\operatorname {E} [Y])

Therefore:

-D(p(x)\|q(x))=\int p(x)\log \left({\frac {q(x)}{p(x)}}\right)\,dx\leq \log \left(\int p(x){\frac {q(x)}{p(x)}}\,dx\right)=\log \left(\int q(x)\,dx\right)=0

a result calledGibbs' inequality.

It shows that the average message length is minimised when codes are assigned on the basis of the true probabilitiesprather than any other distributionq.The quantity that is non-negative is called theKullback–Leibler divergenceofqfromp,where $D(p(x)\|q(x))=\int p(x)\log \left({\frac {p(x)}{q(x)}}\right)dx$ .

Since $-log(x)$ is a strictly convex function for $x > 0$ ,it follows that equality holds when $p (x)$ equals $q (x)$ almost everywhere.

Rao–Blackwell theorem

IfLis a convex function and ${\mathfrak {G}}$ a sub-sigma-algebra, then, from the conditional version of Jensen's inequality, we get

L(\operatorname {E} [\delta (X)\mid {\mathfrak {G}}])\leq \operatorname {E} [L(\delta (X))\mid {\mathfrak {G}}]\quad \Longrightarrow \quad \operatorname {E} [L(\operatorname {E} [\delta (X)\mid {\mathfrak {G}}])]\leq \operatorname {E} [L(\delta (X))].

So if δ(X) is someestimatorof an unobserved parameter θ given a vector of observablesX;and ifT(X) is asufficient statisticfor θ; then an improved estimator, in the sense of having a smaller expected lossL,can be obtained by calculating

\delta _{1}(X)=\operatorname {E} _{\theta }[\delta (X')\mid T(X')=T(X)],

the expected value of δ with respect to θ, taken over all possible vectors of observationsXcompatible with the same value ofT(X) as that observed. Further, because T is a sufficient statistic, $\delta _{1}(X)$ does not depend on θ, hence, becomes a statistic.

This result is known as theRao–Blackwell theorem.

Risk aversion

The relation betweenrisk aversionanddeclining marginal utilityfor scalar outcomes can be stated formally with Jensen's inequality: risk aversion can be stated as preferring a certain outcome $u(E[x])$ to a fair gamble with potentially larger but uncertain outcome of $u(x)$ :

$u(E[x])>E[u(x)]$ .

But this is simply Jensen's inequality for aconcave $u(x)$ :autility functionthat exhibits declining marginal utility.^[11]

Notes

^Jensen, J. L. W. V.(1906)."Sur les fonctions convexes et les inégalités entre les valeurs moyennes".Acta Mathematica.30(1): 175–193.doi:10.1007/BF02418571.
^Guessab, A.; Schmeisser, G. (2013). "Necessary and sufficient conditions for the validity of Jensen's inequality".Archiv der Mathematik.100(6): 561–570.doi:10.1007/s00013-013-0522-3.MR 3069109.S2CID 56372266.
^Dekking, F.M.; Kraaikamp, C.; Lopuhaa, H.P.; Meester, L.E. (2005).A Modern Introduction to Probability and Statistics: Understanding Why and How.Springer Texts in Statistics. London: Springer.doi:10.1007/1-84628-168-7.ISBN 978-1-85233-896-1.
^Gao, Xiang; Sitharam, Meera; Roitberg, Adrian (2019)."Bounds on the Jensen Gap, and Implications for Mean-Concentrated Distributions"(PDF).The Australian Journal of Mathematical Analysis and Applications.16(2).arXiv:1712.05267.
^p. 25 ofRick Durrett(2019).Probability: Theory and Examples(5th ed.). Cambridge University Press.ISBN 978-1108473682.
^Niculescu, Constantin P."Integral inequalities",P. 12.
^p. 29 ofRick Durrett(2019).Probability: Theory and Examples(5th ed.). Cambridge University Press.ISBN 978-1108473682.
^Attention: In this generality additional assumptions on the convex function and/ or the topological vector space are needed, see Example (1.3) on p. 53 inPerlman, Michael D. (1974)."Jensen's Inequality for a Convex Vector-Valued Function on an Infinite-Dimensional Space".Journal of Multivariate Analysis.4(1): 52–65.doi:10.1016/0047-259X(74)90005-0.hdl:11299/199167.
^Liao, J.; Berg, A (2018). "Sharpening Jensen's Inequality".American Statistician.73(3): 278–281.arXiv:1707.08644.doi:10.1080/00031305.2017.1419145.S2CID 88515366.
^Bradley, CJ (2006).Introduction to Inequalities.Leeds, United Kingdom: United Kingdom Mathematics Trust. p. 97.ISBN 978-1-906001-11-7.
^Back, Kerry (2010).Asset Pricing and Portfolio Choice Theory.Oxford University Press. p. 5.ISBN 978-0-19-538061-3.

References

David Chandler(1987).Introduction to Modern Statistical Mechanics.Oxford.ISBN 0-19-504277-8.
Tristan Needham(1993) "A Visual Explanation of Jensen's Inequality",American Mathematical Monthly100(8):768–71.
Nicola Fusco;Paolo Marcellini;Carlo Sbordone (1996).Analisi Matematica Due.Liguori.ISBN 978-88-207-2675-1.
Walter Rudin(1987).Real and Complex Analysis.McGraw-Hill.ISBN 0-07-054234-1.
Rick Durrett(2019).Probability: Theory and Examples(5th ed.). Cambridge University Press. p. 430.ISBN 978-1108473682.Retrieved21 Dec2020.
Sam Savage (2012)The Flaw of Averages: Why We Underestimate Risk in the Face of Uncertainty(1st ed.) Wiley. ISBN 978-0471381976

External links

Jensen's Operator Inequalityof Hansen and Pedersen.
"Jensen inequality",Encyclopedia of Mathematics,EMS Press,2001 [1994]
Weisstein, Eric W."Jensen's inequality".MathWorld.
Arthur Lohwater (1982)."Introduction to Inequalities".Online e-book in PDF format.

[1] Jensen, J. L. W. V.(1906)."Sur les fonctions convexes et les inégalités entre les valeurs moyennes".Acta Mathematica.30(1): 175–193.doi:10.1007/BF02418571.

[2] Guessab, A.; Schmeisser, G. (2013). "Necessary and sufficient conditions for the validity of Jensen's inequality".Archiv der Mathematik.100(6): 561–570.doi:10.1007/s00013-013-0522-3.MR 3069109.S2CID 56372266.

[3] Dekking, F.M.; Kraaikamp, C.; Lopuhaa, H.P.; Meester, L.E. (2005).A Modern Introduction to Probability and Statistics: Understanding Why and How.Springer Texts in Statistics. London: Springer.doi:10.1007/1-84628-168-7.ISBN 978-1-85233-896-1.

[Gao_et_al.-4] Gao, Xiang; Sitharam, Meera; Roitberg, Adrian (2019)."Bounds on the Jensen Gap, and Implications for Mean-Concentrated Distributions"(PDF).The Australian Journal of Mathematical Analysis and Applications.16(2).arXiv:1712.05267.

[5] . 25 ofRick Durrett(2019).Probability: Theory and Examples(5th ed.). Cambridge University Press.ISBN 978-1108473682.

[6] Niculescu, Constantin P."Integral inequalities",P. 12.

[7] . 29 ofRick Durrett(2019).Probability: Theory and Examples(5th ed.). Cambridge University Press.ISBN 978-1108473682.

[8] Attention: In this generality additional assumptions on the convex function and/ or the topological vector space are needed, see Example (1.3) on p. 53 inPerlman, Michael D. (1974)."Jensen's Inequality for a Convex Vector-Valued Function on an Infinite-Dimensional Space".Journal of Multivariate Analysis.4(1): 52–65.doi:10.1016/0047-259X(74)90005-0.hdl:11299/199167.

[Liao_&_Berg-9] Liao, J.; Berg, A (2018). "Sharpening Jensen's Inequality".American Statistician.73(3): 278–281.arXiv:1707.08644.doi:10.1080/00031305.2017.1419145.S2CID 88515366.

[10] Bradley, CJ (2006).Introduction to Inequalities.Leeds, United Kingdom: United Kingdom Mathematics Trust. p. 97.ISBN 978-1-906001-11-7.

[11] Back, Kerry (2010).Asset Pricing and Portfolio Choice Theory.Oxford University Press. p. 5.ISBN 978-0-19-538061-3.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

v t e Convex analysisandvariational analysis
Basic concepts	Convex combination Convex function Convex set
Topics (list)	Choquet theory Convex geometry Convex metric space Convex optimization Duality Lagrange multiplier Legendre transformation Locally convex topological vector space Simplex
Maps	Convex conjugate Concave (Closed K- Logarithmically Proper Pseudo- Quasi-)Convex function Invex function Legendre transformation Semi-continuity Subderivative
Mainresults (list)	Carathéodory's theorem Ekeland's variational principle Fenchel–Moreau theorem Fenchel-Young inequality Jensen's inequality Hermite–Hadamard inequality Krein–Milman theorem Mazur's lemma Shapley–Folkman lemma Robinson–Ursescu Simons Ursescu
Sets	Convex hull (Orthogonally,Pseudo-)Convex set Effective domain Epigraph Hypograph John ellipsoid Lens Radial set/Algebraic interior Zonotope
Series	Convex series related((cs, lcs)-closed,(cs, bcs)-complete,(lower) ideally convex,(Hx),and(Hwx))
Duality	Dual system Duality gap Strong duality Weak duality
Applications and related	Convexity in economics