Generalization of the Legendre transformation
Inmathematics andmathematical optimization ,theconvex conjugate of a function is a generalization of theLegendre transformation which applies to non-convex functions. It is also known asLegendre–Fenchel transformation ,Fenchel transformation ,orFenchel conjugate (afterAdrien-Marie Legendre andWerner Fenchel ). The convex conjugate is widely used for constructing thedual problem inoptimization theory ,thus generalizingLagrangian duality .
Let
X
{\displaystyle X}
be areal topological vector space and let
X
∗
{\displaystyle X^{*}}
be thedual space to
X
{\displaystyle X}
.Denote by
⟨
⋅
,
⋅
⟩
:
X
∗
×
X
→
R
{\displaystyle \langle \cdot,\cdot \rangle:X^{*}\times X\to \mathbb {R} }
the canonicaldual pairing ,which is defined by
⟨
x
∗
,
x
⟩
↦
x
∗
(
x
)
.
{\displaystyle \left\langle x^{*},x\right\rangle \mapsto x^{*}(x).}
For a function
f
:
X
→
R
∪
{
−
∞
,
+
∞
}
{\displaystyle f:X\to \mathbb {R} \cup \{-\infty,+\infty \}}
taking values on theextended real number line ,itsconvex conjugate is the function
f
∗
:
X
∗
→
R
∪
{
−
∞
,
+
∞
}
{\displaystyle f^{*}:X^{*}\to \mathbb {R} \cup \{-\infty,+\infty \}}
whose value at
x
∗
∈
X
∗
{\displaystyle x^{*}\in X^{*}}
is defined to be thesupremum :
f
∗
(
x
∗
)
:=
sup
{
⟨
x
∗
,
x
⟩
−
f
(
x
)
:
x
∈
X
}
,
{\displaystyle f^{*}\left(x^{*}\right):=\sup \left\{\left\langle x^{*},x\right\rangle -f(x)~\colon ~x\in X\right\},}
or, equivalently, in terms of theinfimum :
f
∗
(
x
∗
)
:=
−
inf
{
f
(
x
)
−
⟨
x
∗
,
x
⟩
:
x
∈
X
}
.
{\displaystyle f^{*}\left(x^{*}\right):=-\inf \left\{f(x)-\left\langle x^{*},x\right\rangle ~\colon ~x\in X\right\}.}
This definition can be interpreted as an encoding of theconvex hull of the function'sepigraph in terms of itssupporting hyperplanes .[ 1]
For more examples, see§ Table of selected convex conjugates .
The convex conjugate of anaffine function
f
(
x
)
=
⟨
a
,
x
⟩
−
b
{\displaystyle f(x)=\left\langle a,x\right\rangle -b}
is
f
∗
(
x
∗
)
=
{
b
,
x
∗
=
a
+
∞
,
x
∗
≠
a
.
{\displaystyle f^{*}\left(x^{*}\right)={\begin{cases}b,&x^{*}=a\\+\infty,&x^{*}\neq a.\end{cases}}}
The convex conjugate of apower function
f
(
x
)
=
1
p
|
x
|
p
,
1
<
p
<
∞
{\displaystyle f(x)={\frac {1}{p}}|x|^{p},1<p<\infty }
is
f
∗
(
x
∗
)
=
1
q
|
x
∗
|
q
,
1
<
q
<
∞
,
where
1
p
+
1
q
=
1.
{\displaystyle f^{*}\left(x^{*}\right)={\frac {1}{q}}|x^{*}|^{q},1<q<\infty,{\text{where}}{\tfrac {1}{p}}+{\tfrac {1}{q}}=1.}
The convex conjugate of theabsolute value function
f
(
x
)
=
|
x
|
{\displaystyle f(x)=\left|x\right|}
is
f
∗
(
x
∗
)
=
{
0
,
|
x
∗
|
≤
1
∞
,
|
x
∗
|
>
1.
{\displaystyle f^{*}\left(x^{*}\right)={\begin{cases}0,&\left|x^{*}\right|\leq 1\\\infty,&\left|x^{*}\right|>1.\end{cases}}}
The convex conjugate of theexponential function
f
(
x
)
=
e
x
{\displaystyle f(x)=e^{x}}
is
f
∗
(
x
∗
)
=
{
x
∗
ln
x
∗
−
x
∗
,
x
∗
>
0
0
,
x
∗
=
0
∞
,
x
∗
<
0.
{\displaystyle f^{*}\left(x^{*}\right)={\begin{cases}x^{*}\ln x^{*}-x^{*},&x^{*}>0\\0,&x^{*}=0\\\infty,&x^{*}<0.\end{cases}}}
The convex conjugate and Legendre transform of the exponential function agree except that thedomain of the convex conjugate is strictly larger as the Legendre transform is only defined for positive real numbers.
Connection with expected shortfall (average value at risk)[ edit ]
Seethis article for example.
LetF denote acumulative distribution function of arandom variable X .Then (integrating by parts),
f
(
x
)
:=
∫
−
∞
x
F
(
u
)
d
u
=
E
[
max
(
0
,
x
−
X
)
]
=
x
−
E
[
min
(
x
,
X
)
]
{\displaystyle f(x):=\int _{-\infty }^{x}F(u)\,du=\operatorname {E} \left[\max(0,x-X)\right]=x-\operatorname {E} \left[\min(x,X)\right]}
has the convex conjugate
f
∗
(
p
)
=
∫
0
p
F
−
1
(
q
)
d
q
=
(
p
−
1
)
F
−
1
(
p
)
+
E
[
min
(
F
−
1
(
p
)
,
X
)
]
=
p
F
−
1
(
p
)
−
E
[
max
(
0
,
F
−
1
(
p
)
−
X
)
]
.
{\displaystyle f^{*}(p)=\int _{0}^{p}F^{-1}(q)\,dq=(p-1)F^{-1}(p)+\operatorname {E} \left[\min(F^{-1}(p),X)\right]=pF^{-1}(p)-\operatorname {E} \left[\max(0,F^{-1}(p)-X)\right].}
A particular interpretation has the transform
f
inc
(
x
)
:=
arg
sup
t
t
⋅
x
−
∫
0
1
max
{
t
−
f
(
u
)
,
0
}
d
u
,
{\displaystyle f^{\text{inc}}(x):=\arg \sup _{t}t\cdot x-\int _{0}^{1}\max\{t-f(u),0\}\,du,}
as this is a nondecreasing rearrangement of the initial functionf ;in particular,
f
inc
=
f
{\displaystyle f^{\text{inc}}=f}
forf nondecreasing.
The convex conjugate of aclosed convex function is again a closed convex function. The convex conjugate of apolyhedral convex function (a convex function withpolyhedral epigraph ) is again a polyhedral convex function.
Declare that
f
≤
g
{\displaystyle f\leq g}
if and only if
f
(
x
)
≤
g
(
x
)
{\displaystyle f(x)\leq g(x)}
for all
x
.
{\displaystyle x.}
Then convex-conjugation isorder-reversing ,which by definition means that if
f
≤
g
{\displaystyle f\leq g}
then
f
∗
≥
g
∗
.
{\displaystyle f^{*}\geq g^{*}.}
For a family of functions
(
f
α
)
α
{\displaystyle \left(f_{\ Alpha }\right)_{\ Alpha }}
it follows from the fact that supremums may be interchanged that
(
inf
α
f
α
)
∗
(
x
∗
)
=
sup
α
f
α
∗
(
x
∗
)
,
{\displaystyle \left(\inf _{\ Alpha }f_{\ Alpha }\right)^{*}(x^{*})=\sup _{\ Alpha }f_{\ Alpha }^{*}(x^{*}),}
and from themax–min inequality that
(
sup
α
f
α
)
∗
(
x
∗
)
≤
inf
α
f
α
∗
(
x
∗
)
.
{\displaystyle \left(\sup _{\ Alpha }f_{\ Alpha }\right)^{*}(x^{*})\leq \inf _{\ Alpha }f_{\ Alpha }^{*}(x^{*}).}
The convex conjugate of a function is alwayslower semi-continuous .Thebiconjugate
f
∗
∗
{\displaystyle f^{**}}
(the convex conjugate of the convex conjugate) is also theclosed convex hull ,i.e. the largestlower semi-continuous convex function with
f
∗
∗
≤
f
.
{\displaystyle f^{**}\leq f.}
Forproper functions
f
,
{\displaystyle f,}
f
=
f
∗
∗
{\displaystyle f=f^{**}}
if and only if
f
{\displaystyle f}
is convex and lower semi-continuous, by theFenchel–Moreau theorem .
Fenchel's inequality[ edit ]
For any functionf and its convex conjugatef * ,Fenchel's inequality (also known as theFenchel–Young inequality ) holds for every
x
∈
X
{\displaystyle x\in X}
and
p
∈
X
∗
{\displaystyle p\in X^{*}}
:
⟨
p
,
x
⟩
≤
f
(
x
)
+
f
∗
(
p
)
.
{\displaystyle \left\langle p,x\right\rangle \leq f(x)+f^{*}(p).}
Furthermore, the equality holds only when
p
∈
∂
f
(
x
)
{\displaystyle p\in \partial f(x)}
.
The proof follows from the definition of convex conjugate:
f
∗
(
p
)
=
sup
x
~
{
⟨
p
,
x
~
⟩
−
f
(
x
~
)
}
≥
⟨
p
,
x
⟩
−
f
(
x
)
.
{\displaystyle f^{*}(p)=\sup _{\tilde {x}}\left\{\langle p,{\tilde {x}}\rangle -f({\tilde {x}})\right\}\geq \langle p,x\rangle -f(x).}
For two functions
f
0
{\displaystyle f_{0}}
and
f
1
{\displaystyle f_{1}}
and a number
0
≤
λ
≤
1
{\displaystyle 0\leq \lambda \leq 1}
the convexity relation
(
(
1
−
λ
)
f
0
+
λ
f
1
)
∗
≤
(
1
−
λ
)
f
0
∗
+
λ
f
1
∗
{\displaystyle \left((1-\lambda )f_{0}+\lambda f_{1}\right)^{*}\leq (1-\lambda )f_{0}^{*}+\lambda f_{1}^{*}}
holds. The
∗
{\displaystyle {*}}
operation is a convex mapping itself.
Infimal convolution [ edit ]
Theinfimal convolution (or epi-sum) of two functions
f
{\displaystyle f}
and
g
{\displaystyle g}
is defined as
(
f
◻
g
)
(
x
)
=
inf
{
f
(
x
−
y
)
+
g
(
y
)
∣
y
∈
R
n
}
.
{\displaystyle \left(f\operatorname {\Box } g\right)(x)=\inf \left\{f(x-y)+g(y)\mid y\in \mathbb {R} ^{n}\right\}.}
Let
f
1
,
…
,
f
m
{\displaystyle f_{1},\ldots,f_{m}}
beproper ,convex andlower semicontinuous functions on
R
n
.
{\displaystyle \mathbb {R} ^{n}.}
Then the infimal convolution is convex and lower semicontinuous (but not necessarily proper),[ 2] and satisfies
(
f
1
◻
⋯
◻
f
m
)
∗
=
f
1
∗
+
⋯
+
f
m
∗
.
{\displaystyle \left(f_{1}\operatorname {\Box } \cdots \operatorname {\Box } f_{m}\right)^{*}=f_{1}^{*}+\cdots +f_{m}^{*}.}
The infimal convolution of two functions has a geometric interpretation: The (strict)epigraph of the infimal convolution of two functions is theMinkowski sum of the (strict) epigraphs of those functions.[ 3]
Maximizing argument [ edit ]
If the function
f
{\displaystyle f}
is differentiable, then its derivative is the maximizing argument in the computation of the convex conjugate:
f
′
(
x
)
=
x
∗
(
x
)
:=
arg
sup
x
∗
⟨
x
,
x
∗
⟩
−
f
∗
(
x
∗
)
{\displaystyle f^{\prime }(x)=x^{*}(x):=\arg \sup _{x^{*}}{\langle x,x^{*}\rangle }-f^{*}\left(x^{*}\right)}
and
f
∗
′
(
x
∗
)
=
x
(
x
∗
)
:=
arg
sup
x
⟨
x
,
x
∗
⟩
−
f
(
x
)
;
{\displaystyle f^{{*}\prime }\left(x^{*}\right)=x\left(x^{*}\right):=\arg \sup _{x}{\langle x,x^{*}\rangle }-f(x);}
hence
x
=
∇
f
∗
(
∇
f
(
x
)
)
,
{\displaystyle x=\nabla f^{*}\left(\nabla f(x)\right),}
x
∗
=
∇
f
(
∇
f
∗
(
x
∗
)
)
,
{\displaystyle x^{*}=\nabla f\left(\nabla f^{*}\left(x^{*}\right)\right),}
and moreover
f
′
′
(
x
)
⋅
f
∗
′
′
(
x
∗
(
x
)
)
=
1
,
{\displaystyle f^{\prime \prime }(x)\cdot f^{{*}\prime \prime }\left(x^{*}(x)\right)=1,}
f
∗
′
′
(
x
∗
)
⋅
f
′
′
(
x
(
x
∗
)
)
=
1.
{\displaystyle f^{{*}\prime \prime }\left(x^{*}\right)\cdot f^{\prime \prime }\left(x(x^{*})\right)=1.}
If for some
γ
>
0
,
{\displaystyle \gamma >0,}
g
(
x
)
=
α
+
β
x
+
γ
⋅
f
(
λ
x
+
δ
)
{\displaystyle g(x)=\ Alpha +\beta x+\gamma \cdot f\left(\lambda x+\delta \right)}
,then
g
∗
(
x
∗
)
=
−
α
−
δ
x
∗
−
β
λ
+
γ
⋅
f
∗
(
x
∗
−
β
λ
γ
)
.
{\displaystyle g^{*}\left(x^{*}\right)=-\ Alpha -\delta {\frac {x^{*}-\beta }{\lambda }}+\gamma \cdot f^{*}\left({\frac {x^{*}-\beta }{\lambda \gamma }}\right).}
Let
A
:
X
→
Y
{\displaystyle A:X\to Y}
be abounded linear operator .For any convex function
f
{\displaystyle f}
on
X
,
{\displaystyle X,}
(
A
f
)
∗
=
f
∗
A
∗
{\displaystyle \left(Af\right)^{*}=f^{*}A^{*}}
where
(
A
f
)
(
y
)
=
inf
{
f
(
x
)
:
x
∈
X
,
A
x
=
y
}
{\displaystyle (Af)(y)=\inf\{f(x):x\in X,Ax=y\}}
is the preimage of
f
{\displaystyle f}
with respect to
A
{\displaystyle A}
and
A
∗
{\displaystyle A^{*}}
is theadjoint operator of
A
.
{\displaystyle A.}
[ 4]
A closed convex function
f
{\displaystyle f}
is symmetric with respect to a given set
G
{\displaystyle G}
oforthogonal linear transformations ,
f
(
A
x
)
=
f
(
x
)
{\displaystyle f(Ax)=f(x)}
for all
x
{\displaystyle x}
and all
A
∈
G
{\displaystyle A\in G}
if and only if its convex conjugate
f
∗
{\displaystyle f^{*}}
is symmetric with respect to
G
.
{\displaystyle G.}
Table of selected convex conjugates [ edit ]
The following table provides Legendre transforms for many common functions as well as a few useful properties.[ 5]
g
(
x
)
{\displaystyle g(x)}
dom
(
g
)
{\displaystyle \operatorname {dom} (g)}
g
∗
(
x
∗
)
{\displaystyle g^{*}(x^{*})}
dom
(
g
∗
)
{\displaystyle \operatorname {dom} (g^{*})}
f
(
a
x
)
{\displaystyle f(ax)}
(where
a
≠
0
{\displaystyle a\neq 0}
)
X
{\displaystyle X}
f
∗
(
x
∗
a
)
{\displaystyle f^{*}\left({\frac {x^{*}}{a}}\right)}
X
∗
{\displaystyle X^{*}}
f
(
x
+
b
)
{\displaystyle f(x+b)}
X
{\displaystyle X}
f
∗
(
x
∗
)
−
⟨
b
,
x
∗
⟩
{\displaystyle f^{*}(x^{*})-\langle b,x^{*}\rangle }
X
∗
{\displaystyle X^{*}}
a
f
(
x
)
{\displaystyle af(x)}
(where
a
>
0
{\displaystyle a>0}
)
X
{\displaystyle X}
a
f
∗
(
x
∗
a
)
{\displaystyle af^{*}\left({\frac {x^{*}}{a}}\right)}
X
∗
{\displaystyle X^{*}}
α
+
β
x
+
γ
⋅
f
(
λ
x
+
δ
)
{\displaystyle \ Alpha +\beta x+\gamma \cdot f(\lambda x+\delta )}
X
{\displaystyle X}
−
α
−
δ
x
∗
−
β
λ
+
γ
⋅
f
∗
(
x
∗
−
β
γ
λ
)
(
γ
>
0
)
{\displaystyle -\ Alpha -\delta {\frac {x^{*}-\beta }{\lambda }}+\gamma \cdot f^{*}\left({\frac {x^{*}-\beta }{\gamma \lambda }}\right)\quad (\gamma >0)}
X
∗
{\displaystyle X^{*}}
|
x
|
p
p
{\displaystyle {\frac {|x|^{p}}{p}}}
(where
p
>
1
{\displaystyle p>1}
)
R
{\displaystyle \mathbb {R} }
|
x
∗
|
q
q
{\displaystyle {\frac {|x^{*}|^{q}}{q}}}
(where
1
p
+
1
q
=
1
{\displaystyle {\frac {1}{p}}+{\frac {1}{q}}=1}
)
R
{\displaystyle \mathbb {R} }
−
x
p
p
{\displaystyle {\frac {-x^{p}}{p}}}
(where
0
<
p
<
1
{\displaystyle 0<p<1}
)
R
+
{\displaystyle \mathbb {R} _{+}}
−
(
−
x
∗
)
q
q
{\displaystyle {\frac {-(-x^{*})^{q}}{q}}}
(where
1
p
+
1
q
=
1
{\displaystyle {\frac {1}{p}}+{\frac {1}{q}}=1}
)
R
−
−
{\displaystyle \mathbb {R} _{--}}
1
+
x
2
{\displaystyle {\sqrt {1+x^{2}}}}
R
{\displaystyle \mathbb {R} }
−
1
−
(
x
∗
)
2
{\displaystyle -{\sqrt {1-(x^{*})^{2}}}}
[
−
1
,
1
]
{\displaystyle [-1,1]}
−
log
(
x
)
{\displaystyle -\log(x)}
R
+
+
{\displaystyle \mathbb {R} _{++}}
−
(
1
+
log
(
−
x
∗
)
)
{\displaystyle -(1+\log(-x^{*}))}
R
−
−
{\displaystyle \mathbb {R} _{--}}
e
x
{\displaystyle e^{x}}
R
{\displaystyle \mathbb {R} }
{
x
∗
log
(
x
∗
)
−
x
∗
if
x
∗
>
0
0
if
x
∗
=
0
{\displaystyle {\begin{cases}x^{*}\log(x^{*})-x^{*}&{\text{if }}x^{*}>0\\0&{\text{if }}x^{*}=0\end{cases}}}
R
+
{\displaystyle \mathbb {R} _{+}}
log
(
1
+
e
x
)
{\displaystyle \log \left(1+e^{x}\right)}
R
{\displaystyle \mathbb {R} }
{
x
∗
log
(
x
∗
)
+
(
1
−
x
∗
)
log
(
1
−
x
∗
)
if
0
<
x
∗
<
1
0
if
x
∗
=
0
,
1
{\displaystyle {\begin{cases}x^{*}\log(x^{*})+(1-x^{*})\log(1-x^{*})&{\text{if }}0<x^{*}<1\\0&{\text{if }}x^{*}=0,1\end{cases}}}
[
0
,
1
]
{\displaystyle [0,1]}
−
log
(
1
−
e
x
)
{\displaystyle -\log \left(1-e^{x}\right)}
R
−
−
{\displaystyle \mathbb {R} _{--}}
{
x
∗
log
(
x
∗
)
−
(
1
+
x
∗
)
log
(
1
+
x
∗
)
if
x
∗
>
0
0
if
x
∗
=
0
{\displaystyle {\begin{cases}x^{*}\log(x^{*})-(1+x^{*})\log(1+x^{*})&{\text{if }}x^{*}>0\\0&{\text{if }}x^{*}=0\end{cases}}}
R
+
{\displaystyle \mathbb {R} _{+}}
^ "Legendre Transform" .RetrievedApril 14, 2019 .
^ Phelps, Robert (1993).Convex Functions, Monotone Operators and Differentiability (2 ed.). Springer. p.42 .ISBN 0-387-56715-1 .
^ Bauschke, Heinz H.; Goebel, Rafal; Lucet, Yves; Wang, Xianfu (2008). "The Proximal Average: Basic Theory".SIAM Journal on Optimization .19 (2): 766.CiteSeerX 10.1.1.546.4270 .doi :10.1137/070687542 .
^ Ioffe, A.D. and Tichomirov, V.M. (1979),Theorie der Extremalaufgaben .Deutscher Verlag der Wissenschaften .Satz 3.4.3
^ Borwein, Jonathan ;Lewis, Adrian (2006).Convex Analysis and Nonlinear Optimization: Theory and Examples (2 ed.). Springer. pp.50 –51.ISBN 978-0-387-29570-1 .