Jump to content

Mercer's theorem

From Wikipedia, the free encyclopedia

Inmathematics,specificallyfunctional analysis,Mercer's theoremis a representation of a symmetricpositive-definitefunction on a square as a sum of a convergent sequence of product functions. This theorem, presented in (Mercer 1909), is one of the most notable results of the work ofJames Mercer(1883–1932). It is an important theoretical tool in the theory ofintegral equations;it is used in theHilbert spacetheory ofstochastic processes,for example theKarhunen–Loève theorem;and it is also used in thereproducing kernel Hilbert spacetheory where it characterizes a symmetricpositive-definite kernelas a reproducing kernel.[1]

Introduction[edit]

To explain Mercer'stheorem,we first consider an important special case; seebelowfor a more general formulation. Akernel,in this context, is a symmetric continuous function

where symmetric means thatfor all.

Kis said to be apositive-definite kernelif and only if

for all finite sequences of pointsx1,...,xnof [a,b] and all choices of real numbersc1,...,cn.Note that the term "positive-definite" is well-established in literature despite the weak inequality in the definition.[2][3]

Associated toKis alinear operator(more specifically aHilbert–Schmidt integral operator) on functions defined by the integral

For technical considerations we assumecan range through the space L2[a,b] (seeLp space) of square-integrable real-valued functions. SinceTKis a linear operator, we can talk abouteigenvaluesandeigenfunctionsofTK.

Theorem.SupposeKis a continuous symmetric positive-definite kernel. Then there is anorthonormal basis {ei}iofL2[a,b] consisting of eigenfunctions ofTKsuch that the corresponding sequence of eigenvalues {λi}iis nonnegative. The eigenfunctions corresponding to non-zero eigenvalues are continuous on [a,b] andKhas the representation

where the convergence is absolute and uniform.

Details[edit]

We now explain in greater detail the structure of the proof of Mercer's theorem, particularly how it relates tospectral theory of compact operators.

  • The mapKTKisinjective.
  • TKis a non-negative symmetric compact operator onL2[a,b]; moreoverK(x,x) ≥ 0.

To show compactness, show that the image of theunit ballofL2[a,b] underTKequicontinuousand applyAscoli's theorem,to show that the image of the unit ball is relatively compact in C([a,b]) with theuniform normanda fortioriinL2[a,b].

Now apply thespectral theoremfor compact operators on Hilbert spaces toTKto show the existence of the orthonormal basis {ei}iof L2[a,b]

If λi≠ 0, the eigenvector (eigenfunction)eiis seen to be continuous on [a,b]. Now

which shows that the sequence

converges absolutely and uniformly to a kernelK0which is easily seen to define the same operator as the kernelK.HenceK=K0from which Mercer's theorem follows.

Finally, to show non-negativity of the eigenvalues one can writeand expressing the right hand side as an integral well-approximated by its Riemann sums, which are non-negative by positive-definiteness ofK,implying,implying.

Trace[edit]

The following is immediate:

Theorem.SupposeKis a continuous symmetric positive-definite kernel;TKhas a sequence of nonnegative eigenvalues {λi}i.Then

This shows that the operatorTKis atrace classoperator and

Generalizations[edit]

Mercer's theorem itself is a generalization of the result that anysymmetricpositive-semidefinite matrixis theGramian matrixof a set of vectors.

The first generalization[citation needed]replaces the interval [a,b] with anycompact Hausdorff spaceand Lebesgue measure on [a,b] is replaced by a finite countably additive measure μ on theBorel algebraofXwhose support isX.This means that μ(U) > 0 for any nonempty open subsetUofX.

A recent generalization[citation needed]replaces these conditions by the following: the setXis afirst-countabletopological space endowed with a Borel (complete) measure μ.Xis the support of μ and, for allxinX,there is an open setUcontainingxand having finite measure. Then essentially the same result holds:

Theorem.SupposeKis a continuous symmetric positive-definite kernel onX.If the function κ isL1μ(X), where κ(x)=K(x,x), for allxinX,then there is anorthonormal set {ei}iofL2μ(X) consisting of eigenfunctions ofTKsuch that corresponding sequence of eigenvalues {λi}iis nonnegative. The eigenfunctions corresponding to non-zero eigenvalues are continuous onXandKhas the representation

where the convergence is absolute and uniform on compact subsets ofX.

The next generalization[citation needed]deals with representations ofmeasurablekernels.

Let (X,M,μ) be a σ-finite measure space. AnL2(or square-integrable) kernel onXis a function

L2kernels define a bounded operatorTKby the formula

TKis a compact operator (actually it is even aHilbert–Schmidt operator). If the kernelKis symmetric, by thespectral theorem,TKhas an orthonormal basis of eigenvectors. Those eigenvectors that correspond to non-zero eigenvalues can be arranged in a sequence {ei}i(regardless of separability).

Theorem.IfKis a symmetric positive-definite kernel on (X,M,μ), then

where the convergence in theL2norm. Note that when continuity of the kernel is not assumed, the expansion no longer converges uniformly.

Mercer's condition[edit]

Inmathematics,areal-valuedfunctionK(x,y) is said to fulfillMercer's conditionif for allsquare-integrable functionsg(x) one has

Discrete analog[edit]

This is analogous to the definition of apositive-semidefinite matrix.This is a matrixof dimension,which satisfies, for all vectors,the property

.

Examples[edit]

A positive constant function

satisfies Mercer's condition, as then the integral becomes byFubini's theorem

which is indeednon-negative.

See also[edit]

Notes[edit]

  1. ^Bartlett, Peter (2008)."Reproducing Kernel Hilbert Spaces"(PDF).Lecture notes of CS281B/Stat241B Statistical Learning Theory.University of California at Berkeley.
  2. ^Mohri, Mehryar (2018).Foundations of machine learning.Afshin Rostamizadeh, Ameet Talwalkar (Second ed.). Cambridge, Massachusetts.ISBN978-0-262-03940-6.OCLC1041560990.{{cite book}}:CS1 maint: location missing publisher (link)
  3. ^Berlinet, A. (2004).Reproducing kernel Hilbert spaces in probability and statistics.Christine Thomas-Agnan. New York: Springer Science+Business Media.ISBN1-4419-9096-8.OCLC844346520.

References[edit]

  • Adriaan Zaanen,Linear Analysis,North Holland Publishing Co., 1960,
  • Ferreira, J. C., Menegatto, V. A.,Eigenvalues of integral operators defined by smooth positive definite kernels,Integral equation and Operator Theory, 64 (2009), no. 1, 61–81. (Gives the generalization of Mercer's theorem for metric spaces. The result is easily adapted to first countable topological spaces)
  • Konrad Jörgens,Linear integral operators,Pitman, Boston, 1982,
  • Richard CourantandDavid Hilbert,Methods of Mathematical Physics,vol 1, Interscience 1953,
  • Robert Ash,Information Theory,Dover Publications, 1990,
  • Mercer, J. (1909), "Functions of positive and negative type and their connection with the theory of integral equations",Philosophical Transactions of the Royal Society A,209(441–458): 415–446,Bibcode:1909RSPTA.209..415M,doi:10.1098/rsta.1909.0016,
  • "Mercer theorem",Encyclopedia of Mathematics,EMS Press,2001 [1994]
  • H. König,Eigenvalue distribution of compact operators,Birkhäuser Verlag, 1986. (Gives the generalization of Mercer's theorem for finite measures μ.)