Jump to content

Adjoint state method

From Wikipedia, the free encyclopedia

Theadjoint state methodis anumerical methodfor efficiently computing thegradientof afunctionoroperatorin anumerical optimization problem.[1]It has applications ingeophysics,seismic imaging,photonicsand more recently inneural networks.[2]

The adjoint state space is chosen to simplify the physical interpretation of equationconstraints.[3]

Adjoint state techniques allow the use ofintegration by parts,resulting in a form which explicitly contains the physically interesting quantity. An adjoint state equation is introduced, including a new unknown variable.

The adjoint method formulates the gradient of a function towards its parameters in a constraint optimization form. By using the dual form of this constraint optimization problem, it can be used to calculate the gradient very fast. A nice property is that the number of computations is independent of the number of parameters for which you want the gradient. The adjoint method is derived from thedual problem[4]and is used e.g. in theLandweber iterationmethod.[5]

The nameadjoint state methodrefers to thedualform of the problem, where theadjoint matrixis used.

When the initial problem consists of calculating the productandmust satisfy,the dual problem can be realized as calculating the product(),wheremust satisfy. And is called the adjoint state vector.

General case

[edit]

The original adjoint calculation method goes back to Jean Cea,[6]with the use of the Lagrangian of the optimization problem to compute the derivative of afunctionalwith respect to ashapeparameter.

For a state variable,an optimization variable,an objective functionalis defined. The state variableis often implicitly dependant onthrough the (direct) state equation(usually theweak formof apartial differential equation), thus the considered objective is.Usually, one would be interested in calculatingusing thechain rule:

Unfortunately, the termis often very hard to differentiate analytically since the dependance is defined through an implicit equation. The Lagrangian functional can be used as a workaround for this issue. Since the state equation can be considered as a constraint in the minimization of,the problem

has an associate Lagrangian functionaldefined by

whereis aLagrange multiplieroradjoint state variableandis aninner producton.The method of Lagrange multipliers states that a solution to the problem has to be astationary pointof the lagrangian, namely

whereis theGateaux derivativeofwith respect toin the direction.The last equation is equivalent to,the state equation, to which the solution is.The first equation is the so-called adjoint state equation,

because the operator involved is the adjoint operator of,.Resolving this equation yields the adjoint state. The gradient of the quantity of interestwith respect tois(the second equation withand), thus it can be easily identified by subsequently resolving the direct and adjoint state equations. The process is even simpler when the operatorisself-adjointor symmetric since the direct and adjoint state equations differ only by their right-hand side.

Example: Linear case

[edit]

In a real finite dimensionallinear programmingcontext, the objective function could be,for,and,and let the state equation be,withand.

The lagrangian function of the problem is,where.

The derivative ofwith respect toyields the state equation as shown before, and the state variable is.The derivative ofwith respect tois equivalent to the adjoint equation, which is, for every,

Thus, we can write symbolically.The gradient would be

whereis a third ordertensor,is thedyadic productbetween the direct and adjoint states anddenotes a doubletensor contraction.It is assumed thathas a known analytic expression that can be differentiated easily.

Numerical consideration for the self-adjoint case

[edit]

If the operatorwas self-adjoint,,the direct state equation and the adjoint state equation would have the same left-hand side. In the goal of never inverting a matrix, which is a very slow process numerically, aLU decompositioncan be used instead to solve the state equation, inoperations for the decomposition andoperations for the resolution. That same decomposition can then be used to solve the adjoint state equation in onlyoperations since the matrices are the same.

See also

[edit]

References

[edit]
  1. ^Pollini, Nicolò; Lavan, Oren; Amir, Oded (2018-06-01). "Adjoint sensitivity analysis and optimization of hysteretic dynamic systems with nonlinear viscous dampers".Structural and Multidisciplinary Optimization.57(6): 2273–2289.doi:10.1007/s00158-017-1858-2.ISSN1615-1488.S2CID125712091.
  2. ^Ricky T. Q. Chen, Yulia Rubanova, Jesse Bettencourt, David DuvenaudNeural Ordinary Differential EquationsAvailable online
  3. ^Plessix, R-E. "A review of the adjoint-state method for computing the gradient of a functional with geophysical applications." Geophysical Journal International, 2006, 167(2): 495-503.free access on GJI website
  4. ^McNamara, Antoine; Treuille, Adrien; Popović, Zoran; Stam, Jos (August 2004)."Fluid control using the adjoint method"(PDF).ACM Transactions on Graphics.23(3): 449–456.doi:10.1145/1015706.1015744.Archived(PDF)from the original on 29 January 2022.Retrieved28 October2022.
  5. ^Lundvall, Johan (2007)."Data Assimilation in Fluid Dynamics using Adjoint Optimization"(PDF).Sweden:Linköping University of Technology.Archived(PDF)from the original on 9 October 2022.Retrieved28 October2022.
  6. ^Cea, Jean (1986)."Conception optimale ou identification de formes, calcul rapide de la dérivée directionnelle de la fonction coût".ESAIM: Mathematical Modelling and Numerical Analysis - Modélisation Mathématique et Analyse Numérique(in French).20(3): 371–402.doi:10.1051/m2an/1986200303711.
[edit]
  • A well written explanation by Errico:What is an adjoint Model?
  • Another well written explanation with worked examples, written by Bradley[1]
  • More technical explanation: Areviewof the adjoint-state method for computing the gradient of a functional with geophysical applications
  • MIT course[2]
  • MIT notes[3]