Adjoint state method

Theadjoint state methodis anumerical methodfor efficiently computing thegradientof afunctionoroperatorin anumerical optimization problem.^[1]It has applications ingeophysics,seismic imaging,photonicsand more recently inneural networks.^[2]

The adjoint state space is chosen to simplify the physical interpretation of equationconstraints.^[3]

Adjoint state techniques allow the use ofintegration by parts,resulting in a form which explicitly contains the physically interesting quantity. An adjoint state equation is introduced, including a new unknown variable.

The adjoint method formulates the gradient of a function towards its parameters in a constraint optimization form. By using the dual form of this constraint optimization problem, it can be used to calculate the gradient very fast. A nice property is that the number of computations is independent of the number of parameters for which you want the gradient. The adjoint method is derived from thedual problem^[4]and is used e.g. in theLandweber iterationmethod.^[5]

The nameadjoint state methodrefers to thedualform of the problem, where theadjoint matrix $A^{*}={\overline {A}}^{T}$ is used.

When the initial problem consists of calculating the product $s^{T}x$ and $x$ must satisfy $Ax=b$ ,the dual problem can be realized as calculating the product $r^{T}b$ ( $=s^{T}x$ ),where $r$ must satisfy $A^{*}r=s$ . And $r$ is called the adjoint state vector.

General case

The original adjoint calculation method goes back to Jean Cea,^[6]with the use of the Lagrangian of the optimization problem to compute the derivative of afunctionalwith respect to ashapeparameter.

For a state variable $u\in {\mathcal {U}}$ ,an optimization variable $v\in {\mathcal {V}}$ ,an objective functional $J:{\mathcal {U}}\times {\mathcal {V}}\to \mathbb {R}$ is defined. The state variable $u$ is often implicitly dependant on $v$ through the (direct) state equation $D_{v}(u)=0$ (usually theweak formof apartial differential equation), thus the considered objective is $j(v)=J(u_{v},v)$ .Usually, one would be interested in calculating $\nabla j(v)$ using thechain rule:

\nabla j(v)=\nabla _{v}J(u_{v},v)+\nabla _{u}J(u_{v})\nabla _{v}u_{v}.

Unfortunately, the term $\nabla _{v}u_{v}$ is often very hard to differentiate analytically since the dependance is defined through an implicit equation. The Lagrangian functional can be used as a workaround for this issue. Since the state equation can be considered as a constraint in the minimization of $j$ ,the problem

{\text{minimize}}\ j(v)=J(u_{v},v)

{\text{subject to}}\ D_{v}(u_{v})=0

has an associate Lagrangian functional ${\mathcal {L}}:{\mathcal {U}}\times {\mathcal {V}}\times {\mathcal {U}}\to \mathbb {R}$ defined by

{\mathcal {L}}(u,v,\lambda )=J(u,v)+\langle D_{v}(u),\lambda \rangle,

where $\lambda \in {\mathcal {U}}$ is aLagrange multiplieroradjoint state variableand $\langle \cdot,\cdot \rangle$ is aninner producton ${\mathcal {U}}$ .The method of Lagrange multipliers states that a solution to the problem has to be astationary pointof the lagrangian, namely

{\begin{cases}d_{u}{\mathcal {L}}(u,v,\lambda;\delta _{u})=d_{u}J(u,v;\delta _{u})+\langle \delta _{u},D_{v}^{*}(\lambda )\rangle =0&\forall \delta _{u}\in {\mathcal {U}},\\d_{v}{\mathcal {L}}(u,v,\lambda;\delta _{v})=d_{v}J(u,v;\delta _{v})+\langle d_{v}D_{v}(u;\delta _{v}),\lambda \rangle =0&\forall \delta _{v}\in {\mathcal {V}},\\d_{\lambda }{\mathcal {L}}(u,v,\lambda;\delta _{\lambda })=\langle D_{v}(u),\delta _{\lambda }\rangle =0\quad &\forall \delta _{\lambda }\in {\mathcal {U}},\end{cases}}

where $d_{x}F(x;\delta _{x})$ is theGateaux derivativeof $F$ with respect to $x$ in the direction $\delta _{x}$ .The last equation is equivalent to $D_{v}(u)=0$ ,the state equation, to which the solution is $u_{v}$ .The first equation is the so-called adjoint state equation,

\langle \delta _{u},D_{v}^{*}(\lambda )\rangle =-d_{u}J(u_{v},v;\delta _{u})\quad \forall \delta _{u}\in {\mathcal {U}},

because the operator involved is the adjoint operator of $D_{v}$ , $D_{v}^{*}$ .Resolving this equation yields the adjoint state $\lambda _{v}$ . The gradient of the quantity of interest $j$ with respect to $v$ is $\langle \nabla j(v),\delta _{v}\rangle =d_{v}j(v;\delta _{v})=d_{v}{\mathcal {L}}(u_{v},v,\lambda _{v};\delta _{v})$ (the second equation with $u=u_{v}$ and $\lambda =\lambda _{v}$ ), thus it can be easily identified by subsequently resolving the direct and adjoint state equations. The process is even simpler when the operator $D_{v}$ isself-adjointor symmetric since the direct and adjoint state equations differ only by their right-hand side.

Example: Linear case

In a real finite dimensionallinear programmingcontext, the objective function could be $J(u,v)=\langle Au,v\rangle$ ,for $v\in \mathbb {R} ^{n}$ , $u\in \mathbb {R} ^{m}$ and $A\in \mathbb {R} ^{n\times m}$ ,and let the state equation be $B_{v}u=b$ ,with $B_{v}\in \mathbb {R} ^{m\times m}$ and $b\in \mathbb {R} ^{m}$ .

The lagrangian function of the problem is ${\mathcal {L}}(u,v,\lambda )=\langle Au,v\rangle +\langle B_{v}u-b,\lambda \rangle$ ,where $\lambda \in \mathbb {R} ^{m}$ .

The derivative of ${\mathcal {L}}$ with respect to $\lambda$ yields the state equation as shown before, and the state variable is $u_{v}=B_{v}^{-1}b$ .The derivative of ${\mathcal {L}}$ with respect to $u$ is equivalent to the adjoint equation, which is, for every $\delta _{u}\in \mathbb {R} ^{m}$ ,

d_{u}[\langle B_{v}\cdot -b,\lambda \rangle ](u;\delta _{u})=-\langle A^{\top }v,\delta u\rangle \iff \langle B_{v}\delta _{u},\lambda \rangle =-\langle A^{\top }v,\delta u\rangle \iff \langle B_{v}^{\top }\lambda +A^{\top }v,\delta _{u}\rangle =0\iff B_{v}^{\top }\lambda =-A^{\top }v.

Thus, we can write symbolically $\lambda _{v}=B_{v}^{-\top }A^{\top }v$ .The gradient would be

\langle \nabla j(v),\delta _{v}\rangle =\langle Au_{v},\delta _{v}\rangle +\langle \nabla _{v}B_{v}:\lambda _{v}\otimes u_{v},\delta _{v}\rangle,

where $\nabla _{v}B_{v}={\frac {\partial B_{ij}}{\partial v_{k}}}$ is a third ordertensor, $\lambda _{v}\otimes u_{v}=\lambda _{v}^{\top }u_{v}$ is thedyadic productbetween the direct and adjoint states and ${\displaystyle:}$ denotes a doubletensor contraction.It is assumed that $B_{v}$ has a known analytic expression that can be differentiated easily.

Numerical consideration for the self-adjoint case

If the operator $B_{v}$ was self-adjoint, $B_{v}=B_{v}^{\top }$ ,the direct state equation and the adjoint state equation would have the same left-hand side. In the goal of never inverting a matrix, which is a very slow process numerically, aLU decompositioncan be used instead to solve the state equation, in $O(m^{3})$ operations for the decomposition and $O(m^{2})$ operations for the resolution. That same decomposition can then be used to solve the adjoint state equation in only $O(m^{2})$ operations since the matrices are the same.

References

^Pollini, Nicolò; Lavan, Oren; Amir, Oded (2018-06-01). "Adjoint sensitivity analysis and optimization of hysteretic dynamic systems with nonlinear viscous dampers".Structural and Multidisciplinary Optimization.57(6): 2273–2289.doi:10.1007/s00158-017-1858-2.ISSN 1615-1488.S2CID 125712091.
^Ricky T. Q. Chen, Yulia Rubanova, Jesse Bettencourt, David DuvenaudNeural Ordinary Differential EquationsAvailable online
^Plessix, R-E. "A review of the adjoint-state method for computing the gradient of a functional with geophysical applications." Geophysical Journal International, 2006, 167(2): 495-503.free access on GJI website
^McNamara, Antoine; Treuille, Adrien; Popović, Zoran; Stam, Jos (August 2004)."Fluid control using the adjoint method"(PDF).ACM Transactions on Graphics.23(3): 449–456.doi:10.1145/1015706.1015744.Archived(PDF)from the original on 29 January 2022.Retrieved28 October2022.
^Lundvall, Johan (2007)."Data Assimilation in Fluid Dynamics using Adjoint Optimization"(PDF).Sweden:Linköping University of Technology.Archived(PDF)from the original on 9 October 2022.Retrieved28 October2022.
^Cea, Jean (1986)."Conception optimale ou identification de formes, calcul rapide de la dérivée directionnelle de la fonction coût".ESAIM: Mathematical Modelling and Numerical Analysis - Modélisation Mathématique et Analyse Numérique(in French).20(3): 371–402.doi:10.1051/m2an/1986200303711.

External links

A well written explanation by Errico:What is an adjoint Model?
Another well written explanation with worked examples, written by Bradley[1]
More technical explanation: Areviewof the adjoint-state method for computing the gradient of a functional with geophysical applications
MIT course[2]
MIT notes[3]

Thisapplied mathematics-related article is astub.You can help Wikipedia byexpanding it.

[1] Pollini, Nicolò; Lavan, Oren; Amir, Oded (2018-06-01). "Adjoint sensitivity analysis and optimization of hysteretic dynamic systems with nonlinear viscous dampers".Structural and Multidisciplinary Optimization.57(6): 2273–2289.doi:10.1007/s00158-017-1858-2.ISSN 1615-1488.S2CID 125712091.

[2] Ricky T. Q. Chen, Yulia Rubanova, Jesse Bettencourt, David DuvenaudNeural Ordinary Differential EquationsAvailable online

[Plessix_2006_GJI-3] Plessix, R-E. "A review of the adjoint-state method for computing the gradient of a functional with geophysical applications." Geophysical Journal International, 2006, 167(2): 495-503.free access on GJI website

[4] McNamara, Antoine; Treuille, Adrien; Popović, Zoran; Stam, Jos (August 2004)."Fluid control using the adjoint method"(PDF).ACM Transactions on Graphics.23(3): 449–456.doi:10.1145/1015706.1015744.Archived(PDF)from the original on 29 January 2022.Retrieved28 October2022.

[5] Lundvall, Johan (2007)."Data Assimilation in Fluid Dynamics using Adjoint Optimization"(PDF).Sweden:Linköping University of Technology.Archived(PDF)from the original on 9 October 2022.Retrieved28 October2022.

[6] Cea, Jean (1986)."Conception optimale ou identification de formes, calcul rapide de la dérivée directionnelle de la fonction coût".ESAIM: Mathematical Modelling and Numerical Analysis - Modélisation Mathématique et Analyse Numérique(in French).20(3): 371–402.doi:10.1051/m2an/1986200303711.

[1]

[2]

[3]

[4]

[5]

[6]

General case

Example: Linear case

Numerical consideration for the self-adjoint case

See also

References

External links