Les pages professionnellesdes enseignants chercheurs

Daniel Alazard

Pontryagin’s minimum principle (case : Linear sytem, Quadratic performance index, finite time horizon)

Mis à jour le

Optimal control

Simplified case : Linear system, quadratic performance index, fixed horizon and final state


Problem :

Let us consider the linear system :

\dot{\mathbf{x}}(t)=\mathbf{A}\mathbf{x}(t)+\mathbf{B} \mathbf{u}(t)~; \quad \mathbf{x}\in\mathbf{\mbox{R}}^n~; \mathbf{u}\in\mathbf{\mbox{R}}^m\quad (1)

From a given initial state \mathbf{x}_0=\mathbf{x}(0), the objective is to bring back the state to 0 within a given time horizon t_f (\mathbf{x}(t_f)=0) while minimizing the quadratic performance index :

 J=\frac{1}{2}\int_0^{t_f} (\mathbf{x}^T(t)\mathbf{Q}\mathbf{x}(t)+ \mathbf{u}^T(t)\mathbf{R}\mathbf{u}(t))dt

where \mathbf{Q} and \mathbf{R} are given weighting matrices with \mathbf{Q}\ge 0 and \mathbf{R}>0.

Solution using Pontryagin’s minimum principle :

  • The Hamiltonian reads :


where \mathbf{\Psi}\in\mathbf{\mbox{I\hspace{-.15em}R}}^n is the costate vector.

  • the optimal control minimizes \mathcal{H}\quad \forall t :

 \frac{\partial \mathcal{H}}{\partial  \mathbf{u}}_{|\mathbf{u}=\widehat{\mathbf{u}}}=0=\mathbf{R}\widehat{\mathbf{u}}+\mathbf{B}^T\mathbf{\Psi}\quad\Rightarrow\quad\widehat{\mathbf{u}}=-\mathbf{R}^{-1}\mathbf{B}^T\mathbf{\Psi}\quad(2)

  • Costate dynamics :

\dot{\mathbf{\Psi}}=-\frac{\partial \mathcal{H}}{\partial \mathbf{x}} \Rightarrow \dot{\mathbf{\Psi}}=-\mathbf{Q}\mathbf{x}- \mathbf{A}^T\mathbf{\Psi} \quad (3)

  • State-costate dynamics : (1), (2) and (3) leads to :

 \left\{\begin{array}{ccccc}\dot{\mathbf{x}} & = & \mathbf{A}\mathbf{x} &-& \mathbf{B}\mathbf{R}^{-1}\mathbf{B}^T\mathbf{\Psi}\\ \dot{\mathbf{\Psi}} & = & -\mathbf{Q} \mathbf{x}&-& \mathbf{A}^T\mathbf{\Psi}\end{array}\right.\Rightarrow \left[\begin{array}{c}\dot{\mathbf{x}} \\ \dot{\mathbf{\Psi}}\end{array}\right]= \mathbf{H}\left[\begin{array}{c}\mathbf{x} \\ \mathbf{\Psi}\end{array}\right] (4)


\mathbf{H}=\left[\begin{array}{cc} \mathbf{A} & -\mathbf{B}\mathbf{R}^{-1}\mathbf{B}^T\\ -\mathbf{Q} & -\mathbf{A}^T\end{array}\right].

\mathbf{H} is the 2n\times 2n Hamiltonian matrix associated to such a control problem. (4) can be intregrated taken into account boundary conditions on the state-costate augmented vector [\mathbf{x}^T~;\mathbf{\Psi}^T]^T :

  • initial conditions on \mathbf{x} : \mathbf{x}(0)=\mathbf{x}_0 (5),
  • terminal conditions on \mathbf{x} : \mathbf{x}(t_f)=0 (6).

The set of equations (4), (5) and (6) is also called a two point boundary-value problem.

  • Integration of the two point boundary-value problem :

 \left[\begin{array}{c}\mathbf{x}(t_f)=0 \\ \mathbf{\Psi}(t_f)\end{array}\right]=e^{\mathbf{H}t_f}\left[\begin{array}{c}\mathbf{x}(0)=\mathbf{x}_0 \\ \mathbf{\Psi}(0)\end{array}\right]=\left[\begin{array}{cc} e^{\mathbf{H}t_f}_{11} & e^{\mathbf{H}t_f}_{12} \\ e^{\mathbf{H}t_f}_{21}  & e^{\mathbf{H}t_f}_{22}\end{array}\right]\left[\begin{array}{c}\mathbf{x}(0)=\mathbf{x}_0 \\ \mathbf{\Psi}(0)\end{array}\right]

where e^{\mathbf{H}t_f}_{ij}, i,j=1,2 are the 4 n\times n submatrices partionning e^{\mathbf{H}t_f} (WARNING !! : e^{\mathbf{H}t_f}_{ij}\neq e^{\mathbf{H}_{ij}t_f}).

Then one can easily derive the initial value of the costate :


where \mathbf{P}(0)=-\left[e^{\mathbf{H}t_f}_{12}\right]^{-1}\,e^{\mathbf{H}t_f}_{11} depends only on the problem data : \mathbf{A}, \mathbf{B}, \mathbf{Q}, \mathbf{R}, t_f and not on \mathbf{x}_0.

  • Optimal control initial value : from equation (2) :


  • Closed-loop optimal control at any time t : at time t\in [0, t_f[, assuming that the current state \mathbf{x}(t) is known (using a measurement system), the objective is still to bring back the final state to 0 (\mathbf{x}(t_f)=0) but the time horizon is now t_f-t. The calculus of the current optimal control \widehat{\mathbf{u}}(t) is the same problem than the previous one, just changing \mathbf{x}_0 by \mathbf{x}(t) and t_f by t_f-t. Thus :

\mathbf{\Psi}(t)=\mathbf{P}(t)\,\mathbf{x}(t)\quad \mbox{with~:}\quad\mathbf{P}(t)=-\left[e^{\mathbf{H}(t_f-t)}_{12}\right]^{-1}\,e^{\mathbf{H}(t_f-t)}_{11},


with :


the time-varying state feedback to be implemented in closed-loop according to the following Figure :

Remark : \mathbf{P}(t_f) is not defined since e^{\mathbf{H}0}_{12}=\mathbf{0}_{n\times n} and is not invertible.

  • Optimal state trajectories : The integration of equation (4) between 0 and t (\forall  t \in [0, t_f[) leads to (first n row) :

 \mathbf{x}(t)=e^{\mathbf{H}t}_{11}\,\mathbf{x}_0+e^{\mathbf{H}t}_{12}\,\mathbf{\Psi}(0)= \left(e^{\mathbf{H}t}_{11} - e^{\mathbf{H}t}_{12}\left[e^{\mathbf{H}t_f}_{12}\right]^{-1}\,e^{\mathbf{H}t_f}_{11}\right)\,\mathbf{x}_0= \mathbf{\Phi}(t_f,t)\,\mathbf{x}_0.

where :

\mathbf{\Phi}(t_f,t)=e^{\mathbf{H}t}_{11} - e^{\mathbf{H}t}_{12}\left[e^{\mathbf{H}t_f}_{12}\right]^{-1}\,e^{\mathbf{H}t_f}_{11}

is called the transition matrix.

  • Optimal performance index :

For any t\in [0, t_f[ and a current state \mathbf{x} one can define the cost-to-go function (or value-function) \mathcal{R}(\mathbf{x},t) as :

\mathcal{R}(\mathbf{x},t)=\frac{1}{2}\int_t^{t_f} (\mathbf{x}^T\mathbf{Q}\mathbf{x}+ \mathbf{u}^T\mathbf{R}\mathbf{u})d\tau

and the optimal cost-to-go function as :

\widehat{\mathcal{R}}(\mathbf{x},t)=\frac{1}{2}\int_t^{t_f} (\mathbf{x}^T\mathbf{Q}\mathbf{x}+ \widehat{\mathbf{u}}^T\mathbf{R}\widehat{\mathbf{u}})d\tau

\widehat{\mathcal{R}}(\mathbf{x},t)=\frac{1}{2}\int_t^{t_f} (\mathbf{x}^T\mathbf{Q}\mathbf{x}+ \mathbf{\Psi}^T\mathbf{B}\mathbf{R}^{-1}\mathbf{B}^T\mathbf{\Psi})d\tau.

From equation (4) : one can derive that :


\mathbf{Q}\mathbf{x}=- \mathbf{A}^T\mathbf{\Psi}-\dot{\mathbf{\Psi}}

Thus (after simplification) :

\widehat{\mathcal{R}}(\mathbf{x},t)=\frac{1}{2}\int_t^{t_f} (-\mathbf{x}^T\dot{\mathbf{\Psi}}-\mathbf{\Psi}^T\dot{\mathbf{x}}) d\tau=-\frac{1}{2}\int_t^{t_f}\frac{d\,(\mathbf{x}^T\mathbf{\Psi})}{d\tau}d\tau=0+\frac{1}{2}\mathbf{x}^T(t)\mathbf{\Psi}(t)

Thus :


From this last equation, on can find again the definition of the costate \mathbf{\Psi} used to solve the Hamilton–Jacobi–Bellman equation ; i.e. : the gradient of the optimal cost-to-go function w.r.t. \mathbf{x} :

 \mathbf{\Psi}(t)=\frac{\partial \widehat{\mathcal{R}}(\mathbf{x},t)}{\partial \mathbf{x}}

The optimal performance index is : \widehat{J}=\widehat{\mathcal{R}}(\mathbf{x}_0,0)=\frac{1}{2}\mathbf{x}^T_0\mathbf{P}(0)\mathbf{x}_0.


  • Exo #1 : show that \mathbf{P}(t) is the solution of the matrix Riccati differential equation :


also written as :

 \dot{\mathbf{P}}=\left[-\mathbf{P}\quad\mathbf{I}_n\right]\mathbf{H}\left[\begin{array}{c}\mathbf{I}_n \\ \mathbf{P} \end{array}\right].

  • Exo #2 : considering now that \mathbf{x}(t_f)=\mathbf{x}_f\neq \mathbf{0}, compute the time-variant state feedback gain \mathbf{K}(t) and the time-variant feedforward gain \mathbf{H}(t) of the optimal closed-loop control law to be implemented according to the following Figure.

En poursuivant votre navigation sur ce site, vous acceptez l'utilisation de cookies pour vous proposer des contenus et services adaptés OK
Pour accéder à toutes les fonctionnalités de ce site, vous devez activer JavaScript. Voici les instructions pour activer JavaScript dans votre navigateur Web.