Optimal control

Simplified case : Linear system, quadratic performance index, fixed horizon and final state

Problem :
Solution using Pontryagin’s minimum principle :
Exercises

Problem :

Let us consider the linear system :

$\dot{\mathbf{x}}(t)=\mathbf{A}\mathbf{x}(t)+\mathbf{B} \mathbf{u}(t)~; \quad \mathbf{x}\in\mathbf{\mbox{R}}^n~; \mathbf{u}\in\mathbf{\mbox{R}}^m\quad (1)$

From a given initial state $\mathbf{x}_0=\mathbf{x}(0)$ , the objective is to bring back the state to 0 within a given time horizon ( $\mathbf{x}(t_f)=0$ ) while minimizing the quadratic performance index :

$J=\frac{1}{2}\int_0^{t_f} (\mathbf{x}^T(t)\mathbf{Q}\mathbf{x}(t)+ \mathbf{u}^T(t)\mathbf{R}\mathbf{u}(t))dt$

where $\mathbf{Q}$ and $\mathbf{R}$ are given weighting matrices with $\mathbf{Q}\ge 0$ and $\mathbf{R}>0$ .

Solution using Pontryagin’s minimum principle :

The Hamiltonian reads :

$\mathcal{H}=\frac{1}{2}(\mathbf{x}^T\mathbf{Q}\mathbf{x}+\mathbf{u}^T\mathbf{R}\mathbf{u})+\mathbf{\Psi}^T(\mathbf{A}\mathbf{x}+\mathbf{B}\mathbf{u})$

where $\mathbf{\Psi}\in\mathbf{\mbox{I\hspace{-.15em}R}}^n$ is the costate vector.

the optimal control minimizes $\mathcal{H}\quad \forall t$ :

$\frac{\partial \mathcal{H}}{\partial \mathbf{u}}_{|\mathbf{u}=\widehat{\mathbf{u}}}=0=\mathbf{R}\widehat{\mathbf{u}}+\mathbf{B}^T\mathbf{\Psi}\quad\Rightarrow\quad\widehat{\mathbf{u}}=-\mathbf{R}^{-1}\mathbf{B}^T\mathbf{\Psi}\quad(2)$

Costate dynamics :

$\dot{\mathbf{\Psi}}=-\frac{\partial \mathcal{H}}{\partial \mathbf{x}} \Rightarrow \dot{\mathbf{\Psi}}=-\mathbf{Q}\mathbf{x}- \mathbf{A}^T\mathbf{\Psi} \quad (3)$

State-costate dynamics : (1), (2) and (3) leads to :

$\left\{\begin{array}{ccccc}\dot{\mathbf{x}} & = & \mathbf{A}\mathbf{x} &-& \mathbf{B}\mathbf{R}^{-1}\mathbf{B}^T\mathbf{\Psi}\\ \dot{\mathbf{\Psi}} & = & -\mathbf{Q} \mathbf{x}&-& \mathbf{A}^T\mathbf{\Psi}\end{array}\right.\Rightarrow \left[\begin{array}{c}\dot{\mathbf{x}} \\ \dot{\mathbf{\Psi}}\end{array}\right]= \mathbf{H}\left[\begin{array}{c}\mathbf{x} \\ \mathbf{\Psi}\end{array}\right] (4)$

with

$\mathbf{H}=\left[\begin{array}{cc} \mathbf{A} & -\mathbf{B}\mathbf{R}^{-1}\mathbf{B}^T\\ -\mathbf{Q} & -\mathbf{A}^T\end{array}\right].$

$\mathbf{H}$ is the $2n\times 2n$ Hamiltonian matrix associated to such a control problem. (4) can be intregrated taken into account boundary conditions on the state-costate augmented vector $[\mathbf{x}^T~;\mathbf{\Psi}^T]^T$ :

initial conditions on $\mathbf{x}$ : $\mathbf{x}(0)=\mathbf{x}_0$ (5),
terminal conditions on $\mathbf{x}$ : $\mathbf{x}(t_f)=0$ (6).

The set of equations (4), (5) and (6) is also called a two point boundary-value problem.

Integration of the two point boundary-value problem :

$\left[\begin{array}{c}\mathbf{x}(t_f)=0 \\ \mathbf{\Psi}(t_f)\end{array}\right]=e^{\mathbf{H}t_f}\left[\begin{array}{c}\mathbf{x}(0)=\mathbf{x}_0 \\ \mathbf{\Psi}(0)\end{array}\right]=\left[\begin{array}{cc} e^{\mathbf{H}t_f}_{11} & e^{\mathbf{H}t_f}_{12} \\ e^{\mathbf{H}t_f}_{21} & e^{\mathbf{H}t_f}_{22}\end{array}\right]\left[\begin{array}{c}\mathbf{x}(0)=\mathbf{x}_0 \\ \mathbf{\Psi}(0)\end{array}\right]$

where $e^{\mathbf{H}t_f}_{ij}$ , are the 4 $n\times n$ submatrices partionning $e^{\mathbf{H}t_f}$ (WARNING !! : $e^{\mathbf{H}t_f}_{ij}\neq e^{\mathbf{H}_{ij}t_f}$ ).

Then one can easily derive the initial value of the costate :

$\mathbf{\Psi}(0)=-\left[e^{\mathbf{H}t_f}_{12}\right]^{-1}\,e^{\mathbf{H}t_f}_{11}\,\mathbf{x}_0=\mathbf{P}(0)\,\mathbf{x}_0.$

where $\mathbf{P}(0)=-\left[e^{\mathbf{H}t_f}_{12}\right]^{-1}\,e^{\mathbf{H}t_f}_{11}$ depends only on the problem data : $\mathbf{A}$ , $\mathbf{B}$ , $\mathbf{Q}$ , $\mathbf{R}$ , and not on $\mathbf{x}_0$ .

Optimal control initial value : from equation (2) :

$\widehat{\mathbf{u}}(0)=-\mathbf{R}^{-1}\mathbf{B}^T\mathbf{P}(0)\,\mathbf{x}_0.$

Closed-loop optimal control at any time : at time $t\in [0, t_f[$ , assuming that the current state $\mathbf{x}(t)$ is known (using a measurement system), the objective is still to bring back the final state to $0$ ( $\mathbf{x}(t_f)=0$ ) but the time horizon is now . The calculus of the current optimal control $\widehat{\mathbf{u}}(t)$ is the same problem than the previous one, just changing $\mathbf{x}_0$ by $\mathbf{x}(t)$ and by . Thus :

$\mathbf{\Psi}(t)=\mathbf{P}(t)\,\mathbf{x}(t)\quad \mbox{with~:}\quad\mathbf{P}(t)=-\left[e^{\mathbf{H}(t_f-t)}_{12}\right]^{-1}\,e^{\mathbf{H}(t_f-t)}_{11},$

$\widehat{\mathbf{u}}(t)=-\mathbf{R}^{-1}\mathbf{B}^T\mathbf{P}(t)\,\mathbf{x}(t)=-\mathbf{K}(t)\,\mathbf{x}(t).$

with :

$\mathbf{K}(t)=\mathbf{R}^{-1}\mathbf{B}^T\mathbf{P}(t)$

the time-varying state feedback to be implemented in closed-loop according to the following Figure :

Remark : $\mathbf{P}(t_f)$ is not defined since $e^{\mathbf{H}0}_{12}=\mathbf{0}_{n\times n}$ and is not invertible.

Optimal state trajectories : The integration of equation (4) between 0 and ( $\forall t \in [0, t_f[$ ) leads to (first row) :

$\mathbf{x}(t)=e^{\mathbf{H}t}_{11}\,\mathbf{x}_0+e^{\mathbf{H}t}_{12}\,\mathbf{\Psi}(0)= \left(e^{\mathbf{H}t}_{11} - e^{\mathbf{H}t}_{12}\left[e^{\mathbf{H}t_f}_{12}\right]^{-1}\,e^{\mathbf{H}t_f}_{11}\right)\,\mathbf{x}_0= \mathbf{\Phi}(t_f,t)\,\mathbf{x}_0.$

where :

$\mathbf{\Phi}(t_f,t)=e^{\mathbf{H}t}_{11} - e^{\mathbf{H}t}_{12}\left[e^{\mathbf{H}t_f}_{12}\right]^{-1}\,e^{\mathbf{H}t_f}_{11}$

is called the transition matrix.

Optimal performance index :

For any $t\in [0, t_f[$ and a current state $\mathbf{x}$ one can define the cost-to-go function (or value-function) $\mathcal{R}(\mathbf{x},t)$ as :

$\mathcal{R}(\mathbf{x},t)=\frac{1}{2}\int_t^{t_f} (\mathbf{x}^T\mathbf{Q}\mathbf{x}+ \mathbf{u}^T\mathbf{R}\mathbf{u})d\tau$

and the optimal cost-to-go function as :

$\widehat{\mathcal{R}}(\mathbf{x},t)=\frac{1}{2}\int_t^{t_f} (\mathbf{x}^T\mathbf{Q}\mathbf{x}+ \widehat{\mathbf{u}}^T\mathbf{R}\widehat{\mathbf{u}})d\tau$

$\widehat{\mathcal{R}}(\mathbf{x},t)=\frac{1}{2}\int_t^{t_f} (\mathbf{x}^T\mathbf{Q}\mathbf{x}+ \mathbf{\Psi}^T\mathbf{B}\mathbf{R}^{-1}\mathbf{B}^T\mathbf{\Psi})d\tau.$

From equation (4) : one can derive that :

$\mathbf{B}\mathbf{R}^{-1}\mathbf{B}^T\mathbf{\Psi}=\mathbf{A}\mathbf{x}-\dot{\mathbf{x}},$

$\mathbf{Q}\mathbf{x}=- \mathbf{A}^T\mathbf{\Psi}-\dot{\mathbf{\Psi}}$

Thus (after simplification) :

$\widehat{\mathcal{R}}(\mathbf{x},t)=\frac{1}{2}\int_t^{t_f} (-\mathbf{x}^T\dot{\mathbf{\Psi}}-\mathbf{\Psi}^T\dot{\mathbf{x}}) d\tau=-\frac{1}{2}\int_t^{t_f}\frac{d\,(\mathbf{x}^T\mathbf{\Psi})}{d\tau}d\tau=0+\frac{1}{2}\mathbf{x}^T(t)\mathbf{\Psi}(t)$

Thus :

$\widehat{\mathcal{R}}(\mathbf{x},t)=\frac{1}{2}\mathbf{x}^T(t)\mathbf{P}(t)\mathbf{x}(t)$

From this last equation, on can find again the definition of the costate $\mathbf{\Psi}$ used to solve the Hamilton–Jacobi–Bellman equation ; i.e. : the gradient of the optimal cost-to-go function w.r.t. $\mathbf{x}$ :

$\mathbf{\Psi}(t)=\frac{\partial \widehat{\mathcal{R}}(\mathbf{x},t)}{\partial \mathbf{x}}$

The optimal performance index is : $\widehat{J}=\widehat{\mathcal{R}}(\mathbf{x}_0,0)=\frac{1}{2}\mathbf{x}^T_0\mathbf{P}(0)\mathbf{x}_0$ .

Exercises

Exo #1 : show that $\mathbf{P}(t)$ is the solution of the matrix Riccati differential equation :

$\dot{\mathbf{P}}=-\mathbf{P}\mathbf{A}-\mathbf{A}^T\mathbf{P}+\mathbf{P}\mathbf{B}\mathbf{R}^{-1}\mathbf{B}^T\mathbf{P}-\mathbf{Q}$

also written as :

$\dot{\mathbf{P}}=\left[-\mathbf{P}\quad\mathbf{I}_n\right]\mathbf{H}\left[\begin{array}{c}\mathbf{I}_n \\ \mathbf{P} \end{array}\right].$

Exo #2 : considering now that $\mathbf{x}(t_f)=\mathbf{x}_f\neq \mathbf{0}$ , compute the time-variant state feedback gain $\mathbf{K}(t)$ and the time-variant feedforward gain $\mathbf{H}(t)$ of the optimal closed-loop control law to be implemented according to the following Figure.

Pontryagin’s minimum principle (case : Linear sytem, Quadratic performance index, finite time horizon)

Optimal control

Contents

Problem :

Solution using Pontryagin’s minimum principle :

Exercises

Département

Groupe(s) de recherche