Logo ISAE3
EN
FR |
Daniel Alazard > Goodies & demos > Pontryagin’s minimum principle (case : Linear sytem, Quadratic performance (...)

Pontryagin’s minimum principle (case : Linear sytem, Quadratic performance index, finite time horizon)

mercredi 16 novembre 2016.

Optimal control :

Linear system, quadratic performance index, fixed horizon and final state

Contents

Problem :

Let us consider the linear system :

\dot{\mathbf{x}}(t)=\mathbf{A}\mathbf{x}(t)+\mathbf{B} \mathbf{u}(t) \quad \mathbf{x}\in\mathbf{\mbox{I\hspace{-.15em}R}}^n;\; \mathbf{u}\in\mathbf{\mbox{I\hspace{-.15em}R}}^m\; (1)

From a given initial state \mathbf{x}_0=\mathbf{x}(0), the objective is to bring back the state to 0 within a given time horizon t_f (\mathbf{x}(t_f)=0) while minimizing the quadratic performance index :

 J=\frac{1}{2}\int_0^{t_f} (\mathbf{x}^T(t)\mathbf{Q}\mathbf{x}(t)+ \mathbf{u}^T(t)\mathbf{R}\mathbf{u}(t))dt

where \mathbf{Q} and \mathbf{R} are given weighting matrices with \mathbf{Q}\ge 0 and \mathbf{R}>0.

Solution using Pontryagin’s minimum principle :

  • The Hamiltonian reads :

 \mathcal{H}=\frac{1}{2}(\mathbf{x}^T\mathbf{Q}\mathbf{x}+\mathbf{u}^T\mathbf{R}\mathbf{u})+\mathbf{\Psi}^T(\mathbf{A}\mathbf{x}+\mathbf{B}\mathbf{u})

where \mathbf{\Psi}\in\mathbf{\mbox{I\hspace{-.15em}R}}^n is the costate vector.

  • the optimal control minimizes \mathcal{H}\; \forall t :

 \frac{\partial \mathcal{H}}{\partial  \mathbf{u}}_{|\mathbf{u}=\widehat{\mathbf{u}}}=0=\mathbf{R}\widehat{\mathbf{u}}+\mathbf{B}^T\mathbf{\Psi}\;\Rightarrow\;\widehat{\mathbf{u}}=-\mathbf{R}^{-1}\mathbf{B}^T\mathbf{\Psi}\;(2)

  • Costate dynamics :

\dot{\mathbf{\Psi}}=-\frac{\partial \mathcal{H}}{\partial \mathbf{x}}\; \Rightarrow\; \dot{\mathbf{\Psi}}=-\mathbf{Q}\mathbf{x}- \mathbf{A}^T\mathbf{\Psi}\; (3)

  • State-costate dynamics : (1), (2) and (3) leads to :

 \left\{\begin{array}{ccccc}\dot{\mathbf{x}} & = & \mathbf{A}\mathbf{x} &-& \mathbf{B}\mathbf{R}^{-1}\mathbf{B}^T\mathbf{\Psi}\\ \dot{\mathbf{\Psi}} & = & -\mathbf{Q} \mathbf{x}&-& \mathbf{A}^T\mathbf{\Psi}\end{array}\right.\Rightarrow \left[\begin{array}{c}\dot{\mathbf{x}} \\ \dot{\mathbf{\Psi}}\end{array}\right]= \mathbf{H}\left[\begin{array}{c}\mathbf{x} \\ \mathbf{\Psi}\end{array}\right] (4)

with

\mathbf{H}=\left[\begin{array}{cc} \mathbf{A} & -\mathbf{B}\mathbf{R}^{-1}\mathbf{B}^T\\ -\mathbf{Q} & -\mathbf{A}^T\end{array}\right]\;.

\mathbf{H} is the 2n\times 2n Hamiltonian matrix associated to such a control problem. (4) can be intregrated taken into account boundary conditions on the state-costate augmented vector [\mathbf{x}^T\;\;\mathbf{\Psi}^T]^T :

  • initial conditions on \mathbf{x} : \mathbf{x}(0)=\mathbf{x}_0 (5),
  • terminal conditions on \mathbf{x} : \mathbf{x}(t_f)=0 (6).

The set of equations (4), (5) and (6) is also called a two point boundary-value problem.

  • Integration of the two point boundary-value problem :

 \left[\begin{array}{c}\mathbf{x}(t_f)=0 \\ \mathbf{\Psi}(t_f)\end{array}\right]=e^{\mathbf{H}t_f}\left[\begin{array}{c}\mathbf{x}(0)=\mathbf{x}_0 \\ \mathbf{\Psi}(0)\end{array}\right]=\left[\begin{array}{cc} e^{\mathbf{H}t_f}_{11} & e^{\mathbf{H}t_f}_{12} \\ e^{\mathbf{H}t_f}_{21}  & e^{\mathbf{H}t_f}_{22}\end{array}\right]\left[\begin{array}{c}\mathbf{x}(0)=\mathbf{x}_0 \\ \mathbf{\Psi}(0)\end{array}\right]

where e^{\mathbf{H}t_f}_{ij}, i,j=1,2 are the 4 n\times n submatrices partionning e^{\mathbf{H}t_f} (WARNING !! : e^{\mathbf{H}t_f}_{ij}\neq e^{\mathbf{H}_{ij}t_f}).

Then one can easily derive the initial value of the costate :

 \mathbf{\Psi}(0)=-\left[e^{\mathbf{H}t_f}_{12}\right]^{-1}\,e^{\mathbf{H}t_f}_{11}\,\mathbf{x}_0=\mathbf{P}(0)\,\mathbf{x}_0\;.

where \mathbf{P}(0)=-\left[e^{\mathbf{H}t_f}_{12}\right]^{-1}\,e^{\mathbf{H}t_f}_{11} depends only on the problem data : \mathbf{A}, \mathbf{B}, \mathbf{Q}, \mathbf{R}, t_f and not on \mathbf{x}_0.

  • Optimal control initial value : from equation (2) :

\widehat{\mathbf{u}}(0)=-\mathbf{R}^{-1}\mathbf{B}^T\mathbf{P}(0)\,\mathbf{x}_0\;.

  • Closed-loop optimal control at any time t : at time t\in [0,\; t_f[, assuming that the current state \mathbf{x}(t) is known (using a measurement system), the objective is still to bring back the final state to 0 (\mathbf{x}(t_f)=0) but the time horizon is now t_f-t. The calculus of the current optimal control \widehat{\mathbf{u}}(t) is the same problem than the previous one, just changing \mathbf{x}_0 by \mathbf{x}(t) and t_f by t_f-t. Thus :

\mathbf{\Psi}(t)=\mathbf{P}(t)\,\mathbf{x}(t)\; \mbox{with:}\;\mathbf{P}(t)=-\left[e^{\mathbf{H}(t_f-t)}_{12}\right]^{-1}\,e^{\mathbf{H}(t_f-t)}_{11}\;,

\widehat{\mathbf{u}}(t)=-\mathbf{R}^{-1}\mathbf{B}^T\mathbf{P}(t)\,\mathbf{x}(t)=-\mathbf{K}(t)\,\mathbf{x}(t)\;.

with :

\mathbf{K}(t)=\mathbf{R}^{-1}\mathbf{B}^T\mathbf{P}(t)

the time-varying state feedback to be implemented in closed-loop according to the following Figure :

Remark : \mathbf{P}(t_f) is not defined since e^{\mathbf{H}0}_{12}=\mathbf{0}_{n\times n} and is not invertible.

  • Optimal state trajectories : The integration of equation (4) between 0 and t (\forall\;t\in [0,\; t_f[) leads to (first n row) :

 \mathbf{x}(t)=e^{\mathbf{H}t}_{11}\,\mathbf{x}_0+e^{\mathbf{H}t}_{12}\,\mathbf{\Psi}(0= \left(e^{\mathbf{H}t}_{11} - e^{\mathbf{H}t}_{12}\left[e^{\mathbf{H}t_f}_{12}\right]^{-1}\,e^{\mathbf{H}t_f}_{11}\right)\,\mathbf{x}_0= \mathbf{\Phi}(t_f,t)\,\mathbf{x}_0\;.

where :

\mathbf{\Phi}(t_f,t)=e^{\mathbf{H}t}_{11} - e^{\mathbf{H}t}_{12}\left[e^{\mathbf{H}t_f}_{12}\right]^{-1}\,e^{\mathbf{H}t_f}_{11}

is called the transition matrix.

  • Optimal performance index :

For any t\in [0,\; t_f[ and a current state \mathbf{x} one can define the cost-to-go function (or value-function) \mathcal{R}(\mathbf{x},t) as :

\mathcal{R}(\mathbf{x},t)=\frac{1}{2}\int_t^{t_f} (\mathbf{x}^T\mathbf{Q}\mathbf{x}+ \mathbf{u}^T\mathbf{R}\mathbf{u})d\tau

and the optimal cost-to-go function as :

\widehat{\mathcal{R}}(\mathbf{x},t)=\frac{1}{2}\int_t^{t_f} (\mathbf{x}^T\mathbf{Q}\mathbf{x}+ \widehat{\mathbf{u}}^T\mathbf{R}\widehat{\mathbf{u}})d\tau

\widehat{\mathcal{R}}(\mathbf{x},t)=\frac{1}{2}\int_t^{t_f} (\mathbf{x}^T\mathbf{Q}\mathbf{x}+ \mathbf{\Psi}^T\mathbf{B}\mathbf{R}^{-1}\mathbf{B}^T\mathbf{\Psi})d\tau\;.

From equation (4) : one can derive that :

 \mathbf{B}\mathbf{R}^{-1}\mathbf{B}^T\mathbf{\Psi}=\mathbf{A}\mathbf{x}-\dot{\mathbf{x}}\;,

\mathbf{Q}\mathbf{x}=- \mathbf{A}^T\mathbf{\Psi}-\dot{\mathbf{\Psi}}

Thus (after simplification) :

\widehat{\mathcal{R}}(\mathbf{x},t)=\frac{1}{2}\int_t^{t_f} (-\mathbf{x}^T\dot{\mathbf{\Psi}}-\mathbf{\Psi}^T\dot{\mathbf{x}}) d\tau=-\frac{1}{2}\int_t^{t_f}\frac{d\,(\mathbf{x}^T\mathbf{\Psi})}{d\tau}d\tau=0+\frac{1}{2}\mathbf{x}^T(t)\mathbf{\Psi}(t)

Thus :

\widehat{\mathcal{R}}(\mathbf{x},t)=\frac{1}{2}\mathbf{x}^T(t)\mathbf{P}(t)\mathbf{x}(t)

From this last equation, on can find again the definition of the costate \mathbf{\Psi} used to solve the Hamilton–Jacobi–Bellman equation ; i.e. : the gradient of the optimal cost-to-go function w.r.t. \mathbf{x} :

 \mathbf{\Psi}(t)=\frac{\partial \widehat{\mathcal{R}}(\mathbf{x},t)}{\partial \mathbf{x}}

The optimal performance index is : \widehat{J}=\widehat{\mathcal{R}}(\mathbf{x}_0,0)=\frac{1}{2}\mathbf{x}^T_0\mathbf{P}(0)\mathbf{x}_0.

Exercises

  • Exo #1 : show that \mathbf{P}(t) is the solution of the matrix Riccati differential equation :

 \dot{\mathbf{P}}=-\mathbf{P}\mathbf{A}-\mathbf{A}^T\mathbf{P}+\mathbf{P}\mathbf{B}\mathbf{R}^{-1}\mathbf{B}^T\mathbf{P}-\mathbf{Q}

also written as :

 \dot{\mathbf{P}}=\left[-\mathbf{P}\;\;\mathbf{I}_n\right]\mathbf{H}\left[\begin{array}{c}\mathbf{I}_n \\ \mathbf{P} \end{array}\right]\;.

  • Exo #2 : considering now that \mathbf{x}(t_f)=\mathbf{x}_f\neq \mathbf{0}, compute the time-variant state feedback gain \mathbf{K}(t) and the time-variant feedforward gain \mathbf{H}(t) of the optimal closed-loop control law to be implemented according to the following Figure.
SPIP | | Suivre la vie du site RSS 2.1