Robust-Adaptive Control
of Linear Systems:

beyond Quadratic Costs


Edouard Leurent,
Denis Efimov,
Odalric-Ambrym Maillard

Motivation

Most RL algorithms rely on trial and error

  • Random exploration
  • Optimism in the face of uncertainty


None are suitable for a safety-critical application

  • Pessimism in the face of uncertainty
  • Learn from observations to improve performance

Setting

Linear dynamics with structured uncertainty

$ \dot{{x}}(t) = \color{orange}{A(\theta)}x(t) + Bu(t) + \omega(t), $ $ \quad\text{ where }\color{orange}{A(\theta)} = A + \sum_{i=1}^d\color{orange}{\theta_i}\phi_i, $

$A,\phi$ are known, and the disturbance $\omega(t)$ is bounded.

Robust control framework

  1. Build a confidence region for the dynamics \[\mathbb{P}\left[\color{orange}{\theta}\in\color{crimson}{\mathcal{C}_{N,\delta}}\right]\geq 1-\delta\]
  2. Plan robustly against the worst case outcome $\color{crimson}{V^r}$ \[\color{limegreen}{\sup_{u}} \underbrace{\color{crimson}{\inf_{\substack{\theta \in \mathcal{C}_{N,\delta}\\ \omega\in[\underline\omega,\overline\omega]}}} \expectedvalue \left[\sum_{n=0}^\infty \gamma^n R(x(t_n)) \middle| \color{limegreen}{u}, \color{crimson}{{\theta}}, \color{crimson}{\omega}\right]}_{\color{crimson}{V^r(u)}}\]

Related work

  • Robust Dynamic Programming โฎ• finite $\mathcal{S}$
  • Quadratic costs (LQ) โฎ• stabilization only


We only require the rewards $R$ to be bounded.

Algorithm

1. Model Estimation

Confidence ellipsoid $\color{crimson}{\cC_{N,\delta}}$ from
(Abbasi-Yadkori et al., 2011) \[\small \mathbb{P}\left[\|\color{orange}{\theta}-\color{crimson}{\theta_{N}}\|_{\color{crimson}{G_{N}}} \leq \color{crimson}{\beta_N(\delta)}\right]\geq 1-\delta\]

2. Interval Prediction

Propagate uncertainty $\color{crimson}{\theta\in\cC_{N,\delta}}$ through time and bound the reachable states \[\color{lightskyblue}{\underline x(t)} \leq x(t) \leq \color{lightskyblue}{\overline x(t)}\]

(Leurent et al., 2019) \[ \scriptsize \begin{aligned} \color{lightskyblue}{\dot{\underline{x}}(t)} & = \color{crimson}{A(\theta_N)}\color{lightskyblue}{\underline{x}(t)}-\color{crimson}{\Delta A_{+}}\underline{x}^{-}(t)-\color{crimson}{\Delta A_{-}}\overline{x}^{+}(t) +Bu(t)+D^{+}\underline{\omega}(t)-D^{-}\overline{\omega}(t),\\ \color{lightskyblue}{\dot{\overline{x}}(t)} & = \color{crimson}{A(\theta_N)}\color{lightskyblue}{\overline{x}(t)}+\color{crimson}{\Delta A_{+}}\overline{x}^{+}(t)+\color{crimson}{\Delta A_{-}}\underline{x}^{-}(t) +Bu(t)+D^{+}\overline{\omega}(t)-D^{-}\underline{\omega}(t), \\ \end{aligned} \]

2. Interval Prediction (stability)

Naive solution
Our predictor

3. Pessimistic planning

Use the predicted intervals in a pessimistic surrogate objective \[\small \color{orange}{\hat{V}^r(u)} = \sum_{n=N+1}^\infty \gamma^n \color{orange}{\min_{\color{lightskyblue}{\underline{x}(t_n)}\leq x \leq\color{lightskyblue}{\overline{x}(t_n)}} R(x)} \]

Results

Theorem (Lower bound)

$ \color{orange}{\underbrace{\hat{V}^r(u)}_{\substack{\text{surrogate}\\\text{value}}}} \leq \color{crimson}{\underbrace{{V}^r(u)}_{\substack{\text{robust}\\\text{value}}}}$ $\leq \color{limegreen}{\underbrace{{V}(u)}_{\substack{\text{true}\\\text{performance}}}}$

Bounded suboptimality

Theorem Under two conditions:

  1. Lipschitz reward $R$;
  2. Stability condition: there exist $P>0,\,Q_0,\,\rho,\,N_0$ such that \[\forall \color{orange}{N}>N_0,\quad\begin{bmatrix} \color{orange}{A({\theta}_{N})}^\top P + P \color{orange}{A({\theta}_{N})} + Q_0 & P|D| \\ |D|^\top P & -\rho I_r \\ \end{bmatrix}< 0;\]

with probability $1-\delta$, \[ \underbrace{V(a_\star) - V(a_K)}_{\substack{\text{suboptimality}}} \leq \color{crimson}{\underbrace{\Delta_\omega}_{\substack{\text{robustness to}\\ \text{disturbances}}}} + \color{lightskyblue}{\underbrace{\mathcal{O}\left(\frac{\beta_N(\delta)^2}{\lambda_{\min}(G_{N,\lambda})}\right)}_{\text{estimation error}}} + \color{limegreen}{\underbrace{\mathcal{O}\left(K^{-\frac{\log 1/\gamma}{\log \kappa}}\right)}_{\text{planning error}}} \]

Asymptotic Near-optimality

Corollary Under an additional persistent excitation (PE) assumption: \[\exists \underline{\phi},\overline{\phi}>0: \forall n\geq n_0,\quad \underline{\phi}^2 \leq \lambda_{\min}(\Phi_{n}^\top\Sigma_{p}^{-1}\Phi_{n}) \leq \overline{\phi}^2,\] the stability condition 2. can be relaxed to: $$\begin{bmatrix} \color{orange}{A(\theta)}^\top P + P \color{orange}{A(\theta)} + Q_0 & P|D| \\ |D|^\top P & -\rho I_r \\ \end{bmatrix}< 0;$$ and the bound takes the more explicit form

${V(a_\star)} - {V(a_K)} \leq \color{crimson}{\Delta_\omega} +$ $\color{lightskyblue}{{\mathcal{O}\left(\frac{\log\left(N^{d/2}/\delta\right)}{N}\right)}}$$ + $ $\color{limegreen}{{\mathcal{O}\left(K^{-\frac{\log 1/\gamma}{\log \kappa}}\right)}}$

Experiments

Obstacle avoidance

Estimated dynamics $\color{orange}{A({\theta}_{N})}$

Obstacle avoidance

Robust planning with $\color{crimson}{\cC_{N,\delta}}$

Obstacle avoidance (2)

Estimated dynamics $\color{orange}{A({\theta}_{N})}$

Obstacle avoidance (2)

Robust planning with $\color{crimson}{\cC_{N,\delta}}$

Results

Performance failures min avg $\pm$ std
Oracle $0\%$ $11.6$ $14.2 \pm 1.3$
Nominal $4\%$ $2.8$ $13.8 \pm 2.0$
DQN (trained) $6\%$ $1.7$ $12.3 \pm 2.5$
Robust $0\%$ $10.4$ $13.0 \pm 1.5$

Empirical suboptimality

Multi-model extension

What if our modelling assumption
$\color{orange}{A(\theta)} = \color{lightskyblue}{A} + \sum_{i=1}^d\color{orange}{\theta_i}\color{lightskyblue}{\phi_i}$ is wrong?

  • The adequacy of a model $\color{lightskyblue}{(A,\phi)}$ can be evaluated
  • Use multiple models $\color{lightskyblue}{(A^m,\phi^m)}$,
    through a robust selection mechanism

Driving

github.com/eleurent/highway-env/

Driving (2)

github.com/eleurent/highway-env/

Driving (3)

github.com/eleurent/highway-env/

Results

Performance failures min avg $\pm$ std
Oracle $0\%$ $6.9$ $7.4 \pm 0.5$
Nominal 1 $4\%$ $5.2$ $7.3 \pm 1.5$
Nominal 2 $33\%$ $3.5$ $6.4 \pm 0.3$
DQN (trained) $3\%$ $5.4$ $6.3 \pm 0.6$
Robust $0\%$ $6.8$ $7.1 \pm 0.3$

Thank You!


Code available at
eleurent.github.io/robust-beyond-quadratic

I am looking for a postdoctoral position.