Skip to content

kosuke55/control-playground

Repository files navigation

control-playground

Comparison of Control Algorithms on a Double Inverted Pendulum

NonlinearMPCCasADi_.-180.0_0.0_dt-0.02-0.02_dead-0.0_f-0.0-0.0_n-False_e-False_q-False-72.mp4

This repository implements a double inverted pendulum simulator and compares the following control algorithms:

  • PID
  • Pole Placement
  • LQR
  • iLQR
  • Linear MPC
  • Nonlinear MPC

Setup

Install rye according to the following.

https://rye.astral.sh/guide/installation/

curl -sSf https://rye.astral.sh/get | bash

Run

Set up a virtual environment with rye.

source .rye/env
rye sync
source .venv/bin/activate
python control_double_inverted_pendulum.py

The default controller_model is LQR. You can change the controller_model by modifying the following line.

controller_mode = "LQR"

Also, if you set the following options to True, the selected controller will run the simulation only once from the given initial angle state.

run_once = True

If you set this to False, you can run control repeatedly from various initial values to find the ROA (Region of Attraction).

ROA

Troubleshooting

If matplotlib does not display properly in the virtual environment, run the following command.

sudo apt-get install python3-tk

Detailed Article

The goal is not only to observe control response graphs, but to experience the full workflow of control engineering β€” from physical system modeling to controller implementation β€” and develop practical intuition for each control method.

The double inverted pendulum with single-axis torque input was chosen as the target system because it is an unstable system with considerable control difficulty, making the performance differences between controllers more visible.

  • Unstable system: A system that naturally departs from its equilibrium point without control input (e.g., an inverted pendulum falls over). Continuous and appropriate control input is required to maintain a specific state (e.g., standing upright).
  • Stable system: A system where the error does not grow when control input is removed (e.g., a vehicle stops in place during path tracking). The system state remains within a bounded range.

Table of Contents

Physical System Modeling

Modeling a physical system using the Lagrangian equations of motion from analytical mechanics is one of the major approaches. The equations of motion can be derived from kinetic energy and potential energy.

The desired form is the state-space representation where higher-order derivatives are expressed in terms of lower-order ones:

$$ \dot{x} = f(x, u) $$

For a single-axis torque-controlled double inverted pendulum, the state vector is $x = [\theta_1, \theta_2, \dot{\theta}_1, \dot{\theta}_2]$ and the control input is $u = \tau$.

This form allows us to compute the infinitesimal change when input $u$ is applied at the current state $x$, making it possible to calculate the next state.

The simulator always requires a model to reproduce the system's behavior when a control input is applied. On the controller side, whether a model is needed depends on the method:

  • PID control does not require a model β€” it performs feedback control based on the error between observed and target values.
  • MPC requires a model to predict future behavior and compute optimal control inputs.
  • LQR requires a linear state-space model of the system.

$$ \dot{x}(t) = A x(t) + B u(t) $$

In this project, the simulator's model is assumed to be known, and the controllers use the same (or nearly identical) model. In real-world control, if the physical model is unknown, system identification would be needed to estimate it from input/output measurements.

The following articles were helpful references for modeling. They derive equations of motion for wheeled and cart-type inverted pendulums using SymPy:

For linearization using SymPy (needed for linear controllers like LQR):

Simple Double Pendulum

To build intuition, let's start with a simplified model where the links have no mass β€” only the joints have mass. No control input is considered. The following PDF was a useful reference: http://www.phys.lsu.edu/faculty/gonzalez/Teaching/Phys7221/DoublePendulum.pdf

Simple Double Pendulum Model

The equations of motion derived from the Lagrangian are:

$$ (m_1 + m_2)l_1\ddot{\theta}_1 + m_2l_2\ddot{\theta}_2\cos(\theta_2 - \theta_1) = m_2l_2\dot{\theta}_2^2\sin(\theta_2 - \theta_1) - (m_1 + m_2)g\sin\theta_1 $$

$$ l_2\ddot{\theta}_2 + l_1\ddot{\theta}_1\cos(\theta_2 - \theta_1) = -l_1\dot{\theta}_1^2\sin(\theta_2 - \theta_1) - g\sin\theta_2 $$

Solving these simultaneous equations for $\ddot{\theta}_1$ and $\ddot{\theta}_2$ using SymPy yields surprisingly complex expressions even for this simple model.

For the inverted pendulum, we want the upright equilibrium as the target state, so we define $\theta_1 = 0, \theta_2 = 0$ as the upright position with the y-axis positive upward. Adding control input and re-deriving the Lagrangian equations:

import sympy as sp
from sympy.physics.mechanics import dynamicsymbols

t = sp.symbols('t')
theta1 = dynamicsymbols('theta1')
theta2 = dynamicsymbols('theta2')
l1, l2, m1, m2, g = sp.symbols('L1 L2 m1 m2 g')

x1 = l1 * sp.sin(theta1)
y1 = l1 * sp.cos(theta1)
x2 = l1 * sp.sin(theta1) + l2 * sp.sin(theta2)
y2 = l1 * sp.cos(theta1) + l2 * sp.cos(theta2)

T1 = m1 * (sp.diff(x1, t)**2 + sp.diff(y1, t)**2) / 2
T2 = m2 * (sp.diff(x2, t)**2 + sp.diff(y2, t)**2) / 2
T = T1 + T2

V1 = m1 * g * y1
V2 = m2 * g * y2
V = V1 + V2

L = T - V

lagrange_eq1 = sp.diff(sp.diff(L, sp.diff(theta1, t)), t) - sp.diff(L, theta1)
tau = sp.symbols('tau')
lagrange_eq1 = lagrange_eq1 - tau
lagrange_eq2 = sp.diff(sp.diff(L, sp.diff(theta2, t)), t) - sp.diff(L, theta2)

solutions = sp.solve([lagrange_eq1, lagrange_eq2], [theta1.diff(t, 2), theta2.diff(t, 2)])

This gives us the state-space form:

$$\begin{bmatrix} \dot{\theta}_1 \\\ \dot{\theta}_2 \\\ \ddot{\theta}_1 \\\ \ddot{\theta}_2 \end{bmatrix} = \begin{bmatrix} \dot{\theta}_1 \\\ \dot{\theta}_2 \\\ f_1(\theta_1, \theta_2, \dot{\theta}_1, \dot{\theta}_2) \\\ f_2(\theta_1, \theta_2, \dot{\theta}_1, \dot{\theta}_2) \end{bmatrix}$$

Linearizing around $\theta_1 = 0, \theta_2 = 0$ yields the state-space matrices $A$ and $B$, which can be used for linear controllers like LQR.

Double Pendulum with Link Mass and Friction

A more realistic model considers link mass, moments of inertia, viscous friction, and kinetic friction. Based on the equations from Learning Control Engineering with Inverted Pendulums (Equations 3.54, 3.55), with added kinetic friction:

# Simultaneous equations with friction
eq1 = Eq(
    alpha1 * theta1_ddot + alpha3 * cos(theta12) * theta2_ddot,
    -alpha3 * theta2_dot**2 * sin(theta12)
    + alpha4 * sin(theta1)
    - (mu1 + mu2) * theta1_dot + mu2 * theta2_dot
    + tau1 - sign(theta1_dot) * f1
)

eq2 = Eq(
    alpha3 * cos(theta12) * theta1_ddot + alpha2 * theta2_ddot,
    alpha3 * theta1_dot**2 * sin(theta12)
    + alpha5 * sin(theta2)
    + mu2 * theta1_dot - mu2 * theta2_dot
    - sign(theta2_dot - theta1_dot) * f2
)

Where:

  • $\alpha_1, \alpha_2, \alpha_3, \alpha_4, \alpha_5$ are common constants derived from mass, link length, center of gravity position, moment of inertia, and gravity
  • $\mu_1, \mu_2$ are viscous friction coefficients
  • $f_1, f_2$ are kinetic friction coefficients

The kinetic friction introduces discontinuous sign() terms, which makes differentiation-based linearization fail. To linearize, we remove the discontinuous terms first and then compute the state-space matrices around $\theta_1 = 0, \theta_2 = 0$.

The resulting DoubleInvertedPendulumDynamics class contains both nonlinear and linearized dynamics. The simulator uses Runge-Kutta numerical integration to update the state at each time step.

Additional simulator features:

  • Observation noise
  • Observation quantization
  • Controller/simulation period mismatch
  • Control input delay

Controller Implementation and Evaluation

The controllers attempt to stabilize the double inverted pendulum from various initial states to the upright target state ($\theta_1 = 0, \theta_2 = 0$).

The initial states $\theta_1, \theta_2$ are varied from 0 to 90 degrees at regular intervals. The resulting success/failure grid represents the ROA (Region of Attraction) β€” an indicator of control performance.

ROA Example (LQR)

Success criteria: The simulation runs for 10 seconds. If $|\theta_1| + |\theta_2| < 5Β°$ is maintained for at least 1 second, the trial is considered successful.

Note: The following implementations are simplified and not fully optimized. They are intended to provide a rough comparison of each method's performance. There is significant room for improvement in both logic and computation speed.

PID

PID control (Proportional-Integral-Derivative Control) is said to be used in about 90% of control systems in the world. For example, Autoware (an autonomous driving OSS) uses PID control for speed control: https://autowarefoundation.github.io/autoware.universe/main/control/autoware_pid_longitudinal_controller/

PID generates control output based on the current error (difference between desired and actual values):

$$ u(t) = K_p \cdot e(t) + K_i \cdot \int_0^t e(\tau) , d\tau + K_d \cdot \frac{d}{dt}e(t) $$

  • Proportional (P): Output proportional to the error. Gain $K_p$ determines the response strength.
  • Integral (I): Considers the accumulated past error. Eliminates steady-state error. Gain $K_i$.
  • Derivative (D): Considers the rate of change of error. Prevents overshoot and smooths the response. Gain $K_d$.
class PIDController(Controller):
    def __init__(self, Kp, Ki, Kd, dt):
        super().__init__(dt)
        self.Kp = Kp
        self.Ki = Ki
        self.Kd = Kd
        self.integral = 0.0
        self.previous_error = 0.0

    def control(self, state, desired_state, t):
        error = desired_state - state
        self.integral += error * self.dt
        derivative = (error - self.previous_error) / self.dt
        self.u = np.array([
            np.dot(self.Kp, error)
            + np.dot(self.Ki, self.integral)
            + np.dot(self.Kd, derivative)
        ])
        self.previous_error = error

Result: Despite extensive gain tuning, simple PID could not even maintain $\theta_1 = 1Β°, \theta_2 = 0Β°$, let alone stabilize from a tilted state.

PID Failure

This system is SIMO (Single Input Multiple Output) β€” one torque input controls multiple outputs ($\theta_1, \dot{\theta}_1, \theta_2, \dot{\theta}_2$). PID is primarily designed for SISO (Single Input Single Output) systems and is not suitable for this kind of complex system without a more sophisticated multi-loop design.

ROA: Only succeeds from $\theta_1 = 0, \theta_2 = 0$ (already upright with no disturbance β€” effectively 0% success).

ROA PID

Pole Placement

Pole placement designs a state feedback gain $K$ to place the closed-loop system's eigenvalues (poles) at desired locations. The control input is $u = -K(x - x_{des})$.

The gain $K$ is computed such that the eigenvalues of $(A - BK)$ are at the desired positions. When all poles have negative real parts, the system is stable. The following article provides a clear explanation of pole positions and response characteristics: https://controlabo.com/pole/

Pole Placement in Complex Plane

class PolePlacementController(Controller):
    def __init__(self, A, B, desired_poles, dt):
        super().__init__(dt)
        self.K = control.place(A, B, desired_poles)

    def control(self, state, desired_state, t):
        self.u = -self.K @ (state - desired_state)
        return self.u

The control.place() function from python-control (wrapping scipy.signal.place_poles()) computes the gain in one line: https://python-control.readthedocs.io/en/0.10.0/generated/control.place.html

Choosing appropriate pole locations is the key challenge. Generally, avoiding repeated poles improves robustness. The real part of poles is typically set as a multiple of the system's natural frequency.

Result: With poles at [-1, -2, -3, -4], the controller successfully stabilizes from $\theta_1 = 110Β°, \theta_2 = 0Β°$ with an acrobatic swing-up:

Pole Placement Success

Changing poles to half ([-0.5, -1, -1.5, -2]) or double causes failure:

Pole Placement Failure

ROA: Succeeds from large angles. Interestingly, control fails around $\theta_1 = 80Β°$ but succeeds again around $110Β°$ β€” a non-contiguous ROA.

ROA Pole Placement

LQR

LQR (Linear Quadratic Regulator) is an optimal control method for linear systems. For a detailed explanation, see: https://qiita.com/taka_horibe/items/5d053cdaaf4c886d27b5

It minimizes a quadratic cost function:

$$ J = \int_{0}^{\infty} (x(t)^T Q x(t) + u(t)^T R u(t)) , dt $$

  • $Q$: Positive semi-definite weight matrix for state deviation cost
  • $R$: Positive definite weight matrix for control input cost

The optimal control input is a linear state feedback: $u(t) = -Kx(t)$, where $K$ is computed by solving the continuous algebraic Riccati equation.

class LQRController(Controller):
    def __init__(self, A, B, Q, R, dt):
        super().__init__(dt)
        X = solve_continuous_are(A, B, Q, R)
        self.K = inv(R) @ B.T @ X

    def control(self, state, desired_state, t):
        self.u = -self.K @ (state - desired_state)
        return self.u

Result: Successfully stabilizes from $\theta_1 = 40Β°, \theta_2 = 0Β°$. Very simple implementation with good performance and fast computation.

LQR Success

ROA: Smaller than pole placement, since LQR uses a model linearized at 0Β°, which becomes inaccurate far from the equilibrium.

ROA LQR

iLQR

iLQR (Iterative Linear Quadratic Regulator) extends LQR to nonlinear systems by iteratively performing linear approximations. It uses the nonlinear model directly and finds optimal control inputs through forward and backward passes. The following articles provide detailed explanations:

The cost function is:

$$ J = \sum_{t=0}^{T-1} \left( x(t)^T Q x(t) + u(t)^T R u(t) \right) + x(T)^T Q_f x(T) $$

The algorithm proceeds as follows:

  1. Initial trajectory and input setup: Start from initial state $x_0$ with an initial input sequence ${u_0, u_1, ..., u_{N-1}}$.
  2. Forward pass (initial trajectory): Compute state trajectory using the current input sequence.
  3. Backward pass: Quadratically approximate the cost function around states and inputs, compute feedback gains and feedforward corrections using differential dynamic programming (DDP) based on the Bellman equation.
  4. Forward pass (trajectory update): Generate new input sequence using the gains from the backward pass and update the state trajectory.
  5. Convergence check: Stop if the cost function change is sufficiently small; otherwise return to step 3.

This implementation uses Bharath2/iLQR with Numba acceleration. A forked version with pip install support is available at https://github.com/kosuke55/iLQR. The nonlinear model is passed to iLQR (without the discontinuous kinetic friction terms, since iLQR relies on differentiation for optimization).

Result: With Q=diag([100, 1, 100, 1]), R=diag([0.1]), horizon of 2 seconds (N=20, horizon_dt=0.1), successfully stabilizes from $\theta_1 = 180Β°, \theta_2 = 20Β°$:

iLQR Success

ROA: Much wider than LQR since the nonlinear model handles large angles better. Total computation time is 22.97s (vs 0.3s for LQR), though there is significant room for optimization.

ROA iLQR

Linear MPC

Model Predictive Control (MPC) predicts future system behavior over a horizon and computes optimal control inputs by solving an optimization problem at each time step. Only the first input is applied, then the process repeats with updated measurements (receding horizon). For a detailed explanation, see: https://qiita.com/taka_horibe/items/47f86e02e2db83b0c570

The algorithm:

  1. Model-based prediction: Predict future states using the system model.
  2. Solve optimization problem: Minimize a cost function over the prediction horizon.
  3. Apply control input: Apply only the first computed input.
  4. Re-predict with feedback: Observe the new state and repeat.

The cost function:

$$ J = \sum_{t=0}^{T-1} \left( x(t)^T Q x(t) + u(t)^T R u(t) \right) + x(T)^T Q_f x(T) $$

Linear MPC uses a linearized model for prediction, formulating the optimization as a Quadratic Program (QP) β€” much faster and more stable than nonlinear optimization. For example, Autoware uses Linear MPC for steering control in path tracking: https://autowarefoundation.github.io/autoware.universe/main/control/autoware_mpc_lateral_controller/model_predictive_control_algorithm/

The optimization problem hierarchy: NLP $\supset$ Convex Optimization $\supset$ Conic Programming $\supset$ QP $\supset$ LP

Narrowing the problem class allows the use of specialized, faster solvers. This implementation uses CVXPY with the OSQP solver.

The continuous system is discretized via Euler method for prediction:

$$ A_d = I + A \cdot dt, \quad B_d = B \cdot dt, \quad x[k+1] = A_d x[k] + B_d u[k] $$

class MPCController(Controller):
    def __init__(self, A, B, Q, R, N, dt, horizon_dt):
        super().__init__(dt)
        self.Ad = np.eye(A.shape[0]) + A * horizon_dt
        self.Bd = B * horizon_dt
        self.cp_U = cp.Variable((N, nu))

    def control(self, state, desired_state, t):
        cost = 0
        constraints = []
        x = state
        for i in range(self.N):
            u = self.cp_U[i, :]
            x = self.Ad @ x + self.Bd @ u
            cost += cp.quad_form(x - ref, self.Q) + cp.quad_form(u, self.R)
        constraints += [self.cp_U <= 100.0, self.cp_U >= -100.0]
        problem = cp.Problem(cp.Minimize(cost), constraints)
        problem.solve(solver=cp.OSQP)
        self.u = self.cp_U.value[0, :]

Result: With Q=diag([10, 1, 10, 1]), R=diag([0.01]), N=10, horizon_dt=0.2, stabilizes from $\theta_1 = 170Β°, \theta_2 = 35Β°$. It appears to succeed partly due to the physics naturally settling the rotation at a favorable moment.

Linear MPC Success

ROA: Self-initiated swing-up works up to about $\theta_1 = 75Β°$. Computation time is 65.84s β€” significantly slower than previous methods.

ROA Linear MPC

Nonlinear MPC

Nonlinear MPC uses the nonlinear model for prediction and solves the optimization as a Nonlinear Program (NLP). This directly handles nonlinearity, enabling control over a much wider range, though at higher computational cost and with potential convergence issues.

The implementation uses CasADi for symbolic expression and automatic differentiation. For a detailed reference on CasADi-based MPC, see Learning Model Predictive Control with Python and CasADi. The NLP solver used is IPOPT (Interior Point Optimizer), which uses the primal-dual interior point method. For an explanation of this method, see: https://qiita.com/taka_horibe/items/0c9b0993e0bd1c0135fa

class NonlinearMPCControllerCasADi(Controller):
    def __init__(self, dynamics, A, B, Q, R, N, dt, horizon_dt):
        # CasADi symbolic variables for automatic differentiation
        self.state_sym = ca.MX.sym("state", nx)
        self.u_sym = ca.MX.sym("u", nu)
        self.dxdt_sym = self.f(self.state_sym, 0, self.u_sym)
        self.f_func = ca.Function("f_func", [self.state_sym, self.u_sym], [self.dxdt_sym])

    def predict_state(self, x, u, dt):
        # Runge-Kutta 4th order integration
        k1 = self.f_func(x, u)
        k2 = self.f_func(x + 0.5*dt*k1, u)
        k3 = self.f_func(x + 0.5*dt*k2, u)
        k4 = self.f_func(x + dt*k3, u)
        return x + (dt/6.0) * (k1 + 2*k2 + 2*k3 + k4)

    def control(self, state, desired_state, t):
        opti = ca.Opti()
        U = opti.variable(self.N * self.nu)
        # ... minimize cost with IPOPT solver
        opti.solver("ipopt", {"ipopt.print_level": 0, "print_time": 0})

Result: With Q=diag([100, 1, 100, 1]), R=diag([0.01]), N=10, horizon_dt=0.1, successfully stabilizes even from the fully inverted state $\theta_1 = 180Β°, \theta_2 = 180Β°$:

Nonlinear MPC from 180/180

ROA: Dramatically wider β€” succeeds from nearly all initial states (1284/1296). Computation time is 84.08s.

ROA Nonlinear MPC

Comparison Table

Controller Avg Processing Time [s] Avg Stabilization Time [s] ROA (success/total)
PID 0.33 - 1/1296
Pole Placement 0.32 5.664 44/1296
LQR 0.30 2.890 30/1296
iLQR 22.97 6.690 73/1296
Linear MPC 65.84 5.070 77/1296
Nonlinear MPC 84.08 6.180 1284/1296

Despite higher computation cost, Nonlinear MPC achieves overwhelmingly the best ROA. iLQR is faster than Nonlinear MPC while covering a reasonably wide range. Pole Placement and LQR are much faster but limited in range.

Note: Average Processing Time includes the entire 10-second simulation (not just controller computation). Trials were run in parallel with multiprocessing, so per-process times are higher than single-process execution.


Making Control Harder

Kinetic Friction

Adding kinetic friction ($f_1 = 0.3, f_2 = 0.3$) to both links introduces discontinuous terms that make optimization-based control more difficult. All controllers exhibited steady-state error with friction.

LQR with friction:

LQR with Friction

To address this, an integral term was added to LQR along with friction compensation (applying a force opposite to the friction direction). The method of augmenting the state equation with an integral term is described in: https://www.flight.t.u-tokyo.ac.jp/~tsuchiya/Control/27Servo.pdf

The state equation is augmented with an integral error state:

class LQRControllerWithIntegral(Controller):
    def __init__(self, A, B, C, Q, R, dt):
        # Augment state with integral of theta1 error
        # Only theta1 error is included (including theta2 breaks controllability)
        A_aug = np.zeros((n + 1, n + 1))
        A_aug[:n, :n] = A
        A_aug[n:, :n] = -np.array([[1, 0, 0, 0]])

        # Verify controllability
        Wc = control.ctrb(A_aug, B_aug)
        assert np.linalg.matrix_rank(Wc) == n + 1  # Must be controllable

    def control(self, state, desired_state, t):
        self.integral_error += (desired_state[0] - state[0]) * self.dt
        augmented_state = np.hstack((state - desired_state, self.integral_error))
        self.u = -self.K @ augmented_state
        # Friction compensation
        self.u += f1 * sign(state[1]) + f2 * sign(state[1] + state[3])

Controllability indicates whether the system can be driven from any state to any desired state. Including only $\theta_1$ error in the augmented state maintains controllability; including both $\theta_1$ and $\theta_2$ does not.

With the integral term, the steady-state error is eventually reduced, but overshoot occurs β€” achieving precise stabilization remains difficult:

LQR with Integral + Friction

Setting the second link's friction to $f_2 = 0$ allows successful convergence, showing that kinetic friction on the unactuated link significantly increases difficulty:

LQR with Integral + Friction (f2=0)

Control Input Delay

Adding dead time (delay) to the control input. With a 0.02s control period and 0.05s delay, LQR can still stabilize from $\theta_1 = 30Β°$, though with visible oscillation:

LQR with 0.05s Delay

Effect on ROA:

No delay:

ROA LQR (no delay)

0.05s delay β€” ROA shrinks:

ROA LQR (0.05s delay)

0.1s delay β€” complete failure (0% success):

ROA LQR (0.1s delay)

A well-known approach for handling delay is the Smith Predictor: https://jp.mathworks.com/help/control/ug/control-of-processes-with-long-dead-time-the-smith-predictor.html

References

About

πŸ€– Comparison of Control Algorithms on a Double Inverted Pendulum

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors