Comparison of Control Algorithms on a Double Inverted Pendulum
NonlinearMPCCasADi_.-180.0_0.0_dt-0.02-0.02_dead-0.0_f-0.0-0.0_n-False_e-False_q-False-72.mp4
This repository implements a double inverted pendulum simulator and compares the following control algorithms:
- PID
- Pole Placement
- LQR
- iLQR
- Linear MPC
- Nonlinear MPC
Install rye according to the following.
https://rye.astral.sh/guide/installation/
curl -sSf https://rye.astral.sh/get | bashSet up a virtual environment with rye.
source .rye/env
rye sync
source .venv/bin/activatepython control_double_inverted_pendulum.pyThe default controller_model is LQR. You can change the controller_model by modifying the following line.
controller_mode = "LQR"Also, if you set the following options to True, the selected controller will run the simulation only once from the given initial angle state.
run_once = TrueIf you set this to False, you can run control repeatedly from various initial values to find the ROA (Region of Attraction).
If matplotlib does not display properly in the virtual environment, run the following command.
sudo apt-get install python3-tkThe goal is not only to observe control response graphs, but to experience the full workflow of control engineering β from physical system modeling to controller implementation β and develop practical intuition for each control method.
The double inverted pendulum with single-axis torque input was chosen as the target system because it is an unstable system with considerable control difficulty, making the performance differences between controllers more visible.
- Unstable system: A system that naturally departs from its equilibrium point without control input (e.g., an inverted pendulum falls over). Continuous and appropriate control input is required to maintain a specific state (e.g., standing upright).
- Stable system: A system where the error does not grow when control input is removed (e.g., a vehicle stops in place during path tracking). The system state remains within a bounded range.
Modeling a physical system using the Lagrangian equations of motion from analytical mechanics is one of the major approaches. The equations of motion can be derived from kinetic energy and potential energy.
The desired form is the state-space representation where higher-order derivatives are expressed in terms of lower-order ones:
For a single-axis torque-controlled double inverted pendulum, the state vector is
This form allows us to compute the infinitesimal change when input
The simulator always requires a model to reproduce the system's behavior when a control input is applied. On the controller side, whether a model is needed depends on the method:
- PID control does not require a model β it performs feedback control based on the error between observed and target values.
- MPC requires a model to predict future behavior and compute optimal control inputs.
- LQR requires a linear state-space model of the system.
In this project, the simulator's model is assumed to be known, and the controllers use the same (or nearly identical) model. In real-world control, if the physical model is unknown, system identification would be needed to estimate it from input/output measurements.
The following articles were helpful references for modeling. They derive equations of motion for wheeled and cart-type inverted pendulums using SymPy:
- https://negligible.hatenablog.com/entry/2023/04/27/184908
- https://negligible.hatenablog.com/entry/2023/03/06/190928
For linearization using SymPy (needed for linear controllers like LQR):
- https://negligible.hatenablog.com/entry/2023/05/17/015508
- https://negligible.hatenablog.com/entry/2023/03/29/040000
To build intuition, let's start with a simplified model where the links have no mass β only the joints have mass. No control input is considered. The following PDF was a useful reference: http://www.phys.lsu.edu/faculty/gonzalez/Teaching/Phys7221/DoublePendulum.pdf
The equations of motion derived from the Lagrangian are:
Solving these simultaneous equations for
For the inverted pendulum, we want the upright equilibrium as the target state, so we define
import sympy as sp
from sympy.physics.mechanics import dynamicsymbols
t = sp.symbols('t')
theta1 = dynamicsymbols('theta1')
theta2 = dynamicsymbols('theta2')
l1, l2, m1, m2, g = sp.symbols('L1 L2 m1 m2 g')
x1 = l1 * sp.sin(theta1)
y1 = l1 * sp.cos(theta1)
x2 = l1 * sp.sin(theta1) + l2 * sp.sin(theta2)
y2 = l1 * sp.cos(theta1) + l2 * sp.cos(theta2)
T1 = m1 * (sp.diff(x1, t)**2 + sp.diff(y1, t)**2) / 2
T2 = m2 * (sp.diff(x2, t)**2 + sp.diff(y2, t)**2) / 2
T = T1 + T2
V1 = m1 * g * y1
V2 = m2 * g * y2
V = V1 + V2
L = T - V
lagrange_eq1 = sp.diff(sp.diff(L, sp.diff(theta1, t)), t) - sp.diff(L, theta1)
tau = sp.symbols('tau')
lagrange_eq1 = lagrange_eq1 - tau
lagrange_eq2 = sp.diff(sp.diff(L, sp.diff(theta2, t)), t) - sp.diff(L, theta2)
solutions = sp.solve([lagrange_eq1, lagrange_eq2], [theta1.diff(t, 2), theta2.diff(t, 2)])This gives us the state-space form:
Linearizing around
A more realistic model considers link mass, moments of inertia, viscous friction, and kinetic friction. Based on the equations from Learning Control Engineering with Inverted Pendulums (Equations 3.54, 3.55), with added kinetic friction:
# Simultaneous equations with friction
eq1 = Eq(
alpha1 * theta1_ddot + alpha3 * cos(theta12) * theta2_ddot,
-alpha3 * theta2_dot**2 * sin(theta12)
+ alpha4 * sin(theta1)
- (mu1 + mu2) * theta1_dot + mu2 * theta2_dot
+ tau1 - sign(theta1_dot) * f1
)
eq2 = Eq(
alpha3 * cos(theta12) * theta1_ddot + alpha2 * theta2_ddot,
alpha3 * theta1_dot**2 * sin(theta12)
+ alpha5 * sin(theta2)
+ mu2 * theta1_dot - mu2 * theta2_dot
- sign(theta2_dot - theta1_dot) * f2
)Where:
-
$\alpha_1, \alpha_2, \alpha_3, \alpha_4, \alpha_5$ are common constants derived from mass, link length, center of gravity position, moment of inertia, and gravity -
$\mu_1, \mu_2$ are viscous friction coefficients -
$f_1, f_2$ are kinetic friction coefficients
The kinetic friction introduces discontinuous sign() terms, which makes differentiation-based linearization fail. To linearize, we remove the discontinuous terms first and then compute the state-space matrices around
The resulting DoubleInvertedPendulumDynamics class contains both nonlinear and linearized dynamics. The simulator uses Runge-Kutta numerical integration to update the state at each time step.
Additional simulator features:
- Observation noise
- Observation quantization
- Controller/simulation period mismatch
- Control input delay
The controllers attempt to stabilize the double inverted pendulum from various initial states to the upright target state (
The initial states
Success criteria: The simulation runs for 10 seconds. If
Note: The following implementations are simplified and not fully optimized. They are intended to provide a rough comparison of each method's performance. There is significant room for improvement in both logic and computation speed.
PID control (Proportional-Integral-Derivative Control) is said to be used in about 90% of control systems in the world. For example, Autoware (an autonomous driving OSS) uses PID control for speed control: https://autowarefoundation.github.io/autoware.universe/main/control/autoware_pid_longitudinal_controller/
PID generates control output based on the current error (difference between desired and actual values):
-
Proportional (P): Output proportional to the error. Gain
$K_p$ determines the response strength. -
Integral (I): Considers the accumulated past error. Eliminates steady-state error. Gain
$K_i$ . -
Derivative (D): Considers the rate of change of error. Prevents overshoot and smooths the response. Gain
$K_d$ .
class PIDController(Controller):
def __init__(self, Kp, Ki, Kd, dt):
super().__init__(dt)
self.Kp = Kp
self.Ki = Ki
self.Kd = Kd
self.integral = 0.0
self.previous_error = 0.0
def control(self, state, desired_state, t):
error = desired_state - state
self.integral += error * self.dt
derivative = (error - self.previous_error) / self.dt
self.u = np.array([
np.dot(self.Kp, error)
+ np.dot(self.Ki, self.integral)
+ np.dot(self.Kd, derivative)
])
self.previous_error = errorResult: Despite extensive gain tuning, simple PID could not even maintain
This system is SIMO (Single Input Multiple Output) β one torque input controls multiple outputs (
ROA: Only succeeds from
Pole placement designs a state feedback gain
The gain
class PolePlacementController(Controller):
def __init__(self, A, B, desired_poles, dt):
super().__init__(dt)
self.K = control.place(A, B, desired_poles)
def control(self, state, desired_state, t):
self.u = -self.K @ (state - desired_state)
return self.uThe control.place() function from python-control (wrapping scipy.signal.place_poles()) computes the gain in one line:
https://python-control.readthedocs.io/en/0.10.0/generated/control.place.html
Choosing appropriate pole locations is the key challenge. Generally, avoiding repeated poles improves robustness. The real part of poles is typically set as a multiple of the system's natural frequency.
Result: With poles at [-1, -2, -3, -4], the controller successfully stabilizes from
Changing poles to half ([-0.5, -1, -1.5, -2]) or double causes failure:
ROA: Succeeds from large angles. Interestingly, control fails around
LQR (Linear Quadratic Regulator) is an optimal control method for linear systems. For a detailed explanation, see: https://qiita.com/taka_horibe/items/5d053cdaaf4c886d27b5
It minimizes a quadratic cost function:
-
$Q$ : Positive semi-definite weight matrix for state deviation cost -
$R$ : Positive definite weight matrix for control input cost
The optimal control input is a linear state feedback:
class LQRController(Controller):
def __init__(self, A, B, Q, R, dt):
super().__init__(dt)
X = solve_continuous_are(A, B, Q, R)
self.K = inv(R) @ B.T @ X
def control(self, state, desired_state, t):
self.u = -self.K @ (state - desired_state)
return self.uResult: Successfully stabilizes from
ROA: Smaller than pole placement, since LQR uses a model linearized at 0Β°, which becomes inaccurate far from the equilibrium.
iLQR (Iterative Linear Quadratic Regulator) extends LQR to nonlinear systems by iteratively performing linear approximations. It uses the nonlinear model directly and finds optimal control inputs through forward and backward passes. The following articles provide detailed explanations:
- https://zenn.dev/takuya_fukatsu/articles/3c7ef8a68275c7
- https://qiita.com/PozihameBaystar/items/45fa9fd96ae0a69ec31a
- https://qiita.com/PozihameBaystar/items/6c1c6abc7724728ac637
The cost function is:
The algorithm proceeds as follows:
-
Initial trajectory and input setup: Start from initial state
$x_0$ with an initial input sequence${u_0, u_1, ..., u_{N-1}}$ . - Forward pass (initial trajectory): Compute state trajectory using the current input sequence.
- Backward pass: Quadratically approximate the cost function around states and inputs, compute feedback gains and feedforward corrections using differential dynamic programming (DDP) based on the Bellman equation.
- Forward pass (trajectory update): Generate new input sequence using the gains from the backward pass and update the state trajectory.
- Convergence check: Stop if the cost function change is sufficiently small; otherwise return to step 3.
This implementation uses Bharath2/iLQR with Numba acceleration. A forked version with pip install support is available at https://github.com/kosuke55/iLQR. The nonlinear model is passed to iLQR (without the discontinuous kinetic friction terms, since iLQR relies on differentiation for optimization).
Result: With Q=diag([100, 1, 100, 1]), R=diag([0.1]), horizon of 2 seconds (N=20, horizon_dt=0.1), successfully stabilizes from
ROA: Much wider than LQR since the nonlinear model handles large angles better. Total computation time is 22.97s (vs 0.3s for LQR), though there is significant room for optimization.
Model Predictive Control (MPC) predicts future system behavior over a horizon and computes optimal control inputs by solving an optimization problem at each time step. Only the first input is applied, then the process repeats with updated measurements (receding horizon). For a detailed explanation, see: https://qiita.com/taka_horibe/items/47f86e02e2db83b0c570
The algorithm:
- Model-based prediction: Predict future states using the system model.
- Solve optimization problem: Minimize a cost function over the prediction horizon.
- Apply control input: Apply only the first computed input.
- Re-predict with feedback: Observe the new state and repeat.
The cost function:
Linear MPC uses a linearized model for prediction, formulating the optimization as a Quadratic Program (QP) β much faster and more stable than nonlinear optimization. For example, Autoware uses Linear MPC for steering control in path tracking: https://autowarefoundation.github.io/autoware.universe/main/control/autoware_mpc_lateral_controller/model_predictive_control_algorithm/
The optimization problem hierarchy:
NLP
Narrowing the problem class allows the use of specialized, faster solvers. This implementation uses CVXPY with the OSQP solver.
The continuous system is discretized via Euler method for prediction:
class MPCController(Controller):
def __init__(self, A, B, Q, R, N, dt, horizon_dt):
super().__init__(dt)
self.Ad = np.eye(A.shape[0]) + A * horizon_dt
self.Bd = B * horizon_dt
self.cp_U = cp.Variable((N, nu))
def control(self, state, desired_state, t):
cost = 0
constraints = []
x = state
for i in range(self.N):
u = self.cp_U[i, :]
x = self.Ad @ x + self.Bd @ u
cost += cp.quad_form(x - ref, self.Q) + cp.quad_form(u, self.R)
constraints += [self.cp_U <= 100.0, self.cp_U >= -100.0]
problem = cp.Problem(cp.Minimize(cost), constraints)
problem.solve(solver=cp.OSQP)
self.u = self.cp_U.value[0, :]Result: With Q=diag([10, 1, 10, 1]), R=diag([0.01]), N=10, horizon_dt=0.2, stabilizes from
ROA: Self-initiated swing-up works up to about
Nonlinear MPC uses the nonlinear model for prediction and solves the optimization as a Nonlinear Program (NLP). This directly handles nonlinearity, enabling control over a much wider range, though at higher computational cost and with potential convergence issues.
The implementation uses CasADi for symbolic expression and automatic differentiation. For a detailed reference on CasADi-based MPC, see Learning Model Predictive Control with Python and CasADi. The NLP solver used is IPOPT (Interior Point Optimizer), which uses the primal-dual interior point method. For an explanation of this method, see: https://qiita.com/taka_horibe/items/0c9b0993e0bd1c0135fa
class NonlinearMPCControllerCasADi(Controller):
def __init__(self, dynamics, A, B, Q, R, N, dt, horizon_dt):
# CasADi symbolic variables for automatic differentiation
self.state_sym = ca.MX.sym("state", nx)
self.u_sym = ca.MX.sym("u", nu)
self.dxdt_sym = self.f(self.state_sym, 0, self.u_sym)
self.f_func = ca.Function("f_func", [self.state_sym, self.u_sym], [self.dxdt_sym])
def predict_state(self, x, u, dt):
# Runge-Kutta 4th order integration
k1 = self.f_func(x, u)
k2 = self.f_func(x + 0.5*dt*k1, u)
k3 = self.f_func(x + 0.5*dt*k2, u)
k4 = self.f_func(x + dt*k3, u)
return x + (dt/6.0) * (k1 + 2*k2 + 2*k3 + k4)
def control(self, state, desired_state, t):
opti = ca.Opti()
U = opti.variable(self.N * self.nu)
# ... minimize cost with IPOPT solver
opti.solver("ipopt", {"ipopt.print_level": 0, "print_time": 0})Result: With Q=diag([100, 1, 100, 1]), R=diag([0.01]), N=10, horizon_dt=0.1, successfully stabilizes even from the fully inverted state
ROA: Dramatically wider β succeeds from nearly all initial states (1284/1296). Computation time is 84.08s.
| Controller | Avg Processing Time [s] | Avg Stabilization Time [s] | ROA (success/total) |
|---|---|---|---|
| PID | 0.33 | - | 1/1296 |
| Pole Placement | 0.32 | 5.664 | 44/1296 |
| LQR | 0.30 | 2.890 | 30/1296 |
| iLQR | 22.97 | 6.690 | 73/1296 |
| Linear MPC | 65.84 | 5.070 | 77/1296 |
| Nonlinear MPC | 84.08 | 6.180 | 1284/1296 |
Despite higher computation cost, Nonlinear MPC achieves overwhelmingly the best ROA. iLQR is faster than Nonlinear MPC while covering a reasonably wide range. Pole Placement and LQR are much faster but limited in range.
Note: Average Processing Time includes the entire 10-second simulation (not just controller computation). Trials were run in parallel with multiprocessing, so per-process times are higher than single-process execution.
Adding kinetic friction (
LQR with friction:
To address this, an integral term was added to LQR along with friction compensation (applying a force opposite to the friction direction). The method of augmenting the state equation with an integral term is described in: https://www.flight.t.u-tokyo.ac.jp/~tsuchiya/Control/27Servo.pdf
The state equation is augmented with an integral error state:
class LQRControllerWithIntegral(Controller):
def __init__(self, A, B, C, Q, R, dt):
# Augment state with integral of theta1 error
# Only theta1 error is included (including theta2 breaks controllability)
A_aug = np.zeros((n + 1, n + 1))
A_aug[:n, :n] = A
A_aug[n:, :n] = -np.array([[1, 0, 0, 0]])
# Verify controllability
Wc = control.ctrb(A_aug, B_aug)
assert np.linalg.matrix_rank(Wc) == n + 1 # Must be controllable
def control(self, state, desired_state, t):
self.integral_error += (desired_state[0] - state[0]) * self.dt
augmented_state = np.hstack((state - desired_state, self.integral_error))
self.u = -self.K @ augmented_state
# Friction compensation
self.u += f1 * sign(state[1]) + f2 * sign(state[1] + state[3])Controllability indicates whether the system can be driven from any state to any desired state. Including only
$\theta_1$ error in the augmented state maintains controllability; including both$\theta_1$ and$\theta_2$ does not.
With the integral term, the steady-state error is eventually reduced, but overshoot occurs β achieving precise stabilization remains difficult:
Setting the second link's friction to
Adding dead time (delay) to the control input. With a 0.02s control period and 0.05s delay, LQR can still stabilize from
Effect on ROA:
No delay:
0.05s delay β ROA shrinks:
0.1s delay β complete failure (0% success):
A well-known approach for handling delay is the Smith Predictor: https://jp.mathworks.com/help/control/ug/control-of-processes-with-long-dead-time-the-smith-predictor.html
- Learning Control Engineering with Inverted Pendulums (Kawata, Higashi, Ichihara, Urakubo, Otsuka, 2017)
- Learning Model Predictive Control with Python and CasADi (Fukatsu, Hishinuma, Aramaki)
- Derivation of Equations of Motion for a Wheeled Inverted Pendulum in Python
- Simulation of a Cart-type Inverted Pendulum in Python
- Linearization and Optimal Regulator for a Wheeled Inverted Pendulum
- Linear Optimal Control (LQR) and Riccati Solver
- Convex Optimization and Primal-Dual Interior Point Method
- Complete Guide to iLQR β Theory
- Understanding LQR/iLQR
- Nonlinear MPC of a Differential-Drive Vehicle using iLQR
- Model Predictive Control (MPC) for Trajectory Tracking
- Pole Placement and Response Characteristics




















