原始論文:Rudolf E. Kálmán, A New Approach to Linear Filtering and Prediction Problems, ASME 1960 經典教材:Thrun, Burgard, Fox — Probabilistic Robotics(MIT Press, 2005) 現代應用代表:SORT (2016)、DeepSORT (2017)、ORB-SLAM、Apollo Guidance Computer
TL;DR
上幾篇 blog 我哋睇咗 deep learning 點樣解 ReID、pose estimation 等 high-level 嘅 vision 問題。但好多 production system 嘅 backbone 其實仲係一條 1960 年寫嘅遞推公式——Kalman Filter(卡爾曼濾波器)。
佢嘅核心問題簡單到不可思議:有 noisy 嘅 measurement,又有 noisy 嘅 dynamic model,點樣最 optimal 咁估計 state?
核心重點:
- 🎯 Recursive Bayesian Estimation:唔需要儲存歷史 measurement,每步只用上一個 posterior + 今次 measurement 更新
- 🧩 Predict-Update Cycle:先用 motion model predict 出 prior,再用 measurement 修正成 posterior
- 📐 Kalman Gain 係靈魂:自動 trade-off prediction uncertainty 同 measurement uncertainty——邊個準 listen 邊個多啲
- ⚡ Linear + Gaussian 下係 MMSE optimal:唔係 heuristic,係數學上證明 optimal 嘅 estimator
- 🚀 由 Apollo 11 飛到 SORT:60 年嚟由 navigation、radar tracking、IMU fusion、SLAM、目標追蹤無一缺席
- 🔧 延伸:EKF / UKF / Particle Filter 處理 non-linear / non-Gaussian 嘅現實系統
目錄
背景:State Estimation 嘅古老問題
想象你係 1960 年代 NASA 嘅工程師。Apollo 太空船要由地球飛去月球——你有:
- Noisy IMU measurement(加速度計每秒搖兩搖)
- 間斷嘅 ground radar fix(每幾分鐘先有一次)
- 理論上嘅 orbital mechanics model(重力、推進力)
問題:任何一個 sensor 都唔可靠,但合埋一齊應該可以估到準確嘅位置。點樣 fuse?
三條 naive 路線
| 做法 | 問題 |
|---|---|
| 只信 sensor | 每秒抖動嚴重,noise 累積 |
| 只信 motion model | 長期會 drift(積分 noise 爆炸) |
| 固定權重平均 | 權重唔可能適用所有情況(高速 vs 低速、近 radar vs 遠 radar) |
💡 Kalman 嘅突破:用 Bayesian inference 做 sensor fusion,而且權重唔係 hand-tuned,係由 prediction 同 measurement 嘅 uncertainty 自動 derive。Apollo 11 嘅 onboard computer 嘅 navigation 模組就用咗 Kalman Filter——係佢將「估計」變成可計算嘅工程問題。
核心 Setup:Linear Gaussian State Space Model
Kalman Filter 假設系統可以由兩條 linear equation 描述:
Process Model(State Transition)
Observation Model
| 符號 | 意義 | Tracking example(2D 物件) |
|---|---|---|
| State vector | (位置 + 速度) | |
| State transition matrix | constant-velocity model | |
| Control input | tracking 通常 = 0 | |
| Process noise | 未建模嘅加速度 | |
| Measurement | detector 出嘅 bbox 中心 | |
| Observation matrix | 只觀察位置,唔觀察速度 | |
| Measurement noise | detector 抖動 | |
| Process / Measurement covariance | 整個 KF 嘅 hyper-parameter |
🎯 注意三個 critical assumptions:
Linear:state transition 同 observation 都係 linear
Gaussian noise:所有 noise 都 zero-mean Gaussian
Markov: 只依賴
呢三個假設一齊就保證咗:posterior 永遠都係 Gaussian——所以只需要 track 兩個 sufficient statistics 就夠。
Predict-Update Cycle:5 條方程改變世界
Predict(事前估計)
直觀解讀:用 motion model 將 belief 推前一步。Covariance 一定變大(加咗 )——你冇睇到 sensor 之前,自信心一定下降。
Update(事後修正)
Innovation(measurement 同 prediction 嘅 residual):
Innovation covariance:
Kalman Gain:
Posterior state + covariance:
🔑 Kalman Gain 嘅靈魂:
當 (sensor 完美)→ → posterior ≈ measurement(信晒 sensor)
當 (sensor 廢)→ → posterior ≈ prediction(信晒 model)
中間自動 interpolate,權重由 covariance 自然 derive,唔需要 hand-tune
一個具體 1D 例子:估體溫
你身體實際體溫係 (unknown)。你有一支體溫計,每次量度有 ±0.5°C noise。
Setup:
- (體溫慢慢漂)、(sensor noise variance = 0.5² = 0.25)
- 初始 belief:
Round 1:量到 37.3°C
Predict:
Update:
觀察:初始 belief 唔可靠(P=1.0 >> R=0.25),Kalman Gain ≈ 0.8 → 主要信 measurement。
Round 2:量到 36.8°C
Predict:
Update:
💡 注意 Kalman Gain 自動 down-weight 咗第二次 measurement(由 0.802 → 0.457)。原因:經過 Round 1,model 對 state 嘅 confidence 已經提升(P 由 1.0 → 0.200),所以下一次 measurement 嘅 relative weight 自然細啲。呢個正係 KF 嘅 elegance——你冇 tune 過任何嘢,行為自動 adapt。
Bayesian 推導:點解係 MMSE Optimal?
Kalman Filter 本質上係 recursive Bayesian filter 喺 linear-Gaussian 假設下嘅 closed-form solution。
Prediction Step(Chapman-Kolmogorov)
Gaussian × linear transformation 嘅 marginal 仲係 Gaussian → 直接 reduce 到 mean 同 covariance 嘅 propagation(即上面 Predict 嘅兩條方程)。
Update Step(Bayes Rule)
Gaussian × Gaussian = Gaussian(exponent 相加,still quadratic)。經過代數整理(completing the square),就得到上面 5 條 Update 方程。
🎯 MMSE Optimality 嘅意義:喺 linear-Gaussian 假設下,Kalman Filter 嘅 estimate 同時係:
MMSE estimator:minimize
MAP estimator:maximize
BLUE estimator(Best Linear Unbiased Estimator)
三個 criteria 一致——呢個係好少數情況下「優化準則 ambiguity 消失」嘅 happy case,亦解釋咗點解 KF 喺工程上咁穩定。
NumPy 實作:50 行寫齊核心 logic
pythonimport numpy as np
class KalmanFilter:
def __init__(self, F, H, Q, R, x0, P0, B=None):
"""
F: (n, n) state transition
H: (m, n) observation matrix
Q: (n, n) process noise cov
R: (m, m) measurement noise cov
x0: (n,) initial state mean
P0: (n, n) initial state cov
B: (n, p) control matrix (optional)
"""
self.F, self.H, self.Q, self.R = F, H, Q, R
self.x, self.P = x0.copy(), P0.copy()
self.B = B
self.I = np.eye(F.shape[0])
def predict(self, u=None):
self.x = self.F @ self.x
if u is not None and self.B is not None:
self.x = self.x + self.B @ u
self.P = self.F @ self.P @ self.F.T + self.Q
return self.x.copy()
def update(self, z):
y = z - self.H @ self.x # innovation
S = self.H @ self.P @ self.H.T + self.R # innovation cov
K = self.P @ self.H.T @ np.linalg.inv(S) # Kalman gain
self.x = self.x + K @ y
# Joseph form 更 numerically stable:
# self.P = (I - KH) P (I - KH)ᵀ + K R Kᵀ
self.P = (self.I - K @ self.H) @ self.P
return self.x.copy()
def mahalanobis(self, z):
"""用嚟 data association(SORT / DeepSORT 用嘅 gating)。"""
y = z - self.H @ self.x
S = self.H @ self.P @ self.H.T + self.R
return float(y @ np.linalg.inv(S) @ y)
用嚟 track 一個 2D constant-velocity 物件
pythondt = 1.0
F = np.array([[1, 0, dt, 0],
[0, 1, 0, dt],
[0, 0, 1, 0],
[0, 0, 0, 1]])
H = np.array([[1, 0, 0, 0],
[0, 1, 0, 0]])
Q = np.diag([0.01, 0.01, 0.1, 0.1]) # 細加速度 noise
R = np.diag([1.0, 1.0]) # detector noise
x0 = np.array([0.0, 0.0, 1.0, 0.0]) # 估計從 (0,0) 開始以 vₓ=1 移動
P0 = np.eye(4) * 10.0
kf = KalmanFilter(F, H, Q, R, x0, P0)
for t, z in enumerate(measurements): # noisy 2D detections
kf.predict()
kf.update(z)
print(f"t={t}: pos=({kf.x[0]:.2f}, {kf.x[1]:.2f}), vel=({kf.x[2]:.2f}, {kf.x[3]:.2f})")
🔑 Numerical Stability Tip:production code 永遠用 Joseph form 更新 P,因為 floating-point error 可能令 失去 symmetry / positive-definiteness。Joseph form guaranteed 保 symmetry。Apollo 嗰陣已經發現呢個問題。
Case Study 1:SORT / DeepSORT 入面嘅 KF
SORT (2016) 同 DeepSORT (2017) 係 multi-object tracking 嘅經典 baseline,到 2026 年仲係好多 production system 嘅默認 tracker。佢哋嘅 backbone 就係一個 8 維 constant-velocity Kalman Filter:
State Vector
- :bbox 中心
- :bbox 面積(scale)
- :aspect ratio(假設 constant,唔 track 速度)
- 點解 r 冇速度?因為大部分物件 aspect ratio 變化好細,加 反而加 noise
Predict-Update Cycle 同 detector 結合
Mahalanobis Gating
DeepSORT 用 KF predict 嘅 covariance 計 Mahalanobis distance 嚟 reject 唔合理 association:
如果 → reject。
🎯 SORT 嘅 Lesson:一個 1960 年嘅 algorithm,配上一個 modern detector,就足以喺 MOT challenge 攞高分。複雜嘅 deep learning tracker(如 ByteTrack、OC-SORT)本質上都係 SORT + 一啲 modification——KF 一直都係靈魂。
Case Study 2:IMU + GPS Sensor Fusion
智能手機、無人機、自駕車入面,IMU(100Hz, drift)+ GPS(1Hz, accurate but jittery)嘅 fusion 係 KF 嘅 textbook application。
| Sensor | 特性 | 對 KF 嘅角色 |
|---|---|---|
| IMU (accel + gyro) | 高頻率 (100-1000Hz)、長期 drift、短期準 | Process model(用嚟 predict) |
| GPS | 低頻率 (1-10Hz)、絕對位置、有 jitter | Measurement(用嚟 correct drift) |
| Magnetometer | 絕對方向、受磁場干擾 | Measurement(correct heading) |
| Wheel encoder | 高頻、accumulate error、轉彎打滑 | Measurement (擺喺 H 入面) |
⚡ Trade-off 嘅美感:IMU 短期準、GPS 長期準——KF 完美咁將兩者結合。冇 KF 嘅手機定位會抖到嚇親,冇 GPS 嘅純 IMU 行幾秒就 drift 幾米。(IMU noise)同 (GPS noise)嘅 tuning 就係 navigation engineer 嘅 art。
Extended Kalman Filter (EKF):處理 Non-Linearity
現實系統好少係 linear——車軚轉彎、camera projection、機械臂 forward kinematics 都係 nonlinear。
Setup
EKF 嘅核心 idea:Linearize via Jacobian
喺當前 estimate 附近用 Taylor 展開 first-order:
之後 predict / update 嘅 5 條方程一樣,只係 state propagation 用真 ,covariance 傳播用 Jacobian。
經典應用:EKF-SLAM
機器人位置 + 所有 landmark 位置一齊放入 state vector,sensor model(如 range-bearing)係 nonlinear → EKF。Calculus 上漂亮,但 state 維度爆炸(n landmarks → 2n+3 維),covariance 矩陣係 dense 嘅 → 後嚟被 FastSLAM、Graph SLAM 取代。
⚠️ EKF 嘅死穴:linearization error。如果系統高度 nonlinear(例如車軚急轉),first-order Taylor 展開唔夠準,EKF 嘅 covariance 估錯 → filter 會 over-confident → 之後 reject 正確 measurement → 進一步 drift → filter divergence。
Unscented Kalman Filter (UKF):Sigma Points 嘅藝術
UKF (Julier & Uhlmann, 2000) 用 deterministic sampling 取代 linearization:
- 由 生成 個 sigma points(cover ±σ 嘅幾個關鍵方向)
- 將每個 sigma point 透過 nonlinear / propagate(無需 Jacobian)
- 由 transformed sigma points 重新計算 weighted mean + covariance
| Filter | Linearization | Accuracy on nonlinear | Cost |
|---|---|---|---|
| KF | 不適用(要求 linear) | — | O(n³) |
| EKF | First-order Jacobian | 衰(高度 nonlinear) | O(n³) + 計 Jacobian |
| UKF | Sigma point sampling | Up to 3rd order moment | O(n³),但常數較大 |
💡 UKF 嘅實戰優勢:
唔使計 Jacobian——對於複雜 dynamics(如四旋翼飛行、機械臂)大幅減 implementation effort
Nonlinearity 高嘅情況 accuracy 明顯好過 EKF
Discontinuity 都 work(EKF 喺 discontinuity 上會死,因為 derivative undefined)
但 UKF 同樣假設 noise 係 Gaussian——遇到 multi-modal posterior 仲係要上 Particle Filter。
同 Particle Filter 嘅對比
| Filter | Posterior 表達 | Linearity | Gaussianity | Cost |
|---|---|---|---|---|
| KF | Linear only | Gaussian only | O(n³) | |
| EKF | Mildly nonlinear | Gaussian only | O(n³) | |
| UKF | Highly nonlinear | Gaussian only | O(n³) | |
| Particle Filter | Any | Any (包括 multi-modal) | O(Nn),N 大 |
🎯 何時揀邊個?
System linear + Gaussian noise → KF(永遠係 optimal)
Mildly nonlinear, well-localized posterior → EKF(最常用嘅 SLAM、IMU fusion)
Highly nonlinear, 唔想計 Jacobian → UKF
Multi-modal posterior(如 robot kidnapping problem、ambiguous data association)→ Particle Filter
Posterior 維度極高 + sparse → Information Filter / Graph SLAM
Practical Tips:點 Tune Q 同 R?
呢個係 KF 工程上最痛嘅問題。Q(process noise)同 R(measurement noise)唔啱,filter 行為就會崩。
Tuning Strategy
| 症狀 | 可能原因 | 修正方向 |
|---|---|---|
| Filter 反應慢,跟唔到 sudden motion | Q 太細(過信 model) | 增加 Q(讓 KF 容許更大 motion deviation) |
| Filter 抖動嚴重,跟住 noise 行 | R 太細(過信 sensor) | 增加 R(讓 KF 平滑 measurement) |
| Innovation 一直 biased | Model 有 systematic error | 檢查 F / H 矩陣係咪 mis-specified |
| Covariance P 越嚟越細,但 estimate 衰 | Filter divergence | EKF:減細 step 或者轉 UKF;KF:檢查 Q 係咪太細 |
Innovation 統計診斷
如果 model 正確,Normalized Innovation Squared (NIS) 應該服從 分佈:
喺一個 sliding window 上面計平均,如果經常 outside 95% confidence interval → KF tuning 有問題。
pythondef chi2_check(innovations, S_matrices, window=100):
"""快速診斷 KF 係咪 tune 啱。"""
nis = [y @ np.linalg.inv(S) @ y for y, S in zip(innovations, S_matrices)]
avg_nis = np.mean(nis[-window:])
m = innovations[0].shape[0]
# 預期 avg ≈ m,下界 χ²(0.025), 上界 χ²(0.975)
return avg_nis, m
⚠️ 自動 tuning:可以用 EM 或者 ALS(Autocovariance Least Squares)由 innovation 序列自動估 Q, R——但需要 stationary system。對於 production tracker,通常 hand-tune + grid search 就夠。SORT 直接寫死 Q, R 而冇任何 adaptive logic——簡單就係強大。
PyTorch Differentiable Kalman Filter
2020 年代開始流行 將 KF 嵌入 deep learning 嘅 end-to-end pipeline——用 NN 學 Q, R, F,KF 做 inductive bias。
pythonimport torch
import torch.nn as nn
class DifferentiableKF(nn.Module):
def __init__(self, state_dim, obs_dim):
super().__init__()
self.state_dim, self.obs_dim = state_dim, obs_dim
# F, H 可以係 learnable 或者 fixed
self.F = nn.Parameter(torch.eye(state_dim))
self.H = nn.Parameter(torch.zeros(obs_dim, state_dim))
# Q, R 用 NN 預測(input-dependent noise model)
self.Q_net = nn.Sequential(
nn.Linear(state_dim, 64), nn.ReLU(),
nn.Linear(64, state_dim),
)
self.R_net = nn.Sequential(
nn.Linear(obs_dim, 64), nn.ReLU(),
nn.Linear(64, obs_dim),
)
def forward(self, x, P, z):
# Predict
x_pred = x @ self.F.T
Q = torch.diag_embed(self.Q_net(x).exp())
P_pred = self.F @ P @ self.F.T + Q
# Update
y = z - x_pred @ self.H.T
R = torch.diag_embed(self.R_net(z).exp())
S = self.H @ P_pred @ self.H.T + R
K = P_pred @ self.H.T @ torch.linalg.inv(S)
x_new = x_pred + (y.unsqueeze(-1).transpose(-1, -2) @ K.transpose(-1, -2)).squeeze(-2)
I = torch.eye(self.state_dim, device=x.device)
P_new = (I - K @ self.H) @ P_pred
return x_new, P_new
💡 Differentiable KF 嘅 use case:
Learnable sensor model:CNN feature → measurement,再 KF fuse(如 DeepVO、BackpropKF)
Input-dependent Q, R:用 NN 預測 noise(如 occluded frame 應該大 R)
End-to-end training:loss 可以由 final trajectory error backprop 到 NN
但記住:Gradient through matrix inverse 同 batched linear solve 喺 GPU 上面係 expensive 操作——production 通常仲係 NumPy/Eigen CPU。
限制同常見坑
1. Linear + Gaussian 假設太強
現實世界好少完全 linear,更少完全 Gaussian。重 tail(如 GPS occasional huge error)→ KF 會 over-react → divergence。Robust KF(Huber loss innovation、Student-t noise) 係 mitigation。
2. Cold Start 問題
Initial 揀錯 → 頭幾步 estimate 飄走。常見做法: 設大啲(high prior uncertainty),等 KF 自己 converge。
3. Observability
如果 state 嘅某啲維度從 measurement 上面 unobservable(如 SLAM 入面 robot 唔郁時嘅 landmark depth),KF 嘅 covariance 喺呢個方向永遠唔減 → estimate 飄走。需要 motion 提供 information。
4. Outliers
Detector miss 一個 frame、GPS 跳一下——一個 outlier 就可能令 KF 飄走。Mitigation:
- Gating:用 Mahalanobis distance reject outlier
- Robust update:Innovation 太大就跳過 update step(只 predict)
5. Asynchronous Sensors
IMU 100Hz、GPS 1Hz、camera 30Hz——同 KF 嘅 fixed-rate 假設衝突。Solution:
- Multi-rate KF:predict step 用最快 sensor 嘅 rate,update step 喺各 sensor available 時做
- Event-based KF:完全 asynchronous,每次 measurement 嚟到就 update
技術啟示
1. Bayesian Inference + Closed-Form 嘅威力
Kalman Filter 嘅本質係 Bayesian inference 喺特殊條件下嘅 closed-form solution。現代 deep learning 入面好多 trick(VAE、diffusion model、normalizing flow)都可以追溯到 Gaussian-based probabilistic reasoning——掌握 KF 等於掌握咗呢個傳統嘅核心。
2. Recursive 比 Batch 永遠更實用
KF 嘅 recursive structure 令佢可以喺 embedded device 上面 real-time 跑,唔需要 buffer 歷史 data。同樣嘅 philosophy 反映喺:
- Online learning vs batch learning
- Streaming sketches vs full pass
- Transformer 嘅 KV cache vs full attention
3. Uncertainty 比 Point Estimate 更值錢
KF output 嘅唔只係 ,仲有 (covariance)。下游 decision making(如自駕車嘅 safety margin、SORT 嘅 gating)正正需要呢個 uncertainty——deep learning 點解咁多 model 走去學 uncertainty(Bayesian NN、ensemble)?因為佢哋仲未追到 KF 嘅水平。
4. Simple Beats Complex(when assumptions hold)
SORT 用 1960 年嘅 KF 喺 2016 年攞 SOTA,2026 年仲喺 production——只要 assumption 合理,simple model 永遠 beat complex model。Always check if a Kalman Filter would work before reaching for a neural network.
總結
Kalman Filter 唔係一個 algorithm——係一個 mindset:
- 用 probability 表達 belief
- 用 dynamics 推前
- 用 measurement 修正
- 永遠 track uncertainty
60 年來,由 Apollo 飛去月球,到你手機嘅 GPS、到 SORT 追物件、到 SLAM 起地圖——核心都係呢條遞推。深度學習革命之後,KF 唔但冇被淘汰,反而以 differentiable form 同 inductive bias 嘅角色再生。
一張 cheat sheet
| 場景 | 推薦 | 原因 |
|---|---|---|
| Linear system + Gaussian noise | KF | MMSE optimal,5 條公式搞掂 |
| 多目標追蹤(MOT) | KF + Hungarian (SORT-family) | Mahalanobis gating + constant velocity |
| IMU + GPS fusion | EKF(mildly nonlinear) | 已成 navigation 業界 standard |
| Robot SLAM | EKF / UKF / Graph SLAM | 細場景 EKF,大場景 Graph SLAM |
| Quadrotor / 機械臂 control | UKF | 高度 nonlinear,唔想計 Jacobian |
| Robot kidnapping / ambiguous | Particle Filter | Multi-modal posterior |
| End-to-end learnable system | Differentiable KF | NN 學 Q, R, F,KF 做 inductive bias |
相關資源
- 📄 原始論文:Kalman 1960 — A New Approach to Linear Filtering and Prediction Problems
- 📚 教科書 (必讀):Thrun, Burgard, Fox — Probabilistic Robotics (MIT Press, 2005)
- 📚 應用導向:Bar-Shalom — Estimation with Applications to Tracking and Navigation
- 💻 Python 庫:FilterPy(Roger Labbe 寫嘅 KF/EKF/UKF/PF 庫)
- 📖 Tutorial(強烈推薦):Kalman and Bayesian Filters in Python(Jupyter notebook 互動學習)
- 🔗 SORT:arXiv:1602.00763
- 🔗 DeepSORT:arXiv:1703.07402
- 🔗 BackpropKF:arXiv:1605.07148(differentiable KF 嘅 seminal paper)
- 🔗 UKF:Julier & Uhlmann, Unscented filtering and nonlinear estimation, Proc. IEEE 2004
- 🔗 延伸閱讀:RTMO:點樣將 Coordinate Classification 塞入 YOLO,做到 One-Stage Real-Time 多人 Pose Estimation?(multi-person detection,可以接 KF 做 tracking)
Kalman Filter 教咗我哋一件事:好嘅 estimator 唔係冇 noise,而係識得 quantify noise。一個會講「我估佢喺度,但我都唔太肯定」嘅 system,永遠比「我覺得佢一定喺度」嘅 system 更可靠。下次 design system 嘅時候,問自己一句:你個 model 識唔識講「我唔知」? 📡✨