目标检测

目标检测

目标检测

1. 什么是目标检测？

目标检测是计算机视觉的核心任务之一，它不仅要识别图像中有什么物体（分类），还要定位这些物体的位置（用边界框表示）。

2. 与特征提取/匹配的区别

特征提取：找到图像中的关键点（在哪里）
特征匹配：在不同图像间找到对应关系（对应谁）
目标检测：找到特定物体并框出位置（是什么，在哪里）

3. 技术发展脉络

传统方法（2000-2012） → 深度学习时代（2012至今）
├── Haar特征 + 级联分类器（Viola-Jones）
├── HOG特征 + SVM分类器
├── DPM（可变形部件模型）
└── 深度学习检测器
    ├── 两阶段：R-CNN系列（R-CNN, Fast R-CNN, Faster R-CNN）
    └── 单阶段：YOLO系列、SSD、RetinaNet

4. 学习路径

按照技术发展顺序学习： 1. 传统方法：Haar特征与级联分类器（Viola-Jones算法） 2. 传统方法：HOG特征与SVM分类器 3. 深度学习基础：卷积神经网络（CNN）核心概念 4. 深度学习检测器：YOLO系列原理 5. K230部署考量：模型优化与部署策略

Haar特征与级联分类器（Viola-Jones算法）

1. 核心思想

Viola-Jones算法（2001年） 是第一个实时人脸检测算法，核心思想是：

使用简单的矩形特征（Haar-like特征）描述图像
通过积分图快速计算特征值
使用AdaBoost算法选择重要特征
构建级联分类器实现快速检测

2. Haar特征原理

2.1 什么是Haar特征？

Haar特征是图像中相邻矩形区域的像素和之差。基本类型有4种：

# 可视化Haar特征模板
# 1. 边缘特征（Edge features）
#    ┌───┬───┐    ┌───┐
#    │ A │ B │    │ A │
#    └───┴───┘    ├───┤
#                 │ B │
#                 └───┘
#   特征值 = 区域A像素和 - 区域B像素和

# 2. 线特征（Line features）
#    ┌───┬───┬───┐
#    │ A │ B │ A │
#    └───┴───┴───┘
#   特征值 = 区域A像素和 - 区域B像素和

# 3. 中心环绕特征
#    ┌───┬───┐
#    │ A │ B │
#    ├───┼───┤
#    │ B │ A │
#    └───┴───┘

2.2 为什么用矩形特征？

计算简单：只需加减运算
物理意义明确：能捕捉边缘、线条、明暗对比等模式
人脸特性：人脸区域通常有明暗对比（眼睛比脸颊暗，鼻梁比两侧亮）

3. 积分图（Integral Image）加速计算

3.1 积分图定义

积分图是一种预处理数据结构，用于快速计算图像中任意矩形区域的像素和。它的核心思想是空间换时间。

积分图计算

原始图像$I(x,y)$，积分图$S(x,y)$定义为

$S(x, y) = Σ_{i≤x, j≤y} I(i, j)$

用文字描述：积分图中，点(x,y)的值等于原图像中从(0,0)到(x,y)所围矩形区域内所有像素值的累加和。

预处理：遍历整个图像，计算每个节点的积分图的值。时间复杂度$O(n)$，$n$为像素总数
获取积分值：直接查表，时间复杂度 $O(1)$
计算矩形区域$R=(x_1,y_1,x_2,y_2)$的像素和：（面积）

$sum = S(x_2, y_2) - S(x_1-1, y_2) - S(x_2, y_1-1) + S(x_1-1, y_1-1)$

直观理解

# 原始图像示例（3×3）
original = [
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
]
# 对应的积分图
integral = [
    [1,  3,  6],    # 第一行：1, 1+2, 1+2+3
    [5,  12, 21],   # 第二行：1+4, 1+2+4+5, 1+2+3+4+5+6
    [12, 27, 45]    # 第三行：1+4+7, 1+2+4+5+7+8, 全部像素和
]

3.2 计算复杂度

普通方法：O(wh)（w,h为矩形宽高）
积分图方法：O(1)（只需4次查表运算）

好的，我来详细讲解AdaBoost特征选择的原理和实现机制。这是Viola-Jones算法的核心部分。

4. AdaBoost特征选择

1. 背景

1.1 特征数量爆炸

对于一个24×24像素的检测窗口： - 可能的Haar特征数量：超过160,000个 - 每个特征计算需要时间 - 大部分特征对分类贡献很小

1.2 核心挑战

如何在16万个特征中，选择出最有区分度的几百个特征？

2. AdaBoost基本思想

AdaBoost（Adaptive Boosting，自适应增强）是一种集成学习算法，实现了二元分类（是否属于目标特征）：

2.1 核心概念

弱分类器：比随机猜测略好的简单分类器（准确率>50%）
强分类器：多个弱分类器的加权组合
自适应：根据前一轮分类结果调整样本权重

2.2 算法流程

1. 初始化样本权重（所有样本权重相等）
2. 对于每一轮t=1到T：
   a. 用当前权重训练一个弱分类器
   b. 计算该弱分类器的错误率
   c. 根据错误率计算该弱分类器的权重
   d. 更新样本权重：增加错分样本权重，减少正确分类样本权重（不会重复去关注一些特征）
3. 组合所有弱分类器形成强分类器

3. 在Haar检测中的具体应用

3.1 弱分类器定义

每个弱分类器基于一个Haar特征：

class WeakClassifier:
    """基于单个Haar特征的弱分类器"""

    def __init__(self, feature, threshold, polarity):
        """
        feature: Haar特征对象
        threshold: 分类阈值
        polarity: 极性（1或-1），决定不等式方向
        """
        self.feature = feature
        self.threshold = threshold
        self.polarity = polarity  # 1表示特征值<阈值时为正样本，-1表示相反

    def predict(self, x):
        """
        对单个样本进行分类
        x: 样本（图像窗口）
        """
        # 计算Haar特征值
        feature_value = self.feature.compute(x)

        # 分类决策
        if self.polarity * feature_value < self.polarity * self.threshold:
            return 1  # 正样本（人脸）
        else:
            return 0  # 负样本（非人脸）

3.2 数学形式

弱分类器 $ h_j(x) $ 的决策规则： $$

h_j(x) = \begin{cases} 1 & \text{if } p_j \cdot f_j(x) < p_j \cdot \theta_j \ 0 & \text{otherwise} \end{cases}

其中： - $ f_j(x) $：第j个Haar特征值，这个特征值是从预先生成好的库选择，比如水平边缘、垂直边缘等等。AdaBoost要做的就是训练出适合任务的特征是哪些。 - $ \theta_j $：阈值（训练得到错误分类最少的阈值） - $ p_j \in {+1, -1} $：极性参数（训练得到。因为对于不同光照情况，同一个特征点的Haar特征可能相反，我们希望得到一组较好的极性参数，使得它在不同光照下鲁棒性更好）

4. AdaBoost训练过程（了解）

4.1 初始化

def initialize_weights(positive_samples, negative_samples):
    """
    初始化样本权重
    """
    m = len(positive_samples)  # 正样本数
    n = len(negative_samples)  # 负样本数

    # 正样本权重
    pos_weight = 1.0 / (2 * m)
    # 负样本权重
    neg_weight = 1.0 / (2 * n)

    weights = np.concatenate([
        np.full(m, pos_weight),  # 正样本权重
        np.full(n, neg_weight)   # 负样本权重
    ])

    labels = np.concatenate([
        np.ones(m),   # 正样本标签为1
        np.zeros(n)    # 负样本标签为0
    ])

    return weights, labels, positive_samples + negative_samples

4.2 单轮训练：选择最佳弱分类器

def train_weak_classifier(features, samples, weights, labels):
    """
    训练一个弱分类器（选择最佳特征）
    """
    best_error = float('inf') # 取浮点数无穷大
    best_classifier = None
    best_feature_idx = -1

    # 遍历所有特征（预选的所有特征）
    for feature_idx, feature in enumerate(features):
        # 1. 计算所有样本的对应特征的特征值
        feature_values = []
        for sample in samples:
            value = feature.compute(sample)
            feature_values.append(value)

        # 2. 对特征值排序
        sorted_indices = np.argsort(feature_values)
        sorted_values = np.array(feature_values)[sorted_indices]
        sorted_labels = labels[sorted_indices]
        sorted_weights = weights[sorted_indices]

        # 3. 寻找最佳阈值（遍历所有可能的分割点）

        # 计算各自的总权重
        total_pos_weight = np.sum(weights[labels == 1])
        total_neg_weight = np.sum(weights[labels == 0])

        # 当前累积权重
        current_pos_weight = 0.0
        current_neg_weight = 0.0

        for i in range(len(sorted_values)):
            # 更新累积权重
            if sorted_labels[i] == 1:
                current_pos_weight += sorted_weights[i]
            else:
                current_neg_weight += sorted_weights[i]
            # 注意，因为sorted_values已经排序，我们这里是按顺序来找到合适的阈值
            # 计算两种极性下的错误率

            # 极性1：特征值 < 阈值 时分类为正样本
            # 误差 = 正样本在阈值右侧 + 负样本在阈值左侧
            error1 = (total_pos_weight - current_pos_weight) + current_neg_weight

            # 极性-1：特征值 > 阈值 时分类为正样本
            # 误差 = 正样本在阈值左侧 + 负样本在阈值右侧
            error2 = current_pos_weight + (total_neg_weight - current_neg_weight)

            # 选择错误率较小的极性
            if error1 < error2:
                error = error1
                polarity = 1
                threshold = sorted_values[i] + 0.0001  # 稍大于当前值
            else:
                error = error2
                polarity = -1
                threshold = sorted_values[i] - 0.0001  # 稍小于当前值

            # 更新最佳分类器
            if error < best_error:
                best_error = error
                best_classifier = WeakClassifier(
                    feature=feature,
                    threshold=threshold,
                    polarity=polarity
                )
                best_feature_idx = feature_idx

    return best_classifier, best_error, best_feature_idx

4.3 计算弱分类器权重

def compute_classifier_weight(error):
    """
    计算弱分类器的权重（重要性）
    """
    # 防止除零
    error = max(error, 1e-10)
    error = min(error, 1 - 1e-10)

    # 权重公式：α = 0.5 * ln((1 - ε) / ε)
    alpha = 0.5 * np.log((1.0 - error) / error)

    return alpha

4.4 更新样本权重

def update_weights(weights, labels, predictions, alpha):
    """
    更新样本权重
    """
    new_weights = weights.copy()

    for i in range(len(weights)):
        # 计算样本是否被正确分类
        correct = (predictions[i] == labels[i])

        if correct:
            # 正确分类：降低权重
            # 权重乘以 exp(-α)
            new_weights[i] *= np.exp(-alpha)
        else:
            # 错误分类：增加权重
            # 权重乘以 exp(α)
            new_weights[i] *= np.exp(alpha)

    # 归一化权重，使总和为1
    new_weights /= np.sum(new_weights)

    return new_weights

5. 完整AdaBoost训练算法（了解）

class AdaBoostTrainer:
    """AdaBoost训练器"""

    def __init__(self, max_classifiers=100, min_error=0.001):
        self.max_classifiers = max_classifiers
        self.min_error = min_error
        self.classifiers = []  # 弱分类器列表
        self.alphas = []       # 对应权重列表

    def train(self, features, positive_samples, negative_samples):
        """
        训练AdaBoost分类器
        """
        print("开始AdaBoost训练...")
        print(f"正样本数: {len(positive_samples)}")
        print(f"负样本数: {len(negative_samples)}")
        print(f"特征数: {len(features)}")
        print()

        # 1. 初始化权重
        weights, labels, samples = initialize_weights(
            positive_samples, negative_samples
        )

        # 2. 迭代训练
        for t in range(self.max_classifiers):
            print(f"第 {t+1} 轮训练...")

            # 2.1 选择最佳弱分类器
            classifier, error, feature_idx = train_weak_classifier(
                features, samples, weights, labels
            )

            print(f"  最佳特征索引: {feature_idx}")
            print(f"  错误率: {error:.4f}")

            # 如果错误率接近0.5（随机猜测），停止
            if error > 0.5 - 1e-5:
                print("错误率接近随机猜测，停止训练")
                break

            # 2.2 计算分类器权重
            alpha = compute_classifier_weight(error)
            print(f"  分类器权重(α): {alpha:.4f}")

            # 2.3 保存分类器
            self.classifiers.append(classifier)
            self.alphas.append(alpha)

            # 2.4 计算当前分类器的预测
            predictions = []
            for sample in samples:
                pred = classifier.predict(sample)
                predictions.append(pred)
            predictions = np.array(predictions)

            # 2.5 更新样本权重
            weights = update_weights(weights, labels, predictions, alpha)

            # 2.6 计算当前强分类器的性能
            strong_predictions = self._predict_strong(samples)
            strong_error = np.mean(strong_predictions != labels)
            print(f"  当前强分类器错误率: {strong_error:.4f}")
            print()

            # 检查停止条件
            if strong_error < self.min_error:
                print(f"达到最小错误率 {self.min_error}，停止训练")
                break

        print(f"训练完成，共选择 {len(self.classifiers)} 个特征")
        return self

    def _predict_strong(self, samples):
        """使用当前所有弱分类器进行预测"""
        if len(self.classifiers) == 0:
            return np.zeros(len(samples))

        predictions = np.zeros(len(samples))

        for classifier, alpha in zip(self.classifiers, self.alphas):
            for i, sample in enumerate(samples):
                pred = classifier.predict(sample)
                # 加权投票
                if pred == 1:
                    predictions[i] += alpha
                else:
                    predictions[i] -= alpha

        # 最终决策：加权和 > 0 则为正样本
        final_predictions = (predictions > 0).astype(int)

        return final_predictions

    def predict(self, sample):
        """预测单个样本"""
        if len(self.classifiers) == 0:
            return 0

        score = 0.0
        for classifier, alpha in zip(self.classifiers, self.alphas):
            pred = classifier.predict(sample)
            if pred == 1:
                score += alpha
            else:
                score -= alpha

        return 1 if score > 0 else 0

5. 级联分类器（Cascade Classifier）

5.1 级联结构

输入图像 → 第1层分类器 → 第2层分类器 → ... → 第N层分类器 → 检测结果
         ↓ 拒绝非人脸    ↓ 拒绝非人脸              ↓ 拒绝非人脸

5.2 设计思想

早期拒绝：简单分类器放在前面，快速排除明显不是人脸的窗口
逐步精细：后面的分类器更复杂，用于确认困难样本
效率优化：大部分窗口在前几层就被拒绝，只有少数窗口需要完整检测

5.3 性能优势

假设： - 每层拒绝50%的非人脸窗口 - 人脸窗口100%通过 - 有10层分类器

则非人脸窗口通过率：(0.5)^10 ≈ 0.1%，极大提升检测速度。

6. OpenCV实现示例

import cv2
import numpy as np

def haar_face_detection():
    """
    Haar特征人脸检测示例
    """
    # 1. 加载预训练的级联分类器
    # OpenCV自带训练好的人脸检测模型
    face_cascade = cv2.CascadeClassifier(cv2.data.haarcascades + 'haarcascade_frontalface_default.xml')

    # 2. 读取图像
    img = cv2.imread('test_face.jpg')
    if img is None:
        # 如果没有测试图像，创建一个模拟图像
        print("未找到测试图像，创建模拟图像...")
        img = np.zeros((400, 600, 3), dtype=np.uint8)
        # 画一个模拟人脸
        cv2.rectangle(img, (200, 150), (400, 350), (200, 200, 200), -1)  # 脸部
        cv2.circle(img, (250, 200), 20, (100, 100, 100), -1)  # 左眼
        cv2.circle(img, (350, 200), 20, (100, 100, 100), -1)  # 右眼
        cv2.rectangle(img, (280, 280), (320, 300), (150, 150, 150), -1)  # 嘴巴

    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

    # 3. 人脸检测
    # scaleFactor: 图像缩放比例（用于多尺度检测）
    # minNeighbors: 检测框合并的最小邻居数
    # minSize: 最小检测尺寸
    # maxSize: 最大检测尺寸
    faces = face_cascade.detectMultiScale(
        gray,
        scaleFactor=1.1,      # 每次缩放10%
        minNeighbors=5,       # 至少5个邻居才认为是人脸
        minSize=(30, 30),     # 最小人脸尺寸
        maxSize=(300, 300)    # 最大人脸尺寸
    )

    # 4. 绘制检测结果
    for (x, y, w, h) in faces:
        cv2.rectangle(img, (x, y), (x+w, y+h), (0, 255, 0), 2)
        # 标记关键点（模拟）
        cv2.circle(img, (x + w//3, y + h//3), 5, (0, 0, 255), -1)  # 左眼
        cv2.circle(img, (x + 2*w//3, y + h//3), 5, (0, 0, 255), -1)  # 右眼
        cv2.rectangle(img, (x + w//2 - 10, y + 2*h//3), 
                     (x + w//2 + 10, y + 2*h//3 + 10), (0, 0, 255), -1)  # 嘴巴

    # 5. 显示结果
    cv2.imshow('Haar Face Detection', img)
    print(f"检测到 {len(faces)} 个人脸")

    # 6. 参数影响分析
    print("\n参数调优说明：")
    print("1. scaleFactor (1.1-1.3): 越小检测越细，但速度越慢")
    print("2. minNeighbors (3-6): 越大误检越少，但可能漏检")
    print("3. minSize: 根据实际人脸大小设置，排除太小的误检")

    cv2.waitKey(0)
    cv2.destroyAllWindows()

7. 知识关联与实现机制

7.1 与传统特征提取的关系

Harris/Shi-Tomasi：检测角点，适用于特征匹配
Haar特征：检测区域对比，适用于目标检测
共同点：都基于图像局部统计特性

7.2 与K230的关联

在K230上部署Haar检测器的考虑： 1. 计算优势： - 积分图计算适合硬件加速 - 级联结构适合流水线处理 2. 内存需求： - 模型小（几百KB） - 适合嵌入式设备 3. 实时性： - 早期拒绝机制保证速度 - 可调整检测尺度平衡速度与精度

7.3 局限性

旋转不变性差：主要检测正面人脸
光照敏感：依赖灰度对比
遮挡处理差：部分遮挡易导致漏检
多类别扩展难：每类物体需要单独训练好的，我们进入下一课：HOG特征 + SVM分类器。

HOG特征 + SVM分类器

1. 概述：从Haar到HOG的演进

1.1 Haar特征的局限性

主要依赖灰度对比（明暗关系）
对光照变化敏感
对旋转、形变鲁棒性差
主要适用于刚性物体（如正面人脸）

1.2 HOG特征的创新

HOG（Histogram of Oriented Gradients，方向梯度直方图）： - 基于局部梯度方向统计 - 对光照变化不敏感（归一化处理） - 能更好地描述物体轮廓 - 特别适合行人检测

2. HOG特征原理详解

2.1 核心思想

用局部区域的梯度方向分布来描述物体形状

2.2 计算步骤

1. 计算图像梯度（大小和方向）
2. 将图像划分为小的细胞单元（cell）
3. 统计每个cell内的梯度方向直方图
4. 将多个cell组合成块（block），进行归一化
5. 将所有块的直方图连接成最终特征向量

3. HOG特征计算过程

3.1 梯度计算

def compute_gradients(image):
    """
    计算图像的梯度大小和方向
    """
    # 使用Sobel算子
    gx = cv2.Sobel(image, cv2.CV_32F, 1, 0, ksize=1)  # x方向梯度
    gy = cv2.Sobel(image, cv2.CV_32F, 0, 1, ksize=1)  # y方向梯度

    # 梯度大小
    magnitude = np.sqrt(gx**2 + gy**2)

    # 梯度方向（角度，0-180度）
    angle = np.arctan2(gy, gx) * 180 / np.pi
    angle = np.where(angle < 0, angle + 180, angle)  # 转换到0-180度，类似三元表达式

    return magnitude, angle

3.2 细胞单元（Cell）直方图

def compute_cell_histogram(magnitude, angle, cell_size=8, bins=9):
    """
    计算一个cell的梯度方向直方图
    cell_size: 细胞大小（如8×8像素）
    bins: 方向分箱数（通常9个方向：0°,20°,40°,...,160°）
    """
    h, w = magnitude.shape # magnitude是梯度幅值图
    cell_h = h // cell_size # 向下取整除法，得到各个cell的位序
    cell_w = w // cell_size

    histogram = np.zeros((cell_h, cell_w, bins))

    for y in range(cell_h):
        for x in range(cell_w):
            # 提取当前cell
            y_start = y * cell_size
            y_end = y_start + cell_size
            x_start = x * cell_size
            x_end = x_start + cell_size

            cell_mag = magnitude[y_start:y_end, x_start:x_end]
            cell_ang = angle[y_start:y_end, x_start:x_end] # 梯度方向图

            # 计算直方图，遍历cell中的每一个像素
            for i in range(cell_size):
                for j in range(cell_size):
                    mag = cell_mag[i, j]
                    ang = cell_ang[i, j]

                    # 确定属于哪个bin
                    bin_idx = int(ang / 20) % bins  # 每20度一个bin

                    # 线性插值（分配到相邻两个bin）
                    bin_center = bin_idx * 20 + 10  # bin中心角度
                    diff = ang - bin_center

                    if diff < -10:
                        # 分配到前一个bin
                        prev_bin = (bin_idx - 1) % bins # % bins是保险措施，这里没必要
                        weight = (diff + 20) / 20 
                        # 如果diff=-20 说明完全在前一个bin
                        histogram[y, x, prev_bin] += mag * (1 - weight) 
                        histogram[y, x, bin_idx] += mag * weight
                    elif diff > 10:
                        # 分配到后一个bin
                        next_bin = (bin_idx + 1) % bins # % bins是保险措施，这里有必要
                        weight = (20 - diff) / 20
                        histogram[y, x, bin_idx] += mag * weight
                        histogram[y, x, next_bin] += mag * (1 - weight)
                    else:
                        # 在当前bin
                        histogram[y, x, bin_idx] += mag

    return histogram

补充：为什么需要 $weight$（双线性插值）？

核心原因：平滑性（Smoothness）

避免边界突变：
如果没有插值，35.1度和34.9度会分配到不同的bin
导致特征向量在边界处剧烈变化
插值后，边界处的变化是平滑的
提高鲁棒性：
轻微的图像旋转不会导致特征完全改变
特征对小的方向变化不敏感
数学上的连续性：
使HOG特征成为方向角的连续函数
有利于后续的机器学习算法

3.3 块（Block）归一化

def block_normalization(histogram, block_size=2, stride=1):
    """
    histogram：cell单元 直方图 图
    块归一化：提高对光照变化的鲁棒性
    block_size: 块包含的cell数（如2×2个cell）
    stride: 滑动步长（通常为1）
    """
    cell_h, cell_w, bins = histogram.shape # cell直方图的行数和列数，bins数
    block_h = (cell_h - block_size) // stride + 1 # block的列数
    block_w = (cell_w - block_size) // stride + 1 # block的行数

    hog_features = [] # 特征图

    for y in range(0, cell_h - block_size + 1, stride):
        for x in range(0, cell_w - block_size + 1, stride):
            # 提取block
            block = histogram[y:y+block_size, x:x+block_size, :]

            # 展平
            block_vector = block.flatten()

            # L2归一化
            norm = np.sqrt(np.sum(block_vector**2) + 1e-6)  # 加小值防止除零
            block_vector = block_vector / norm

            hog_features.append(block_vector)

    # 连接所有block的特征
    hog_feature = np.concatenate(hog_features)

    return hog_feature

4. SVM分类器原理

4.1 基本思想

SVM（Support Vector Machine，支持向量机）： - 寻找一个最优超平面来分隔两类数据 - 使间隔（margin）最大化 - 支持向量：距离超平面最近的样本点

4.2 数学形式（简化）

对于线性可分情况： - 超平面方程：$ w^T x + b = 0 $ - 决策函数：$ f(x) = \text{sign}(w^T x + b) $ - 优化目标：最大化间隔 $ \frac{2}{|w|} $

4.3 核技巧（Kernel Trick）

对于非线性可分情况，使用核函数将数据映射到高维空间： - 线性核：$ K(x_i, x_j) = x_i^T x_j $ - 多项式核：$ K(x_i, x_j) = (x_i^T x_j + c)^d $ - RBF核（高斯核）：$ K(x_i, x_j) = \exp(-\gamma |x_i - x_j|^2) $