[关闭]
@wanghuijiao 2021-09-22T16:40:36.000000Z 字数 14802 阅读 3749

YOLOv4 新手教程-人体形态检测&人体检测

技术文档


前言

1. 跑通YOLOv4源码

2. 数据集

人体形态数据集(自制)

人体数据集(公开)

3. 准备配置文件

配置文件说明

“Yolov4-tiny 人体检测器”配置文件内容

“Yolov4-tiny 人体形态检测器”配置文件内容

4. 模型训练

训练过程说明

“Yolov4-tiny 人体检测器”训练

  1. # 1.先在COCO person上训练6000batch,预训练权重是官方提供的yolov4-tiny.conv.29(应该是COCO80类上训练的)
  2. ./darknet detector train \
  3. cfg/yolov4-tiny_cocohuman_416.data \
  4. cfg/yolov4-tiny_human_416.cfg \
  5. yolov4-tiny.conv.29 \
  6. -dont_show \
  7. -map \
  8. -gpus 4,5,6,7
  9. # 2.然后在CrowdHuman person类上继续训练14000个batch
  10. ./darknet detector train \
  11. cfg/crowdhuman-416.data \
  12. cfg/yolov4-tiny_human_416.cfg \
  13. backup/yolov4-tiny_cocohuman_416/yolov4-tiny_human_416_best.weights \ # 在COCO person类上的batch6000预训练权重
  14. -dont_show \
  15. -map \
  16. -gpus 4,5,6,7 \
  17. # -clear 添加这个选项会从batch=0开始训练,不添加就从batch6000继续

“Yolov4-tiny 人体形态检测器”训练

  1. ./darknet detector train \
  2. cfg/pose.data \
  3. cfg/yolov4-tiny_human_pose_416.cfg \
  4. backup/yolov4-tiny_manfu_humanPose_416/yolov4-tiny_human_pose_416_best.weights \
  5. -dont_show \
  6. -map \
  7. -gpus 4,5,6,7 \
  8. -clear

5. 模型测试与性能

性能指标讲解

“Yolov4-tiny 人体检测器” 结果

“Yolov4-tiny 人体形态检测器” 结果

附录1:调试过程记录

0806

  1. # test_v1.0.txt
  2. # weights: ./backup_manfu_human_v1.0_512/pose_yolov4_best.weights
  3. class_id = 0, name = stand, ap = 64.18% (TP = 2450, FP = 1119)
  4. class_id = 1, name = sit, ap = 72.67% (TP = 2426, FP = 924)
  5. class_id = 2, name = squat, ap = 33.98% (TP = 84, FP = 96)
  6. class_id = 3, name = lie, ap = 23.06% (TP = 10, FP = 5)
  7. class_id = 4, name = half_lie, ap = 33.38% (TP = 2, FP = 2)
  8. class_id = 5, name = other_pose, ap = 1.21% (TP = 0, FP = 0)
  9. for conf_thresh = 0.25, precision = 0.70, recall = 0.58, F1-score = 0.63
  10. for conf_thresh = 0.25, TP = 4972, FP = 2146, FN = 3591, average IoU = 48.36 %
  11. IoU threshold = 50 %, used Area-Under-Curve for each unique Recall
  12. mean average precision (mAP@0.50) = 0.380806, or 38.08 %
  1. # test_v1.0.txt
  2. # weights: ./backup_manfu_human_v1.0_b_512/pose_yolov4_b_best.weights
  3. class_id = 0, name = stand, ap = 68.38% (TP = 536, FP = 238)
  4. class_id = 1, name = sit, ap = 79.06% (TP = 429, FP = 143)
  5. class_id = 2, name = squat, ap = 63.94% (TP = 125, FP = 36)
  6. class_id = 3, name = lie, ap = 55.34% (TP = 132, FP = 78)
  7. class_id = 4, name = half_lie, ap = 30.82% (TP = 58, FP = 47)
  8. class_id = 5, name = other_pose, ap = 34.12% (TP = 15, FP = 6)
  9. for conf_thresh = 0.25, precision = 0.70, recall = 0.58, F1-score = 0.64
  10. for conf_thresh = 0.25, TP = 1295, FP = 548, FN = 919, average IoU = 53.73 %
  11. IoU threshold = 50 %, used Area-Under-Curve for each unique Recall
  12. mean average precision (mAP@0.50) = 0.552780, or 55.28 %

0809

  1. calculation mAP (mean average precision)...
  2. Detection layer: 139 - type = 27
  3. Detection layer: 150 - type = 27
  4. Detection layer: 161 - type = 27
  5. 4708
  6. detections_count = 13112, unique_truth_count = 8563
  7. class_id = 0, name = stand, ap = 73.59% (TP = 2901, FP = 856)
  8. class_id = 1, name = sit, ap = 86.05% (TP = 2810, FP = 462)
  9. class_id = 2, name = squat, ap = 57.60% (TP = 138, FP = 96)
  10. class_id = 3, name = lie, ap = 38.77% (TP = 126, FP = 105)
  11. class_id = 4, name = half_lie, ap = 44.31% (TP = 513, FP = 330)
  12. class_id = 5, name = other_pose, ap = 39.18% (TP = 25, FP = 17)
  13. for conf_thresh = 0.25, precision = 0.78, recall = 0.76, F1-score = 0.77
  14. for conf_thresh = 0.25, TP = 6513, FP = 1866, FN = 2050, average IoU = 63.42 %
  15. IoU threshold = 50 %, used Area-Under-Curve for each unique Recall
  16. mean average precision (mAP@0.50) = 0.565842, or 56.58 %
  17. Total Detection Time: 90 Seconds
  18. Set -points flag:
  19. `-points 101` for MS COCO
  20. `-points 11` for PascalVOC 2007 (uncomment `difficult` in voc.data)
  21. `-points 0` (AUC) for ImageNet, PascalVOC 2010-2012, your custom dataset
  22. mean_average_precision (mAP@0.5) = 0.565842
  23. Saving weights to ./backup_manfu_human_v1.0_512/pose_yolov4_120000.weights
  1. 1036
  2. detections_count = 2797, unique_truth_count = 2214
  3. class_id = 0, name = stand, ap = 68.25% (TP = 530, FP = 140)
  4. class_id = 1, name = sit, ap = 80.36% (TP = 457, FP = 104)
  5. class_id = 2, name = squat, ap = 63.12% (TP = 148, FP = 38)
  6. class_id = 3, name = lie, ap = 61.53% (TP = 164, FP = 50)
  7. class_id = 4, name = half_lie, ap = 33.34% (TP = 133, FP = 114)
  8. class_id = 5, name = other_pose, ap = 54.09% (TP = 35, FP = 16)
  9. for conf_thresh = 0.25, precision = 0.76, recall = 0.66, F1-score = 0.71
  10. for conf_thresh = 0.25, TP = 1467, FP = 462, FN = 747, average IoU = 61.19 %
  11. IoU threshold = 50 %, used Area-Under-Curve for each unique Recall
  12. mean average precision (mAP@0.50) = 0.601145, or 60.11 %
  13. Total Detection Time: 19 Seconds
  14. Set -points flag:
  15. `-points 101` for MS COCO
  16. `-points 11` for PascalVOC 2007 (uncomment `difficult` in voc.data)
  17. `-points 0` (AUC) for ImageNet, PascalVOC 2010-2012, your custom dataset
  18. mean_average_precision (mAP@0.5) = 0.601145
  19. Saving weights to ./backup_manfu_human_v1.0_b_512/pose_yolov4_b_120000.weights

0809

0817 human detector yolov4-tiny

  1. calculation mAP (mean average precision)...
  2. Detection layer: 30 - type = 27
  3. Detection layer: 37 - type = 27
  4. 2696
  5. detections_count = 57081, unique_truth_count = 11004
  6. class_id = 0, name = person, ap = 59.12% (TP = 5692, FP = 1653)
  7. for conf_thresh = 0.25, precision = 0.77, recall = 0.52, F1-score = 0.62
  8. for conf_thresh = 0.25, TP = 5692, FP = 1653, FN = 5312, average IoU = 58.60 %
  9. IoU threshold = 50 %, used Area-Under-Curve for each unique Recall
  10. mean average precision (mAP@0.50) = 0.591248, or 59.12 %
  11. Total Detection Time: 20 Seconds
  12. Set -points flag:
  13. `-points 101` for MS COCO
  14. `-points 11` for PascalVOC 2007 (uncomment `difficult` in voc.data)
  15. `-points 0` (AUC) for ImageNet, PascalVOC 2010-2012, your custom dataset
  16. mean_average_precision (mAP@0.5) = 0.591248
  1. calculation mAP (mean average precision)...
  2. Detection layer: 30 - type = 27
  3. Detection layer: 37 - type = 27
  4. 2696
  5. detections_count = 63890, unique_truth_count = 11004
  6. class_id = 0, name = person, ap = 60.66% (TP = 5477, FP = 1682)
  7. for conf_thresh = 0.25, precision = 0.77, recall = 0.50, F1-score = 0.60
  8. for conf_thresh = 0.25, TP = 5477, FP = 1682, FN = 5527, average IoU = 58.19 %
  9. IoU threshold = 50 %, used Area-Under-Curve for each unique Recall
  10. mean average precision (mAP@0.50) = 0.606624, or 60.66 %
  11. Total Detection Time: 22 Seconds
  12. Set -points flag:
  13. `-points 101` for MS COCO
  14. `-points 11` for PascalVOC 2007 (uncomment `difficult` in voc.data)
  15. `-points 0` (AUC) for ImageNet, PascalVOC 2010-2012, your custom dataset
  16. mean_average_precision (mAP@0.5) = 0.606624

0820 v4-tiny 人体形态检测更新

  1. calculation mAP (mean average precision)...
  2. Detection layer: 30 - type = 27
  3. Detection layer: 37 - type = 27
  4. 4708
  5. detections_count = 49418, unique_truth_count = 8563
  6. class_id = 0, name = stand, ap = 68.94% (TP = 2448, FP = 969)
  7. class_id = 1, name = sit, ap = 84.11% (TP = 2684, FP = 656)
  8. class_id = 2, name = squat, ap = 49.37% (TP = 107, FP = 68)
  9. class_id = 3, name = lie, ap = 32.50% (TP = 60, FP = 55)
  10. class_id = 4, name = half_lie, ap = 34.15% (TP = 182, FP = 144)
  11. class_id = 5, name = other_pose, ap = 14.10% (TP = 5, FP = 4)
  12. for conf_thresh = 0.25, precision = 0.74, recall = 0.64, F1-score = 0.69
  13. for conf_thresh = 0.25, TP = 5486, FP = 1896, FN = 3077, average IoU = 58.11 %
  14. IoU threshold = 50 %, used Area-Under-Curve for each unique Recall
  15. mean average precision (mAP@0.50) = 0.471944, or 47.19 %
  16. Total Detection Time: 75 Seconds

附录2:提升Yolov4模型训练效果的技巧(by作者AlexeyAB)

  1. 训练前

    • .cfg 文件中random=1,这样有助于提升对不同分辨率的precision
    • .cfg文件中增加模型分辨率,比如height=608, width=608
    • 检查训练集的标签都标上了,没有错标漏标等
    • 损失非常高,mAP非常低是训练错误吗?
      • 训练模型命令末尾加上-show_imgs ,如果看到正确的bbox标在物体上(windows或者aug...jpg图片上)
    • 对于每个你想检测的物体,在训练集中至少存在一个相似物体:形状、物体侧面、相对大小、旋转角度、翻转、光照等。所以你的训练数据集包括图像与对象在不同:比例,旋转,灯光,从不同的方面,在不同的背景-你应该最好有2000不同的图像为每个类或更多,你应该训练2000*类迭代或更多。
    • desirable that your training dataset include images with non-labeled objects that you do not want to detect - negative samples without bounded box (empty .txt files) - use as many images of negative samples as there are images with objects
    • 最好的标注方式:无论是指标物体的可见部分,或是标注可见部分和被遮挡部分,或者完整物体包含一点边界,都可,关键在于你想如何检测这个物体。
    • 对于检测一张图片中存在大量目标的情况,可以在.cfg文件的最后一个[yolo]层或者[region]层添加参数max=200(YoloV3在整个图片全局最大值是可以检测出0.0615234375*(width*height)个物体)。
    • 对于小物体检测:(图片缩放到416*416后小于16*16)将.cfg中L895改为layer=23, L892改stride=4,L989改为stride=4。为啥???
    • 同时训练小目标和大目标可以用更改后的模型:
    • 如果要训练的模型需要区分左右边,比如左右手,把.cfg文件的flip=0翻转数据增强关掉
    • 通用规则-训练集应该包含与测试集相同的相对尺寸,对测试集每一个物体,训练集里至少应用有同类别同一个相对尺寸的样本。train_network_width * train_obj_width / train_image_width ~= detection_network_width * detection_obj_width / detection_image_width
    • 加速训练(同时会降低精度),设置.cfg文件的136层stopbackward=1
    • 每一个:对象的模型,侧面,光照,比例,每一个30度的转弯和倾斜角度-从神经网络的内部角度来看,这些是不同的对象。因此,你想要检测的对象越多,就应该使用越复杂的网络模型。
    • 为了使检测到的有界盒更精确,可以在每个[yolo]层中添加3个参数ignore_thresh = .9 iou_normalizer=0.5 iou_loss=giou,它会增加mAP@0.9,减少mAP@0.5。
    • 除非你是专家,否则不要改anchor。
  2. 训练后 - 用于检测

    • 增加.cfg图像分辨率,这样会提高precision,并且有助于检测小目标
    • 没必要重新训练网络,仅需要使用416*416的模型.weights即可
    • 为了得到更高的精度accuracy,提高模型分辨率,若是出现Out of memory,就更改.cfg文件的subdivisions=16,32 or 64
添加新批注
在作者公开此批注前,只有你和作者可见。
回复批注