@wanghuijiao
2021-09-22T16:40:36.000000Z
字数 14802
阅读 3749
技术文档
/ssd01/wanghuijiao/pose_detector02
环境配置(默认使用Linux系统)
使用make
编译
make
命令。GPU=1
用CUDA编译,CUDA需要在\myurl{/usr/local/cuda}目录下CUDNN=1
用cuDNN v5-v7编译,cuDNN需要在/usr/local/cudnn目录下CUDNN_HALF=1
使用单精度来训练和测试,可以在Titan V / Tesla V100 / DGX-2或一些更新的设备上获得测试性能3倍的提升,训练速度2倍的提升。OPENCV=1
,编译过程中使用OpenCV 4.x/3.x/2.4.x,方便测试的时候可以使用视频和摄像头。DEBUG=1
,编译debug版本的YOLO。OPENMP=1
,在多核CPU平台上使用OpenMP加速。LIBSO=1
,编译生成darknet.so和可执行的库文件。下载预训练模型
./darknet detector test ./cfg/coco.data ./cfg/yolov4.cfg ./yolov4.weights data/dog.jpg -i 0 -thresh 0.25
./darknet detector demo ./cfg/coco.data ./cfg/yolov4.cfg ./yolov4.weights test50.mp4 -i 0 -thresh 0.25
./darknet detector demo ./cfg/coco.data ./cfg/yolov4.cfg ./yolov4.weights rtsp://admin:admin12345@192.168.0.228:554 -i 0 -thresh 0.25
10.0.10.56:/ssd01/wanghuijiao/dataset/manfu_human/v1.0
标签格式说明
标签文件中每一行格式为:<object-class> <x_center> <y_center> <width> <height>
,一行表示一个bbox。
<object-class>
表示标签编号,取值范围从0到class-1
<x_center> <y_center> <width> <height>
依次是bbox的中心横纵坐标和宽高,均为相对于图片宽高的比值,是浮点数,计算方式如:<x> = <absolute_x> / <image_width> 或 <height> = <absolute_height> / <image_height>
。比如close_2m_stand_coldlight_longsleeve_backshot_withoutshelter_CJ67_P0964.mp4#t=1.966667.txt
内容为:
0 0.41476345840130513 0.4560756579689358 0.20309951060358902 0.9121513159378716
0 0.4942903752039152 0.3217391304347827 0.07014681892332793 0.6434782608695654
配置文件
darknet/cfg/yolov4_pose.cfg
, YOLOv4通过.cfg
文件定义网络结构,此类文件一般放在darknet/cfg
目录下。 数据配置文件:准备数据文件darknet/cfg/pose.data
,YOLOv4用.data
文件记录数据集相关信息;
darknet/cfg/pose.data
内容示例如下:
classes = 6
train = /ssd01/wanghuijiao/dataset/manfu_human/v1.0/train_v1.0.txt
valid = /ssd01/wanghuijiao/dataset/manfu_human/v1.0/test_v1.0.txt
names = /ssd01/wanghuijiao/pose_detector02/data/pose.names
backup = ./backup_manfu_human_v1.0_512
eval = coco
darknet/data/pose.names
,此文件记录数据集中标签名称,本实验的darknet/data/pose.names
内容示例如下:
stand
sit
squat
lie
half_lie
other_pose
/ssd01/wanghuijiao/pose_detector02/cfg/yolov4-tiny_human_416.cfg
/ssd01/wanghuijiao/pose_detector02/cfg/crowdhuman-416.data
/ssd01/wanghuijiao/pose_detector02/data/crowdhuman.names
/ssd01/wanghuijiao/pose_detector02/cfg/yolov4-tiny_human_pose_416.cfg
/ssd01/wanghuijiao/pose_detector02/cfg/pose.data
/ssd01/wanghuijiao/pose_detector02/data/pose.names
训练命令
10.0.10.56: /ssd01/wanghuijiao/pose_detector02/csdarknet53-omega.conv.105
下载csdarknet53-omega.conv.105
./darknet detector train \
cfg/pose_b.data \ # 数据配置文件
cfg/pose_yolov4.cfg \ # 模型配置文件
./csdarknet53-omega.conv.105 \ # ImageNet预训练权重
-dont_show \
-map \
-gpus 4,5,6,7
调参方法
0.00065
,若是单卡,则使用默认的0.00231
.cfg
中参数配置(大约第24行)处增加mosaic=1
, 关闭则注释掉这一行。cutmix=1
。
# 1.先在COCO person上训练6000batch,预训练权重是官方提供的yolov4-tiny.conv.29(应该是COCO80类上训练的)
./darknet detector train \
cfg/yolov4-tiny_cocohuman_416.data \
cfg/yolov4-tiny_human_416.cfg \
yolov4-tiny.conv.29 \
-dont_show \
-map \
-gpus 4,5,6,7
# 2.然后在CrowdHuman person类上继续训练14000个batch
./darknet detector train \
cfg/crowdhuman-416.data \
cfg/yolov4-tiny_human_416.cfg \
backup/yolov4-tiny_cocohuman_416/yolov4-tiny_human_416_best.weights \ # 在COCO person类上的batch6000预训练权重
-dont_show \
-map \
-gpus 4,5,6,7 \
# -clear 添加这个选项会从batch=0开始训练,不添加就从batch6000继续
./darknet detector train \
cfg/pose.data \
cfg/yolov4-tiny_human_pose_416.cfg \
backup/yolov4-tiny_manfu_humanPose_416/yolov4-tiny_human_pose_416_best.weights \
-dont_show \
-map \
-gpus 4,5,6,7 \
-clear
性能指标
实验编号/分支名 | 参数 | 精度 |
---|---|---|
0817 yolov4_tiny_crowdhuman_416 | IOU_th=0.25; w=416,h=416; 预训练模型:yolov4-tiny.conv.29 | person ap = 59.12% |
权重:/ssd01/wanghuijiao/pose_detector02/backup/yolov4_tiny_crowdhuman_416/yolov4-tiny_human_416_best.weights
模型性能分析& Demo
模型训练结果图mAP、Loss等指标变化(只保存了训练 40000 batch的后20000 batch部分)
性能指标
实验编号/分支名 | 参数 | 精度 |
---|---|---|
0820 yolov4-tiny_manfu_humanPose_416 | IOU_th=0.25; w=416,h=416; 测试集:test_v1.0.txt; 预训练模型:v4-tiny 人体检测器 | stand ap = 68.94%; sit ap = 84.11%; squat ap = 49.3%; lie ap = 32.5%; half_lie ap = 34.15%; other_pose ap = 14.10%, mAP = 47.19 % |
/ssd01/wanghuijiao/pose_detector02/backup/yolov4-tiny_manfu_humanPose_416/yolov4-tiny_human_pose_416_best.weights
# test_v1.0.txt
# weights: ./backup_manfu_human_v1.0_512/pose_yolov4_best.weights
class_id = 0, name = stand, ap = 64.18% (TP = 2450, FP = 1119)
class_id = 1, name = sit, ap = 72.67% (TP = 2426, FP = 924)
class_id = 2, name = squat, ap = 33.98% (TP = 84, FP = 96)
class_id = 3, name = lie, ap = 23.06% (TP = 10, FP = 5)
class_id = 4, name = half_lie, ap = 33.38% (TP = 2, FP = 2)
class_id = 5, name = other_pose, ap = 1.21% (TP = 0, FP = 0)
for conf_thresh = 0.25, precision = 0.70, recall = 0.58, F1-score = 0.63
for conf_thresh = 0.25, TP = 4972, FP = 2146, FN = 3591, average IoU = 48.36 %
IoU threshold = 50 %, used Area-Under-Curve for each unique Recall
mean average precision (mAP@0.50) = 0.380806, or 38.08 %
# test_v1.0.txt
# weights: ./backup_manfu_human_v1.0_b_512/pose_yolov4_b_best.weights
class_id = 0, name = stand, ap = 68.38% (TP = 536, FP = 238)
class_id = 1, name = sit, ap = 79.06% (TP = 429, FP = 143)
class_id = 2, name = squat, ap = 63.94% (TP = 125, FP = 36)
class_id = 3, name = lie, ap = 55.34% (TP = 132, FP = 78)
class_id = 4, name = half_lie, ap = 30.82% (TP = 58, FP = 47)
class_id = 5, name = other_pose, ap = 34.12% (TP = 15, FP = 6)
for conf_thresh = 0.25, precision = 0.70, recall = 0.58, F1-score = 0.64
for conf_thresh = 0.25, TP = 1295, FP = 548, FN = 919, average IoU = 53.73 %
IoU threshold = 50 %, used Area-Under-Curve for each unique Recall
mean average precision (mAP@0.50) = 0.552780, or 55.28 %
calculation mAP (mean average precision)...
Detection layer: 139 - type = 27
Detection layer: 150 - type = 27
Detection layer: 161 - type = 27
4708
detections_count = 13112, unique_truth_count = 8563
class_id = 0, name = stand, ap = 73.59% (TP = 2901, FP = 856)
class_id = 1, name = sit, ap = 86.05% (TP = 2810, FP = 462)
class_id = 2, name = squat, ap = 57.60% (TP = 138, FP = 96)
class_id = 3, name = lie, ap = 38.77% (TP = 126, FP = 105)
class_id = 4, name = half_lie, ap = 44.31% (TP = 513, FP = 330)
class_id = 5, name = other_pose, ap = 39.18% (TP = 25, FP = 17)
for conf_thresh = 0.25, precision = 0.78, recall = 0.76, F1-score = 0.77
for conf_thresh = 0.25, TP = 6513, FP = 1866, FN = 2050, average IoU = 63.42 %
IoU threshold = 50 %, used Area-Under-Curve for each unique Recall
mean average precision (mAP@0.50) = 0.565842, or 56.58 %
Total Detection Time: 90 Seconds
Set -points flag:
`-points 101` for MS COCO
`-points 11` for PascalVOC 2007 (uncomment `difficult` in voc.data)
`-points 0` (AUC) for ImageNet, PascalVOC 2010-2012, your custom dataset
mean_average_precision (mAP@0.5) = 0.565842
Saving weights to ./backup_manfu_human_v1.0_512/pose_yolov4_120000.weights
1036
detections_count = 2797, unique_truth_count = 2214
class_id = 0, name = stand, ap = 68.25% (TP = 530, FP = 140)
class_id = 1, name = sit, ap = 80.36% (TP = 457, FP = 104)
class_id = 2, name = squat, ap = 63.12% (TP = 148, FP = 38)
class_id = 3, name = lie, ap = 61.53% (TP = 164, FP = 50)
class_id = 4, name = half_lie, ap = 33.34% (TP = 133, FP = 114)
class_id = 5, name = other_pose, ap = 54.09% (TP = 35, FP = 16)
for conf_thresh = 0.25, precision = 0.76, recall = 0.66, F1-score = 0.71
for conf_thresh = 0.25, TP = 1467, FP = 462, FN = 747, average IoU = 61.19 %
IoU threshold = 50 %, used Area-Under-Curve for each unique Recall
mean average precision (mAP@0.50) = 0.601145, or 60.11 %
Total Detection Time: 19 Seconds
Set -points flag:
`-points 101` for MS COCO
`-points 11` for PascalVOC 2007 (uncomment `difficult` in voc.data)
`-points 0` (AUC) for ImageNet, PascalVOC 2010-2012, your custom dataset
mean_average_precision (mAP@0.5) = 0.601145
Saving weights to ./backup_manfu_human_v1.0_b_512/pose_yolov4_b_120000.weights
calculation mAP (mean average precision)...
Detection layer: 30 - type = 27
Detection layer: 37 - type = 27
2696
detections_count = 57081, unique_truth_count = 11004
class_id = 0, name = person, ap = 59.12% (TP = 5692, FP = 1653)
for conf_thresh = 0.25, precision = 0.77, recall = 0.52, F1-score = 0.62
for conf_thresh = 0.25, TP = 5692, FP = 1653, FN = 5312, average IoU = 58.60 %
IoU threshold = 50 %, used Area-Under-Curve for each unique Recall
mean average precision (mAP@0.50) = 0.591248, or 59.12 %
Total Detection Time: 20 Seconds
Set -points flag:
`-points 101` for MS COCO
`-points 11` for PascalVOC 2007 (uncomment `difficult` in voc.data)
`-points 0` (AUC) for ImageNet, PascalVOC 2010-2012, your custom dataset
mean_average_precision (mAP@0.5) = 0.591248
calculation mAP (mean average precision)...
Detection layer: 30 - type = 27
Detection layer: 37 - type = 27
2696
detections_count = 63890, unique_truth_count = 11004
class_id = 0, name = person, ap = 60.66% (TP = 5477, FP = 1682)
for conf_thresh = 0.25, precision = 0.77, recall = 0.50, F1-score = 0.60
for conf_thresh = 0.25, TP = 5477, FP = 1682, FN = 5527, average IoU = 58.19 %
IoU threshold = 50 %, used Area-Under-Curve for each unique Recall
mean average precision (mAP@0.50) = 0.606624, or 60.66 %
Total Detection Time: 22 Seconds
Set -points flag:
`-points 101` for MS COCO
`-points 11` for PascalVOC 2007 (uncomment `difficult` in voc.data)
`-points 0` (AUC) for ImageNet, PascalVOC 2010-2012, your custom dataset
mean_average_precision (mAP@0.5) = 0.606624
/ssd01/wanghuijiao/pose_detector02/backup/yolov4-tiny_manfu_humanPose_416/yolov4-tiny_human_pose_416_best.weights
calculation mAP (mean average precision)...
Detection layer: 30 - type = 27
Detection layer: 37 - type = 27
4708
detections_count = 49418, unique_truth_count = 8563
class_id = 0, name = stand, ap = 68.94% (TP = 2448, FP = 969)
class_id = 1, name = sit, ap = 84.11% (TP = 2684, FP = 656)
class_id = 2, name = squat, ap = 49.37% (TP = 107, FP = 68)
class_id = 3, name = lie, ap = 32.50% (TP = 60, FP = 55)
class_id = 4, name = half_lie, ap = 34.15% (TP = 182, FP = 144)
class_id = 5, name = other_pose, ap = 14.10% (TP = 5, FP = 4)
for conf_thresh = 0.25, precision = 0.74, recall = 0.64, F1-score = 0.69
for conf_thresh = 0.25, TP = 5486, FP = 1896, FN = 3077, average IoU = 58.11 %
IoU threshold = 50 %, used Area-Under-Curve for each unique Recall
mean average precision (mAP@0.50) = 0.471944, or 47.19 %
Total Detection Time: 75 Seconds
训练前
.cfg
文件中random=1
,这样有助于提升对不同分辨率的precision.cfg
文件中增加模型分辨率,比如height=608, width=608
。-show_imgs
,如果看到正确的bbox标在物体上(windows或者aug...jpg图片上).cfg
文件的最后一个[yolo]
层或者[region]
层添加参数max=200
(YoloV3在整个图片全局最大值是可以检测出0.0615234375*(width*height)个物体)。.cfg
中L895改为layer=23
, L892改stride=4
,L989改为stride=4
。为啥???.cfg
文件的flip=0
翻转数据增强关掉train_network_width * train_obj_width / train_image_width ~= detection_network_width * detection_obj_width / detection_image_width
.cfg
文件的136层stopbackward=1
训练后 - 用于检测
.cfg
图像分辨率,这样会提高precision,并且有助于检测小目标.weights
即可Out of memory
,就更改.cfg
文件的subdivisions=16,32 or 64