@w460461339 2020-05-20T08:57:31.000000Z 字数 3322 阅读 2294

Pytorch-yolov3 单机多GPU训练

MachineLearning

1、多GPU训练扫盲

https://zhuanlan.zhihu.com/p/72939003

2、pytorch-yolov3

原始代码：
https://github.com/WuZifan/PyTorch-YOLOv3

参考1：
https://github.com/ujsyehao/yolov3-multigpu
上面这哥们写的不能用，还是报下面这个错，但是他改进的几个地方可以参考

noobj_mask[b[i], anchor_ious > ignore_thres, gj[i], gi[i]] = 0 # i is index RuntimeError: CUDA error: device-side assert triggered

参考2：
https://github.com/zhangyongshun/Training_YOLOv3_with_Multi-GPUs
这哥们的能跑，验证了看每个GPU也的确被占满了，但没有说为啥改。

参考3：
https://github.com/yinghuang/yolov3_pytorch
这里面的‘说明’文档里面，解释了一些改动的理由，不过我没跑。

之后需要以参考2为主，结合参考1和参考3的说明来看下pytorch-multi-gpu的一些坑。

3、代码说明

代码以

测试下来，感觉pytorch上这几个点要注意：

1）模型要改成model = nn.DataParallel(model)
2）一个batch中，每个样本中包含的目标数量要一致 -> 向样本中目标数量最大的样本对齐（utils/datasets.py）。


    def collate_fn(self, batch):
        paths, imgs, targets = list(zip(*batch))
        targets = [boxes for boxes in targets if boxes is not None]
        '''
            下面这里操作是为了保证，在一个batch中，
            每个图片上，包含的目标对象数量一致
            比如batch=8，每张图包含的数量为[1,2,3,4,5,6,7,8]
            那么，需要通过pad，每张图包含的数量变成[8,8,8,8,8,8,8,8]
        '''
        # 找到这个batch中，目标数量最多的那一个图片
        max_targets = max([targets[i].size(0) for i in range(len(targets))])
        # 下面是每个图片的目标数量补充到和最大的那个图片一致
        padded_targets = list()
        for i, boxes in enumerate(targets):
            if boxes is not None:
                boxes[:, 0] = i
                absent = max_targets - boxes.size(0)
                if absent > 0:
                    boxes = torch.cat((boxes, torch.zeros((absent, 6))), 0)
                padded_targets.append(boxes)
        targets = [boxes for boxes in padded_targets]
        #########################
        targets = torch.cat(targets, 0)
        # select new image size every 10 batch
        if self.multiscale and self.batch_count % 10 == 0:
            self.img_size = random.choice(range(self.min_size, self.max_size + 1, 32))
        # resize image(pad-to-square) to new size
        imgs = torch.stack([resize(img, self.img_size) for img in imgs])
        self.batch_count += 1
        return paths, imgs, targets

3）在生成target(labels)时，将对其的空标签删除；
4）另外，最重要的是，由于nn.DataParallel是Parameter-Server的，因此相当于一个batch的数据会被均分到各个GPU上，假设batch-size=4，n_GPUS=4,那么每个GPU就分到两个。

在不做处理的时候，每个GPU被分到的batch，其batch-id会是 [0,1],[2,3],[4,5],[6,7]
现在要将其全部映射到[0,batch_size/n_gpus]上面(这里是[0,1])，因此每个元素都要对2取余。

def build_targets(pred_boxes, pred_cls, target, anchors, ignore_thres):
    ByteTensor = torch.cuda.ByteTensor if pred_boxes.is_cuda else torch.ByteTensor
    FloatTensor = torch.cuda.FloatTensor if pred_boxes.is_cuda else torch.FloatTensor
    nB = pred_boxes.size(0)
    nA = pred_boxes.size(1)
    nC = pred_cls.size(-1)
    nG = pred_boxes.size(2)
    # Output tensors
    obj_mask = ByteTensor(nB, nA, nG, nG).fill_(0)
    noobj_mask = ByteTensor(nB, nA, nG, nG).fill_(1)
    class_mask = FloatTensor(nB, nA, nG, nG).fill_(0)
    iou_scores = FloatTensor(nB, nA, nG, nG).fill_(0)
    tx = FloatTensor(nB, nA, nG, nG).fill_(0)
    ty = FloatTensor(nB, nA, nG, nG).fill_(0)
    tw = FloatTensor(nB, nA, nG, nG).fill_(0)
    th = FloatTensor(nB, nA, nG, nG).fill_(0)
    tcls = FloatTensor(nB, nA, nG, nG, nC).fill_(0)
    # 这里，把之前填充成全部是0的去掉
    target  = target[target.sum(dim=1) != 0]
    # Convert to position relative to box
    target_boxes = target[:, 2:6] * nG
    gxy = target_boxes[:, :2]
    gwh = target_boxes[:, 2:]
    # Get anchors with best iou
    ious = torch.stack([bbox_wh_iou(anchor, gwh) for anchor in anchors])
    best_ious, best_n = ious.max(0)
    # Separate target values
    b, target_labels = target[:, :2].long().t()
    # 拿到有几个GPU
    n_gpus = len(n_gpu.split(','))
    # 把batch分成N_gpus分
    img_cnt_per_gpu = int(batch_size/n_gpus)
    # 将每个GPU拿到的batch的编号，从0~batch_size
    # 降到 0~img_cnt_per_gpu
    b = b%img_cnt_per_gpu
    gx, gy = gxy.t()
    gw, gh = gwh.t()
    gi, gj = gxy.long().t()
    # Set masks
    ......
    ......

5）原来可以直接用model.yolo_layers来获取自定义的模型中的数据，现在由于被nn.DataParallel包裹了一层，需要用model.moduel.yolo_layers来获取。
6）loss方面，原来是是单卡训练，所以直接loss.backward()就好，现在多卡，loss是一个array，因此需要用loss.mean().backward()来进行反向传播。其他用到loss的地方，大多数也需要加上loss.mean()

当然，这里考虑用sum也行

Pytorch-yolov3 单机多GPU训练

1、多GPU训练扫盲

2、pytorch-yolov3

3、代码说明

内容目录