[关闭]
@w460461339 2020-05-20T16:57:31.000000Z 字数 3322 阅读 1982

Pytorch-yolov3 单机多GPU训练

MachineLearning


1、多GPU训练扫盲

https://zhuanlan.zhihu.com/p/72939003

2、pytorch-yolov3

原始代码:
https://github.com/WuZifan/PyTorch-YOLOv3

参考1:
https://github.com/ujsyehao/yolov3-multigpu
上面这哥们写的不能用,还是报下面这个错,但是他改进的几个地方可以参考

  1. noobj_mask[b[i], anchor_ious > ignore_thres, gj[i], gi[i]] = 0 # i is index RuntimeError: CUDA error: device-side assert triggered

参考2:
https://github.com/zhangyongshun/Training_YOLOv3_with_Multi-GPUs
这哥们的能跑,验证了看每个GPU也的确被占满了,但没有说为啥改。

参考3:
https://github.com/yinghuang/yolov3_pytorch
这里面的‘说明’文档里面,解释了一些改动的理由,不过我没跑。

之后需要以参考2为主,结合参考1和参考3的说明来看下pytorch-multi-gpu的一些坑。

3、代码说明

代码以

测试下来,感觉pytorch上这几个点要注意:

  1. def collate_fn(self, batch):
  2. paths, imgs, targets = list(zip(*batch))
  3. targets = [boxes for boxes in targets if boxes is not None]
  4. '''
  5. 下面这里操作是为了保证,在一个batch中,
  6. 每个图片上,包含的目标对象数量一致
  7. 比如batch=8,每张图包含的数量为[1,2,3,4,5,6,7,8]
  8. 那么,需要通过pad,每张图包含的数量变成[8,8,8,8,8,8,8,8]
  9. '''
  10. # 找到这个batch中,目标数量最多的那一个图片
  11. max_targets = max([targets[i].size(0) for i in range(len(targets))])
  12. # 下面是每个图片的目标数量补充到和最大的那个图片一致
  13. padded_targets = list()
  14. for i, boxes in enumerate(targets):
  15. if boxes is not None:
  16. boxes[:, 0] = i
  17. absent = max_targets - boxes.size(0)
  18. if absent > 0:
  19. boxes = torch.cat((boxes, torch.zeros((absent, 6))), 0)
  20. padded_targets.append(boxes)
  21. targets = [boxes for boxes in padded_targets]
  22. #########################
  23. targets = torch.cat(targets, 0)
  24. # select new image size every 10 batch
  25. if self.multiscale and self.batch_count % 10 == 0:
  26. self.img_size = random.choice(range(self.min_size, self.max_size + 1, 32))
  27. # resize image(pad-to-square) to new size
  28. imgs = torch.stack([resize(img, self.img_size) for img in imgs])
  29. self.batch_count += 1
  30. return paths, imgs, targets

在不做处理的时候,每个GPU被分到的batch,其batch-id会是 [0,1],[2,3],[4,5],[6,7]
现在要将其全部映射到[0,batch_size/n_gpus]上面(这里是[0,1]),因此每个元素都要对2取余。

  1. def build_targets(pred_boxes, pred_cls, target, anchors, ignore_thres):
  2. ByteTensor = torch.cuda.ByteTensor if pred_boxes.is_cuda else torch.ByteTensor
  3. FloatTensor = torch.cuda.FloatTensor if pred_boxes.is_cuda else torch.FloatTensor
  4. nB = pred_boxes.size(0)
  5. nA = pred_boxes.size(1)
  6. nC = pred_cls.size(-1)
  7. nG = pred_boxes.size(2)
  8. # Output tensors
  9. obj_mask = ByteTensor(nB, nA, nG, nG).fill_(0)
  10. noobj_mask = ByteTensor(nB, nA, nG, nG).fill_(1)
  11. class_mask = FloatTensor(nB, nA, nG, nG).fill_(0)
  12. iou_scores = FloatTensor(nB, nA, nG, nG).fill_(0)
  13. tx = FloatTensor(nB, nA, nG, nG).fill_(0)
  14. ty = FloatTensor(nB, nA, nG, nG).fill_(0)
  15. tw = FloatTensor(nB, nA, nG, nG).fill_(0)
  16. th = FloatTensor(nB, nA, nG, nG).fill_(0)
  17. tcls = FloatTensor(nB, nA, nG, nG, nC).fill_(0)
  18. # 这里,把之前填充成全部是0的去掉
  19. target = target[target.sum(dim=1) != 0]
  20. # Convert to position relative to box
  21. target_boxes = target[:, 2:6] * nG
  22. gxy = target_boxes[:, :2]
  23. gwh = target_boxes[:, 2:]
  24. # Get anchors with best iou
  25. ious = torch.stack([bbox_wh_iou(anchor, gwh) for anchor in anchors])
  26. best_ious, best_n = ious.max(0)
  27. # Separate target values
  28. b, target_labels = target[:, :2].long().t()
  29. # 拿到有几个GPU
  30. n_gpus = len(n_gpu.split(','))
  31. # 把batch分成N_gpus分
  32. img_cnt_per_gpu = int(batch_size/n_gpus)
  33. # 将每个GPU拿到的batch的编号,从0~batch_size
  34. # 降到 0~img_cnt_per_gpu
  35. b = b%img_cnt_per_gpu
  36. gx, gy = gxy.t()
  37. gw, gh = gwh.t()
  38. gi, gj = gxy.long().t()
  39. # Set masks
  40. ......
  41. ......

当然,这里考虑用sum也行

添加新批注
在作者公开此批注前,只有你和作者可见。
回复批注