-
T.T. Tang authored
* Implement YOLOv3 * Remove unused function * Update yolov3_ms_aug_273e.py Clean the comments in config file * Add README.md * port to mmdet-2.0 api * unify registry * port to ConvModule and remove ConvLayer * Refactor Backbone * Update README * Lint and format * Unify the class name * fix the `label - 1` problem * Move a lot hard-coded params to the __init__ function * Refactor YOLOV3Neck * Add norm_cfg and act_cfg to backbone * Update Config * Fix doc string * Fix nms (thanks to @LMerCy) * Add doc string * Update config * Remove pretrained in head and neck * Add support for conv_cfg in neck * Update mmdet/models/dense_heads/yolo_head.py Co-authored-by:
Jerry Jiarui XU <xvjiarui0826@gmail.com> * Update mmdet/models/dense_heads/yolo_head.py Co-authored-by:
Jerry Jiarui XU <xvjiarui0826@gmail.com> * Fix README.md * Fix typos * update config * flake8, yapf, docformatter, etc * Update README * Add conv_cfg to backbone and head * Move some config to arch_settings in backbone * Add doc strings and replace Warning with warnings.warn() * Fix bug. * Update doc * Add _frozen_stages for backbone * Update mmdet/models/backbones/darknet.py Co-authored-by:
Jerry Jiarui XU <xvjiarui0826@gmail.com> * Fix inplace bug * fix indent * refactor config * set 8GPU lr * fixed typo * update performance table * Resolve conversation * Add anchor generator and coder * fixed test * Finish refactor * refactor anchor order * fixed batch size * Fixed train_cfg * fix yolo assigner * clean up * Fixed format * Update model zoo * change to mmcv pretrain link * add test forward * fixed comma and docstring * Refactor loss * reformat * fixed avg_factor * revert to original * fixed format * update table * fixed BCE Co-authored-by:
Haoyu Wu <haoyu.wu@wdc.com> Co-authored-by:
Haoyu Wu <wuhy08@users.noreply.github.com> Co-authored-by:
Haoyu Wu <wuhaoyu1989@gmail.com> Co-authored-by:
Jerry Jiarui XU <xvjiarui0826@gmail.com> Co-authored-by:
xmpeng <1051323399@qq.com>
T.T. Tang authored* Implement YOLOv3 * Remove unused function * Update yolov3_ms_aug_273e.py Clean the comments in config file * Add README.md * port to mmdet-2.0 api * unify registry * port to ConvModule and remove ConvLayer * Refactor Backbone * Update README * Lint and format * Unify the class name * fix the `label - 1` problem * Move a lot hard-coded params to the __init__ function * Refactor YOLOV3Neck * Add norm_cfg and act_cfg to backbone * Update Config * Fix doc string * Fix nms (thanks to @LMerCy) * Add doc string * Update config * Remove pretrained in head and neck * Add support for conv_cfg in neck * Update mmdet/models/dense_heads/yolo_head.py Co-authored-by:
Jerry Jiarui XU <xvjiarui0826@gmail.com> * Update mmdet/models/dense_heads/yolo_head.py Co-authored-by:
Jerry Jiarui XU <xvjiarui0826@gmail.com> * Fix README.md * Fix typos * update config * flake8, yapf, docformatter, etc * Update README * Add conv_cfg to backbone and head * Move some config to arch_settings in backbone * Add doc strings and replace Warning with warnings.warn() * Fix bug. * Update doc * Add _frozen_stages for backbone * Update mmdet/models/backbones/darknet.py Co-authored-by:
Jerry Jiarui XU <xvjiarui0826@gmail.com> * Fix inplace bug * fix indent * refactor config * set 8GPU lr * fixed typo * update performance table * Resolve conversation * Add anchor generator and coder * fixed test * Finish refactor * refactor anchor order * fixed batch size * Fixed train_cfg * fix yolo assigner * clean up * Fixed format * Update model zoo * change to mmcv pretrain link * add test forward * fixed comma and docstring * Refactor loss * reformat * fixed avg_factor * revert to original * fixed format * update table * fixed BCE Co-authored-by:
Haoyu Wu <haoyu.wu@wdc.com> Co-authored-by:
Haoyu Wu <wuhy08@users.noreply.github.com> Co-authored-by:
Haoyu Wu <wuhaoyu1989@gmail.com> Co-authored-by:
Jerry Jiarui XU <xvjiarui0826@gmail.com> Co-authored-by:
xmpeng <1051323399@qq.com>
- Benchmark and Model Zoo
- Mirror sites
- Common settings
- Baselines
- RPN
- Faster R-CNN
- Mask R-CNN
- Fast R-CNN (with pre-computed proposals)
- RetinaNet
- Cascade R-CNN and Cascade Mask R-CNN
- Hybrid Task Cascade (HTC)
- SSD
- Group Normalization (GN)
- Weight Standardization
- Deformable Convolution v2
- CARAFE: Content-Aware ReAssembly of FEatures
- Instaboost
- Libra R-CNN
- Guided Anchoring
- FCOS
- FoveaBox
- RepPoints
- FreeAnchor
- Grid R-CNN (plus)
- GHM
- GCNet
- HRNet
- Mask Scoring R-CNN
- Train from Scratch
- NAS-FPN
- ATSS
- FSAF
- RegNetX
- Res2Net
- GRoIE
- Dynamic R-CNN
- PointRend
- DetectoRS
- Generalized Focal Loss
- CornerNet
- YOLOv3
- Other datasets
- Pre-trained Models
- Speed benchmark
- Comparison with Detectron2
- Hardware
- Software environment
- Performance
- Training Speed
- Inference Speed
- Training memory
Benchmark and Model Zoo
Mirror sites
We use AWS as the main site to host our model zoo, and maintain a mirror on aliyun.
You can replace https://s3.ap-northeast-2.amazonaws.com/open-mmlab
with https://open-mmlab.oss-cn-beijing.aliyuncs.com
in model urls.
Common settings
- All models were trained on
coco_2017_train
, and tested on thecoco_2017_val
. - We use distributed training.
- All pytorch-style pretrained backbones on ImageNet are from PyTorch model zoo, caffe-style pretrained backbones are converted from the newly released model from detectron2.
- For fair comparison with other codebases, we report the GPU memory as the maximum value of
torch.cuda.max_memory_allocated()
for all 8 GPUs. Note that this value is usually less than whatnvidia-smi
shows. - We report the inference time as the total time of network forwarding and post-processing, excluding the data loading time. Results are obtained with the script benchmark.py which computes the average time on 2000 images.
Baselines
RPN
Please refer to RPN for details.
Faster R-CNN
Please refer to Faster R-CNN for details.
Mask R-CNN
Please refer to Mask R-CNN for details.
Fast R-CNN (with pre-computed proposals)
Please refer to Fast R-CNN for details.
RetinaNet
Please refer to RetinaNet for details.
Cascade R-CNN and Cascade Mask R-CNN
Please refer to Cascade R-CNN for details.
Hybrid Task Cascade (HTC)
Please refer to HTC for details.
SSD
Please refer to SSD for details.
Group Normalization (GN)
Please refer to Group Normalization for details.
Weight Standardization
Please refer to Weight Standardization for details.
Deformable Convolution v2
Please refer to Deformable Convolutional Networks for details.
CARAFE: Content-Aware ReAssembly of FEatures
Please refer to CARAFE for details.
Instaboost
Please refer to Instaboost for details.
Libra R-CNN
Please refer to Libra R-CNN for details.
Guided Anchoring
Please refer to Guided Anchoring for details.
FCOS
Please refer to FCOS for details.
FoveaBox
Please refer to FoveaBox for details.
RepPoints
Please refer to RepPoints for details.
FreeAnchor
Please refer to FreeAnchor for details.
Grid R-CNN (plus)
Please refer to Grid R-CNN for details.
GHM
Please refer to GHM for details.
GCNet
Please refer to GCNet for details.
HRNet
Please refer to HRNet for details.
Mask Scoring R-CNN
Please refer to Mask Scoring R-CNN for details.
Train from Scratch
Please refer to Rethinking ImageNet Pre-training for details.
NAS-FPN
Please refer to NAS-FPN for details.
ATSS
Please refer to ATSS for details.
FSAF
Please refer to FSAF for details.
RegNetX
Please refer to RegNet for details.
Res2Net
Please refer to Res2Net for details.
GRoIE
Please refer to GRoIE for details.
Dynamic R-CNN
Please refer to Dynamic R-CNN for details.
PointRend
Please refer to PointRend for details.
DetectoRS
Please refer to DetectoRS for details.
Generalized Focal Loss
Please refer to Generalized Focal Loss for details.
CornerNet
Please refer to CornerNet for details.
YOLOv3
Please refer to YOLOv3 for details.
Other datasets
We also benchmark some methods on PASCAL VOC, Cityscapes and WIDER FACE.
Pre-trained Models
We also train Faster R-CNN and Mask R-CNN using ResNet-50 and RegNetX-3.2G with multi-scale training and longer schedules. These models serve as strong pre-trained models for downstream tasks for convenience.
Speed benchmark
We compare the training speed of Mask R-CNN with some other popular frameworks (The data is copied from detectron2). For mmdetection, we benchmark with mask_rcnn_r50_caffe_fpn_poly_1x_coco_v1.py, which should have the same setting with mask_rcnn_R_50_FPN_noaug_1x.yaml of detectron2. We also provide the checkpoint and training log for reference. The throughput is computed as the average throughput in iterations 100-500 to skip GPU warmup time.
Implementation | Throughput (img/s) |
---|---|
Detectron2 | 62 |
MMDetection | 61 |
maskrcnn-benchmark | 53 |
tensorpack | 50 |
simpledet | 39 |
Detectron | 19 |
matterport/Mask_RCNN | 14 |
Comparison with Detectron2
We compare mmdetection with Detectron2 in terms of speed and performance. We use the commit id 185c27e(30/4/2020) of detectron. For fair comparison, we install and run both frameworks on the same machine.
Hardware
- 8 NVIDIA Tesla V100 (32G) GPUs
- Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz
Software environment
- Python 3.7
- PyTorch 1.4
- CUDA 10.1
- CUDNN 7.6.03
- NCCL 2.4.08
Performance
Type | Lr schd | Detectron2 | mmdetection | Download |
---|---|---|---|---|
Faster R-CNN | 1x | 37.9 | 38.0 | model | log |
Mask R-CNN | 1x | 38.6 & 35.2 | 38.8 & 35.4 | model | log |
Retinanet | 1x | 36.5 | 37.0 | model | log |
Training Speed
The training speed is measure with s/iter. The lower, the better.
Type | Detectron2 | mmdetection |
---|---|---|
Faster R-CNN | 0.210 | 0.216 |
Mask R-CNN | 0.261 | 0.265 |
Retinanet | 0.200 | 0.205 |
Inference Speed
The inference speed is measured with fps (img/s) on a single GPU, the higher, the better. To be consistent with Detectron2, we report the pure inference speed (without the time of data loading). For Mask R-CNN, we exclude the time of RLE encoding in post-processing. We also include the officially reported speed in the parentheses, which is slightly higher than the results tested on our server due to differences of hardwares.
Type | Detectron2 | mmdetection |
---|---|---|
Faster R-CNN | 25.6 (26.3) | 22.2 |
Mask R-CNN | 22.5 (23.3) | 19.6 |
Retinanet | 17.8 (18.2) | 20.6 |
Training memory
Type | Detectron2 | mmdetection |
---|---|---|
Faster R-CNN | 3.0 | 3.8 |
Mask R-CNN | 3.4 | 3.9 |
Retinanet | 3.9 | 3.4 |