## Mirror sites
We use AWS as the main site to host our model zoo, and maintain a mirror on aliyun.
You can replace `` with `` in model urls.
- All FPN baselines and RPN-C4 baselines were trained using 8 GPU with a batch size of 16 (2 images per GPU). Other C4 baselines were trained using 8 GPU with a batch size of 8 (1 image per GPU).
- All models were trained on `coco_2017_train`, and tested on the `coco_2017_val`.
- We use distributed training and BN layer stats are fixed.
- We adopt the same training schedules as Detectron. 1x indicates 12 epochs and 2x indicates 24 epochs, which corresponds to slightly less iterations than Detectron and the difference can be ignored.
- All pytorch-style pretrained backbones on ImageNet are from PyTorch model zoo.
- For fair comparison with other codebases, we report the GPU memory as the maximum value of `torch.cuda.max_memory_allocated()` for all 8 GPUs. Note that this value is usually less than what `nvidia-smi` shows.
- We report the inference time as the overall time including data loading, network forwarding and post processing.
Please refer to [RPN]( for details.
Please refer to [Faster R-CNN]( for details.
Please refer to [Mask R-CNN]( for details.
Please refer to [Fast R-CNN]( for details.
Please refer to [RetinaNet]( for details.
Please refer to [Cascade R-CNN]( for details.
### Hybrid Task Cascade (HTC)
Please refer to [HTC]( for details.
Please refer to [SSD]( for details.
Please refer to [Group Normalization]( for details.
Please refer to [Weight Standardization]( for details.
Please refer to [Deformable Convolutional Networks]( for details.
### CARAFE: Content-Aware ReAssembly of FEatures
Please refer to [CARAFE]( for details.
### Instaboost
Please refer to [Instaboost]( for details.
Jiangmiao Pang
### Libra R-CNN
Please refer to [Libra R-CNN]( for details.
Jiangmiao Pang
### Guided Anchoring
Please refer to [Guided Anchoring]( for details.
Please refer to [FCOS]( for details.
Please refer to [FoveaBox]( for details.
Please refer to [RepPoints]( for details.
Please refer to [FreeAnchor]( for details.
Please refer to [Grid R-CNN]( for details.
Please refer to [GHM]( for details.
Please refer to [GCNet]( for details.
Please refer to [HRNet]( for details.
Please refer to [Mask Scoring R-CNN]( for details.
Please refer to [Rethinking ImageNet Pre-training]( for details.
Please refer to [NAS-FPN]( for details.
### ATSS
Please refer to [ATSS]( for details.
We also benchmark some methods on [PASCAL VOC](, [Cityscapes]( and [WIDER FACE](
## Speed benchmark
We compare the training speed of Mask R-CNN with some other popular frameworks (The data is copied from [detectron2](
| Implementation | Throughput (img/s) |
| [Detectron2]( | 61 |
| [MMDetection]( | 60 |
| [maskrcnn-benchmark]( | 51 |
| [tensorpack]( | 50 |
| [simpledet]( | 39 |
| [Detectron]( | 19 |
| [matterport/Mask_RCNN]( | 14 |
We compare mmdetection with [Detectron2]( in terms of speed and performance.
We use the commit id [185c27e]( of detectron.
For fair comparison, we install and run both frameworks on the same machine.
- 8 NVIDIA Tesla V100 (32G) GPUs
- Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz
- Python 3.7
- PyTorch 1.4
- CUDA 10.1
- CUDNN 7.6.03
- NCCL 2.4.08
<td rowspan="2">Faster R-CNN</td>
<td>38.6 & 35.2</td>
<td>38.8 & 35.4</td>
<td>41.0 & 37.2 </td>
The training speed is measure with s/iter. The lower, the better.
The inference speed is measured with fps (img/s) on a single GPU, the higher, the better.
To be consistent with Detectron2, we report the pure inference speed (without the time of data loading).
For Mask R-CNN, we exclude the time of RLE encoding in post-processing.
We also include the officially reported speed in the parentheses, which is slightly higher
than the results tested on our server due to differences of hardwards.
<td>Faster R-CNN</td>
<td>Mask R-CNN</td>