MODEL_ZOO.md



Benchmark and Model Zoo

Environment

Hardware

8 NVIDIA Tesla V100 GPUs
Intel Xeon 4114 CPU @ 2.20GHz


Software environment

Python 3.6 / 3.7
PyTorch 1.1
CUDA 9.0.176
CUDNN 7.0.4
NCCL 2.1.15


Mirror sites
We use AWS as the main site to host our model zoo, and maintain a mirror on aliyun.
You can replace https://s3.ap-northeast-2.amazonaws.com/open-mmlab with https://open-mmlab.oss-cn-beijing.aliyuncs.com in model urls.

Common settings

All FPN baselines and RPN-C4 baselines were trained using 8 GPU with a batch size of 16 (2 images per GPU). Other C4 baselines were trained using 8 GPU with a batch size of 8 (1 image per GPU).
All models were trained on coco_2017_train, and tested on the coco_2017_val.
We use distributed training and BN layer stats are fixed.
We adopt the same training schedules as Detectron. 1x indicates 12 epochs and 2x indicates 24 epochs, which corresponds to slightly less iterations than Detectron and the difference can be ignored.
All pytorch-style pretrained backbones on ImageNet are from PyTorch model zoo.
For fair comparison with other codebases, we report the GPU memory as the maximum value of torch.cuda.max_memory_allocated() for all 8 GPUs. Note that this value is usually less than what nvidia-smi shows.
We report the inference time as the overall time including data loading, network forwarding and post processing.


Baselines
More models with different backbones will be added to the model zoo.

RPN


Backbone
Style
Lr schd
Mem (GB)
Train time (s/iter)
Inf time (fps)
AR1000
Download


R-50-C4
caffe
1x
-
-
20.5
51.1
model


R-50-C4
caffe
2x
2.2
0.17
20.3
52.2
model


R-50-C4
pytorch
1x
-
-
20.1
50.2
model


R-50-C4
pytorch
2x
-
-
20.0
51.1
model


R-50-FPN
caffe
1x
3.3
0.253
16.9
58.2
-


R-50-FPN
pytorch
1x
3.5
0.276
17.7
57.1
model


R-50-FPN
pytorch
2x
-
-
-
57.6
model


R-101-FPN
caffe
1x
5.2
0.379
13.9
59.4
-


R-101-FPN
pytorch
1x
5.4
0.396
14.4
58.6
model


R-101-FPN
pytorch
2x
-
-
-
59.1
model


X-101-32x4d-FPN
pytorch
1x
6.6
0.589
11.8
59.4
model


X-101-32x4d-FPN
pytorch
2x
-
-
-
59.9
model


X-101-64x4d-FPN
pytorch
1x
9.5
0.955
8.3
59.8
model


X-101-64x4d-FPN
pytorch
2x
-
-
-
60.0
model


Faster R-CNN


Backbone
Style
Lr schd
Mem (GB)
Train time (s/iter)
Inf time (fps)
box AP
Download


R-50-C4
caffe
1x
-
-
9.5
34.9
model


R-50-C4
caffe
2x
4.0
0.39
9.3
36.5
model


R-50-C4
pytorch
1x
-
-
9.3
33.9
model


R-50-C4
pytorch
2x
-
-
9.4
35.9
model


R-50-FPN
caffe
1x
3.6
0.333
13.5
36.6
-


R-50-FPN
pytorch
1x
3.8
0.353
13.6
36.4
model


R-50-FPN
pytorch
2x
-
-
-
37.7
model


R-101-FPN
caffe
1x
5.5
0.465
11.5
38.8
-


R-101-FPN
pytorch
1x
5.7
0.474
11.9
38.5
model


R-101-FPN
pytorch
2x
-
-
-
39.4
model


X-101-32x4d-FPN
pytorch
1x
6.9
0.672
10.3
40.1
model


X-101-32x4d-FPN
pytorch
2x
-
-
-
40.4
model


X-101-64x4d-FPN
pytorch
1x
9.8
1.040
7.3
41.3
model


X-101-64x4d-FPN
pytorch
2x
-
-
-
40.7
model


HRNetV2p-W18
pytorch
1x
-
-
-
36.1
model


HRNetV2p-W18
pytorch
2x
-
-
-
38.3
model


HRNetV2p-W32
pytorch
1x
-
-
-
39.5
model


HRNetV2p-W32
pytorch
2x
-
-
-
40.6
model


HRNetV2p-W48
pytorch
1x
-
-
-
40.9
model


HRNetV2p-W48
pytorch
2x
-
-
-
41.5
model


Mask R-CNN


Backbone
Style
Lr schd
Mem (GB)
Train time (s/iter)
Inf time (fps)
box AP
mask AP
Download


R-50-C4
caffe
1x
-
-
8.1
35.9
31.5
model


R-50-C4
caffe
2x
4.2
0.43
8.1
37.9
32.9
model


R-50-C4
pytorch
1x
-
-
7.9
35.1
31.2
model


R-50-C4
pytorch
2x
-
-
8.0
37.2
32.5
model


R-50-FPN
caffe
1x
3.8
0.430
10.2
37.4
34.3
-


R-50-FPN
pytorch
1x
3.9
0.453
10.6
37.3
34.2
model


R-50-FPN
pytorch
2x
-
-
-
38.5
35.1
model


R-101-FPN
caffe
1x
5.7
0.534
9.4
39.9
36.1
-


R-101-FPN
pytorch
1x
5.8
0.571
9.5
39.4
35.9
model


R-101-FPN
pytorch
2x
-
-
-
40.3
36.5
model


X-101-32x4d-FPN
pytorch
1x
7.1
0.759
8.3
41.1
37.1
model


X-101-32x4d-FPN
pytorch
2x
-
-
-
41.4
37.1
model


X-101-64x4d-FPN
pytorch
1x
10.0
1.102
6.5
42.1
38.0
model


X-101-64x4d-FPN
pytorch
2x
-
-
-
42.0
37.7
model


HRNetV2p-W18
pytorch
1x
-
-
-
37.3
34.2
model


HRNetV2p-W18
pytorch
2x
-
-
-
39.2
35.7
model


HRNetV2p-W32
pytorch
1x
-
-
-
40.7
36.8
model


HRNetV2p-W32
pytorch
2x
-
-
-
41.7
37.5
model


HRNetV2p-W48
pytorch
1x
-
-
-
42.4
38.1
model


HRNetV2p-W48
pytorch
2x
-
-
-
42.9
38.3
model


Fast R-CNN (with pre-computed proposals)


Backbone
Style
Type
Lr schd
Mem (GB)
Train time (s/iter)
Inf time (fps)
box AP
mask AP
Download


R-50-C4
caffe
Faster
1x
-
-
6.7
35.0
-
model


R-50-C4
caffe
Faster
2x
3.8
0.34
6.6
36.4
-
model


R-50-C4
pytorch
Faster
1x
-
-
6.3
34.2
-
model


R-50-C4
pytorch
Faster
2x
-
-
6.1
35.8
-
model


R-50-FPN
caffe
Faster
1x
3.3
0.242
18.4
36.6
-
-


R-50-FPN
pytorch
Faster
1x
3.5
0.250
16.5
35.8
-
model


R-50-C4
caffe
Mask
1x
-
-
8.1
35.9
31.5
model


R-50-C4
caffe
Mask
2x
4.2
0.43
8.1
37.9
32.9
model


R-50-C4
pytorch
Mask
1x
-
-
7.9
35.1
31.2
model


R-50-C4
pytorch
Mask
2x
-
-
8.0
37.2
32.5
model


R-50-FPN
pytorch
Faster
2x
-
-
-
37.1
-
model


R-101-FPN
caffe
Faster
1x
5.2
0.355
14.4
38.6
-
-


R-101-FPN
pytorch
Faster
1x
5.4
0.388
13.2
38.1
-
model


R-101-FPN
pytorch
Faster
2x
-
-
-
38.8
-
model


R-50-FPN
caffe
Mask
1x
3.4
0.328
12.8
37.3
34.5
-


R-50-FPN
pytorch
Mask
1x
3.5
0.346
12.7
36.8
34.1
model


R-50-FPN
pytorch
Mask
2x
-
-
-
37.9
34.8
model


R-101-FPN
caffe
Mask
1x
5.2
0.429
11.2
39.4
36.1
-


R-101-FPN
pytorch
Mask
1x
5.4
0.462
10.9
38.9
35.8
model


R-101-FPN
pytorch
Mask
2x
-
-
-
39.9
36.4
model


RetinaNet


Backbone
Style
Lr schd
Mem (GB)
Train time (s/iter)
Inf time (fps)
box AP
Download


R-50-FPN
caffe
1x
3.4
0.285
12.5
35.8
-


R-50-FPN
pytorch
1x
3.6
0.308
12.1
35.6
model


R-50-FPN
pytorch
2x
-
-
-
36.4
model


R-101-FPN
caffe
1x
5.3
0.410
10.4
37.8
-


R-101-FPN
pytorch
1x
5.5
0.429
10.9
37.7
model


R-101-FPN
pytorch
2x
-
-
-
38.1
model


X-101-32x4d-FPN
pytorch
1x
6.7
0.632
9.3
39.0
model


X-101-32x4d-FPN
pytorch
2x
-
-
-
39.3
model


X-101-64x4d-FPN
pytorch
1x
9.6
0.993
7.0
40.0
model


X-101-64x4d-FPN
pytorch
2x
-
-
-
39.6
model


Cascade R-CNN


Backbone
Style
Lr schd
Mem (GB)
Train time (s/iter)
Inf time (fps)
box AP
Download


R-50-C4
caffe
1x
8.7
0.92
5.0
38.7
model


R-50-FPN
caffe
1x
3.9
0.464
10.9
40.5
-


R-50-FPN
pytorch
1x
4.1
0.455
11.9
40.4
model


R-50-FPN
pytorch
20e
-
-
-
41.1
model


R-101-FPN
caffe
1x
5.8
0.569
9.6
42.4
-


R-101-FPN
pytorch
1x
6.0
0.584
10.3
42.0
model


R-101-FPN
pytorch
20e
-
-
-
42.5
model


X-101-32x4d-FPN
pytorch
1x
7.2
0.770
8.9
43.6
model


X-101-32x4d-FPN
pytorch
20e
-
-
-
44.0
model


X-101-64x4d-FPN
pytorch
1x
10.0
1.133
6.7
44.5
model


X-101-64x4d-FPN
pytorch
20e
-
-
-
44.7
model


HRNetV2p-W18
pytorch
20e
-
-
-
41.2
model


HRNetV2p-W32
pytorch
20e
-
-
-
43.7
model


HRNetV2p-W48
pytorch
20e
-
-
-
44.6
model


Cascade Mask R-CNN


Backbone
Style
Lr schd
Mem (GB)
Train time (s/iter)
Inf time (fps)
box AP
mask AP
Download


R-50-C4
caffe
1x
9.1
0.99
4.5
39.3
32.8
model


R-50-FPN
caffe
1x
5.1
0.692
7.6
40.9
35.5
-


R-50-FPN
pytorch
1x
5.3
0.683
7.4
41.2
35.7
model


R-50-FPN
pytorch
20e
-
-
-
42.3
36.6
model


R-101-FPN
caffe
1x
7.0
0.803
7.2
43.1
37.2
-


R-101-FPN
pytorch
1x
7.2
0.807
6.8
42.6
37.0
model


R-101-FPN
pytorch
20e
-
-
-
43.3
37.6
model


X-101-32x4d-FPN
pytorch
1x
8.4
0.976
6.6
44.4
38.2
model


X-101-32x4d-FPN
pytorch
20e
-
-
-
44.7
38.6
model


X-101-64x4d-FPN
pytorch
1x
11.4
1.33
5.3
45.4
39.1
model


X-101-64x4d-FPN
pytorch
20e
-
-
-
45.7
39.4
model


HRNetV2p-W18
pytorch
20e
-
-
-
41.9
36.4
model


HRNetV2p-W32
pytorch
20e
-
-
-
44.5
38.5
model


HRNetV2p-W48
pytorch
20e
-
-
-
46.0
39.5
model


Notes:

The 20e schedule in Cascade (Mask) R-CNN indicates decreasing the lr at 16 and 19 epochs, with a total of 20 epochs.


Hybrid Task Cascade (HTC)


Backbone
Style
Lr schd
Mem (GB)
Train time (s/iter)
Inf time (fps)
box AP
mask AP
Download


R-50-FPN
pytorch
1x
7.4
0.936
4.1
42.1
37.3
model


R-50-FPN
pytorch
20e
-
-
-
43.2
38.1
model


R-101-FPN
pytorch
20e
9.3
1.051
4.0
44.9
39.4
model


X-101-32x4d-FPN
pytorch
20e
5.8
0.769
3.8
46.1
40.3
model


X-101-64x4d-FPN
pytorch
20e
7.5
1.120
3.5
46.9
40.8
model


HRNetV2p-W18
pytorch
20e
-
-
-
43.1
37.9
model


HRNetV2p-W32
pytorch
20e
-
-
-
45.3
39.6
model


HRNetV2p-W48
pytorch
20e
-
-
-
46.8
40.7
model


HRNetV2p-W48
pytorch
28e
-
-
-
47.0
41.0
model


Notes:

Please refer to Hybrid Task Cascade for details and more a powerful model (50.7/43.9).


SSD


Backbone
Size
Style
Lr schd
Mem (GB)
Train time (s/iter)
Inf time (fps)
box AP
Download


VGG16
300
caffe
120e
3.5
0.256
25.9 / 34.6
25.7
model


VGG16
512
caffe
120e
7.6
0.412
20.7 / 25.4
29.3
model


Notes:


cudnn.benchmark is set as True for SSD training and testing.
Inference time is reported for batch size = 1 and batch size = 8.
The speed on COCO and VOC are different due to model parameters and nms.


Group Normalization (GN)
Please refer to Group Normalization for details.

Weight Standardization
Please refer to Weight Standardization for details.

Deformable Convolution v2
Please refer to Deformable Convolutional Networks for details.

Libra R-CNN
Please refer to Libra R-CNN for details.

Guided Anchoring
Please refer to Guided Anchoring for details.

FCOS
Please refer to FCOS for details.

FoveaBox
Please refer to FoveaBox for details.

RepPoints
Please refer to RepPoints for details.

FreeAnchor
Please refer to FreeAnchor for details.

Grid R-CNN (plus)
Please refer to Grid R-CNN for details.

GHM
Please refer to GHM for details.

GCNet
Please refer to GCNet for details.

HRNet
Please refer to HRNet for details.

Mask Scoring R-CNN
Please refer to Mask Scoring R-CNN for details.

Train from Scratch
Please refer to Rethinking ImageNet Pre-training for details.

NAS-FPN
Please refer to NAS-FPN for details.

Other datasets
We also benchmark some methods on PASCAL VOC, Cityscapes and WIDER FACE.

Comparison with Detectron and maskrcnn-benchmark
We compare mmdetection with Detectron
and maskrcnn-benchmark. The backbone used is R-50-FPN.
In general, mmdetection has 3 advantages over Detectron.


Higher performance (especially in terms of mask AP)
Faster training speed
Memory efficient


Performance
Detectron and maskrcnn-benchmark use caffe-style ResNet as the backbone.
We report results using both caffe-style (weights converted from
here)
and pytorch-style (weights from the official model zoo) ResNet backbone,
indicated as pytorch-style results / caffe-style results.
We find that pytorch-style ResNet usually converges slower than caffe-style ResNet,
thus leading to slightly lower results in 1x schedule, but the final results
of 2x schedule is higher.

  
    Type
    Lr schd
    Detectron
    maskrcnn-benchmark
    mmdetection
  
  
    RPN
    1x
    57.2
    -
    57.1 / 58.2
  
  
    2x
    -
    -
    57.6 / -
  
  
    Faster R-CNN
    1x
    36.7
    36.8
    36.4 / 36.6
  
  
    2x
    37.9
    -
    37.7 / -
  
  
    Mask R-CNN
    1x
    37.7 & 33.9
    37.8 & 34.2
    37.3 & 34.2 / 37.4 & 34.3
  
  
    2x
    38.6 & 34.5
    -
    38.5 & 35.1 / -
  
  
    Fast R-CNN
    1x
    36.4
    -
    35.8 / 36.6
  
  
    2x
    36.8
    -
    37.1 / -
  
  
    Fast R-CNN (w/mask)
    1x
    37.3 & 33.7
    -
    36.8 & 34.1 / 37.3 & 34.5
  
  
    2x
    37.7 & 34.0
    -
    37.9 & 34.8 / -
  

Training Speed
The training speed is measure with s/iter. The lower, the better.

  
    Type
    Detectron (P100¹)
    maskrcnn-benchmark (V100)
    mmdetection (V100²)
  
  
    RPN
    0.416
    -
    0.253
  
  
    Faster R-CNN
    0.544
    0.353
    0.333
  
  
    Mask R-CNN
    0.889
    0.454
    0.430
  
  
    Fast R-CNN
    0.285
    -
    0.242
  
  
    Fast R-CNN (w/mask)
    0.377
    -
    0.328
  

*1. Facebook's Big Basin servers (P100/V100) is slightly faster than the servers we use. mmdetection can also run slightly faster on FB's servers.
*2. For fair comparison, we list the caffe-style results here.

Inference Speed
The inference speed is measured with fps (img/s) on a single GPU. The higher, the better.

  
    Type
    Detectron (P100)
    maskrcnn-benchmark (V100)
    mmdetection (V100)
  
  
    RPN
    12.5
    -
    16.9
  
  
    Faster R-CNN
    10.3
    7.9
    13.5
  
  
    Mask R-CNN
    8.5
    7.7
    10.2
  
  
    Fast R-CNN
    12.5
    -
    18.4
  
  
    Fast R-CNN (w/mask)
    9.9
    -
    12.8
  

Training memory

  
    Type
    Detectron
    maskrcnn-benchmark
    mmdetection
  
  
    RPN
    6.4
    -
    3.3
  
  
    Faster R-CNN
    7.2
    4.4
    3.6
  
  
    Mask R-CNN
    8.6
    5.2
    3.8
  
  
    Fast R-CNN
    6.0
    -
    3.3
  
  
    Fast R-CNN (w/mask)
    7.9
    -
    3.4
  

There is no doubt that maskrcnn-benchmark and mmdetection is more memory efficient than Detectron,
and the main advantage is PyTorch itself. We also perform some memory optimizations to push it forward.
Note that Caffe2 and PyTorch have different apis to obtain memory usage with different implementations.
For all codebases, nvidia-smi shows a larger memory usage than the reported number in the above table.