@@ -185,6 +189,39 @@ run it on V100, so we report the speed on TITAN XP.
\*3. The speed of pytorch-style ResNet is approximately 5% slower than caffe-style,
and we report the pytorch-style results here.
\*4. We also run the models on a DGX-1 server (P100) and the speed is almost the same as our V100 servers.
### Inference Speed
The inference speed is measured with fps (img/s) on a single GPU. The higher, the better.
<table>
<tr>
<th>Type</th>
<th>Detectron (P100)</th>
<th>Detectron.pytorch (XP)</th>
<th>mmdetection (V100 / XP)</th>
</tr>
<tr>
<td>RPN</td>
<td>12.5</td>
<td>-</td>
<td>14.5 / 15.4</td>
</tr>
<tr>
<td>Faster R-CNN</td>
<td>10.3</td>
<td></td>
<td>9.9 / 9.8</td>
</tr>
<tr>
<td>Mask R-CNN</td>
<td>8.5</td>
<td></td>
<td>7.7 / 7.4</td>
</tr>
</table>
### Training memory
We perform various tests and there is no doubt that mmdetection is more memory
...
...
@@ -195,5 +232,5 @@ whose implementation is not exactly the same.
`nvidia-smi` shows a larger memory usage for both detectron and mmdetection, e.g.,
we observe a much higher memory usage when we train Mask R-CNN with 2 images per GPU using detectron (10.6G) and mmdetection (9.3G), which is obviously more than actually required.
**Note**: With mmdetection, we can train R-50 FPN Mask R-CNN with **4** images per GPU (TITAN XP, 12G),
> With mmdetection, we can train R-50 FPN Mask R-CNN with **4** images per GPU (TITAN XP, 12G),