diff --git a/configs/reppoints/README.md b/configs/reppoints/README.md deleted file mode 100644 index 2937113ce4d0403d52ca218d24de284aea0cbb3f..0000000000000000000000000000000000000000 --- a/configs/reppoints/README.md +++ /dev/null @@ -1,62 +0,0 @@ -# RepPoints: Point Set Representation for Object Detection - -By [Ze Yang](https://yangze.tech/), [Shaohui Liu](http://b1ueber2y.me/), and [Han Hu](https://ancientmooner.github.io/). - -We provide code support and configuration files to reproduce the results in the paper for -["RepPoints: Point Set Representation for Object Detection"](https://arxiv.org/abs/1904.11490) on COCO object detection. - -## Introduction - -**RepPoints**, initially described in [arXiv](https://arxiv.org/abs/1904.11490), is a new representation method for visual objects, on which visual understanding tasks are typically centered. Visual object representation, aiming at both geometric description and appearance feature extraction, is conventionally achieved by `bounding box + RoIPool (RoIAlign)`. The bounding box representation is convenient to use; however, it provides only a rectangular localization of objects that lacks geometric precision and may consequently degrade feature quality. Our new representation, RepPoints, models objects by a `point set` instead of a `bounding box`, which learns to adaptively position themselves over an object in a manner that circumscribes the object鈥檚 `spatial extent` and enables `semantically aligned feature extraction`. This richer and more flexible representation maintains the convenience of bounding boxes while facilitating various visual understanding applications. This repo demonstrated the effectiveness of RepPoints for COCO object detection. - -Another feature of this repo is the demonstration of an `anchor-free detector`, which can be as effective as state-of-the-art anchor-based detection methods. The anchor-free detector can utilize either `bounding box` or `RepPoints` as the basic object representation. - -<div align="center"> - <img src="reppoints.png" width="400px" /> - <p>Learning RepPoints in Object Detection.</p> -</div> - -## Citing RepPoints - -``` -@inproceedings{yang2019reppoints, - title={RepPoints: Point Set Representation for Object Detection}, - author={Yang, Ze and Liu, Shaohui and Hu, Han and Wang, Liwei and Lin, Stephen}, - booktitle={The IEEE International Conference on Computer Vision (ICCV)}, - month={Oct}, - year={2019} -} -``` - -## Results and models - -The results on COCO 2017val are shown in the table below. - -| Method | Backbone | Anchor | convert func | Lr schd | box AP | Download | -| :----: | :------: | :-------: | :------: | :-----: | :----: | :------: | -| BBox | R-50-FPN | single | - | 1x | 36.3|[model](https://drive.google.com/open?id=1TaVAFGZP2i7RwtlQjy3LBH1WI-YRH774) | -| BBox | R-50-FPN | none | - | 1x | 37.3| [model](https://drive.google.com/open?id=1hpfu-I7gtZnIb0NU2WvUvaZz_dm-THuZ) | -| RepPoints | R-50-FPN | none | partial MinMax | 1x | 38.1| [model](https://drive.google.com/open?id=11zFtdKH-QGz_zH7vlcIih6FQAjV84CWc) | -| RepPoints | R-50-FPN | none | MinMax | 1x | 38.2| [model](https://drive.google.com/open?id=1Cg9818dpkL-9qjmYdkhrY_BRiQFjV4xu) | -| RepPoints | R-50-FPN | none | moment | 1x | 38.2| [model](https://drive.google.com/open?id=1rQg-lE-5nuqO1bt6okeYkti4Q-EaBsu_) | -| RepPoints | R-50-FPN | none | moment | 2x | 38.6| [model](https://drive.google.com/open?id=1TfR-5geVviKhRoXL9JP6cG3fkN2itbBU) | -| RepPoints | R-50-FPN | none | moment | 2x (ms train) | 40.8| [model](https://drive.google.com/open?id=1oaHTIaP51oB5HJ6GWV3WYK19lMm9iJO6) | -| RepPoints | R-50-FPN | none | moment | 2x (ms train&ms test) | 42.2| | -| RepPoints | R-101-FPN | none | moment | 2x | 40.3| [model](https://drive.google.com/open?id=1BAmGeUQ_zVQi2u7rgOuPQem2EjXDLgWm) | -| RepPoints | R-101-FPN | none | moment | 2x (ms train) | 42.3| [model](https://drive.google.com/open?id=14Lf0p4fXElXaxFu8stk3hek3bY8tNENX) | -| RepPoints | R-101-FPN | none | moment | 2x (ms train&ms test) | 44.1| | -| RepPoints | R-101-FPN-DCN | none | moment | 2x | 43.0| [model](https://drive.google.com/open?id=1hpptxpb4QtNuB-HnV5wHbDltPHhlYq4z) | -| RepPoints | R-101-FPN-DCN | none | moment | 2x (ms train) | 44.8| [model](https://drive.google.com/open?id=1fsTckK99HYjOURwcFeHfy5JRRtsCajfX) | -| RepPoints | R-101-FPN-DCN | none | moment | 2x (ms train&ms test) | 46.4| | -| RepPoints | X-101-FPN-DCN | none | moment | 2x | 44.5| [model](https://drive.google.com/open?id=1Y8vqaqU88-FEqqwl6Zb9exD5O246yrMR) | -| RepPoints | X-101-FPN-DCN | none | moment | 2x (ms train) | 45.6| [model](https://drive.google.com/open?id=1nr9gcVWxzeakbfPC6ON9yvKOuLzj_RrJ) | -| RepPoints | X-101-FPN-DCN | none | moment | 2x (ms train&ms test) | 46.8| | - -**Notes:** - -- `R-xx`, `X-xx` denote the ResNet and ResNeXt architectures, respectively. -- `DCN` denotes replacing 3x3 conv with the 3x3 deformable convolution in `c3-c5` stages of backbone. -- `none` in the `anchor` column means 2-d `center point` (x,y) is used to represent the initial object hypothesis. `single` denotes one 4-d anchor box (x,y,w,h) with IoU based label assign criterion is adopted. -- `moment`, `partial MinMax`, `MinMax` in the `convert func` column are three functions to convert a point set to a pseudo box. -- `ms` denotes multi-scale training or multi-scale test. -- Note the results here are slightly different from those reported in the paper, due to framework change. While the original paper uses an [MXNet](https://mxnet.apache.org/) implementation, we re-implement the method in [PyTorch](https://pytorch.org/) based on mmdetection. diff --git a/configs/reppoints/bbox_r50_grid_center_fpn_1x.py b/configs/reppoints/bbox_r50_grid_center_fpn_1x.py deleted file mode 100644 index d2ab61d0a2d89554aaaf5d43ce216e31d4bcd8ab..0000000000000000000000000000000000000000 --- a/configs/reppoints/bbox_r50_grid_center_fpn_1x.py +++ /dev/null @@ -1,143 +0,0 @@ -# model settings -norm_cfg = dict(type='GN', num_groups=32, requires_grad=True) - -model = dict( - type='RepPointsDetector', - pretrained='torchvision://resnet50', - backbone=dict( - type='ResNet', - depth=50, - num_stages=4, - out_indices=(0, 1, 2, 3), - frozen_stages=1, - style='pytorch'), - neck=dict( - type='FPN', - in_channels=[256, 512, 1024, 2048], - out_channels=256, - start_level=1, - add_extra_convs=True, - num_outs=5, - norm_cfg=norm_cfg), - bbox_head=dict( - type='RepPointsHead', - num_classes=81, - in_channels=256, - feat_channels=256, - point_feat_channels=256, - stacked_convs=3, - num_points=9, - gradient_mul=0.1, - point_strides=[8, 16, 32, 64, 128], - point_base_scale=4, - norm_cfg=norm_cfg, - loss_cls=dict( - type='FocalLoss', - use_sigmoid=True, - gamma=2.0, - alpha=0.25, - loss_weight=1.0), - loss_bbox_init=dict(type='SmoothL1Loss', beta=0.11, loss_weight=0.5), - loss_bbox_refine=dict(type='SmoothL1Loss', beta=0.11, loss_weight=1.0), - transform_method='minmax', - use_grid_points=True)) -# training and testing settings -train_cfg = dict( - init=dict( - assigner=dict(type='PointAssigner', scale=4, pos_num=1), - allowed_border=-1, - pos_weight=-1, - debug=False), - refine=dict( - assigner=dict( - type='MaxIoUAssigner', - pos_iou_thr=0.5, - neg_iou_thr=0.4, - min_pos_iou=0, - ignore_iof_thr=-1), - allowed_border=-1, - pos_weight=-1, - debug=False)) -test_cfg = dict( - nms_pre=1000, - min_bbox_size=0, - score_thr=0.05, - nms=dict(type='nms', iou_thr=0.5), - max_per_img=100) -# dataset settings -dataset_type = 'CocoDataset' -data_root = 'data/coco/' -img_norm_cfg = dict( - mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) -train_pipeline = [ - dict(type='LoadImageFromFile'), - dict(type='LoadAnnotations', with_bbox=True), - dict(type='Resize', img_scale=(1333, 800), keep_ratio=True), - dict(type='RandomFlip', flip_ratio=0.5), - dict(type='Normalize', **img_norm_cfg), - dict(type='Pad', size_divisor=32), - dict(type='DefaultFormatBundle'), - dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']), -] -test_pipeline = [ - dict(type='LoadImageFromFile'), - dict( - type='MultiScaleFlipAug', - img_scale=(1333, 800), - flip=False, - transforms=[ - dict(type='Resize', keep_ratio=True), - dict(type='RandomFlip'), - dict(type='Normalize', **img_norm_cfg), - dict(type='Pad', size_divisor=32), - dict(type='ImageToTensor', keys=['img']), - dict(type='Collect', keys=['img']), - ]) -] -data = dict( - imgs_per_gpu=2, - workers_per_gpu=2, - train=dict( - type=dataset_type, - ann_file=data_root + 'annotations/instances_train2017.json', - img_prefix=data_root + 'train2017/', - pipeline=train_pipeline), - val=dict( - type=dataset_type, - ann_file=data_root + 'annotations/instances_val2017.json', - img_prefix=data_root + 'val2017/', - pipeline=test_pipeline), - test=dict( - type=dataset_type, - ann_file=data_root + 'annotations/instances_val2017.json', - img_prefix=data_root + 'val2017/', - pipeline=test_pipeline)) -# optimizer -optimizer = dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001) -optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2)) -# learning policy -lr_config = dict( - policy='step', - warmup='linear', - warmup_iters=500, - warmup_ratio=1.0 / 3, - step=[8, 11]) -checkpoint_config = dict(interval=1) -# yapf:disable -log_config = dict( - interval=50, - hooks=[ - dict(type='TextLoggerHook'), - # dict(type='TensorboardLoggerHook') - ]) -# yapf:enable -# runtime settings -total_epochs = 12 -device_ids = range(8) -dist_params = dict(backend='nccl') -log_level = 'INFO' -work_dir = './work_dirs/bbox_r50_grid_center_fpn_1x' -load_from = None -resume_from = None -auto_resume = True -workflow = [('train', 1)] diff --git a/configs/reppoints/bbox_r50_grid_fpn_1x.py b/configs/reppoints/bbox_r50_grid_fpn_1x.py deleted file mode 100644 index 79e3c76ff4b927eca34ed3489d17f14bbe11f708..0000000000000000000000000000000000000000 --- a/configs/reppoints/bbox_r50_grid_fpn_1x.py +++ /dev/null @@ -1,148 +0,0 @@ -# model settings -norm_cfg = dict(type='GN', num_groups=32, requires_grad=True) - -model = dict( - type='RepPointsDetector', - pretrained='torchvision://resnet50', - backbone=dict( - type='ResNet', - depth=50, - num_stages=4, - out_indices=(0, 1, 2, 3), - frozen_stages=1, - style='pytorch'), - neck=dict( - type='FPN', - in_channels=[256, 512, 1024, 2048], - out_channels=256, - start_level=1, - add_extra_convs=True, - num_outs=5, - norm_cfg=norm_cfg), - bbox_head=dict( - type='RepPointsHead', - num_classes=81, - in_channels=256, - feat_channels=256, - point_feat_channels=256, - stacked_convs=3, - num_points=9, - gradient_mul=0.1, - point_strides=[8, 16, 32, 64, 128], - point_base_scale=4, - norm_cfg=norm_cfg, - loss_cls=dict( - type='FocalLoss', - use_sigmoid=True, - gamma=2.0, - alpha=0.25, - loss_weight=1.0), - loss_bbox_init=dict(type='SmoothL1Loss', beta=0.11, loss_weight=0.5), - loss_bbox_refine=dict(type='SmoothL1Loss', beta=0.11, loss_weight=1.0), - transform_method='minmax', - use_grid_points=True)) -# training and testing settings -train_cfg = dict( - init=dict( - assigner=dict( - type='MaxIoUAssigner', - pos_iou_thr=0.5, - neg_iou_thr=0.4, - min_pos_iou=0, - ignore_iof_thr=-1), - allowed_border=-1, - pos_weight=-1, - debug=False), - refine=dict( - assigner=dict( - type='MaxIoUAssigner', - pos_iou_thr=0.5, - neg_iou_thr=0.4, - min_pos_iou=0, - ignore_iof_thr=-1), - allowed_border=-1, - pos_weight=-1, - debug=False)) -test_cfg = dict( - nms_pre=1000, - min_bbox_size=0, - score_thr=0.05, - nms=dict(type='nms', iou_thr=0.5), - max_per_img=100) -# dataset settings -dataset_type = 'CocoDataset' -data_root = 'data/coco/' -img_norm_cfg = dict( - mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) -train_pipeline = [ - dict(type='LoadImageFromFile'), - dict(type='LoadAnnotations', with_bbox=True), - dict(type='Resize', img_scale=(1333, 800), keep_ratio=True), - dict(type='RandomFlip', flip_ratio=0.5), - dict(type='Normalize', **img_norm_cfg), - dict(type='Pad', size_divisor=32), - dict(type='DefaultFormatBundle'), - dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']), -] -test_pipeline = [ - dict(type='LoadImageFromFile'), - dict( - type='MultiScaleFlipAug', - img_scale=(1333, 800), - flip=False, - transforms=[ - dict(type='Resize', keep_ratio=True), - dict(type='RandomFlip'), - dict(type='Normalize', **img_norm_cfg), - dict(type='Pad', size_divisor=32), - dict(type='ImageToTensor', keys=['img']), - dict(type='Collect', keys=['img']), - ]) -] -data = dict( - imgs_per_gpu=2, - workers_per_gpu=2, - train=dict( - type=dataset_type, - ann_file=data_root + 'annotations/instances_train2017.json', - img_prefix=data_root + 'train2017/', - pipeline=train_pipeline), - val=dict( - type=dataset_type, - ann_file=data_root + 'annotations/instances_val2017.json', - img_prefix=data_root + 'val2017/', - pipeline=test_pipeline), - test=dict( - type=dataset_type, - ann_file=data_root + 'annotations/instances_val2017.json', - img_prefix=data_root + 'val2017/', - pipeline=test_pipeline)) -# optimizer -optimizer = dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001) -optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2)) -# learning policy -lr_config = dict( - policy='step', - warmup='linear', - warmup_iters=500, - warmup_ratio=1.0 / 3, - step=[8, 11]) -checkpoint_config = dict(interval=1) -# yapf:disable -log_config = dict( - interval=50, - hooks=[ - dict(type='TextLoggerHook'), - # dict(type='TensorboardLoggerHook') - ]) -# yapf:enable -# runtime settings -total_epochs = 12 -device_ids = range(8) -dist_params = dict(backend='nccl') -log_level = 'INFO' -work_dir = './work_dirs/bbox_r50_grid_fpn_1x' -load_from = None -resume_from = None -auto_resume = True -workflow = [('train', 1)] diff --git a/configs/reppoints/reppoints.png b/configs/reppoints/reppoints.png deleted file mode 100644 index a9306d9ba6c659a670822213bf198099f9e125b1..0000000000000000000000000000000000000000 Binary files a/configs/reppoints/reppoints.png and /dev/null differ diff --git a/configs/reppoints/reppoints_minmax_r50_fpn_1x.py b/configs/reppoints/reppoints_minmax_r50_fpn_1x.py deleted file mode 100644 index 0103beb937ab80b9b5a6b875f050c475a13f1b36..0000000000000000000000000000000000000000 --- a/configs/reppoints/reppoints_minmax_r50_fpn_1x.py +++ /dev/null @@ -1,142 +0,0 @@ -# model settings -norm_cfg = dict(type='GN', num_groups=32, requires_grad=True) - -model = dict( - type='RepPointsDetector', - pretrained='torchvision://resnet50', - backbone=dict( - type='ResNet', - depth=50, - num_stages=4, - out_indices=(0, 1, 2, 3), - frozen_stages=1, - style='pytorch'), - neck=dict( - type='FPN', - in_channels=[256, 512, 1024, 2048], - out_channels=256, - start_level=1, - add_extra_convs=True, - num_outs=5, - norm_cfg=norm_cfg), - bbox_head=dict( - type='RepPointsHead', - num_classes=81, - in_channels=256, - feat_channels=256, - point_feat_channels=256, - stacked_convs=3, - num_points=9, - gradient_mul=0.1, - point_strides=[8, 16, 32, 64, 128], - point_base_scale=4, - norm_cfg=norm_cfg, - loss_cls=dict( - type='FocalLoss', - use_sigmoid=True, - gamma=2.0, - alpha=0.25, - loss_weight=1.0), - loss_bbox_init=dict(type='SmoothL1Loss', beta=0.11, loss_weight=0.5), - loss_bbox_refine=dict(type='SmoothL1Loss', beta=0.11, loss_weight=1.0), - transform_method='minmax')) -# training and testing settings -train_cfg = dict( - init=dict( - assigner=dict(type='PointAssigner', scale=4, pos_num=1), - allowed_border=-1, - pos_weight=-1, - debug=False), - refine=dict( - assigner=dict( - type='MaxIoUAssigner', - pos_iou_thr=0.5, - neg_iou_thr=0.4, - min_pos_iou=0, - ignore_iof_thr=-1), - allowed_border=-1, - pos_weight=-1, - debug=False)) -test_cfg = dict( - nms_pre=1000, - min_bbox_size=0, - score_thr=0.05, - nms=dict(type='nms', iou_thr=0.5), - max_per_img=100) -# dataset settings -dataset_type = 'CocoDataset' -data_root = 'data/coco/' -img_norm_cfg = dict( - mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) -train_pipeline = [ - dict(type='LoadImageFromFile'), - dict(type='LoadAnnotations', with_bbox=True), - dict(type='Resize', img_scale=(1333, 800), keep_ratio=True), - dict(type='RandomFlip', flip_ratio=0.5), - dict(type='Normalize', **img_norm_cfg), - dict(type='Pad', size_divisor=32), - dict(type='DefaultFormatBundle'), - dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']), -] -test_pipeline = [ - dict(type='LoadImageFromFile'), - dict( - type='MultiScaleFlipAug', - img_scale=(1333, 800), - flip=False, - transforms=[ - dict(type='Resize', keep_ratio=True), - dict(type='RandomFlip'), - dict(type='Normalize', **img_norm_cfg), - dict(type='Pad', size_divisor=32), - dict(type='ImageToTensor', keys=['img']), - dict(type='Collect', keys=['img']), - ]) -] -data = dict( - imgs_per_gpu=2, - workers_per_gpu=2, - train=dict( - type=dataset_type, - ann_file=data_root + 'annotations/instances_train2017.json', - img_prefix=data_root + 'train2017/', - pipeline=train_pipeline), - val=dict( - type=dataset_type, - ann_file=data_root + 'annotations/instances_val2017.json', - img_prefix=data_root + 'val2017/', - pipeline=test_pipeline), - test=dict( - type=dataset_type, - ann_file=data_root + 'annotations/instances_val2017.json', - img_prefix=data_root + 'val2017/', - pipeline=test_pipeline)) -# optimizer -optimizer = dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001) -optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2)) -# learning policy -lr_config = dict( - policy='step', - warmup='linear', - warmup_iters=500, - warmup_ratio=1.0 / 3, - step=[8, 11]) -checkpoint_config = dict(interval=1) -# yapf:disable -log_config = dict( - interval=50, - hooks=[ - dict(type='TextLoggerHook'), - # dict(type='TensorboardLoggerHook') - ]) -# yapf:enable -# runtime settings -total_epochs = 12 -device_ids = range(8) -dist_params = dict(backend='nccl') -log_level = 'INFO' -work_dir = './work_dirs/reppoints_minmax_r50_fpn_1x' -load_from = None -resume_from = None -auto_resume = True -workflow = [('train', 1)] diff --git a/configs/reppoints/reppoints_moment_r101_dcn_fpn_2x.py b/configs/reppoints/reppoints_moment_r101_dcn_fpn_2x.py deleted file mode 100644 index 864cec03d1ebf8c748f8d2d8d43c2e396f49c209..0000000000000000000000000000000000000000 --- a/configs/reppoints/reppoints_moment_r101_dcn_fpn_2x.py +++ /dev/null @@ -1,145 +0,0 @@ -# model settings -norm_cfg = dict(type='GN', num_groups=32, requires_grad=True) - -model = dict( - type='RepPointsDetector', - pretrained='torchvision://resnet101', - backbone=dict( - type='ResNet', - depth=101, - num_stages=4, - out_indices=(0, 1, 2, 3), - frozen_stages=1, - style='pytorch', - dcn=dict( - modulated=False, deformable_groups=1, fallback_on_stride=False), - stage_with_dcn=(False, True, True, True)), - neck=dict( - type='FPN', - in_channels=[256, 512, 1024, 2048], - out_channels=256, - start_level=1, - add_extra_convs=True, - num_outs=5, - norm_cfg=norm_cfg), - bbox_head=dict( - type='RepPointsHead', - num_classes=81, - in_channels=256, - feat_channels=256, - point_feat_channels=256, - stacked_convs=3, - num_points=9, - gradient_mul=0.1, - point_strides=[8, 16, 32, 64, 128], - point_base_scale=4, - norm_cfg=norm_cfg, - loss_cls=dict( - type='FocalLoss', - use_sigmoid=True, - gamma=2.0, - alpha=0.25, - loss_weight=1.0), - loss_bbox_init=dict(type='SmoothL1Loss', beta=0.11, loss_weight=0.5), - loss_bbox_refine=dict(type='SmoothL1Loss', beta=0.11, loss_weight=1.0), - transform_method='moment')) -# training and testing settings -train_cfg = dict( - init=dict( - assigner=dict(type='PointAssigner', scale=4, pos_num=1), - allowed_border=-1, - pos_weight=-1, - debug=False), - refine=dict( - assigner=dict( - type='MaxIoUAssigner', - pos_iou_thr=0.5, - neg_iou_thr=0.4, - min_pos_iou=0, - ignore_iof_thr=-1), - allowed_border=-1, - pos_weight=-1, - debug=False)) -test_cfg = dict( - nms_pre=1000, - min_bbox_size=0, - score_thr=0.05, - nms=dict(type='nms', iou_thr=0.5), - max_per_img=100) -# dataset settings -dataset_type = 'CocoDataset' -data_root = 'data/coco/' -img_norm_cfg = dict( - mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) -train_pipeline = [ - dict(type='LoadImageFromFile'), - dict(type='LoadAnnotations', with_bbox=True), - dict(type='Resize', img_scale=(1333, 800), keep_ratio=True), - dict(type='RandomFlip', flip_ratio=0.5), - dict(type='Normalize', **img_norm_cfg), - dict(type='Pad', size_divisor=32), - dict(type='DefaultFormatBundle'), - dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']), -] -test_pipeline = [ - dict(type='LoadImageFromFile'), - dict( - type='MultiScaleFlipAug', - img_scale=(1333, 800), - flip=False, - transforms=[ - dict(type='Resize', keep_ratio=True), - dict(type='RandomFlip'), - dict(type='Normalize', **img_norm_cfg), - dict(type='Pad', size_divisor=32), - dict(type='ImageToTensor', keys=['img']), - dict(type='Collect', keys=['img']), - ]) -] -data = dict( - imgs_per_gpu=2, - workers_per_gpu=2, - train=dict( - type=dataset_type, - ann_file=data_root + 'annotations/instances_train2017.json', - img_prefix=data_root + 'train2017/', - pipeline=train_pipeline), - val=dict( - type=dataset_type, - ann_file=data_root + 'annotations/instances_val2017.json', - img_prefix=data_root + 'val2017/', - pipeline=test_pipeline), - test=dict( - type=dataset_type, - ann_file=data_root + 'annotations/instances_val2017.json', - img_prefix=data_root + 'val2017/', - pipeline=test_pipeline)) -# optimizer -optimizer = dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001) -optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2)) -# learning policy -lr_config = dict( - policy='step', - warmup='linear', - warmup_iters=500, - warmup_ratio=1.0 / 3, - step=[16, 22]) -checkpoint_config = dict(interval=1) -# yapf:disable -log_config = dict( - interval=50, - hooks=[ - dict(type='TextLoggerHook'), - # dict(type='TensorboardLoggerHook') - ]) -# yapf:enable -# runtime settings -total_epochs = 24 -device_ids = range(8) -dist_params = dict(backend='nccl') -log_level = 'INFO' -work_dir = './work_dirs/reppoints_moment_r101_dcn_fpn_2x' -load_from = None -resume_from = None -auto_resume = True -workflow = [('train', 1)] diff --git a/configs/reppoints/reppoints_moment_r101_dcn_fpn_2x_mt.py b/configs/reppoints/reppoints_moment_r101_dcn_fpn_2x_mt.py deleted file mode 100644 index ac6d93a9cb00b5f1ddd51dcaaac58c7fea432719..0000000000000000000000000000000000000000 --- a/configs/reppoints/reppoints_moment_r101_dcn_fpn_2x_mt.py +++ /dev/null @@ -1,149 +0,0 @@ -# model settings -norm_cfg = dict(type='GN', num_groups=32, requires_grad=True) - -model = dict( - type='RepPointsDetector', - pretrained='torchvision://resnet101', - backbone=dict( - type='ResNet', - depth=101, - num_stages=4, - out_indices=(0, 1, 2, 3), - frozen_stages=1, - style='pytorch', - dcn=dict( - modulated=False, deformable_groups=1, fallback_on_stride=False), - stage_with_dcn=(False, True, True, True)), - neck=dict( - type='FPN', - in_channels=[256, 512, 1024, 2048], - out_channels=256, - start_level=1, - add_extra_convs=True, - num_outs=5, - norm_cfg=norm_cfg), - bbox_head=dict( - type='RepPointsHead', - num_classes=81, - in_channels=256, - feat_channels=256, - point_feat_channels=256, - stacked_convs=3, - num_points=9, - gradient_mul=0.1, - point_strides=[8, 16, 32, 64, 128], - point_base_scale=4, - norm_cfg=norm_cfg, - loss_cls=dict( - type='FocalLoss', - use_sigmoid=True, - gamma=2.0, - alpha=0.25, - loss_weight=1.0), - loss_bbox_init=dict(type='SmoothL1Loss', beta=0.11, loss_weight=0.5), - loss_bbox_refine=dict(type='SmoothL1Loss', beta=0.11, loss_weight=1.0), - transform_method='moment')) -# training and testing settings -train_cfg = dict( - init=dict( - assigner=dict(type='PointAssigner', scale=4, pos_num=1), - allowed_border=-1, - pos_weight=-1, - debug=False), - refine=dict( - assigner=dict( - type='MaxIoUAssigner', - pos_iou_thr=0.5, - neg_iou_thr=0.4, - min_pos_iou=0, - ignore_iof_thr=-1), - allowed_border=-1, - pos_weight=-1, - debug=False)) -test_cfg = dict( - nms_pre=1000, - min_bbox_size=0, - score_thr=0.05, - nms=dict(type='nms', iou_thr=0.5), - max_per_img=100) -# dataset settings -dataset_type = 'CocoDataset' -data_root = 'data/coco/' -img_norm_cfg = dict( - mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) -train_pipeline = [ - dict(type='LoadImageFromFile'), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='Resize', - img_scale=[(1333, 480), (1333, 960)], - keep_ratio=True, - multiscale_mode='range'), - dict(type='RandomFlip', flip_ratio=0.5), - dict(type='Normalize', **img_norm_cfg), - dict(type='Pad', size_divisor=32), - dict(type='DefaultFormatBundle'), - dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']), -] -test_pipeline = [ - dict(type='LoadImageFromFile'), - dict( - type='MultiScaleFlipAug', - img_scale=(1333, 800), - flip=False, - transforms=[ - dict(type='Resize', keep_ratio=True), - dict(type='RandomFlip'), - dict(type='Normalize', **img_norm_cfg), - dict(type='Pad', size_divisor=32), - dict(type='ImageToTensor', keys=['img']), - dict(type='Collect', keys=['img']), - ]) -] -data = dict( - imgs_per_gpu=2, - workers_per_gpu=2, - train=dict( - type=dataset_type, - ann_file=data_root + 'annotations/instances_train2017.json', - img_prefix=data_root + 'train2017/', - pipeline=train_pipeline), - val=dict( - type=dataset_type, - ann_file=data_root + 'annotations/instances_val2017.json', - img_prefix=data_root + 'val2017/', - pipeline=test_pipeline), - test=dict( - type=dataset_type, - ann_file=data_root + 'annotations/instances_val2017.json', - img_prefix=data_root + 'val2017/', - pipeline=test_pipeline)) -# optimizer -optimizer = dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001) -optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2)) -# learning policy -lr_config = dict( - policy='step', - warmup='linear', - warmup_iters=500, - warmup_ratio=1.0 / 3, - step=[16, 22]) -checkpoint_config = dict(interval=1) -# yapf:disable -log_config = dict( - interval=50, - hooks=[ - dict(type='TextLoggerHook'), - # dict(type='TensorboardLoggerHook') - ]) -# yapf:enable -# runtime settings -total_epochs = 24 -device_ids = range(8) -dist_params = dict(backend='nccl') -log_level = 'INFO' -work_dir = './work_dirs/reppoints_moment_r101_dcn_fpn_2x_mt' -load_from = None -resume_from = None -auto_resume = True -workflow = [('train', 1)] diff --git a/configs/reppoints/reppoints_moment_r101_fpn_2x.py b/configs/reppoints/reppoints_moment_r101_fpn_2x.py deleted file mode 100644 index a4732a279bd43fe0f158aaf753ff1e1d910c65ec..0000000000000000000000000000000000000000 --- a/configs/reppoints/reppoints_moment_r101_fpn_2x.py +++ /dev/null @@ -1,142 +0,0 @@ -# model settings -norm_cfg = dict(type='GN', num_groups=32, requires_grad=True) - -model = dict( - type='RepPointsDetector', - pretrained='torchvision://resnet101', - backbone=dict( - type='ResNet', - depth=101, - num_stages=4, - out_indices=(0, 1, 2, 3), - frozen_stages=1, - style='pytorch'), - neck=dict( - type='FPN', - in_channels=[256, 512, 1024, 2048], - out_channels=256, - start_level=1, - add_extra_convs=True, - num_outs=5, - norm_cfg=norm_cfg), - bbox_head=dict( - type='RepPointsHead', - num_classes=81, - in_channels=256, - feat_channels=256, - point_feat_channels=256, - stacked_convs=3, - num_points=9, - gradient_mul=0.1, - point_strides=[8, 16, 32, 64, 128], - point_base_scale=4, - norm_cfg=norm_cfg, - loss_cls=dict( - type='FocalLoss', - use_sigmoid=True, - gamma=2.0, - alpha=0.25, - loss_weight=1.0), - loss_bbox_init=dict(type='SmoothL1Loss', beta=0.11, loss_weight=0.5), - loss_bbox_refine=dict(type='SmoothL1Loss', beta=0.11, loss_weight=1.0), - transform_method='moment')) -# training and testing settings -train_cfg = dict( - init=dict( - assigner=dict(type='PointAssigner', scale=4, pos_num=1), - allowed_border=-1, - pos_weight=-1, - debug=False), - refine=dict( - assigner=dict( - type='MaxIoUAssigner', - pos_iou_thr=0.5, - neg_iou_thr=0.4, - min_pos_iou=0, - ignore_iof_thr=-1), - allowed_border=-1, - pos_weight=-1, - debug=False)) -test_cfg = dict( - nms_pre=1000, - min_bbox_size=0, - score_thr=0.05, - nms=dict(type='nms', iou_thr=0.5), - max_per_img=100) -# dataset settings -dataset_type = 'CocoDataset' -data_root = 'data/coco/' -img_norm_cfg = dict( - mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) -train_pipeline = [ - dict(type='LoadImageFromFile'), - dict(type='LoadAnnotations', with_bbox=True), - dict(type='Resize', img_scale=(1333, 800), keep_ratio=True), - dict(type='RandomFlip', flip_ratio=0.5), - dict(type='Normalize', **img_norm_cfg), - dict(type='Pad', size_divisor=32), - dict(type='DefaultFormatBundle'), - dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']), -] -test_pipeline = [ - dict(type='LoadImageFromFile'), - dict( - type='MultiScaleFlipAug', - img_scale=(1333, 800), - flip=False, - transforms=[ - dict(type='Resize', keep_ratio=True), - dict(type='RandomFlip'), - dict(type='Normalize', **img_norm_cfg), - dict(type='Pad', size_divisor=32), - dict(type='ImageToTensor', keys=['img']), - dict(type='Collect', keys=['img']), - ]) -] -data = dict( - imgs_per_gpu=2, - workers_per_gpu=2, - train=dict( - type=dataset_type, - ann_file=data_root + 'annotations/instances_train2017.json', - img_prefix=data_root + 'train2017/', - pipeline=train_pipeline), - val=dict( - type=dataset_type, - ann_file=data_root + 'annotations/instances_val2017.json', - img_prefix=data_root + 'val2017/', - pipeline=test_pipeline), - test=dict( - type=dataset_type, - ann_file=data_root + 'annotations/instances_val2017.json', - img_prefix=data_root + 'val2017/', - pipeline=test_pipeline)) -# optimizer -optimizer = dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001) -optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2)) -# learning policy -lr_config = dict( - policy='step', - warmup='linear', - warmup_iters=500, - warmup_ratio=1.0 / 3, - step=[16, 22]) -checkpoint_config = dict(interval=1) -# yapf:disable -log_config = dict( - interval=50, - hooks=[ - dict(type='TextLoggerHook'), - # dict(type='TensorboardLoggerHook') - ]) -# yapf:enable -# runtime settings -total_epochs = 24 -device_ids = range(8) -dist_params = dict(backend='nccl') -log_level = 'INFO' -work_dir = './work_dirs/reppoints_moment_r101_fpn_2x' -load_from = None -resume_from = None -auto_resume = True -workflow = [('train', 1)] diff --git a/configs/reppoints/reppoints_moment_r101_fpn_2x_mt.py b/configs/reppoints/reppoints_moment_r101_fpn_2x_mt.py deleted file mode 100644 index 2f481e7ac66197a886a97cb1c81fadc5608e828c..0000000000000000000000000000000000000000 --- a/configs/reppoints/reppoints_moment_r101_fpn_2x_mt.py +++ /dev/null @@ -1,146 +0,0 @@ -# model settings -norm_cfg = dict(type='GN', num_groups=32, requires_grad=True) - -model = dict( - type='RepPointsDetector', - pretrained='torchvision://resnet101', - backbone=dict( - type='ResNet', - depth=101, - num_stages=4, - out_indices=(0, 1, 2, 3), - frozen_stages=1, - style='pytorch'), - neck=dict( - type='FPN', - in_channels=[256, 512, 1024, 2048], - out_channels=256, - start_level=1, - add_extra_convs=True, - num_outs=5, - norm_cfg=norm_cfg), - bbox_head=dict( - type='RepPointsHead', - num_classes=81, - in_channels=256, - feat_channels=256, - point_feat_channels=256, - stacked_convs=3, - num_points=9, - gradient_mul=0.1, - point_strides=[8, 16, 32, 64, 128], - point_base_scale=4, - norm_cfg=norm_cfg, - loss_cls=dict( - type='FocalLoss', - use_sigmoid=True, - gamma=2.0, - alpha=0.25, - loss_weight=1.0), - loss_bbox_init=dict(type='SmoothL1Loss', beta=0.11, loss_weight=0.5), - loss_bbox_refine=dict(type='SmoothL1Loss', beta=0.11, loss_weight=1.0), - transform_method='moment')) -# training and testing settings -train_cfg = dict( - init=dict( - assigner=dict(type='PointAssigner', scale=4, pos_num=1), - allowed_border=-1, - pos_weight=-1, - debug=False), - refine=dict( - assigner=dict( - type='MaxIoUAssigner', - pos_iou_thr=0.5, - neg_iou_thr=0.4, - min_pos_iou=0, - ignore_iof_thr=-1), - allowed_border=-1, - pos_weight=-1, - debug=False)) -test_cfg = dict( - nms_pre=1000, - min_bbox_size=0, - score_thr=0.05, - nms=dict(type='nms', iou_thr=0.5), - max_per_img=100) -# dataset settings -dataset_type = 'CocoDataset' -data_root = 'data/coco/' -img_norm_cfg = dict( - mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) -train_pipeline = [ - dict(type='LoadImageFromFile'), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='Resize', - img_scale=[(1333, 480), (1333, 960)], - keep_ratio=True, - multiscale_mode='range'), - dict(type='RandomFlip', flip_ratio=0.5), - dict(type='Normalize', **img_norm_cfg), - dict(type='Pad', size_divisor=32), - dict(type='DefaultFormatBundle'), - dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']), -] -test_pipeline = [ - dict(type='LoadImageFromFile'), - dict( - type='MultiScaleFlipAug', - img_scale=(1333, 800), - flip=False, - transforms=[ - dict(type='Resize', keep_ratio=True), - dict(type='RandomFlip'), - dict(type='Normalize', **img_norm_cfg), - dict(type='Pad', size_divisor=32), - dict(type='ImageToTensor', keys=['img']), - dict(type='Collect', keys=['img']), - ]) -] -data = dict( - imgs_per_gpu=2, - workers_per_gpu=2, - train=dict( - type=dataset_type, - ann_file=data_root + 'annotations/instances_train2017.json', - img_prefix=data_root + 'train2017/', - pipeline=train_pipeline), - val=dict( - type=dataset_type, - ann_file=data_root + 'annotations/instances_val2017.json', - img_prefix=data_root + 'val2017/', - pipeline=test_pipeline), - test=dict( - type=dataset_type, - ann_file=data_root + 'annotations/instances_val2017.json', - img_prefix=data_root + 'val2017/', - pipeline=test_pipeline)) -# optimizer -optimizer = dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001) -optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2)) -# learning policy -lr_config = dict( - policy='step', - warmup='linear', - warmup_iters=500, - warmup_ratio=1.0 / 3, - step=[16, 22]) -checkpoint_config = dict(interval=1) -# yapf:disable -log_config = dict( - interval=50, - hooks=[ - dict(type='TextLoggerHook'), - # dict(type='TensorboardLoggerHook') - ]) -# yapf:enable -# runtime settings -total_epochs = 24 -device_ids = range(8) -dist_params = dict(backend='nccl') -log_level = 'INFO' -work_dir = './work_dirs/reppoints_moment_r101_fpn_2x_mt' -load_from = None -resume_from = None -auto_resume = True -workflow = [('train', 1)] diff --git a/configs/reppoints/reppoints_moment_r50_fpn_1x.py b/configs/reppoints/reppoints_moment_r50_fpn_1x.py deleted file mode 100644 index 671b9e2655b5ca886febd2b09937e4206b272533..0000000000000000000000000000000000000000 --- a/configs/reppoints/reppoints_moment_r50_fpn_1x.py +++ /dev/null @@ -1,142 +0,0 @@ -# model settings -norm_cfg = dict(type='GN', num_groups=32, requires_grad=True) - -model = dict( - type='RepPointsDetector', - pretrained='torchvision://resnet50', - backbone=dict( - type='ResNet', - depth=50, - num_stages=4, - out_indices=(0, 1, 2, 3), - frozen_stages=1, - style='pytorch'), - neck=dict( - type='FPN', - in_channels=[256, 512, 1024, 2048], - out_channels=256, - start_level=1, - add_extra_convs=True, - num_outs=5, - norm_cfg=norm_cfg), - bbox_head=dict( - type='RepPointsHead', - num_classes=81, - in_channels=256, - feat_channels=256, - point_feat_channels=256, - stacked_convs=3, - num_points=9, - gradient_mul=0.1, - point_strides=[8, 16, 32, 64, 128], - point_base_scale=4, - norm_cfg=norm_cfg, - loss_cls=dict( - type='FocalLoss', - use_sigmoid=True, - gamma=2.0, - alpha=0.25, - loss_weight=1.0), - loss_bbox_init=dict(type='SmoothL1Loss', beta=0.11, loss_weight=0.5), - loss_bbox_refine=dict(type='SmoothL1Loss', beta=0.11, loss_weight=1.0), - transform_method='moment')) -# training and testing settings -train_cfg = dict( - init=dict( - assigner=dict(type='PointAssigner', scale=4, pos_num=1), - allowed_border=-1, - pos_weight=-1, - debug=False), - refine=dict( - assigner=dict( - type='MaxIoUAssigner', - pos_iou_thr=0.5, - neg_iou_thr=0.4, - min_pos_iou=0, - ignore_iof_thr=-1), - allowed_border=-1, - pos_weight=-1, - debug=False)) -test_cfg = dict( - nms_pre=1000, - min_bbox_size=0, - score_thr=0.05, - nms=dict(type='nms', iou_thr=0.5), - max_per_img=100) -# dataset settings -dataset_type = 'CocoDataset' -data_root = 'data/coco/' -img_norm_cfg = dict( - mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) -train_pipeline = [ - dict(type='LoadImageFromFile'), - dict(type='LoadAnnotations', with_bbox=True), - dict(type='Resize', img_scale=(1333, 800), keep_ratio=True), - dict(type='RandomFlip', flip_ratio=0.5), - dict(type='Normalize', **img_norm_cfg), - dict(type='Pad', size_divisor=32), - dict(type='DefaultFormatBundle'), - dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']), -] -test_pipeline = [ - dict(type='LoadImageFromFile'), - dict( - type='MultiScaleFlipAug', - img_scale=(1333, 800), - flip=False, - transforms=[ - dict(type='Resize', keep_ratio=True), - dict(type='RandomFlip'), - dict(type='Normalize', **img_norm_cfg), - dict(type='Pad', size_divisor=32), - dict(type='ImageToTensor', keys=['img']), - dict(type='Collect', keys=['img']), - ]) -] -data = dict( - imgs_per_gpu=2, - workers_per_gpu=2, - train=dict( - type=dataset_type, - ann_file=data_root + 'annotations/instances_train2017.json', - img_prefix=data_root + 'train2017/', - pipeline=train_pipeline), - val=dict( - type=dataset_type, - ann_file=data_root + 'annotations/instances_val2017.json', - img_prefix=data_root + 'val2017/', - pipeline=test_pipeline), - test=dict( - type=dataset_type, - ann_file=data_root + 'annotations/instances_val2017.json', - img_prefix=data_root + 'val2017/', - pipeline=test_pipeline)) -# optimizer -optimizer = dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001) -optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2)) -# learning policy -lr_config = dict( - policy='step', - warmup='linear', - warmup_iters=500, - warmup_ratio=1.0 / 3, - step=[8, 11]) -checkpoint_config = dict(interval=1) -# yapf:disable -log_config = dict( - interval=50, - hooks=[ - dict(type='TextLoggerHook'), - # dict(type='TensorboardLoggerHook') - ]) -# yapf:enable -# runtime settings -total_epochs = 12 -device_ids = range(8) -dist_params = dict(backend='nccl') -log_level = 'INFO' -work_dir = './work_dirs/reppoints_moment_r50_fpn_1x' -load_from = None -resume_from = None -auto_resume = True -workflow = [('train', 1)] diff --git a/configs/reppoints/reppoints_moment_r50_fpn_2x.py b/configs/reppoints/reppoints_moment_r50_fpn_2x.py deleted file mode 100644 index 53824301418db1e46a467fe2547b0f2a5b420c32..0000000000000000000000000000000000000000 --- a/configs/reppoints/reppoints_moment_r50_fpn_2x.py +++ /dev/null @@ -1,142 +0,0 @@ -# model settings -norm_cfg = dict(type='GN', num_groups=32, requires_grad=True) - -model = dict( - type='RepPointsDetector', - pretrained='torchvision://resnet50', - backbone=dict( - type='ResNet', - depth=50, - num_stages=4, - out_indices=(0, 1, 2, 3), - frozen_stages=1, - style='pytorch'), - neck=dict( - type='FPN', - in_channels=[256, 512, 1024, 2048], - out_channels=256, - start_level=1, - add_extra_convs=True, - num_outs=5, - norm_cfg=norm_cfg), - bbox_head=dict( - type='RepPointsHead', - num_classes=81, - in_channels=256, - feat_channels=256, - point_feat_channels=256, - stacked_convs=3, - num_points=9, - gradient_mul=0.1, - point_strides=[8, 16, 32, 64, 128], - point_base_scale=4, - norm_cfg=norm_cfg, - loss_cls=dict( - type='FocalLoss', - use_sigmoid=True, - gamma=2.0, - alpha=0.25, - loss_weight=1.0), - loss_bbox_init=dict(type='SmoothL1Loss', beta=0.11, loss_weight=0.5), - loss_bbox_refine=dict(type='SmoothL1Loss', beta=0.11, loss_weight=1.0), - transform_method='moment')) -# training and testing settings -train_cfg = dict( - init=dict( - assigner=dict(type='PointAssigner', scale=4, pos_num=1), - allowed_border=-1, - pos_weight=-1, - debug=False), - refine=dict( - assigner=dict( - type='MaxIoUAssigner', - pos_iou_thr=0.5, - neg_iou_thr=0.4, - min_pos_iou=0, - ignore_iof_thr=-1), - allowed_border=-1, - pos_weight=-1, - debug=False)) -test_cfg = dict( - nms_pre=1000, - min_bbox_size=0, - score_thr=0.05, - nms=dict(type='nms', iou_thr=0.5), - max_per_img=100) -# dataset settings -dataset_type = 'CocoDataset' -data_root = 'data/coco/' -img_norm_cfg = dict( - mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) -train_pipeline = [ - dict(type='LoadImageFromFile'), - dict(type='LoadAnnotations', with_bbox=True), - dict(type='Resize', img_scale=(1333, 800), keep_ratio=True), - dict(type='RandomFlip', flip_ratio=0.5), - dict(type='Normalize', **img_norm_cfg), - dict(type='Pad', size_divisor=32), - dict(type='DefaultFormatBundle'), - dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']), -] -test_pipeline = [ - dict(type='LoadImageFromFile'), - dict( - type='MultiScaleFlipAug', - img_scale=(1333, 800), - flip=False, - transforms=[ - dict(type='Resize', keep_ratio=True), - dict(type='RandomFlip'), - dict(type='Normalize', **img_norm_cfg), - dict(type='Pad', size_divisor=32), - dict(type='ImageToTensor', keys=['img']), - dict(type='Collect', keys=['img']), - ]) -] -data = dict( - imgs_per_gpu=2, - workers_per_gpu=2, - train=dict( - type=dataset_type, - ann_file=data_root + 'annotations/instances_train2017.json', - img_prefix=data_root + 'train2017/', - pipeline=train_pipeline), - val=dict( - type=dataset_type, - ann_file=data_root + 'annotations/instances_val2017.json', - img_prefix=data_root + 'val2017/', - pipeline=test_pipeline), - test=dict( - type=dataset_type, - ann_file=data_root + 'annotations/instances_val2017.json', - img_prefix=data_root + 'val2017/', - pipeline=test_pipeline)) -# optimizer -optimizer = dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001) -optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2)) -# learning policy -lr_config = dict( - policy='step', - warmup='linear', - warmup_iters=500, - warmup_ratio=1.0 / 3, - step=[16, 22]) -checkpoint_config = dict(interval=1) -# yapf:disable -log_config = dict( - interval=50, - hooks=[ - dict(type='TextLoggerHook'), - # dict(type='TensorboardLoggerHook') - ]) -# yapf:enable -# runtime settings -total_epochs = 24 -device_ids = range(8) -dist_params = dict(backend='nccl') -log_level = 'INFO' -work_dir = './work_dirs/reppoints_moment_r50_fpn_2x' -load_from = None -resume_from = None -auto_resume = True -workflow = [('train', 1)] diff --git a/configs/reppoints/reppoints_moment_r50_fpn_2x_mt.py b/configs/reppoints/reppoints_moment_r50_fpn_2x_mt.py deleted file mode 100644 index ad86d74c458000784ee1d2c832a23500cafe8169..0000000000000000000000000000000000000000 --- a/configs/reppoints/reppoints_moment_r50_fpn_2x_mt.py +++ /dev/null @@ -1,146 +0,0 @@ -# model settings -norm_cfg = dict(type='GN', num_groups=32, requires_grad=True) - -model = dict( - type='RepPointsDetector', - pretrained='torchvision://resnet50', - backbone=dict( - type='ResNet', - depth=50, - num_stages=4, - out_indices=(0, 1, 2, 3), - frozen_stages=1, - style='pytorch'), - neck=dict( - type='FPN', - in_channels=[256, 512, 1024, 2048], - out_channels=256, - start_level=1, - add_extra_convs=True, - num_outs=5, - norm_cfg=norm_cfg), - bbox_head=dict( - type='RepPointsHead', - num_classes=81, - in_channels=256, - feat_channels=256, - point_feat_channels=256, - stacked_convs=3, - num_points=9, - gradient_mul=0.1, - point_strides=[8, 16, 32, 64, 128], - point_base_scale=4, - norm_cfg=norm_cfg, - loss_cls=dict( - type='FocalLoss', - use_sigmoid=True, - gamma=2.0, - alpha=0.25, - loss_weight=1.0), - loss_bbox_init=dict(type='SmoothL1Loss', beta=0.11, loss_weight=0.5), - loss_bbox_refine=dict(type='SmoothL1Loss', beta=0.11, loss_weight=1.0), - transform_method='moment')) -# training and testing settings -train_cfg = dict( - init=dict( - assigner=dict(type='PointAssigner', scale=4, pos_num=1), - allowed_border=-1, - pos_weight=-1, - debug=False), - refine=dict( - assigner=dict( - type='MaxIoUAssigner', - pos_iou_thr=0.5, - neg_iou_thr=0.4, - min_pos_iou=0, - ignore_iof_thr=-1), - allowed_border=-1, - pos_weight=-1, - debug=False)) -test_cfg = dict( - nms_pre=1000, - min_bbox_size=0, - score_thr=0.05, - nms=dict(type='nms', iou_thr=0.5), - max_per_img=100) -# dataset settings -dataset_type = 'CocoDataset' -data_root = 'data/coco/' -img_norm_cfg = dict( - mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) -train_pipeline = [ - dict(type='LoadImageFromFile'), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='Resize', - img_scale=[(1333, 480), (1333, 960)], - keep_ratio=True, - multiscale_mode='range'), - dict(type='RandomFlip', flip_ratio=0.5), - dict(type='Normalize', **img_norm_cfg), - dict(type='Pad', size_divisor=32), - dict(type='DefaultFormatBundle'), - dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']), -] -test_pipeline = [ - dict(type='LoadImageFromFile'), - dict( - type='MultiScaleFlipAug', - img_scale=(1333, 800), - flip=False, - transforms=[ - dict(type='Resize', keep_ratio=True), - dict(type='RandomFlip'), - dict(type='Normalize', **img_norm_cfg), - dict(type='Pad', size_divisor=32), - dict(type='ImageToTensor', keys=['img']), - dict(type='Collect', keys=['img']), - ]) -] -data = dict( - imgs_per_gpu=2, - workers_per_gpu=2, - train=dict( - type=dataset_type, - ann_file=data_root + 'annotations/instances_train2017.json', - img_prefix=data_root + 'train2017/', - pipeline=train_pipeline), - val=dict( - type=dataset_type, - ann_file=data_root + 'annotations/instances_val2017.json', - img_prefix=data_root + 'val2017/', - pipeline=test_pipeline), - test=dict( - type=dataset_type, - ann_file=data_root + 'annotations/instances_val2017.json', - img_prefix=data_root + 'val2017/', - pipeline=test_pipeline)) -# optimizer -optimizer = dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001) -optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2)) -# learning policy -lr_config = dict( - policy='step', - warmup='linear', - warmup_iters=500, - warmup_ratio=1.0 / 3, - step=[16, 22]) -checkpoint_config = dict(interval=1) -# yapf:disable -log_config = dict( - interval=50, - hooks=[ - dict(type='TextLoggerHook'), - # dict(type='TensorboardLoggerHook') - ]) -# yapf:enable -# runtime settings -total_epochs = 24 -device_ids = range(8) -dist_params = dict(backend='nccl') -log_level = 'INFO' -work_dir = './work_dirs/reppoints_moment_r50_fpn_2x_mt' -load_from = None -resume_from = None -auto_resume = True -workflow = [('train', 1)] diff --git a/configs/reppoints/reppoints_moment_x101_dcn_fpn_2x.py b/configs/reppoints/reppoints_moment_x101_dcn_fpn_2x.py deleted file mode 100644 index bc0bd6636d37d5c5cd95a4c7af9fe0cbc6c7d3c2..0000000000000000000000000000000000000000 --- a/configs/reppoints/reppoints_moment_x101_dcn_fpn_2x.py +++ /dev/null @@ -1,150 +0,0 @@ -# model settings -norm_cfg = dict(type='GN', num_groups=32, requires_grad=True) - -model = dict( - type='RepPointsDetector', - pretrained='open-mmlab://resnext101_32x4d', - backbone=dict( - type='ResNeXt', - depth=101, - groups=32, - base_width=4, - num_stages=4, - out_indices=(0, 1, 2, 3), - frozen_stages=1, - style='pytorch', - dcn=dict( - modulated=False, - groups=32, - deformable_groups=1, - fallback_on_stride=False), - stage_with_dcn=(False, True, True, True)), - neck=dict( - type='FPN', - in_channels=[256, 512, 1024, 2048], - out_channels=256, - start_level=1, - add_extra_convs=True, - num_outs=5, - norm_cfg=norm_cfg), - bbox_head=dict( - type='RepPointsHead', - num_classes=81, - in_channels=256, - feat_channels=256, - point_feat_channels=256, - stacked_convs=3, - num_points=9, - gradient_mul=0.1, - point_strides=[8, 16, 32, 64, 128], - point_base_scale=4, - norm_cfg=norm_cfg, - loss_cls=dict( - type='FocalLoss', - use_sigmoid=True, - gamma=2.0, - alpha=0.25, - loss_weight=1.0), - loss_bbox_init=dict(type='SmoothL1Loss', beta=0.11, loss_weight=0.5), - loss_bbox_refine=dict(type='SmoothL1Loss', beta=0.11, loss_weight=1.0), - transform_method='moment')) -# training and testing settings -train_cfg = dict( - init=dict( - assigner=dict(type='PointAssigner', scale=4, pos_num=1), - allowed_border=-1, - pos_weight=-1, - debug=False), - refine=dict( - assigner=dict( - type='MaxIoUAssigner', - pos_iou_thr=0.5, - neg_iou_thr=0.4, - min_pos_iou=0, - ignore_iof_thr=-1), - allowed_border=-1, - pos_weight=-1, - debug=False)) -test_cfg = dict( - nms_pre=1000, - min_bbox_size=0, - score_thr=0.05, - nms=dict(type='nms', iou_thr=0.5), - max_per_img=100) -# dataset settings -dataset_type = 'CocoDataset' -data_root = 'data/coco/' -img_norm_cfg = dict( - mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) -train_pipeline = [ - dict(type='LoadImageFromFile'), - dict(type='LoadAnnotations', with_bbox=True), - dict(type='Resize', img_scale=(1333, 800), keep_ratio=True), - dict(type='RandomFlip', flip_ratio=0.5), - dict(type='Normalize', **img_norm_cfg), - dict(type='Pad', size_divisor=32), - dict(type='DefaultFormatBundle'), - dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']), -] -test_pipeline = [ - dict(type='LoadImageFromFile'), - dict( - type='MultiScaleFlipAug', - img_scale=(1333, 800), - flip=False, - transforms=[ - dict(type='Resize', keep_ratio=True), - dict(type='RandomFlip'), - dict(type='Normalize', **img_norm_cfg), - dict(type='Pad', size_divisor=32), - dict(type='ImageToTensor', keys=['img']), - dict(type='Collect', keys=['img']), - ]) -] -data = dict( - imgs_per_gpu=2, - workers_per_gpu=2, - train=dict( - type=dataset_type, - ann_file=data_root + 'annotations/instances_train2017.json', - img_prefix=data_root + 'train2017/', - pipeline=train_pipeline), - val=dict( - type=dataset_type, - ann_file=data_root + 'annotations/instances_val2017.json', - img_prefix=data_root + 'val2017/', - pipeline=test_pipeline), - test=dict( - type=dataset_type, - ann_file=data_root + 'annotations/instances_val2017.json', - img_prefix=data_root + 'val2017/', - pipeline=test_pipeline)) -# optimizer -optimizer = dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001) -optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2)) -# learning policy -lr_config = dict( - policy='step', - warmup='linear', - warmup_iters=500, - warmup_ratio=1.0 / 3, - step=[16, 22]) -checkpoint_config = dict(interval=1) -# yapf:disable -log_config = dict( - interval=50, - hooks=[ - dict(type='TextLoggerHook'), - # dict(type='TensorboardLoggerHook') - ]) -# yapf:enable -# runtime settings -total_epochs = 24 -device_ids = range(8) -dist_params = dict(backend='nccl') -log_level = 'INFO' -work_dir = './work_dirs/reppoints_moment_x101_dcn_fpn_2x' -load_from = None -resume_from = None -auto_resume = True -workflow = [('train', 1)] diff --git a/configs/reppoints/reppoints_moment_x101_dcn_fpn_2x_mt.py b/configs/reppoints/reppoints_moment_x101_dcn_fpn_2x_mt.py deleted file mode 100644 index 93b5ac83abde9dc530f4105fd0ca71d9c42d329c..0000000000000000000000000000000000000000 --- a/configs/reppoints/reppoints_moment_x101_dcn_fpn_2x_mt.py +++ /dev/null @@ -1,154 +0,0 @@ -# model settings -norm_cfg = dict(type='GN', num_groups=32, requires_grad=True) - -model = dict( - type='RepPointsDetector', - pretrained='open-mmlab://resnext101_32x4d', - backbone=dict( - type='ResNeXt', - depth=101, - groups=32, - base_width=4, - num_stages=4, - out_indices=(0, 1, 2, 3), - frozen_stages=1, - style='pytorch', - dcn=dict( - modulated=False, - groups=32, - deformable_groups=1, - fallback_on_stride=False), - stage_with_dcn=(False, True, True, True)), - neck=dict( - type='FPN', - in_channels=[256, 512, 1024, 2048], - out_channels=256, - start_level=1, - add_extra_convs=True, - num_outs=5, - norm_cfg=norm_cfg), - bbox_head=dict( - type='RepPointsHead', - num_classes=81, - in_channels=256, - feat_channels=256, - point_feat_channels=256, - stacked_convs=3, - num_points=9, - gradient_mul=0.1, - point_strides=[8, 16, 32, 64, 128], - point_base_scale=4, - norm_cfg=norm_cfg, - loss_cls=dict( - type='FocalLoss', - use_sigmoid=True, - gamma=2.0, - alpha=0.25, - loss_weight=1.0), - loss_bbox_init=dict(type='SmoothL1Loss', beta=0.11, loss_weight=0.5), - loss_bbox_refine=dict(type='SmoothL1Loss', beta=0.11, loss_weight=1.0), - transform_method='moment')) -# training and testing settings -train_cfg = dict( - init=dict( - assigner=dict(type='PointAssigner', scale=4, pos_num=1), - allowed_border=-1, - pos_weight=-1, - debug=False), - refine=dict( - assigner=dict( - type='MaxIoUAssigner', - pos_iou_thr=0.5, - neg_iou_thr=0.4, - min_pos_iou=0, - ignore_iof_thr=-1), - allowed_border=-1, - pos_weight=-1, - debug=False)) -test_cfg = dict( - nms_pre=1000, - min_bbox_size=0, - score_thr=0.05, - nms=dict(type='nms', iou_thr=0.5), - max_per_img=100) -# dataset settings -dataset_type = 'CocoDataset' -data_root = 'data/coco/' -img_norm_cfg = dict( - mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) -train_pipeline = [ - dict(type='LoadImageFromFile'), - dict(type='LoadAnnotations', with_bbox=True), - dict( - type='Resize', - img_scale=[(1333, 480), (1333, 960)], - keep_ratio=True, - multiscale_mode='range'), - dict(type='RandomFlip', flip_ratio=0.5), - dict(type='Normalize', **img_norm_cfg), - dict(type='Pad', size_divisor=32), - dict(type='DefaultFormatBundle'), - dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']), -] -test_pipeline = [ - dict(type='LoadImageFromFile'), - dict( - type='MultiScaleFlipAug', - img_scale=(1333, 800), - flip=False, - transforms=[ - dict(type='Resize', keep_ratio=True), - dict(type='RandomFlip'), - dict(type='Normalize', **img_norm_cfg), - dict(type='Pad', size_divisor=32), - dict(type='ImageToTensor', keys=['img']), - dict(type='Collect', keys=['img']), - ]) -] -data = dict( - imgs_per_gpu=2, - workers_per_gpu=2, - train=dict( - type=dataset_type, - ann_file=data_root + 'annotations/instances_train2017.json', - img_prefix=data_root + 'train2017/', - pipeline=train_pipeline), - val=dict( - type=dataset_type, - ann_file=data_root + 'annotations/instances_val2017.json', - img_prefix=data_root + 'val2017/', - pipeline=test_pipeline), - test=dict( - type=dataset_type, - ann_file=data_root + 'annotations/instances_val2017.json', - img_prefix=data_root + 'val2017/', - pipeline=test_pipeline)) -# optimizer -optimizer = dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001) -optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2)) -# learning policy -lr_config = dict( - policy='step', - warmup='linear', - warmup_iters=500, - warmup_ratio=1.0 / 3, - step=[16, 22]) -checkpoint_config = dict(interval=1) -# yapf:disable -log_config = dict( - interval=50, - hooks=[ - dict(type='TextLoggerHook'), - # dict(type='TensorboardLoggerHook') - ]) -# yapf:enable -# runtime settings -total_epochs = 24 -device_ids = range(8) -dist_params = dict(backend='nccl') -log_level = 'INFO' -work_dir = './work_dirs/reppoints_moment_x101_dcn_fpn_2x_mt' -load_from = None -resume_from = None -auto_resume = True -workflow = [('train', 1)] diff --git a/configs/reppoints/reppoints_partial_minmax_r50_fpn_1x.py b/configs/reppoints/reppoints_partial_minmax_r50_fpn_1x.py deleted file mode 100644 index 2296163c8a254defb65e0340bb796f3d45c127d8..0000000000000000000000000000000000000000 --- a/configs/reppoints/reppoints_partial_minmax_r50_fpn_1x.py +++ /dev/null @@ -1,142 +0,0 @@ -# model settings -norm_cfg = dict(type='GN', num_groups=32, requires_grad=True) - -model = dict( - type='RepPointsDetector', - pretrained='torchvision://resnet50', - backbone=dict( - type='ResNet', - depth=50, - num_stages=4, - out_indices=(0, 1, 2, 3), - frozen_stages=1, - style='pytorch'), - neck=dict( - type='FPN', - in_channels=[256, 512, 1024, 2048], - out_channels=256, - start_level=1, - add_extra_convs=True, - num_outs=5, - norm_cfg=norm_cfg), - bbox_head=dict( - type='RepPointsHead', - num_classes=81, - in_channels=256, - feat_channels=256, - point_feat_channels=256, - stacked_convs=3, - num_points=9, - gradient_mul=0.1, - point_strides=[8, 16, 32, 64, 128], - point_base_scale=4, - norm_cfg=norm_cfg, - loss_cls=dict( - type='FocalLoss', - use_sigmoid=True, - gamma=2.0, - alpha=0.25, - loss_weight=1.0), - loss_bbox_init=dict(type='SmoothL1Loss', beta=0.11, loss_weight=0.5), - loss_bbox_refine=dict(type='SmoothL1Loss', beta=0.11, loss_weight=1.0), - transform_method='partial_minmax')) -# training and testing settings -train_cfg = dict( - init=dict( - assigner=dict(type='PointAssigner', scale=4, pos_num=1), - allowed_border=-1, - pos_weight=-1, - debug=False), - refine=dict( - assigner=dict( - type='MaxIoUAssigner', - pos_iou_thr=0.5, - neg_iou_thr=0.4, - min_pos_iou=0, - ignore_iof_thr=-1), - allowed_border=-1, - pos_weight=-1, - debug=False)) -test_cfg = dict( - nms_pre=1000, - min_bbox_size=0, - score_thr=0.05, - nms=dict(type='nms', iou_thr=0.5), - max_per_img=100) -# dataset settings -dataset_type = 'CocoDataset' -data_root = 'data/coco/' -img_norm_cfg = dict( - mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) -train_pipeline = [ - dict(type='LoadImageFromFile'), - dict(type='LoadAnnotations', with_bbox=True), - dict(type='Resize', img_scale=(1333, 800), keep_ratio=True), - dict(type='RandomFlip', flip_ratio=0.5), - dict(type='Normalize', **img_norm_cfg), - dict(type='Pad', size_divisor=32), - dict(type='DefaultFormatBundle'), - dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']), -] -test_pipeline = [ - dict(type='LoadImageFromFile'), - dict( - type='MultiScaleFlipAug', - img_scale=(1333, 800), - flip=False, - transforms=[ - dict(type='Resize', keep_ratio=True), - dict(type='RandomFlip'), - dict(type='Normalize', **img_norm_cfg), - dict(type='Pad', size_divisor=32), - dict(type='ImageToTensor', keys=['img']), - dict(type='Collect', keys=['img']), - ]) -] -data = dict( - imgs_per_gpu=2, - workers_per_gpu=2, - train=dict( - type=dataset_type, - ann_file=data_root + 'annotations/instances_train2017.json', - img_prefix=data_root + 'train2017/', - pipeline=train_pipeline), - val=dict( - type=dataset_type, - ann_file=data_root + 'annotations/instances_val2017.json', - img_prefix=data_root + 'val2017/', - pipeline=test_pipeline), - test=dict( - type=dataset_type, - ann_file=data_root + 'annotations/instances_val2017.json', - img_prefix=data_root + 'val2017/', - pipeline=test_pipeline)) -# optimizer -optimizer = dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0001) -optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2)) -# learning policy -lr_config = dict( - policy='step', - warmup='linear', - warmup_iters=500, - warmup_ratio=1.0 / 3, - step=[8, 11]) -checkpoint_config = dict(interval=1) -# yapf:disable -log_config = dict( - interval=50, - hooks=[ - dict(type='TextLoggerHook'), - # dict(type='TensorboardLoggerHook') - ]) -# yapf:enable -# runtime settings -total_epochs = 12 -device_ids = range(8) -dist_params = dict(backend='nccl') -log_level = 'INFO' -work_dir = './work_dirs/reppoints_partial_minmax_r50_fpn_1x' -load_from = None -resume_from = None -auto_resume = True -workflow = [('train', 1)] diff --git a/mmdet/core/anchor/__init__.py b/mmdet/core/anchor/__init__.py index dfeb3b407300beee4d530c111b4187693b190e2b..a5f070f8bbe9a7293acf13fd89d480827eba3a69 100644 --- a/mmdet/core/anchor/__init__.py +++ b/mmdet/core/anchor/__init__.py @@ -1,10 +1,8 @@ from .anchor_generator import AnchorGenerator from .anchor_target import anchor_inside_flags, anchor_target from .guided_anchor_target import ga_loc_target, ga_shape_target -from .point_generator import PointGenerator -from .point_target import point_target __all__ = [ 'AnchorGenerator', 'anchor_target', 'anchor_inside_flags', 'ga_loc_target', - 'ga_shape_target', 'PointGenerator', 'point_target' + 'ga_shape_target' ] diff --git a/mmdet/core/anchor/point_generator.py b/mmdet/core/anchor/point_generator.py deleted file mode 100644 index c1a34dddd7a76946cf8177f0aea529a29cfa4a78..0000000000000000000000000000000000000000 --- a/mmdet/core/anchor/point_generator.py +++ /dev/null @@ -1,34 +0,0 @@ -import torch - - -class PointGenerator(object): - - def _meshgrid(self, x, y, row_major=True): - xx = x.repeat(len(y)) - yy = y.view(-1, 1).repeat(1, len(x)).view(-1) - if row_major: - return xx, yy - else: - return yy, xx - - def grid_points(self, featmap_size, stride=16, device='cuda'): - feat_h, feat_w = featmap_size - shift_x = torch.arange(0., feat_w, device=device) * stride - shift_y = torch.arange(0., feat_h, device=device) * stride - shift_xx, shift_yy = self._meshgrid(shift_x, shift_y) - stride = shift_x.new_full((shift_xx.shape[0], ), stride) - shifts = torch.stack([shift_xx, shift_yy, stride], dim=-1) - all_points = shifts.to(device) - return all_points - - def valid_flags(self, featmap_size, valid_size, device='cuda'): - feat_h, feat_w = featmap_size - valid_h, valid_w = valid_size - assert valid_h <= feat_h and valid_w <= feat_w - valid_x = torch.zeros(feat_w, dtype=torch.uint8, device=device) - valid_y = torch.zeros(feat_h, dtype=torch.uint8, device=device) - valid_x[:valid_w] = 1 - valid_y[:valid_h] = 1 - valid_xx, valid_yy = self._meshgrid(valid_x, valid_y) - valid = valid_xx & valid_yy - return valid diff --git a/mmdet/core/anchor/point_target.py b/mmdet/core/anchor/point_target.py deleted file mode 100644 index 1ab8d0260c93e479783fff9fbb02d680589ed28e..0000000000000000000000000000000000000000 --- a/mmdet/core/anchor/point_target.py +++ /dev/null @@ -1,165 +0,0 @@ -import torch - -from ..bbox import PseudoSampler, assign_and_sample, build_assigner -from ..utils import multi_apply - - -def point_target(proposals_list, - valid_flag_list, - gt_bboxes_list, - img_metas, - cfg, - gt_bboxes_ignore_list=None, - gt_labels_list=None, - label_channels=1, - sampling=True, - unmap_outputs=True): - """Compute corresponding GT box and classification targets for proposals. - - Args: - points_list (list[list]): Multi level points of each image. - valid_flag_list (list[list]): Multi level valid flags of each image. - gt_bboxes_list (list[Tensor]): Ground truth bboxes of each image. - img_metas (list[dict]): Meta info of each image. - cfg (dict): train sample configs. - - Returns: - tuple - """ - num_imgs = len(img_metas) - assert len(proposals_list) == len(valid_flag_list) == num_imgs - - # points number of multi levels - num_level_proposals = [points.size(0) for points in proposals_list[0]] - - # concat all level points and flags to a single tensor - for i in range(num_imgs): - assert len(proposals_list[i]) == len(valid_flag_list[i]) - proposals_list[i] = torch.cat(proposals_list[i]) - valid_flag_list[i] = torch.cat(valid_flag_list[i]) - - # compute targets for each image - if gt_bboxes_ignore_list is None: - gt_bboxes_ignore_list = [None for _ in range(num_imgs)] - if gt_labels_list is None: - gt_labels_list = [None for _ in range(num_imgs)] - (all_labels, all_label_weights, all_bbox_gt, all_proposals, - all_proposal_weights, pos_inds_list, neg_inds_list) = multi_apply( - point_target_single, - proposals_list, - valid_flag_list, - gt_bboxes_list, - gt_bboxes_ignore_list, - gt_labels_list, - cfg=cfg, - label_channels=label_channels, - sampling=sampling, - unmap_outputs=unmap_outputs) - # no valid points - if any([labels is None for labels in all_labels]): - return None - # sampled points of all images - num_total_pos = sum([max(inds.numel(), 1) for inds in pos_inds_list]) - num_total_neg = sum([max(inds.numel(), 1) for inds in neg_inds_list]) - labels_list = images_to_levels(all_labels, num_level_proposals) - label_weights_list = images_to_levels(all_label_weights, - num_level_proposals) - bbox_gt_list = images_to_levels(all_bbox_gt, num_level_proposals) - proposals_list = images_to_levels(all_proposals, num_level_proposals) - proposal_weights_list = images_to_levels(all_proposal_weights, - num_level_proposals) - return (labels_list, label_weights_list, bbox_gt_list, proposals_list, - proposal_weights_list, num_total_pos, num_total_neg) - - -def images_to_levels(target, num_level_grids): - """Convert targets by image to targets by feature level. - - [target_img0, target_img1] -> [target_level0, target_level1, ...] - """ - target = torch.stack(target, 0) - level_targets = [] - start = 0 - for n in num_level_grids: - end = start + n - level_targets.append(target[:, start:end].squeeze(0)) - start = end - return level_targets - - -def point_target_single(flat_proposals, - valid_flags, - gt_bboxes, - gt_bboxes_ignore, - gt_labels, - cfg, - label_channels=1, - sampling=True, - unmap_outputs=True): - inside_flags = valid_flags - if not inside_flags.any(): - return (None, ) * 7 - # assign gt and sample proposals - proposals = flat_proposals[inside_flags, :] - - if sampling: - assign_result, sampling_result = assign_and_sample( - proposals, gt_bboxes, gt_bboxes_ignore, None, cfg) - else: - bbox_assigner = build_assigner(cfg.assigner) - assign_result = bbox_assigner.assign(proposals, gt_bboxes, - gt_bboxes_ignore, gt_labels) - bbox_sampler = PseudoSampler() - sampling_result = bbox_sampler.sample(assign_result, proposals, - gt_bboxes) - - num_valid_proposals = proposals.shape[0] - bbox_gt = proposals.new_zeros([num_valid_proposals, 4]) - pos_proposals = torch.zeros_like(proposals) - proposals_weights = proposals.new_zeros([num_valid_proposals, 4]) - labels = proposals.new_zeros(num_valid_proposals, dtype=torch.long) - label_weights = proposals.new_zeros(num_valid_proposals, dtype=torch.float) - - pos_inds = sampling_result.pos_inds - neg_inds = sampling_result.neg_inds - if len(pos_inds) > 0: - pos_gt_bboxes = sampling_result.pos_gt_bboxes - bbox_gt[pos_inds, :] = pos_gt_bboxes - pos_proposals[pos_inds, :] = proposals[pos_inds, :] - proposals_weights[pos_inds, :] = 1.0 - if gt_labels is None: - labels[pos_inds] = 1 - else: - labels[pos_inds] = gt_labels[sampling_result.pos_assigned_gt_inds] - if cfg.pos_weight <= 0: - label_weights[pos_inds] = 1.0 - else: - label_weights[pos_inds] = cfg.pos_weight - if len(neg_inds) > 0: - label_weights[neg_inds] = 1.0 - - # map up to original set of proposals - if unmap_outputs: - num_total_proposals = flat_proposals.size(0) - labels = unmap(labels, num_total_proposals, inside_flags) - label_weights = unmap(label_weights, num_total_proposals, inside_flags) - bbox_gt = unmap(bbox_gt, num_total_proposals, inside_flags) - pos_proposals = unmap(pos_proposals, num_total_proposals, inside_flags) - proposals_weights = unmap(proposals_weights, num_total_proposals, - inside_flags) - - return (labels, label_weights, bbox_gt, pos_proposals, proposals_weights, - pos_inds, neg_inds) - - -def unmap(data, count, inds, fill=0): - """ Unmap a subset of item (data) back to the original set of items (of - size count) """ - if data.dim() == 1: - ret = data.new_full((count, ), fill) - ret[inds] = data - else: - new_size = (count, ) + data.size()[1:] - ret = data.new_full(new_size, fill) - ret[inds, :] = data - return ret diff --git a/mmdet/core/bbox/assigners/__init__.py b/mmdet/core/bbox/assigners/__init__.py index 93eebb775be7720f232f122050d5f753117f7731..594e8406b5dad0ef381a9dd9d2ec9fbb75e0efd7 100644 --- a/mmdet/core/bbox/assigners/__init__.py +++ b/mmdet/core/bbox/assigners/__init__.py @@ -2,9 +2,7 @@ from .approx_max_iou_assigner import ApproxMaxIoUAssigner from .assign_result import AssignResult from .base_assigner import BaseAssigner from .max_iou_assigner import MaxIoUAssigner -from .point_assigner import PointAssigner __all__ = [ - 'BaseAssigner', 'MaxIoUAssigner', 'ApproxMaxIoUAssigner', 'AssignResult', - 'PointAssigner' + 'BaseAssigner', 'MaxIoUAssigner', 'ApproxMaxIoUAssigner', 'AssignResult' ] diff --git a/mmdet/core/bbox/assigners/point_assigner.py b/mmdet/core/bbox/assigners/point_assigner.py deleted file mode 100644 index fe81e7d57e0a00ebbd732638927d629c4e87960a..0000000000000000000000000000000000000000 --- a/mmdet/core/bbox/assigners/point_assigner.py +++ /dev/null @@ -1,116 +0,0 @@ -import torch - -from .assign_result import AssignResult -from .base_assigner import BaseAssigner - - -class PointAssigner(BaseAssigner): - """Assign a corresponding gt bbox or background to each point. - - Each proposals will be assigned with `0`, or a positive integer - indicating the ground truth index. - - - 0: negative sample, no assigned gt - - positive integer: positive sample, index (1-based) of assigned gt - - """ - - def __init__(self, scale=4, pos_num=3): - self.scale = scale - self.pos_num = pos_num - - def assign(self, points, gt_bboxes, gt_bboxes_ignore=None, gt_labels=None): - """Assign gt to points. - - This method assign a gt bbox to every points set, each points set - will be assigned with 0, or a positive number. - 0 means negative sample, positive number is the index (1-based) of - assigned gt. - The assignment is done in following steps, the order matters. - - 1. assign every points to 0 - 2. A point is assigned to some gt bbox if - (i) the point is within the k closest points to the gt bbox - (ii) the distance between this point and the gt is smaller than - other gt bboxes - - Args: - points (Tensor): points to be assigned, shape(n, 3) while last - dimension stands for (x, y, stride). - gt_bboxes (Tensor): Groundtruth boxes, shape (k, 4). - gt_bboxes_ignore (Tensor, optional): Ground truth bboxes that are - labelled as `ignored`, e.g., crowd boxes in COCO. - gt_labels (Tensor, optional): Label of gt_bboxes, shape (k, ). - - Returns: - :obj:`AssignResult`: The assign result. - """ - if points.shape[0] == 0 or gt_bboxes.shape[0] == 0: - raise ValueError('No gt or bboxes') - points_xy = points[:, :2] - points_stride = points[:, 2] - points_lvl = torch.log2( - points_stride).int() # [3...,4...,5...,6...,7...] - lvl_min, lvl_max = points_lvl.min(), points_lvl.max() - num_gts, num_points = gt_bboxes.shape[0], points.shape[0] - - # assign gt box - gt_bboxes_xy = (gt_bboxes[:, :2] + gt_bboxes[:, 2:]) / 2 - gt_bboxes_wh = (gt_bboxes[:, 2:] - gt_bboxes[:, :2]).clamp(min=1e-6) - scale = self.scale - gt_bboxes_lvl = ((torch.log2(gt_bboxes_wh[:, 0] / scale) + - torch.log2(gt_bboxes_wh[:, 1] / scale)) / 2).int() - gt_bboxes_lvl = torch.clamp(gt_bboxes_lvl, min=lvl_min, max=lvl_max) - - # stores the assigned gt index of each point - assigned_gt_inds = points.new_zeros((num_points, ), dtype=torch.long) - # stores the assigned gt dist (to this point) of each point - assigned_gt_dist = points.new_full((num_points, ), float('inf')) - points_range = torch.arange(points.shape[0]) - - for idx in range(num_gts): - gt_lvl = gt_bboxes_lvl[idx] - # get the index of points in this level - lvl_idx = gt_lvl == points_lvl - points_index = points_range[lvl_idx] - # get the points in this level - lvl_points = points_xy[lvl_idx, :] - # get the center point of gt - gt_point = gt_bboxes_xy[[idx], :] - # get width and height of gt - gt_wh = gt_bboxes_wh[[idx], :] - # compute the distance between gt center and - # all points in this level - points_gt_dist = ((lvl_points - gt_point) / gt_wh).norm(dim=1) - # find the nearest k points to gt center in this level - min_dist, min_dist_index = torch.topk( - points_gt_dist, self.pos_num, largest=False) - # the index of nearest k points to gt center in this level - min_dist_points_index = points_index[min_dist_index] - # The less_than_recorded_index stores the index - # of min_dist that is less then the assigned_gt_dist. Where - # assigned_gt_dist stores the dist from previous assigned gt - # (if exist) to each point. - less_than_recorded_index = min_dist < assigned_gt_dist[ - min_dist_points_index] - # The min_dist_points_index stores the index of points satisfy: - # (1) it is k nearest to current gt center in this level. - # (2) it is closer to current gt center than other gt center. - min_dist_points_index = min_dist_points_index[ - less_than_recorded_index] - # assign the result - assigned_gt_inds[min_dist_points_index] = idx + 1 - assigned_gt_dist[min_dist_points_index] = min_dist[ - less_than_recorded_index] - - if gt_labels is not None: - assigned_labels = assigned_gt_inds.new_zeros((num_points, )) - pos_inds = torch.nonzero(assigned_gt_inds > 0).squeeze() - if pos_inds.numel() > 0: - assigned_labels[pos_inds] = gt_labels[ - assigned_gt_inds[pos_inds] - 1] - else: - assigned_labels = None - - return AssignResult( - num_gts, assigned_gt_inds, None, labels=assigned_labels) diff --git a/mmdet/models/anchor_heads/__init__.py b/mmdet/models/anchor_heads/__init__.py index 5df25d04e16975e71067028f7622d8174eb7a7b7..f5a54ce4cfe1aadf1cf61d2bda714b5413bf013d 100644 --- a/mmdet/models/anchor_heads/__init__.py +++ b/mmdet/models/anchor_heads/__init__.py @@ -3,13 +3,11 @@ from .fcos_head import FCOSHead from .ga_retina_head import GARetinaHead from .ga_rpn_head import GARPNHead from .guided_anchor_head import FeatureAdaption, GuidedAnchorHead -from .reppoints_head import RepPointsHead from .retina_head import RetinaHead from .rpn_head import RPNHead from .ssd_head import SSDHead __all__ = [ 'AnchorHead', 'GuidedAnchorHead', 'FeatureAdaption', 'RPNHead', - 'GARPNHead', 'RetinaHead', 'GARetinaHead', 'SSDHead', 'FCOSHead', - 'RepPointsHead' + 'GARPNHead', 'RetinaHead', 'GARetinaHead', 'SSDHead', 'FCOSHead' ] diff --git a/mmdet/models/anchor_heads/reppoints_head.py b/mmdet/models/anchor_heads/reppoints_head.py deleted file mode 100644 index 1ce7abd16f917c1ca07dcf8eb78bc3633eb75704..0000000000000000000000000000000000000000 --- a/mmdet/models/anchor_heads/reppoints_head.py +++ /dev/null @@ -1,596 +0,0 @@ -from __future__ import division - -import numpy as np -import torch -import torch.nn as nn -from mmcv.cnn import normal_init - -from mmdet.core import (PointGenerator, multi_apply, multiclass_nms, - point_target) -from mmdet.ops import DeformConv -from ..builder import build_loss -from ..registry import HEADS -from ..utils import ConvModule, bias_init_with_prob - - -@HEADS.register_module -class RepPointsHead(nn.Module): - """RepPoint head. - - Args: - in_channels (int): Number of channels in the input feature map. - feat_channels (int): Number of channels of the feature map. - point_feat_channels (int): Number of channels of points features. - stacked_convs (int): How many conv layers are used. - gradient_mul (float): The multiplier to gradients from - points refinement and recognition. - point_strides (Iterable): points strides. - point_base_scale (int): bbox scale for assigning labels. - loss_cls (dict): Config of classification loss. - loss_bbox_init (dict): Config of initial points loss. - loss_bbox_refine (dict): Config of points loss in refinement. - use_grid_points (bool): If we use bounding box representation, the - reppoints is represented as grid points on the bounding box. - center_init (bool): Whether to use center point assignment. - transform_method (str): The methods to transform RepPoints to bbox. - """ # noqa: W605 - - def __init__(self, - num_classes, - in_channels, - feat_channels=256, - point_feat_channels=256, - stacked_convs=3, - num_points=9, - gradient_mul=0.1, - point_strides=[8, 16, 32, 64, 128], - point_base_scale=4, - conv_cfg=None, - norm_cfg=None, - loss_cls=dict( - type='FocalLoss', - use_sigmoid=True, - gamma=2.0, - alpha=0.25, - loss_weight=1.0), - loss_bbox_init=dict( - type='SmoothL1Loss', beta=1.0 / 9.0, loss_weight=0.5), - loss_bbox_refine=dict( - type='SmoothL1Loss', beta=1.0 / 9.0, loss_weight=1.0), - use_grid_points=False, - center_init=True, - transform_method='moment', - moment_mul=0.01): - super(RepPointsHead, self).__init__() - self.in_channels = in_channels - self.num_classes = num_classes - self.feat_channels = feat_channels - self.point_feat_channels = point_feat_channels - self.stacked_convs = stacked_convs - self.num_points = num_points - self.gradient_mul = gradient_mul - self.point_base_scale = point_base_scale - self.point_strides = point_strides - self.conv_cfg = conv_cfg - self.norm_cfg = norm_cfg - self.use_sigmoid_cls = loss_cls.get('use_sigmoid', False) - self.sampling = loss_cls['type'] not in ['FocalLoss'] - self.loss_cls = build_loss(loss_cls) - self.loss_bbox_init = build_loss(loss_bbox_init) - self.loss_bbox_refine = build_loss(loss_bbox_refine) - self.use_grid_points = use_grid_points - self.center_init = center_init - self.transform_method = transform_method - if self.transform_method == 'moment': - self.moment_transfer = nn.Parameter( - data=torch.zeros(2), requires_grad=True) - self.moment_mul = moment_mul - if self.use_sigmoid_cls: - self.cls_out_channels = self.num_classes - 1 - else: - self.cls_out_channels = self.num_classes - self.point_generators = [PointGenerator() for _ in self.point_strides] - # we use deformable conv to extract points features - self.dcn_kernel = int(np.sqrt(num_points)) - self.dcn_pad = int((self.dcn_kernel - 1) / 2) - assert self.dcn_kernel * self.dcn_kernel == num_points, \ - "The points number should be a square number." - assert self.dcn_kernel % 2 == 1, \ - "The points number should be an odd square number." - dcn_base = np.arange(-self.dcn_pad, - self.dcn_pad + 1).astype(np.float64) - dcn_base_y = np.repeat(dcn_base, self.dcn_kernel) - dcn_base_x = np.tile(dcn_base, self.dcn_kernel) - dcn_base_offset = np.stack([dcn_base_y, dcn_base_x], axis=1).reshape( - (-1)) - self.dcn_base_offset = torch.tensor(dcn_base_offset).view(1, -1, 1, 1) - self._init_layers() - - def _init_layers(self): - self.relu = nn.ReLU(inplace=True) - self.cls_convs = nn.ModuleList() - self.reg_convs = nn.ModuleList() - for i in range(self.stacked_convs): - chn = self.in_channels if i == 0 else self.feat_channels - self.cls_convs.append( - ConvModule( - chn, - self.feat_channels, - 3, - stride=1, - padding=1, - conv_cfg=self.conv_cfg, - norm_cfg=self.norm_cfg)) - self.reg_convs.append( - ConvModule( - chn, - self.feat_channels, - 3, - stride=1, - padding=1, - conv_cfg=self.conv_cfg, - norm_cfg=self.norm_cfg)) - pts_out_dim = 4 if self.use_grid_points else 2 * self.num_points - self.reppoints_cls_conv = DeformConv(self.feat_channels, - self.point_feat_channels, - self.dcn_kernel, 1, self.dcn_pad) - self.reppoints_cls_out = nn.Conv2d(self.point_feat_channels, - self.cls_out_channels, 1, 1, 0) - self.reppoints_pts_init_conv = nn.Conv2d(self.feat_channels, - self.point_feat_channels, 3, - 1, 1) - self.reppoints_pts_init_out = nn.Conv2d(self.point_feat_channels, - pts_out_dim, 1, 1, 0) - self.reppoints_pts_refine_conv = DeformConv(self.feat_channels, - self.point_feat_channels, - self.dcn_kernel, 1, - self.dcn_pad) - self.reppoints_pts_refine_out = nn.Conv2d(self.point_feat_channels, - pts_out_dim, 1, 1, 0) - - def init_weights(self): - for m in self.cls_convs: - normal_init(m.conv, std=0.01) - for m in self.reg_convs: - normal_init(m.conv, std=0.01) - bias_cls = bias_init_with_prob(0.01) - normal_init(self.reppoints_cls_conv, std=0.01) - normal_init(self.reppoints_cls_out, std=0.01, bias=bias_cls) - normal_init(self.reppoints_pts_init_conv, std=0.01) - normal_init(self.reppoints_pts_init_out, std=0.01) - normal_init(self.reppoints_pts_refine_conv, std=0.01) - normal_init(self.reppoints_pts_refine_out, std=0.01) - - def points2bbox(self, pts, y_first=True): - """ - Converting the points set into bounding box. - :param pts: the input points sets (fields), each points - set (fields) is represented as 2n scalar. - :param y_first: if y_fisrt=True, the point set is represented as - [y1, x1, y2, x2 ... yn, xn], otherwise the point set is - represented as [x1, y1, x2, y2 ... xn, yn]. - :return: each points set is converting to a bbox [x1, y1, x2, y2]. - """ - pts_reshape = pts.view(pts.shape[0], -1, 2, *pts.shape[2:]) - pts_y = pts_reshape[:, :, 0, ...] if y_first else pts_reshape[:, :, 1, - ...] - pts_x = pts_reshape[:, :, 1, ...] if y_first else pts_reshape[:, :, 0, - ...] - if self.transform_method == 'minmax': - bbox_left = pts_x.min(dim=1, keepdim=True)[0] - bbox_right = pts_x.max(dim=1, keepdim=True)[0] - bbox_up = pts_y.min(dim=1, keepdim=True)[0] - bbox_bottom = pts_y.max(dim=1, keepdim=True)[0] - bbox = torch.cat([bbox_left, bbox_up, bbox_right, bbox_bottom], - dim=1) - elif self.transform_method == 'partial_minmax': - pts_y = pts_y[:, :4, ...] - pts_x = pts_x[:, :4, ...] - bbox_left = pts_x.min(dim=1, keepdim=True)[0] - bbox_right = pts_x.max(dim=1, keepdim=True)[0] - bbox_up = pts_y.min(dim=1, keepdim=True)[0] - bbox_bottom = pts_y.max(dim=1, keepdim=True)[0] - bbox = torch.cat([bbox_left, bbox_up, bbox_right, bbox_bottom], - dim=1) - elif self.transform_method == 'moment': - pts_y_mean = pts_y.mean(dim=1, keepdim=True) - pts_x_mean = pts_x.mean(dim=1, keepdim=True) - pts_y_std = torch.std(pts_y - pts_y_mean, dim=1, keepdim=True) - pts_x_std = torch.std(pts_x - pts_x_mean, dim=1, keepdim=True) - moment_transfer = (self.moment_transfer * self.moment_mul) + ( - self.moment_transfer.detach() * (1 - self.moment_mul)) - moment_width_transfer = moment_transfer[0] - moment_height_transfer = moment_transfer[1] - half_width = pts_x_std * torch.exp(moment_width_transfer) - half_height = pts_y_std * torch.exp(moment_height_transfer) - bbox = torch.cat([ - pts_x_mean - half_width, pts_y_mean - half_height, - pts_x_mean + half_width, pts_y_mean + half_height - ], - dim=1) - else: - raise NotImplementedError - return bbox - - def gen_grid_from_reg(self, reg, previous_boxes): - """ - Base on the previous bboxes and regression values, we compute the - regressed bboxes and generate the grids on the bboxes. - :param reg: the regression value to previous bboxes. - :param previous_boxes: previous bboxes. - :return: generate grids on the regressed bboxes. - """ - b, _, h, w = reg.shape - bxy = (previous_boxes[:, :2, ...] + previous_boxes[:, 2:, ...]) / 2. - bwh = (previous_boxes[:, 2:, ...] - - previous_boxes[:, :2, ...]).clamp(min=1e-6) - grid_topleft = bxy + bwh * reg[:, :2, ...] - 0.5 * bwh * torch.exp( - reg[:, 2:, ...]) - grid_wh = bwh * torch.exp(reg[:, 2:, ...]) - grid_left = grid_topleft[:, [0], ...] - grid_top = grid_topleft[:, [1], ...] - grid_width = grid_wh[:, [0], ...] - grid_height = grid_wh[:, [1], ...] - intervel = torch.linspace(0., 1., self.dcn_kernel).view( - 1, self.dcn_kernel, 1, 1).type_as(reg) - grid_x = grid_left + grid_width * intervel - grid_x = grid_x.unsqueeze(1).repeat(1, self.dcn_kernel, 1, 1, 1) - grid_x = grid_x.view(b, -1, h, w) - grid_y = grid_top + grid_height * intervel - grid_y = grid_y.unsqueeze(2).repeat(1, 1, self.dcn_kernel, 1, 1) - grid_y = grid_y.view(b, -1, h, w) - grid_yx = torch.stack([grid_y, grid_x], dim=2) - grid_yx = grid_yx.view(b, -1, h, w) - regressed_bbox = torch.cat([ - grid_left, grid_top, grid_left + grid_width, grid_top + grid_height - ], 1) - return grid_yx, regressed_bbox - - def forward_single(self, x): - dcn_base_offset = self.dcn_base_offset.type_as(x) - # If we use center_init, the initial reppoints is from center points. - # If we use bounding bbox representation, the initial reppoints is - # from regular grid placed on a pre-defined bbox. - if self.use_grid_points or not self.center_init: - scale = self.point_base_scale / 2 - points_init = dcn_base_offset / dcn_base_offset.max() * scale - bbox_init = x.new_tensor([-scale, -scale, scale, - scale]).view(1, 4, 1, 1) - else: - points_init = 0 - cls_feat = x - pts_feat = x - for cls_conv in self.cls_convs: - cls_feat = cls_conv(cls_feat) - for reg_conv in self.reg_convs: - pts_feat = reg_conv(pts_feat) - # initialize reppoints - pts_out_init = self.reppoints_pts_init_out( - self.relu(self.reppoints_pts_init_conv(pts_feat))) - if self.use_grid_points: - pts_out_init, bbox_out_init = self.gen_grid_from_reg( - pts_out_init, bbox_init.detach()) - else: - pts_out_init = pts_out_init + points_init - # refine and classify reppoints - pts_out_init_grad_mul = (1 - self.gradient_mul) * pts_out_init.detach( - ) + self.gradient_mul * pts_out_init - dcn_offset = pts_out_init_grad_mul - dcn_base_offset - cls_out = self.reppoints_cls_out( - self.relu(self.reppoints_cls_conv(cls_feat, dcn_offset))) - pts_out_refine = self.reppoints_pts_refine_out( - self.relu(self.reppoints_pts_refine_conv(pts_feat, dcn_offset))) - if self.use_grid_points: - pts_out_refine, bbox_out_refine = self.gen_grid_from_reg( - pts_out_refine, bbox_out_init.detach()) - else: - pts_out_refine = pts_out_refine + pts_out_init.detach() - return cls_out, pts_out_init, pts_out_refine - - def forward(self, feats): - return multi_apply(self.forward_single, feats) - - def get_points(self, featmap_sizes, img_metas): - """Get points according to feature map sizes. - - Args: - featmap_sizes (list[tuple]): Multi-level feature map sizes. - img_metas (list[dict]): Image meta info. - - Returns: - tuple: points of each image, valid flags of each image - """ - num_imgs = len(img_metas) - num_levels = len(featmap_sizes) - - # since feature map sizes of all images are the same, we only compute - # points center for one time - multi_level_points = [] - for i in range(num_levels): - points = self.point_generators[i].grid_points( - featmap_sizes[i], self.point_strides[i]) - multi_level_points.append(points) - points_list = [[point.clone() for point in multi_level_points] - for _ in range(num_imgs)] - - # for each image, we compute valid flags of multi level grids - valid_flag_list = [] - for img_id, img_meta in enumerate(img_metas): - multi_level_flags = [] - for i in range(num_levels): - point_stride = self.point_strides[i] - feat_h, feat_w = featmap_sizes[i] - h, w, _ = img_meta['pad_shape'] - valid_feat_h = min(int(np.ceil(h / point_stride)), feat_h) - valid_feat_w = min(int(np.ceil(w / point_stride)), feat_w) - flags = self.point_generators[i].valid_flags( - (feat_h, feat_w), (valid_feat_h, valid_feat_w)) - multi_level_flags.append(flags) - valid_flag_list.append(multi_level_flags) - - return points_list, valid_flag_list - - def centers_to_bboxes(self, point_list): - """Get bboxes according to center points. Only used in MaxIOUAssigner. - """ - bbox_list = [] - for i_img, point in enumerate(point_list): - bbox = [] - for i_lvl in range(len(self.point_strides)): - scale = self.point_base_scale * self.point_strides[i_lvl] * 0.5 - bbox_shift = torch.Tensor([-scale, -scale, scale, - scale]).view(1, 4).type_as(point[0]) - bbox_center = torch.cat( - [point[i_lvl][:, :2], point[i_lvl][:, :2]], dim=1) - bbox.append(bbox_center + bbox_shift) - bbox_list.append(bbox) - return bbox_list - - def offset_to_pts(self, center_list, pred_list): - """Change from point offset to point coordinate. - """ - pts_list = [] - for i_lvl in range(len(self.point_strides)): - pts_lvl = [] - for i_img in range(len(center_list)): - pts_center = center_list[i_img][i_lvl][:, :2].repeat( - 1, self.num_points) - pts_shift = pred_list[i_lvl][i_img] - yx_pts_shift = pts_shift.permute(1, 2, 0).view( - -1, 2 * self.num_points) - y_pts_shift = yx_pts_shift[..., 0::2] - x_pts_shift = yx_pts_shift[..., 1::2] - xy_pts_shift = torch.stack([x_pts_shift, y_pts_shift], -1) - xy_pts_shift = xy_pts_shift.view(*yx_pts_shift.shape[:-1], -1) - pts = xy_pts_shift * self.point_strides[i_lvl] + pts_center - pts_lvl.append(pts) - pts_lvl = torch.stack(pts_lvl, 0) - pts_list.append(pts_lvl) - return pts_list - - def loss_single(self, cls_score, pts_pred_init, pts_pred_refine, labels, - label_weights, bbox_gt_init, bbox_weights_init, - bbox_gt_refine, bbox_weights_refine, stride, - num_total_samples_init, num_total_samples_refine): - # classification loss - labels = labels.reshape(-1) - label_weights = label_weights.reshape(-1) - cls_score = cls_score.permute(0, 2, 3, - 1).reshape(-1, self.cls_out_channels) - loss_cls = self.loss_cls( - cls_score, - labels, - label_weights, - avg_factor=num_total_samples_refine) - - # points loss - bbox_gt_init = bbox_gt_init.reshape(-1, 4) - bbox_weights_init = bbox_weights_init.reshape(-1, 4) - bbox_pred_init = self.points2bbox( - pts_pred_init.reshape(-1, 2 * self.num_points), y_first=False) - bbox_gt_refine = bbox_gt_refine.reshape(-1, 4) - bbox_weights_refine = bbox_weights_refine.reshape(-1, 4) - bbox_pred_refine = self.points2bbox( - pts_pred_refine.reshape(-1, 2 * self.num_points), y_first=False) - normalize_term = self.point_base_scale * stride - loss_pts_init = self.loss_bbox_init( - bbox_pred_init / normalize_term, - bbox_gt_init / normalize_term, - bbox_weights_init, - avg_factor=num_total_samples_init) - loss_pts_refine = self.loss_bbox_refine( - bbox_pred_refine / normalize_term, - bbox_gt_refine / normalize_term, - bbox_weights_refine, - avg_factor=num_total_samples_refine) - return loss_cls, loss_pts_init, loss_pts_refine - - def loss(self, - cls_scores, - pts_preds_init, - pts_preds_refine, - gt_bboxes, - gt_labels, - img_metas, - cfg, - gt_bboxes_ignore=None): - featmap_sizes = [featmap.size()[-2:] for featmap in cls_scores] - assert len(featmap_sizes) == len(self.point_generators) - label_channels = self.cls_out_channels if self.use_sigmoid_cls else 1 - - # target for initial stage - center_list, valid_flag_list = self.get_points(featmap_sizes, - img_metas) - pts_coordinate_preds_init = self.offset_to_pts(center_list, - pts_preds_init) - if cfg.init.assigner['type'] == 'PointAssigner': - # Assign target for center list - candidate_list = center_list - else: - # transform center list to bbox list and - # assign target for bbox list - bbox_list = self.centers_to_bboxes(center_list) - candidate_list = bbox_list - cls_reg_targets_init = point_target( - candidate_list, - valid_flag_list, - gt_bboxes, - img_metas, - cfg.init, - gt_bboxes_ignore_list=gt_bboxes_ignore, - gt_labels_list=gt_labels, - label_channels=label_channels, - sampling=self.sampling) - (*_, bbox_gt_list_init, candidate_list_init, bbox_weights_list_init, - num_total_pos_init, num_total_neg_init) = cls_reg_targets_init - num_total_samples_init = ( - num_total_pos_init + - num_total_neg_init if self.sampling else num_total_pos_init) - - # target for refinement stage - center_list, valid_flag_list = self.get_points(featmap_sizes, - img_metas) - pts_coordinate_preds_refine = self.offset_to_pts( - center_list, pts_preds_refine) - bbox_list = [] - for i_img, center in enumerate(center_list): - bbox = [] - for i_lvl in range(len(pts_preds_refine)): - bbox_preds_init = self.points2bbox( - pts_preds_init[i_lvl].detach()) - bbox_shift = bbox_preds_init * self.point_strides[i_lvl] - bbox_center = torch.cat( - [center[i_lvl][:, :2], center[i_lvl][:, :2]], dim=1) - bbox.append(bbox_center + - bbox_shift[i_img].permute(1, 2, 0).reshape(-1, 4)) - bbox_list.append(bbox) - cls_reg_targets_refine = point_target( - bbox_list, - valid_flag_list, - gt_bboxes, - img_metas, - cfg.refine, - gt_bboxes_ignore_list=gt_bboxes_ignore, - gt_labels_list=gt_labels, - label_channels=label_channels, - sampling=self.sampling) - (labels_list, label_weights_list, bbox_gt_list_refine, - candidate_list_refine, bbox_weights_list_refine, num_total_pos_refine, - num_total_neg_refine) = cls_reg_targets_refine - num_total_samples_refine = ( - num_total_pos_refine + - num_total_neg_refine if self.sampling else num_total_pos_refine) - - # compute loss - losses_cls, losses_pts_init, losses_pts_refine = multi_apply( - self.loss_single, - cls_scores, - pts_coordinate_preds_init, - pts_coordinate_preds_refine, - labels_list, - label_weights_list, - bbox_gt_list_init, - bbox_weights_list_init, - bbox_gt_list_refine, - bbox_weights_list_refine, - self.point_strides, - num_total_samples_init=num_total_samples_init, - num_total_samples_refine=num_total_samples_refine) - loss_dict_all = { - 'loss_cls': losses_cls, - 'loss_pts_init': losses_pts_init, - 'loss_pts_refine': losses_pts_refine - } - return loss_dict_all - - def get_bboxes(self, - cls_scores, - pts_preds_init, - pts_preds_refine, - img_metas, - cfg, - rescale=False, - nms=True): - assert len(cls_scores) == len(pts_preds_refine) - bbox_preds_refine = [ - self.points2bbox(pts_pred_refine) - for pts_pred_refine in pts_preds_refine - ] - num_levels = len(cls_scores) - mlvl_points = [ - self.point_generators[i].grid_points(cls_scores[i].size()[-2:], - self.point_strides[i]) - for i in range(num_levels) - ] - result_list = [] - for img_id in range(len(img_metas)): - cls_score_list = [ - cls_scores[i][img_id].detach() for i in range(num_levels) - ] - bbox_pred_list = [ - bbox_preds_refine[i][img_id].detach() - for i in range(num_levels) - ] - img_shape = img_metas[img_id]['img_shape'] - scale_factor = img_metas[img_id]['scale_factor'] - proposals = self.get_bboxes_single(cls_score_list, bbox_pred_list, - mlvl_points, img_shape, - scale_factor, cfg, rescale, nms) - result_list.append(proposals) - return result_list - - def get_bboxes_single(self, - cls_scores, - bbox_preds, - mlvl_points, - img_shape, - scale_factor, - cfg, - rescale=False, - nms=True): - assert len(cls_scores) == len(bbox_preds) == len(mlvl_points) - mlvl_bboxes = [] - mlvl_scores = [] - for i_lvl, (cls_score, bbox_pred, points) in enumerate( - zip(cls_scores, bbox_preds, mlvl_points)): - assert cls_score.size()[-2:] == bbox_pred.size()[-2:] - cls_score = cls_score.permute(1, 2, - 0).reshape(-1, self.cls_out_channels) - if self.use_sigmoid_cls: - scores = cls_score.sigmoid() - else: - scores = cls_score.softmax(-1) - bbox_pred = bbox_pred.permute(1, 2, 0).reshape(-1, 4) - nms_pre = cfg.get('nms_pre', -1) - if nms_pre > 0 and scores.shape[0] > nms_pre: - if self.use_sigmoid_cls: - max_scores, _ = scores.max(dim=1) - else: - max_scores, _ = scores[:, 1:].max(dim=1) - _, topk_inds = max_scores.topk(nms_pre) - points = points[topk_inds, :] - bbox_pred = bbox_pred[topk_inds, :] - scores = scores[topk_inds, :] - bbox_pos_center = torch.cat([points[:, :2], points[:, :2]], dim=1) - bboxes = bbox_pred * self.point_strides[i_lvl] + bbox_pos_center - x1 = bboxes[:, 0].clamp(min=0, max=img_shape[1]) - y1 = bboxes[:, 1].clamp(min=0, max=img_shape[0]) - x2 = bboxes[:, 2].clamp(min=0, max=img_shape[1]) - y2 = bboxes[:, 3].clamp(min=0, max=img_shape[0]) - bboxes = torch.stack([x1, y1, x2, y2], dim=-1) - mlvl_bboxes.append(bboxes) - mlvl_scores.append(scores) - mlvl_bboxes = torch.cat(mlvl_bboxes) - if rescale: - mlvl_bboxes /= mlvl_bboxes.new_tensor(scale_factor) - mlvl_scores = torch.cat(mlvl_scores) - if self.use_sigmoid_cls: - padding = mlvl_scores.new_zeros(mlvl_scores.shape[0], 1) - mlvl_scores = torch.cat([padding, mlvl_scores], dim=1) - if nms: - det_bboxes, det_labels = multiclass_nms(mlvl_bboxes, mlvl_scores, - cfg.score_thr, cfg.nms, - cfg.max_per_img) - return det_bboxes, det_labels - else: - return mlvl_bboxes, mlvl_scores diff --git a/mmdet/models/detectors/__init__.py b/mmdet/models/detectors/__init__.py index 189c823bdec5f0cbf0d3e157702c53c5ed75c934..d613a3bf7bdb3e95fa165a37c3036ecad36b4b34 100644 --- a/mmdet/models/detectors/__init__.py +++ b/mmdet/models/detectors/__init__.py @@ -8,7 +8,6 @@ from .grid_rcnn import GridRCNN from .htc import HybridTaskCascade from .mask_rcnn import MaskRCNN from .mask_scoring_rcnn import MaskScoringRCNN -from .reppoints_detector import RepPointsDetector from .retinanet import RetinaNet from .rpn import RPN from .single_stage import SingleStageDetector @@ -17,6 +16,5 @@ from .two_stage import TwoStageDetector __all__ = [ 'BaseDetector', 'SingleStageDetector', 'TwoStageDetector', 'RPN', 'FastRCNN', 'FasterRCNN', 'MaskRCNN', 'CascadeRCNN', 'HybridTaskCascade', - 'DoubleHeadRCNN', 'RetinaNet', 'FCOS', 'GridRCNN', 'MaskScoringRCNN', - 'RepPointsDetector' + 'DoubleHeadRCNN', 'RetinaNet', 'FCOS', 'GridRCNN', 'MaskScoringRCNN' ] diff --git a/mmdet/models/detectors/reppoints_detector.py b/mmdet/models/detectors/reppoints_detector.py deleted file mode 100644 index 53d698f1f69a9aeae3cb139820efc3e5c033142a..0000000000000000000000000000000000000000 --- a/mmdet/models/detectors/reppoints_detector.py +++ /dev/null @@ -1,81 +0,0 @@ -import torch - -from mmdet.core import bbox2result, bbox_mapping_back, multiclass_nms -from ..registry import DETECTORS -from .single_stage import SingleStageDetector - - -@DETECTORS.register_module -class RepPointsDetector(SingleStageDetector): - """RepPoints: Point Set Representation for Object Detection. - - This detector is the implementation of: - - RepPoints detector (https://arxiv.org/pdf/1904.11490) - """ - - def __init__(self, - backbone, - neck, - bbox_head, - train_cfg=None, - test_cfg=None, - pretrained=None): - super(RepPointsDetector, - self).__init__(backbone, neck, bbox_head, train_cfg, test_cfg, - pretrained) - - def merge_aug_results(self, aug_bboxes, aug_scores, img_metas): - """Merge augmented detection bboxes and scores. - - Args: - aug_bboxes (list[Tensor]): shape (n, 4*#class) - aug_scores (list[Tensor] or None): shape (n, #class) - img_shapes (list[Tensor]): shape (3, ). - - Returns: - tuple: (bboxes, scores) - """ - recovered_bboxes = [] - for bboxes, img_info in zip(aug_bboxes, img_metas): - img_shape = img_info[0]['img_shape'] - scale_factor = img_info[0]['scale_factor'] - flip = img_info[0]['flip'] - bboxes = bbox_mapping_back(bboxes, img_shape, scale_factor, flip) - recovered_bboxes.append(bboxes) - bboxes = torch.cat(recovered_bboxes, dim=0) - if aug_scores is None: - return bboxes - else: - scores = torch.cat(aug_scores, dim=0) - return bboxes, scores - - def aug_test(self, imgs, img_metas, rescale=False): - # recompute feats to save memory - feats = self.extract_feats(imgs) - - aug_bboxes = [] - aug_scores = [] - for x, img_meta in zip(feats, img_metas): - # only one image in the batch - outs = self.bbox_head(x) - bbox_inputs = outs + (img_meta, self.test_cfg, False, False) - det_bboxes, det_scores = self.bbox_head.get_bboxes(*bbox_inputs)[0] - aug_bboxes.append(det_bboxes) - aug_scores.append(det_scores) - - # after merging, bboxes will be rescaled to the original image size - merged_bboxes, merged_scores = self.merge_aug_results( - aug_bboxes, aug_scores, img_metas) - det_bboxes, det_labels = multiclass_nms(merged_bboxes, merged_scores, - self.test_cfg.score_thr, - self.test_cfg.nms, - self.test_cfg.max_per_img) - - if rescale: - _det_bboxes = det_bboxes - else: - _det_bboxes = det_bboxes.clone() - _det_bboxes[:, :4] *= img_metas[0][0]['scale_factor'] - bbox_results = bbox2result(_det_bboxes, det_labels, - self.bbox_head.num_classes) - return bbox_results