RoIAlign¶

class mmcv.ops.RoIAlign(output_size: tuple, spatial_scale: float = 1.0, sampling_ratio: int = 0, pool_mode: str = 'avg', aligned: bool = True, use_torchvision: bool = False)[source]¶

RoI align pooling layer.

Parameters

output_size (tuple) – h, w
spatial_scale (float) – scale the input boxes by this number
sampling_ratio (int) – number of inputs samples to take for each output sample. 0 to take samples densely for current models.
pool_mode (str, 'avg' or 'max') – pooling mode in each bin.
aligned (bool) – if False, use the legacy implementation in MMDetection. If True, align the results more perfectly.
use_torchvision (bool) – whether to use roi_align from torchvision.

Note

The implementation of RoIAlign when aligned=True is modified from https://github.com/facebookresearch/detectron2/

The meaning of aligned=True:

Given a continuous coordinate c, its two neighboring pixel indices (in our pixel model) are computed by floor(c - 0.5) and ceil(c - 0.5). For example, c=1.3 has pixel neighbors with discrete indices [0] and [1] (which are sampled from the underlying signal at continuous coordinates 0.5 and 1.5). But the original roi_align (aligned=False) does not subtract the 0.5 when computing neighboring pixel indices and therefore it uses pixels with a slightly incorrect alignment (relative to our pixel model) when performing bilinear interpolation.

With aligned=True, we first appropriately scale the ROI and then shift it by -0.5 prior to calling roi_align. This produces the correct neighbors;

The difference does not make a difference to the model’s performance if ROIAlign is used together with conv layers.

forward(input: torch.Tensor, rois: torch.Tensor) → torch.Tensor [source]¶

Parameters

input – NCHW images
rois – Bx5 boxes. First column is the index into N. The other 4 columns are xyxy.