MultiScaleDeformableAttention¶

class mmcv.ops.MultiScaleDeformableAttention(embed_dims: int = 256, num_heads: int = 8, num_levels: int = 4, num_points: int = 4, im2col_step: int = 64, dropout: float = 0.1, batch_first: bool = False, norm_cfg: Optional[dict] = None, init_cfg: Optional[mmengine.config.config.ConfigDict] = None, value_proj_ratio: float = 1.0)[source]¶

An attention module used in Deformable-Detr.

Deformable DETR: Deformable Transformers for End-to-End Object Detection..

Parameters

embed_dims (int) – The embedding dimension of Attention. Default: 256.
num_heads (int) – Parallel attention heads. Default: 8.
num_levels (int) – The number of feature map used in Attention. Default: 4.
num_points (int) – The number of sampling points for each query in each head. Default: 4.
im2col_step (int) – The step used in image_to_column. Default: 64.
dropout (float) – A Dropout layer on inp_identity. Default: 0.1.
batch_first (bool) – Key, Query and Value are shape of (batch, n, embed_dim) or (n, batch, embed_dim). Default to False.
norm_cfg (dict) – Config dict for normalization layer. Default: None.
(obj (init_cfg) – mmcv.ConfigDict): The Config for initialization. Default: None.
value_proj_ratio (float) – The expansion ratio of value_proj. Default: 1.0.

forward(query: torch.Tensor, key: Optional[torch.Tensor] = None, value: Optional[torch.Tensor] = None, identity: Optional[torch.Tensor] = None, query_pos: Optional[torch.Tensor] = None, key_padding_mask: Optional[torch.Tensor] = None, reference_points: Optional[torch.Tensor] = None, spatial_shapes: Optional[torch.Tensor] = None, level_start_index: Optional[torch.Tensor] = None, **kwargs) → torch.Tensor [source]¶

Forward Function of MultiScaleDeformAttention.

Parameters

query (torch.Tensor) – Query of Transformer with shape (num_query, bs, embed_dims).
key (torch.Tensor) – The key tensor with shape (num_key, bs, embed_dims).
value (torch.Tensor) – The value tensor with shape (num_key, bs, embed_dims).
identity (torch.Tensor) – The tensor used for addition, with the same shape as query. Default None. If None, query will be used.
query_pos (torch.Tensor) – The positional encoding for query. Default: None.
key_padding_mask (torch.Tensor) – ByteTensor for query, with shape [bs, num_key].
reference_points (torch.Tensor) – The normalized reference points with shape (bs, num_query, num_levels, 2), all elements is range in [0, 1], top-left (0,0), bottom-right (1, 1), including padding area. or (N, Length_{query}, num_levels, 4), add additional two dimensions is (w, h) to form reference boxes.
spatial_shapes (torch.Tensor) – Spatial shape of features in different levels. With shape (num_levels, 2), last dimension represents (h, w).
level_start_index (torch.Tensor) – The start index of each level. A tensor has shape (num_levels, ) and can be represented as [0, h_0*w_0, h_0*w_0+h_1*w_1, …].

Returns

forwarded results with shape [num_query, bs, embed_dims].

Return type

torch.Tensor

init_weights() → None [source]¶: Default initialization for Parameters of Module.