API Reference

fileio

class mmcv.fileio.BaseStorageBackend[source]

Abstract class of storage backends.

All backends need to implement two apis: get() and get_text(). get() reads the file as a byte stream and get_text() reads the file as texts.

class mmcv.fileio.FileClient(backend='disk', **kwargs)[source]

A general file client to access files in different backend.

The client loads a file or text in a specified backend from its path and return it as a binary file. it can also register other backend accessor with a given name and backend class.

backend

The storage backend type. Options are “disk”, “ceph”, “memcached”, “lmdb” and “http”.

Type:str
client

The backend object.

Type:BaseStorageBackend
classmethod register_backend(name, backend=None, force=False)[source]

Register a backend to FileClient.

This method can be used as a normal class method or a decorator.

class NewBackend(BaseStorageBackend):

    def get(self, filepath):
        return filepath

    def get_text(self, filepath):
        return filepath

FileClient.register_backend('new', NewBackend)

or

@FileClient.register_backend('new')
class NewBackend(BaseStorageBackend):

    def get(self, filepath):
        return filepath

    def get_text(self, filepath):
        return filepath
Parameters:
  • name (str) – The name of the registered backend.
  • backend (class, optional) – The backend class to be registered, which must be a subclass of BaseStorageBackend. When this method is used as a decorator, backend is None. Defaults to None.
  • force (bool, optional) – Whether to override the backend if the name has already been registered. Defaults to False.
mmcv.fileio.load(file, file_format=None, **kwargs)[source]

Load data from json/yaml/pickle files.

This method provides a unified api for loading data from serialized files.

Parameters:
  • file (str or Path or file-like object) – Filename or a file-like object.
  • file_format (str, optional) – If not specified, the file format will be inferred from the file extension, otherwise use the specified one. Currently supported formats include “json”, “yaml/yml” and “pickle/pkl”.
Returns:

The content from the file.

mmcv.fileio.dump(obj, file=None, file_format=None, **kwargs)[source]

Dump data to json/yaml/pickle strings or files.

This method provides a unified api for dumping data as strings or to files, and also supports custom arguments for each file format.

Parameters:
  • obj (any) – The python object to be dumped.
  • file (str or Path or file-like object, optional) – If not specified, then the object is dump to a str, otherwise to a file specified by the filename or file-like object.
  • file_format (str, optional) – Same as load().
Returns:

True for success, False otherwise.

Return type:

bool

mmcv.fileio.list_from_file(filename, prefix='', offset=0, max_num=0, encoding='utf-8')[source]

Load a text file and parse the content as a list of strings.

Parameters:
  • filename (str) – Filename.
  • prefix (str) – The prefix to be inserted to the begining of each item.
  • offset (int) – The offset of lines.
  • max_num (int) – The maximum number of lines to be read, zeros and negatives mean no limitation.
  • encoding (str) – Encoding used to open the file. Default utf-8.
Returns:

A list of strings.

Return type:

list[str]

mmcv.fileio.dict_from_file(filename, key_type=<class 'str'>)[source]

Load a text file and parse the content as a dict.

Each line of the text file will be two or more columns split by whitespaces or tabs. The first column will be parsed as dict keys, and the following columns will be parsed as dict values.

Parameters:
  • filename (str) – Filename.
  • key_type (type) – Type of the dict keys. str is user by default and type conversion will be performed if specified.
Returns:

The parsed contents.

Return type:

dict

image

mmcv.image.bgr2gray(img, keepdim=False)[source]

Convert a BGR image to grayscale image.

Parameters:
  • img (ndarray) – The input image.
  • keepdim (bool) – If False (by default), then return the grayscale image with 2 dims, otherwise 3 dims.
Returns:

The converted grayscale image.

Return type:

ndarray

mmcv.image.bgr2hls(img)
Convert a BGR image to HLS
image.
Parameters:img (ndarray or str) – The input image.
Returns:The converted HLS image.
Return type:ndarray
mmcv.image.bgr2hsv(img)
Convert a BGR image to HSV
image.
Parameters:img (ndarray or str) – The input image.
Returns:The converted HSV image.
Return type:ndarray
mmcv.image.bgr2rgb(img)
Convert a BGR image to RGB
image.
Parameters:img (ndarray or str) – The input image.
Returns:The converted RGB image.
Return type:ndarray
mmcv.image.gray2bgr(img)[source]

Convert a grayscale image to BGR image.

Parameters:img (ndarray) – The input image.
Returns:The converted BGR image.
Return type:ndarray
mmcv.image.gray2rgb(img)[source]

Convert a grayscale image to RGB image.

Parameters:img (ndarray) – The input image.
Returns:The converted RGB image.
Return type:ndarray
mmcv.image.hls2bgr(img)
Convert a HLS image to BGR
image.
Parameters:img (ndarray or str) – The input image.
Returns:The converted BGR image.
Return type:ndarray
mmcv.image.hsv2bgr(img)
Convert a HSV image to BGR
image.
Parameters:img (ndarray or str) – The input image.
Returns:The converted BGR image.
Return type:ndarray
mmcv.image.imconvert(img, src, dst)[source]

Convert an image from the src colorspace to dst colorspace.

Parameters:
  • img (ndarray) – The input image.
  • src (str) – The source colorspace, e.g., ‘rgb’, ‘hsv’.
  • dst (str) – The destination colorspace, e.g., ‘rgb’, ‘hsv’.
Returns:

The converted image.

Return type:

ndarray

mmcv.image.rgb2bgr(img)
Convert a RGB image to BGR
image.
Parameters:img (ndarray or str) – The input image.
Returns:The converted BGR image.
Return type:ndarray
mmcv.image.rgb2gray(img, keepdim=False)[source]

Convert a RGB image to grayscale image.

Parameters:
  • img (ndarray) – The input image.
  • keepdim (bool) – If False (by default), then return the grayscale image with 2 dims, otherwise 3 dims.
Returns:

The converted grayscale image.

Return type:

ndarray

mmcv.image.imrescale(img, scale, return_scale=False, interpolation='bilinear', backend=None)[source]

Resize image while keeping the aspect ratio.

Parameters:
  • img (ndarray) – The input image.
  • scale (float | tuple[int]) – The scaling factor or maximum size. If it is a float number, then the image will be rescaled by this factor, else if it is a tuple of 2 integers, then the image will be rescaled as large as possible within the scale.
  • return_scale (bool) – Whether to return the scaling factor besides the rescaled image.
  • interpolation (str) – Same as resize().
  • backend (str | None) – Same as resize().
Returns:

The rescaled image.

Return type:

ndarray

mmcv.image.imresize(img, size, return_scale=False, interpolation='bilinear', out=None, backend=None)[source]

Resize image to a given size.

Parameters:
  • img (ndarray) – The input image.
  • size (tuple[int]) – Target size (w, h).
  • return_scale (bool) – Whether to return w_scale and h_scale.
  • interpolation (str) – Interpolation method, accepted values are “nearest”, “bilinear”, “bicubic”, “area”, “lanczos” for ‘cv2’ backend, “nearest”, “bilinear” for ‘pillow’ backend.
  • out (ndarray) – The output destination.
  • backend (str | None) – The image resize backend type. Options are cv2, pillow, None. If backend is None, the global imread_backend specified by mmcv.use_backend() will be used. Default: None.
Returns:

(resized_img, w_scale, h_scale) or

resized_img.

Return type:

tuple | ndarray

mmcv.image.imresize_like(img, dst_img, return_scale=False, interpolation='bilinear', backend=None)[source]

Resize image to the same size of a given image.

Parameters:
  • img (ndarray) – The input image.
  • dst_img (ndarray) – The target image.
  • return_scale (bool) – Whether to return w_scale and h_scale.
  • interpolation (str) – Same as resize().
  • backend (str | None) – Same as resize().
Returns:

(resized_img, w_scale, h_scale) or

resized_img.

Return type:

tuple or ndarray

mmcv.image.imresize_to_multiple(img, divisor, size=None, scale_factor=None, keep_ratio=False, return_scale=False, interpolation='bilinear', out=None, backend=None)[source]

Resize image according to a given size or scale factor and then rounds up the the resized or rescaled image size to the nearest value that can be divided by the divisor.

Parameters:
  • img (ndarray) – The input image.
  • divisor (int | tuple) – Resized image size will be a multiple of divisor. If divisor is a tuple, divisor should be (w_divisor, h_divisor).
  • size (None | int | tuple[int]) – Target size (w, h). Default: None.
  • scale_factor (None | float | tuple[float]) – Multiplier for spatial size. Should match input size if it is a tuple and the 2D style is (w_scale_factor, h_scale_factor). Default: None.
  • keep_ratio (bool) – Whether to keep the aspect ratio when resizing the image. Default: False.
  • return_scale (bool) – Whether to return w_scale and h_scale.
  • interpolation (str) – Interpolation method, accepted values are “nearest”, “bilinear”, “bicubic”, “area”, “lanczos” for ‘cv2’ backend, “nearest”, “bilinear” for ‘pillow’ backend.
  • out (ndarray) – The output destination.
  • backend (str | None) – The image resize backend type. Options are cv2, pillow, None. If backend is None, the global imread_backend specified by mmcv.use_backend() will be used. Default: None.
Returns:

(resized_img, w_scale, h_scale) or

resized_img.

Return type:

tuple | ndarray

mmcv.image.rescale_size(old_size, scale, return_scale=False)[source]

Calculate the new size to be rescaled to.

Parameters:
  • old_size (tuple[int]) – The old size (w, h) of image.
  • scale (float | tuple[int]) – The scaling factor or maximum size. If it is a float number, then the image will be rescaled by this factor, else if it is a tuple of 2 integers, then the image will be rescaled as large as possible within the scale.
  • return_scale (bool) – Whether to return the scaling factor besides the rescaled image size.
Returns:

The new rescaled image size.

Return type:

tuple[int]

mmcv.image.imcrop(img, bboxes, scale=1.0, pad_fill=None)[source]

Crop image patches.

3 steps: scale the bboxes -> clip bboxes -> crop and pad.

Parameters:
  • img (ndarray) – Image to be cropped.
  • bboxes (ndarray) – Shape (k, 4) or (4, ), location of cropped bboxes.
  • scale (float, optional) – Scale ratio of bboxes, the default value 1.0 means no padding.
  • pad_fill (Number | list[Number]) – Value to be filled for padding. Default: None, which means no padding.
Returns:

The cropped image patches.

Return type:

list[ndarray] | ndarray

mmcv.image.imflip(img, direction='horizontal')[source]

Flip an image horizontally or vertically.

Parameters:
  • img (ndarray) – Image to be flipped.
  • direction (str) – The flip direction, either “horizontal” or “vertical” or “diagonal”.
Returns:

The flipped image.

Return type:

ndarray

mmcv.image.imflip_(img, direction='horizontal')[source]

Inplace flip an image horizontally or vertically.

Parameters:
  • img (ndarray) – Image to be flipped.
  • direction (str) – The flip direction, either “horizontal” or “vertical” or “diagonal”.
Returns:

The flipped image (inplace).

Return type:

ndarray

mmcv.image.impad(img, *, shape=None, padding=None, pad_val=0, padding_mode='constant')[source]

Pad the given image to a certain shape or pad on all sides with specified padding mode and padding value.

Parameters:
  • img (ndarray) – Image to be padded.
  • shape (tuple[int]) – Expected padding shape (h, w). Default: None.
  • padding (int or tuple[int]) – Padding on each border. If a single int is provided this is used to pad all borders. If tuple of length 2 is provided this is the padding on left/right and top/bottom respectively. If a tuple of length 4 is provided this is the padding for the left, top, right and bottom borders respectively. Default: None. Note that shape and padding can not be both set.
  • pad_val (Number | Sequence[Number]) – Values to be filled in padding areas when padding_mode is ‘constant’. Default: 0.
  • padding_mode (str) –

    Type of padding. Should be: constant, edge, reflect or symmetric. Default: constant.

    • constant: pads with a constant value, this value is specified
      with pad_val.
    • edge: pads with the last value at the edge of the image.
    • reflect: pads with reflection of image without repeating the
      last value on the edge. For example, padding [1, 2, 3, 4] with 2 elements on both sides in reflect mode will result in [3, 2, 1, 2, 3, 4, 3, 2].
    • symmetric: pads with reflection of image repeating the last
      value on the edge. For example, padding [1, 2, 3, 4] with 2 elements on both sides in symmetric mode will result in [2, 1, 1, 2, 3, 4, 4, 3]
Returns:

The padded image.

Return type:

ndarray

mmcv.image.impad_to_multiple(img, divisor, pad_val=0)[source]

Pad an image to ensure each edge to be multiple to some number.

Parameters:
  • img (ndarray) – Image to be padded.
  • divisor (int) – Padded image edges will be multiple to divisor.
  • pad_val (Number | Sequence[Number]) – Same as impad().
Returns:

The padded image.

Return type:

ndarray

mmcv.image.imrotate(img, angle, center=None, scale=1.0, border_value=0, interpolation='bilinear', auto_bound=False)[source]

Rotate an image.

Parameters:
  • img (ndarray) – Image to be rotated.
  • angle (float) – Rotation angle in degrees, positive values mean clockwise rotation.
  • center (tuple[float], optional) – Center point (w, h) of the rotation in the source image. If not specified, the center of the image will be used.
  • scale (float) – Isotropic scale factor.
  • border_value (int) – Border value.
  • interpolation (str) – Same as resize().
  • auto_bound (bool) – Whether to adjust the image size to cover the whole rotated image.
Returns:

The rotated image.

Return type:

ndarray

mmcv.image.imfrombytes(content, flag='color', channel_order='bgr', backend=None)[source]

Read an image from bytes.

Parameters:
  • content (bytes) – Image bytes got from files or other streams.
  • flag (str) – Same as imread().
  • backend (str | None) – The image decoding backend type. Options are cv2, pillow, turbojpeg, None. If backend is None, the global imread_backend specified by mmcv.use_backend() will be used. Default: None.
Returns:

Loaded image array.

Return type:

ndarray

mmcv.image.imread(img_or_path, flag='color', channel_order='bgr', backend=None)[source]

Read an image.

Parameters:
  • img_or_path (ndarray or str or Path) – Either a numpy array or str or pathlib.Path. If it is a numpy array (loaded image), then it will be returned as is.
  • flag (str) – Flags specifying the color type of a loaded image, candidates are color, grayscale, unchanged, color_ignore_orientation and grayscale_ignore_orientation. By default, cv2 and pillow backend would rotate the image according to its EXIF info unless called with unchanged or *_ignore_orientation flags. turbojpeg and tifffile backend always ignore image’s EXIF info regardless of the flag. The turbojpeg backend only supports color and grayscale.
  • channel_order (str) – Order of channel, candidates are bgr and rgb.
  • backend (str | None) – The image decoding backend type. Options are cv2, pillow, turbojpeg, tifffile, None. If backend is None, the global imread_backend specified by mmcv.use_backend() will be used. Default: None.
Returns:

Loaded image array.

Return type:

ndarray

mmcv.image.imwrite(img, file_path, params=None, auto_mkdir=True)[source]

Write image to file.

Parameters:
  • img (ndarray) – Image array to be written.
  • file_path (str) – Image file path.
  • params (None or list) – Same as opencv imwrite() interface.
  • auto_mkdir (bool) – If the parent folder of file_path does not exist, whether to create it automatically.
Returns:

Successful or not.

Return type:

bool

mmcv.image.use_backend(backend)[source]

Select a backend for image decoding.

Parameters:
  • backend (str) – The image decoding backend type. Options are cv2,
  • turbojpeg (see https (pillow,) – //github.com/lilohuang/PyTurboJPEG)
  • tifffile. turbojpeg is faster but it only supports .jpeg (and) –
  • format. (file) –
mmcv.image.imnormalize(img, mean, std, to_rgb=True)[source]

Normalize an image with mean and std.

Parameters:
  • img (ndarray) – Image to be normalized.
  • mean (ndarray) – The mean to be used for normalize.
  • std (ndarray) – The std to be used for normalize.
  • to_rgb (bool) – Whether to convert to rgb.
Returns:

The normalized image.

Return type:

ndarray

mmcv.image.imnormalize_(img, mean, std, to_rgb=True)[source]

Inplace normalize an image with mean and std.

Parameters:
  • img (ndarray) – Image to be normalized.
  • mean (ndarray) – The mean to be used for normalize.
  • std (ndarray) – The std to be used for normalize.
  • to_rgb (bool) – Whether to convert to rgb.
Returns:

The normalized image.

Return type:

ndarray

mmcv.image.iminvert(img)[source]

Invert (negate) an image.

Parameters:img (ndarray) – Image to be inverted.
Returns:The inverted image.
Return type:ndarray
mmcv.image.posterize(img, bits)[source]

Posterize an image (reduce the number of bits for each color channel)

Parameters:
  • img (ndarray) – Image to be posterized.
  • bits (int) – Number of bits (1 to 8) to use for posterizing.
Returns:

The posterized image.

Return type:

ndarray

mmcv.image.solarize(img, thr=128)[source]

Solarize an image (invert all pixel values above a threshold)

Parameters:
  • img (ndarray) – Image to be solarized.
  • thr (int) – Threshold for solarizing (0 - 255).
Returns:

The solarized image.

Return type:

ndarray

mmcv.image.rgb2ycbcr(img, y_only=False)[source]

Convert a RGB image to YCbCr image.

This function produces the same results as Matlab’s rgb2ycbcr function. It implements the ITU-R BT.601 conversion for standard-definition television. See more details in https://en.wikipedia.org/wiki/YCbCr#ITU-R_BT.601_conversion.

It differs from a similar function in cv2.cvtColor: RGB <-> YCrCb. In OpenCV, it implements a JPEG conversion. See more details in https://en.wikipedia.org/wiki/YCbCr#JPEG_conversion.

Parameters:
  • img (ndarray) – The input image. It accepts: 1. np.uint8 type with range [0, 255]; 2. np.float32 type with range [0, 1].
  • y_only (bool) – Whether to only return Y channel. Default: False.
Returns:

The converted YCbCr image. The output image has the same type

and range as input image.

Return type:

ndarray

mmcv.image.bgr2ycbcr(img, y_only=False)[source]

Convert a BGR image to YCbCr image.

The bgr version of rgb2ycbcr. It implements the ITU-R BT.601 conversion for standard-definition television. See more details in https://en.wikipedia.org/wiki/YCbCr#ITU-R_BT.601_conversion.

It differs from a similar function in cv2.cvtColor: BGR <-> YCrCb. In OpenCV, it implements a JPEG conversion. See more details in https://en.wikipedia.org/wiki/YCbCr#JPEG_conversion.

Parameters:
  • img (ndarray) – The input image. It accepts: 1. np.uint8 type with range [0, 255]; 2. np.float32 type with range [0, 1].
  • y_only (bool) – Whether to only return Y channel. Default: False.
Returns:

The converted YCbCr image. The output image has the same type

and range as input image.

Return type:

ndarray

mmcv.image.ycbcr2rgb(img)[source]

Convert a YCbCr image to RGB image.

This function produces the same results as Matlab’s ycbcr2rgb function. It implements the ITU-R BT.601 conversion for standard-definition television. See more details in https://en.wikipedia.org/wiki/YCbCr#ITU-R_BT.601_conversion.

It differs from a similar function in cv2.cvtColor: YCrCb <-> RGB. In OpenCV, it implements a JPEG conversion. See more details in https://en.wikipedia.org/wiki/YCbCr#JPEG_conversion.

Parameters:img (ndarray) – The input image. It accepts: 1. np.uint8 type with range [0, 255]; 2. np.float32 type with range [0, 1].
Returns:
The converted RGB image. The output image has the same type
and range as input image.
Return type:ndarray
mmcv.image.ycbcr2bgr(img)[source]

Convert a YCbCr image to BGR image.

The bgr version of ycbcr2rgb. It implements the ITU-R BT.601 conversion for standard-definition television. See more details in https://en.wikipedia.org/wiki/YCbCr#ITU-R_BT.601_conversion.

It differs from a similar function in cv2.cvtColor: YCrCb <-> BGR. In OpenCV, it implements a JPEG conversion. See more details in https://en.wikipedia.org/wiki/YCbCr#JPEG_conversion.

Parameters:img (ndarray) – The input image. It accepts: 1. np.uint8 type with range [0, 255]; 2. np.float32 type with range [0, 1].
Returns:
The converted BGR image. The output image has the same type
and range as input image.
Return type:ndarray
mmcv.image.tensor2imgs(tensor, mean=(0, 0, 0), std=(1, 1, 1), to_rgb=True)[source]

Convert tensor to 3-channel images.

Parameters:
  • tensor (torch.Tensor) – Tensor that contains multiple images, shape ( N, C, H, W).
  • mean (tuple[float], optional) – Mean of images. Defaults to (0, 0, 0).
  • std (tuple[float], optional) – Standard deviation of images. Defaults to (1, 1, 1).
  • to_rgb (bool, optional) – Whether the tensor was converted to RGB format in the first place. If so, convert it back to BGR. Defaults to True.
Returns:

A list that contains multiple images.

Return type:

list[np.ndarray]

mmcv.image.imshear(img, magnitude, direction='horizontal', border_value=0, interpolation='bilinear')[source]

Shear an image.

Parameters:
  • img (ndarray) – Image to be sheared with format (h, w) or (h, w, c).
  • magnitude (int | float) – The magnitude used for shear.
  • direction (str) – The flip direction, either “horizontal” or “vertical”.
  • border_value (int | tuple[int]) – Value used in case of a constant border.
  • interpolation (str) – Same as resize().
Returns:

The sheared image.

Return type:

ndarray

mmcv.image.imtranslate(img, offset, direction='horizontal', border_value=0, interpolation='bilinear')[source]

Translate an image.

Parameters:
  • img (ndarray) – Image to be translated with format (h, w) or (h, w, c).
  • offset (int | float) – The offset used for translate.
  • direction (str) – The translate direction, either “horizontal” or “vertical”.
  • border_value (int | tuple[int]) – Value used in case of a constant border.
  • interpolation (str) – Same as resize().
Returns:

The translated image.

Return type:

ndarray

mmcv.image.adjust_color(img, alpha=1, beta=None, gamma=0)[source]

It blends the source image and its gray image:

\[output = img * alpha + gray\_img * beta + gamma\]
Parameters:
  • img (ndarray) – The input source image.
  • alpha (int | float) – Weight for the source image. Default 1.
  • beta (int | float) – Weight for the converted gray image. If None, it’s assigned the value (1 - alpha).
  • gamma (int | float) – Scalar added to each sum. Same as cv2.addWeighted(). Default 0.
Returns:

Colored image which has the same size and dtype as input.

Return type:

ndarray

mmcv.image.imequalize(img)[source]

Equalize the image histogram.

This function applies a non-linear mapping to the input image, in order to create a uniform distribution of grayscale values in the output image.

Parameters:img (ndarray) – Image to be equalized.
Returns:The equalized image.
Return type:ndarray
mmcv.image.adjust_brightness(img, factor=1.0)[source]

Adjust image brightness.

This function controls the brightness of an image. An enhancement factor of 0.0 gives a black image. A factor of 1.0 gives the original image. This function blends the source image and the degenerated black image:

\[output = img * factor + degenerated * (1 - factor)\]
Parameters:
  • img (ndarray) – Image to be brightened.
  • factor (float) – A value controls the enhancement. Factor 1.0 returns the original image, lower factors mean less color (brightness, contrast, etc), and higher values more. Default 1.
Returns:

The brightened image.

Return type:

ndarray

mmcv.image.adjust_contrast(img, factor=1.0)[source]

Adjust image contrast.

This function controls the contrast of an image. An enhancement factor of 0.0 gives a solid grey image. A factor of 1.0 gives the original image. It blends the source image and the degenerated mean image:

\[output = img * factor + degenerated * (1 - factor)\]
Parameters:
  • img (ndarray) – Image to be contrasted. BGR order.
  • factor (float) – Same as mmcv.adjust_brightness().
Returns:

The contrasted image.

Return type:

ndarray

mmcv.image.lut_transform(img, lut_table)[source]

Transform array by look-up table.

The function lut_transform fills the output array with values from the look-up table. Indices of the entries are taken from the input array.

Parameters:
  • img (ndarray) – Image to be transformed.
  • lut_table (ndarray) – look-up table of 256 elements; in case of multi-channel input array, the table should either have a single channel (in this case the same table is used for all channels) or the same number of channels as in the input array.
Returns:

The transformed image.

Return type:

ndarray

mmcv.image.clahe(img, clip_limit=40.0, tile_grid_size=(8, 8))[source]

Use CLAHE method to process the image.

See ZUIDERVELD,K. Contrast Limited Adaptive Histogram Equalization[J]. Graphics Gems, 1994:474-485. for more information.

Parameters:
  • img (ndarray) – Image to be processed.
  • clip_limit (float) – Threshold for contrast limiting. Default: 40.0.
  • tile_grid_size (tuple[int]) – Size of grid for histogram equalization. Input image will be divided into equally sized rectangular tiles. It defines the number of tiles in row and column. Default: (8, 8).
Returns:

The processed image.

Return type:

ndarray

mmcv.image.adjust_sharpness(img, factor=1.0, kernel=None)[source]

Adjust image sharpness.

This function controls the sharpness of an image. An enhancement factor of 0.0 gives a blurred image. A factor of 1.0 gives the original image. And a factor of 2.0 gives a sharpened image. It blends the source image and the degenerated mean image:

\[\]

output = img * factor + degenerated * (1 - factor)

Parameters:
  • img (ndarray) – Image to be sharpened. BGR order.
  • factor (float) – Same as mmcv.adjust_brightness().
  • kernel (np.ndarray, optional) – Filter kernel to be applied on the img to obtain the degenerated img. Defaults to None.

Notes

No value sanity check is enforced on the kernel set by users. So with an inappropriate kernel, the adjust_sharpness may fail to perform the function its name indicates but end up performing whatever transform determined by the kernel.

Returns:The sharpened image.
Return type:ndarray
mmcv.image.auto_contrast(img, cutoff=0)[source]

Auto adjust image contrast.

This function maximize (normalize) image contrast by first removing cutoff percent of the lightest and darkest pixels from the histogram and remapping

the image so that the darkest pixel becomes black (0), and the lightest

becomes white (255).

Parameters:
  • img (ndarray) – Image to be contrasted. BGR order.
  • cutoff (int | float | tuple) – The cutoff percent of the lightest and darkest pixels to be removed. If given as tuple, it shall be (low, high). Otherwise, the single value will be used for both. Defaults to 0.
Returns:

The contrasted image.

Return type:

ndarray

mmcv.image.cutout(img, shape, pad_val=0)[source]

Randomly cut out a rectangle from the original img.

Parameters:
  • img (ndarray) – Image to be cutout.
  • shape (int | tuple[int]) – Expected cutout shape (h, w). If given as a int, the value will be used for both h and w.
  • pad_val (int | float | tuple[int | float]) – Values to be filled in the cut area. Defaults to 0.
Returns:

The cutout image.

Return type:

ndarray

mmcv.image.adjust_lighting(img, eigval, eigvec, alphastd=0.1, to_rgb=True)[source]

AlexNet-style PCA jitter.

This data augmentation is proposed in ImageNet Classification with Deep Convolutional Neural Networks.

Parameters:
  • img (ndarray) – Image to be adjusted lighting. BGR order.
  • eigval (ndarray) – the eigenvalue of the convariance matrix of pixel values, respectively.
  • eigvec (ndarray) – the eigenvector of the convariance matrix of pixel values, respectively.
  • alphastd (float) – The standard deviation for distribution of alpha. Defaults to 0.1
  • to_rgb (bool) – Whether to convert img to rgb.
Returns:

The adjusted image.

Return type:

ndarray

video

class mmcv.video.VideoReader(filename, cache_capacity=10)[source]

Video class with similar usage to a list object.

This video warpper class provides convenient apis to access frames. There exists an issue of OpenCV’s VideoCapture class that jumping to a certain frame may be inaccurate. It is fixed in this class by checking the position after jumping each time. Cache is used when decoding videos. So if the same frame is visited for the second time, there is no need to decode again if it is stored in the cache.

Example:
>>> import mmcv
>>> v = mmcv.VideoReader('sample.mp4')
>>> len(v)  # get the total frame number with `len()`
120
>>> for img in v:  # v is iterable
>>>     mmcv.imshow(img)
>>> v[5]  # get the 6th frame
current_frame()[source]

Get the current frame (frame that is just visited).

Returns:
If the video is fresh, return None, otherwise
return the frame.
Return type:ndarray or None
cvt2frames(frame_dir, file_start=0, filename_tmpl='{:06d}.jpg', start=0, max_num=0, show_progress=True)[source]

Convert a video to frame images.

Parameters:
  • frame_dir (str) – Output directory to store all the frame images.
  • file_start (int) – Filenames will start from the specified number.
  • filename_tmpl (str) – Filename template with the index as the placeholder.
  • start (int) – The starting frame index.
  • max_num (int) – Maximum number of frames to be written.
  • show_progress (bool) – Whether to show a progress bar.
fourcc

“Four character code” of the video.

Type:str
fps

FPS of the video.

Type:float
frame_cnt

Total frames of the video.

Type:int
get_frame(frame_id)[source]

Get frame by index.

Parameters:frame_id (int) – Index of the expected frame, 0-based.
Returns:Return the frame if successful, otherwise None.
Return type:ndarray or None
height

Height of video frames.

Type:int
opened

Indicate whether the video is opened.

Type:bool
position

Current cursor position, indicating frame decoded.

Type:int
read()[source]

Read the next frame.

If the next frame have been decoded before and in the cache, then return it directly, otherwise decode, cache and return it.

Returns:Return the frame if successful, otherwise None.
Return type:ndarray or None
resolution

Video resolution (width, height).

Type:tuple
vcap

The raw VideoCapture object.

Type:cv2.VideoCapture
width

Width of video frames.

Type:int
mmcv.video.frames2video(frame_dir, video_file, fps=30, fourcc='XVID', filename_tmpl='{:06d}.jpg', start=0, end=0, show_progress=True)[source]

Read the frame images from a directory and join them as a video.

Parameters:
  • frame_dir (str) – The directory containing video frames.
  • video_file (str) – Output filename.
  • fps (float) – FPS of the output video.
  • fourcc (str) – Fourcc of the output video, this should be compatible with the output file type.
  • filename_tmpl (str) – Filename template with the index as the variable.
  • start (int) – Starting frame index.
  • end (int) – Ending frame index.
  • show_progress (bool) – Whether to show a progress bar.
mmcv.video.convert_video(in_file, out_file, print_cmd=False, pre_options='', **kwargs)[source]

Convert a video with ffmpeg.

This provides a general api to ffmpeg, the executed command is:

`ffmpeg -y <pre_options> -i <in_file> <options> <out_file>`

Options(kwargs) are mapped to ffmpeg commands with the following rules:

  • key=val: “-key val”
  • key=True: “-key”
  • key=False: “”
Parameters:
  • in_file (str) – Input video filename.
  • out_file (str) – Output video filename.
  • pre_options (str) – Options appears before “-i <in_file>”.
  • print_cmd (bool) – Whether to print the final ffmpeg command.
mmcv.video.resize_video(in_file, out_file, size=None, ratio=None, keep_ar=False, log_level='info', print_cmd=False)[source]

Resize a video.

Parameters:
  • in_file (str) – Input video filename.
  • out_file (str) – Output video filename.
  • size (tuple) – Expected size (w, h), eg, (320, 240) or (320, -1).
  • ratio (tuple or float) – Expected resize ratio, (2, 0.5) means (w*2, h*0.5).
  • keep_ar (bool) – Whether to keep original aspect ratio.
  • log_level (str) – Logging level of ffmpeg.
  • print_cmd (bool) – Whether to print the final ffmpeg command.
mmcv.video.cut_video(in_file, out_file, start=None, end=None, vcodec=None, acodec=None, log_level='info', print_cmd=False)[source]

Cut a clip from a video.

Parameters:
  • in_file (str) – Input video filename.
  • out_file (str) – Output video filename.
  • start (None or float) – Start time (in seconds).
  • end (None or float) – End time (in seconds).
  • vcodec (None or str) – Output video codec, None for unchanged.
  • acodec (None or str) – Output audio codec, None for unchanged.
  • log_level (str) – Logging level of ffmpeg.
  • print_cmd (bool) – Whether to print the final ffmpeg command.
mmcv.video.concat_video(video_list, out_file, vcodec=None, acodec=None, log_level='info', print_cmd=False)[source]

Concatenate multiple videos into a single one.

Parameters:
  • video_list (list) – A list of video filenames
  • out_file (str) – Output video filename
  • vcodec (None or str) – Output video codec, None for unchanged
  • acodec (None or str) – Output audio codec, None for unchanged
  • log_level (str) – Logging level of ffmpeg.
  • print_cmd (bool) – Whether to print the final ffmpeg command.
mmcv.video.flowread(flow_or_path, quantize=False, concat_axis=0, *args, **kwargs)[source]

Read an optical flow map.

Parameters:
  • flow_or_path (ndarray or str) – A flow map or filepath.
  • quantize (bool) – whether to read quantized pair, if set to True, remaining args will be passed to dequantize_flow().
  • concat_axis (int) – The axis that dx and dy are concatenated, can be either 0 or 1. Ignored if quantize is False.
Returns:

Optical flow represented as a (h, w, 2) numpy array

Return type:

ndarray

mmcv.video.flowwrite(flow, filename, quantize=False, concat_axis=0, *args, **kwargs)[source]

Write optical flow to file.

If the flow is not quantized, it will be saved as a .flo file losslessly, otherwise a jpeg image which is lossy but of much smaller size. (dx and dy will be concatenated horizontally into a single image if quantize is True.)

Parameters:
  • flow (ndarray) – (h, w, 2) array of optical flow.
  • filename (str) – Output filepath.
  • quantize (bool) – Whether to quantize the flow and save it to 2 jpeg images. If set to True, remaining args will be passed to quantize_flow().
  • concat_axis (int) – The axis that dx and dy are concatenated, can be either 0 or 1. Ignored if quantize is False.
mmcv.video.quantize_flow(flow, max_val=0.02, norm=True)[source]

Quantize flow to [0, 255].

After this step, the size of flow will be much smaller, and can be dumped as jpeg images.

Parameters:
  • flow (ndarray) – (h, w, 2) array of optical flow.
  • max_val (float) – Maximum value of flow, values beyond [-max_val, max_val] will be truncated.
  • norm (bool) – Whether to divide flow values by image width/height.
Returns:

Quantized dx and dy.

Return type:

tuple[ndarray]

mmcv.video.dequantize_flow(dx, dy, max_val=0.02, denorm=True)[source]

Recover from quantized flow.

Parameters:
  • dx (ndarray) – Quantized dx.
  • dy (ndarray) – Quantized dy.
  • max_val (float) – Maximum value used when quantizing.
  • denorm (bool) – Whether to multiply flow values with width/height.
Returns:

Dequantized flow.

Return type:

ndarray

mmcv.video.flow_warp(img, flow, filling_value=0, interpolate_mode='nearest')[source]

Use flow to warp img.

Parameters:
  • img (ndarray, float or uint8) – Image to be warped.
  • flow (ndarray, float) – Optical Flow.
  • filling_value (int) – The missing pixels will be set with filling_value.
  • interpolate_mode (str) – bilinear -> Bilinear Interpolation; nearest -> Nearest Neighbor.
Returns:

Warped image with the same shape of img

Return type:

ndarray

arraymisc

mmcv.arraymisc.quantize(arr, min_val, max_val, levels, dtype=<class 'numpy.int64'>)[source]

Quantize an array of (-inf, inf) to [0, levels-1].

Parameters:
  • arr (ndarray) – Input array.
  • min_val (scalar) – Minimum value to be clipped.
  • max_val (scalar) – Maximum value to be clipped.
  • levels (int) – Quantization levels.
  • dtype (np.type) – The type of the quantized array.
Returns:

Quantized array.

Return type:

tuple

mmcv.arraymisc.dequantize(arr, min_val, max_val, levels, dtype=<class 'numpy.float64'>)[source]

Dequantize an array.

Parameters:
  • arr (ndarray) – Input array.
  • min_val (scalar) – Minimum value to be clipped.
  • max_val (scalar) – Maximum value to be clipped.
  • levels (int) – Quantization levels.
  • dtype (np.type) – The type of the dequantized array.
Returns:

Dequantized array.

Return type:

tuple

visualization

class mmcv.visualization.Color[source]

An enum that defines common colors.

Contains red, green, blue, cyan, yellow, magenta, white and black.

mmcv.visualization.color_val(color)[source]

Convert various input to color tuples.

Parameters:color (Color/str/tuple/int/ndarray) – Color inputs
Returns:A tuple of 3 integers indicating BGR channels.
Return type:tuple[int]
mmcv.visualization.imshow(img, win_name='', wait_time=0)[source]

Show an image.

Parameters:
  • img (str or ndarray) – The image to be displayed.
  • win_name (str) – The window name.
  • wait_time (int) – Value of waitKey param.
mmcv.visualization.imshow_bboxes(img, bboxes, colors='green', top_k=-1, thickness=1, show=True, win_name='', wait_time=0, out_file=None)[source]

Draw bboxes on an image.

Parameters:
  • img (str or ndarray) – The image to be displayed.
  • bboxes (list or ndarray) – A list of ndarray of shape (k, 4).
  • colors (list[str or tuple or Color]) – A list of colors.
  • top_k (int) – Plot the first k bboxes only if set positive.
  • thickness (int) – Thickness of lines.
  • show (bool) – Whether to show the image.
  • win_name (str) – The window name.
  • wait_time (int) – Value of waitKey param.
  • out_file (str, optional) – The filename to write the image.
Returns:

The image with bboxes drawn on it.

Return type:

ndarray

mmcv.visualization.imshow_det_bboxes(img, bboxes, labels, class_names=None, score_thr=0, bbox_color='green', text_color='green', thickness=1, font_scale=0.5, show=True, win_name='', wait_time=0, out_file=None)[source]

Draw bboxes and class labels (with scores) on an image.

Parameters:
  • img (str or ndarray) – The image to be displayed.
  • bboxes (ndarray) – Bounding boxes (with scores), shaped (n, 4) or (n, 5).
  • labels (ndarray) – Labels of bboxes.
  • class_names (list[str]) – Names of each classes.
  • score_thr (float) – Minimum score of bboxes to be shown.
  • bbox_color (str or tuple or Color) – Color of bbox lines.
  • text_color (str or tuple or Color) – Color of texts.
  • thickness (int) – Thickness of lines.
  • font_scale (float) – Font scales of texts.
  • show (bool) – Whether to show the image.
  • win_name (str) – The window name.
  • wait_time (int) – Value of waitKey param.
  • out_file (str or None) – The filename to write the image.
Returns:

The image with bboxes drawn on it.

Return type:

ndarray

mmcv.visualization.flowshow(flow, win_name='', wait_time=0)[source]

Show optical flow.

Parameters:
  • flow (ndarray or str) – The optical flow to be displayed.
  • win_name (str) – The window name.
  • wait_time (int) – Value of waitKey param.
mmcv.visualization.flow2rgb(flow, color_wheel=None, unknown_thr=1000000.0)[source]

Convert flow map to RGB image.

Parameters:
  • flow (ndarray) – Array of optical flow.
  • color_wheel (ndarray or None) – Color wheel used to map flow field to RGB colorspace. Default color wheel will be used if not specified.
  • unknown_thr (str) – Values above this threshold will be marked as unknown and thus ignored.
Returns:

RGB image that can be visualized.

Return type:

ndarray

mmcv.visualization.make_color_wheel(bins=None)[source]

Build a color wheel.

Parameters:bins (list or tuple, optional) – Specify the number of bins for each color range, corresponding to six ranges: red -> yellow, yellow -> green, green -> cyan, cyan -> blue, blue -> magenta, magenta -> red. [15, 6, 4, 11, 13, 6] is used for default (see Middlebury).
Returns:Color wheel of shape (total_bins, 3).
Return type:ndarray

utils

class mmcv.utils.Config(cfg_dict=None, cfg_text=None, filename=None)[source]

A facility for config and config files.

It supports common file formats as configs: python/json/yaml. The interface is the same as a dict object and also allows access config values as attributes.

Example

>>> cfg = Config(dict(a=1, b=dict(b1=[0, 1])))
>>> cfg.a
1
>>> cfg.b
{'b1': [0, 1]}
>>> cfg.b.b1
[0, 1]
>>> cfg = Config.fromfile('tests/data/config/a.py')
>>> cfg.filename
"/home/kchen/projects/mmcv/tests/data/config/a.py"
>>> cfg.item4
'test'
>>> cfg
"Config [path: /home/kchen/projects/mmcv/tests/data/config/a.py]: "
"{'item1': [1, 2], 'item2': {'a': 0}, 'item3': True, 'item4': 'test'}"
static auto_argparser(description=None)[source]

Generate argparser from config file automatically (experimental)

static fromstring(cfg_str, file_format)[source]

Generate config from config str.

Parameters:
  • cfg_str (str) – Config str.
  • file_format (str) – Config file format corresponding to the config str. Only py/yml/yaml/json type are supported now!
Returns:

Config: Config obj.

Return type:

obj

merge_from_dict(options, allow_list_keys=True)[source]

Merge list into cfg_dict.

Merge the dict parsed by MultipleKVAction into this cfg.

Examples

>>> options = {'model.backbone.depth': 50,
...            'model.backbone.with_cp':True}
>>> cfg = Config(dict(model=dict(backbone=dict(type='ResNet'))))
>>> cfg.merge_from_dict(options)
>>> cfg_dict = super(Config, self).__getattribute__('_cfg_dict')
>>> assert cfg_dict == dict(
...     model=dict(backbone=dict(depth=50, with_cp=True)))

# Merge list element >>> cfg = Config(dict(pipeline=[ … dict(type=’LoadImage’), dict(type=’LoadAnnotations’)])) >>> options = dict(pipeline={‘0’: dict(type=’SelfLoadImage’)}) >>> cfg.merge_from_dict(options, allow_list_keys=True) >>> cfg_dict = super(Config, self).__getattribute__(‘_cfg_dict’) >>> assert cfg_dict == dict(pipeline=[ … dict(type=’SelfLoadImage’), dict(type=’LoadAnnotations’)])

Parameters:
  • options (dict) – dict of configs to merge from.
  • allow_list_keys (bool) – If True, int string keys (e.g. ‘0’, ‘1’) are allowed in options and will replace the element of the corresponding index in the config if the config is a list. Default: True.
class mmcv.utils.ConfigDict(*args, **kwargs)[source]
class mmcv.utils.DictAction(option_strings, dest, nargs=None, const=None, default=None, type=None, choices=None, required=False, help=None, metavar=None)[source]

argparse action to split an argument into KEY=VALUE form on the first = and append to a dictionary. List options can be passed as comma separated values, i.e ‘KEY=V1,V2,V3’, or with explicit brackets, i.e. ‘KEY=[V1,V2,V3]’. It also support nested brackets to build list/tuple values. e.g. ‘KEY=[(V1,V2),(V3,V4)]’

mmcv.utils.collect_env()[source]

Collect the information of the running environments.

Returns:The environment information. The following fields are contained.
  • sys.platform: The variable of sys.platform.
  • Python: Python version.
  • CUDA available: Bool, indicating if CUDA is available.
  • GPU devices: Device type of each GPU.
  • CUDA_HOME (optional): The env var CUDA_HOME.
  • NVCC (optional): NVCC version.
  • GCC: GCC version, “n/a” if GCC is not installed.
  • PyTorch: PyTorch version.
  • PyTorch compiling details: The output of torch.__config__.show().
  • TorchVision (optional): TorchVision version.
  • OpenCV: OpenCV version.
  • MMCV: MMCV version.
  • MMCV Compiler: The GCC version for compiling MMCV ops.
  • MMCV CUDA Compiler: The CUDA version for compiling MMCV ops.
Return type:dict
mmcv.utils.get_logger(name, log_file=None, log_level=20, file_mode='w')[source]

Initialize and get a logger by name.

If the logger has not been initialized, this method will initialize the logger by adding one or two handlers, otherwise the initialized logger will be directly returned. During initialization, a StreamHandler will always be added. If log_file is specified and the process rank is 0, a FileHandler will also be added.

Parameters:
  • name (str) – Logger name.
  • log_file (str | None) – The log filename. If specified, a FileHandler will be added to the logger.
  • log_level (int) – The logger level. Note that only the process of rank 0 is affected, and other processes will set the level to “Error” thus be silent most of the time.
  • file_mode (str) – The file mode used in opening log file. Defaults to ‘w’.
Returns:

The expected logger.

Return type:

logging.Logger

mmcv.utils.print_log(msg, logger=None, level=20)[source]

Print a log message.

Parameters:
  • msg (str) – The message to be logged.
  • logger (logging.Logger | str | None) – The logger to be used. Some special loggers are: - “silent”: no message will be printed. - other str: the logger obtained with get_root_logger(logger). - None: The print() method will be used to print log messages.
  • level (int) – Logging level. Only available when logger is a Logger object or “root”.
mmcv.utils.is_str(x)[source]

Whether the input is an string instance.

Note: This method is deprecated since python 2 is no longer supported.

mmcv.utils.iter_cast(inputs, dst_type, return_type=None)[source]

Cast elements of an iterable object into some type.

Parameters:
  • inputs (Iterable) – The input object.
  • dst_type (type) – Destination type.
  • return_type (type, optional) – If specified, the output object will be converted to this type, otherwise an iterator.
Returns:

The converted object.

Return type:

iterator or specified type

mmcv.utils.list_cast(inputs, dst_type)[source]

Cast elements of an iterable object into a list of some type.

A partial method of iter_cast().

mmcv.utils.tuple_cast(inputs, dst_type)[source]

Cast elements of an iterable object into a tuple of some type.

A partial method of iter_cast().

mmcv.utils.is_seq_of(seq, expected_type, seq_type=None)[source]

Check whether it is a sequence of some type.

Parameters:
  • seq (Sequence) – The sequence to be checked.
  • expected_type (type) – Expected type of sequence items.
  • seq_type (type, optional) – Expected sequence type.
Returns:

Whether the sequence is valid.

Return type:

bool

mmcv.utils.is_list_of(seq, expected_type)[source]

Check whether it is a list of some type.

A partial method of is_seq_of().

mmcv.utils.is_tuple_of(seq, expected_type)[source]

Check whether it is a tuple of some type.

A partial method of is_seq_of().

mmcv.utils.slice_list(in_list, lens)[source]

Slice a list into several sub lists by a list of given length.

Parameters:
  • in_list (list) – The list to be sliced.
  • lens (int or list) – The expected length of each out list.
Returns:

A list of sliced list.

Return type:

list

mmcv.utils.concat_list(in_list)[source]

Concatenate a list of list into a single list.

Parameters:in_list (list) – The list of list to be merged.
Returns:The concatenated flat list.
Return type:list
mmcv.utils.check_prerequisites(prerequisites, checker, msg_tmpl='Prerequisites "{}" are required in method "{}" but not found, please install them first.')[source]

A decorator factory to check if prerequisites are satisfied.

Parameters:
  • prerequisites (str of list[str]) – Prerequisites to be checked.
  • checker (callable) – The checker method that returns True if a prerequisite is meet, False otherwise.
  • msg_tmpl (str) – The message template with two variables.
Returns:

A specific decorator.

Return type:

decorator

mmcv.utils.requires_package(prerequisites)[source]

A decorator to check if some python packages are installed.

Example

>>> @requires_package('numpy')
>>> func(arg1, args):
>>>     return numpy.zeros(1)
array([0.])
>>> @requires_package(['numpy', 'non_package'])
>>> func(arg1, args):
>>>     return numpy.zeros(1)
ImportError
mmcv.utils.requires_executable(prerequisites)[source]

A decorator to check if some executable files are installed.

Example

>>> @requires_executable('ffmpeg')
>>> func(arg1, args):
>>>     print(1)
1
mmcv.utils.scandir(dir_path, suffix=None, recursive=False)[source]

Scan a directory to find the interested files.

Parameters:
  • (str | obj (dir_path) – Path): Path of the directory.
  • suffix (str | tuple(str), optional) – File suffix that we are interested in. Default: None.
  • recursive (bool, optional) – If set to True, recursively scan the directory. Default: False.
Returns:

A generator for all the interested files with relative pathes.

class mmcv.utils.ProgressBar(task_num=0, bar_width=50, start=True, file=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>)[source]

A progress bar which can print the progress.

mmcv.utils.track_progress(func, tasks, bar_width=50, file=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>, **kwargs)[source]

Track the progress of tasks execution with a progress bar.

Tasks are done with a simple for-loop.

Parameters:
  • func (callable) – The function to be applied to each task.
  • tasks (list or tuple[Iterable, int]) – A list of tasks or (tasks, total num).
  • bar_width (int) – Width of progress bar.
Returns:

The task results.

Return type:

list

mmcv.utils.track_iter_progress(tasks, bar_width=50, file=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>)[source]

Track the progress of tasks iteration or enumeration with a progress bar.

Tasks are yielded with a simple for-loop.

Parameters:
  • tasks (list or tuple[Iterable, int]) – A list of tasks or (tasks, total num).
  • bar_width (int) – Width of progress bar.
Yields:

list – The task results.

mmcv.utils.track_parallel_progress(func, tasks, nproc, initializer=None, initargs=None, bar_width=50, chunksize=1, skip_first=False, keep_order=True, file=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>)[source]

Track the progress of parallel task execution with a progress bar.

The built-in multiprocessing module is used for process pools and tasks are done with Pool.map() or Pool.imap_unordered().

Parameters:
  • func (callable) – The function to be applied to each task.
  • tasks (list or tuple[Iterable, int]) – A list of tasks or (tasks, total num).
  • nproc (int) – Process (worker) number.
  • initializer (None or callable) – Refer to multiprocessing.Pool for details.
  • initargs (None or tuple) – Refer to multiprocessing.Pool for details.
  • chunksize (int) – Refer to multiprocessing.Pool for details.
  • bar_width (int) – Width of progress bar.
  • skip_first (bool) – Whether to skip the first sample for each worker when estimating fps, since the initialization step may takes longer.
  • keep_order (bool) – If True, Pool.imap() is used, otherwise Pool.imap_unordered() is used.
Returns:

The task results.

Return type:

list

class mmcv.utils.Registry(name, build_func=None, parent=None, scope=None)[source]

A registry to map strings to classes.

Registered object could be built from registry. .. rubric:: Example

>>> MODELS = Registry('models')
>>> @MODELS.register_module()
>>> class ResNet:
>>>     pass
>>> resnet = MODELS.build(dict(type='ResNet'))

Please refer to https://mmcv.readthedocs.io/en/latest/registry.html for advanced useage.

Parameters:
  • name (str) – Registry name.
  • build_func (func, optional) – Build function to construct instance from Registry, func:build_from_cfg is used if neither parent or build_func is specified. If parent is specified and build_func is not given, build_func will be inherited from parent. Default: None.
  • parent (Registry, optional) – Parent registry. The class registered in children registry could be built from parent. Default: None.
  • scope (str, optional) – The scope of registry. It is the key to search for children registry. If not specified, scope will be the name of the package where class is defined, e.g. mmdet, mmcls, mmseg. Default: None.
get(key)[source]

Get the registry record.

Parameters:key (str) – The class name in string format.
Returns:The corresponding class.
Return type:class
static infer_scope()[source]

Infer the scope of registry.

The name of the package where registry is defined will be returned.

Example

# in mmdet/models/backbone/resnet.py >>> MODELS = Registry(‘models’) >>> @MODELS.register_module() >>> class ResNet: >>> pass The scope of ResNet will be mmdet.

Returns:The inferred scope name.
Return type:scope (str)
register_module(name=None, force=False, module=None)[source]

Register a module.

A record will be added to self._module_dict, whose key is the class name or the specified name, and value is the class itself. It can be used as a decorator or a normal function.

Example

>>> backbones = Registry('backbone')
>>> @backbones.register_module()
>>> class ResNet:
>>>     pass
>>> backbones = Registry('backbone')
>>> @backbones.register_module(name='mnet')
>>> class MobileNet:
>>>     pass
>>> backbones = Registry('backbone')
>>> class ResNet:
>>>     pass
>>> backbones.register_module(ResNet)
Parameters:
  • name (str | None) – The module name to be registered. If not specified, the class name will be used.
  • force (bool, optional) – Whether to override an existing class with the same name. Default: False.
  • module (type) – Module class to be registered.
static split_scope_key(key)[source]

Split scope and key.

The first scope will be split from key.

Examples

>>> Registry.split_scope_key('mmdet.ResNet')
'mmdet', 'ResNet'
>>> Registry.split_scope_key('ResNet')
None, 'ResNet'
Returns:The first scope. key (str): The remaining key.
Return type:scope (str, None)
mmcv.utils.build_from_cfg(cfg, registry, default_args=None)[source]

Build a module from config dict.

Parameters:
  • cfg (dict) – Config dict. It should at least contain the key “type”.
  • registry (Registry) – The registry to search the type from.
  • default_args (dict, optional) – Default initialization arguments.
Returns:

The constructed object.

Return type:

object

class mmcv.utils.Timer(start=True, print_tmpl=None)[source]

A flexible Timer class.

Example:
>>> import time
>>> import mmcv
>>> with mmcv.Timer():
>>>     # simulate a code block that will run for 1s
>>>     time.sleep(1)
1.000
>>> with mmcv.Timer(print_tmpl='it takes {:.1f} seconds'):
>>>     # simulate a code block that will run for 1s
>>>     time.sleep(1)
it takes 1.0 seconds
>>> timer = mmcv.Timer()
>>> time.sleep(0.5)
>>> print(timer.since_start())
0.500
>>> time.sleep(0.5)
>>> print(timer.since_last_check())
0.500
>>> print(timer.since_start())
1.000
is_running

indicate whether the timer is running

Type:bool
since_last_check()[source]

Time since the last checking.

Either since_start() or since_last_check() is a checking operation.

Returns (float): Time in seconds.

since_start()[source]

Total time since the timer is started.

Returns (float): Time in seconds.

start()[source]

Start the timer.

exception mmcv.utils.TimerError(message)[source]
mmcv.utils.check_time(timer_id)[source]

Add check points in a single line.

This method is suitable for running a task on a list of items. A timer will be registered when the method is called for the first time.

Example:
>>> import time
>>> import mmcv
>>> for i in range(1, 6):
>>>     # simulate a code block
>>>     time.sleep(i)
>>>     mmcv.check_time('task1')
2.000
3.000
4.000
5.000
Parameters:timer_id (str) – Timer identifier.
class mmcv.utils.SyncBatchNorm(num_features: int, eps: float = 1e-05, momentum: float = 0.1, affine: bool = True, track_running_stats: bool = True, process_group: Optional[Any] = None, device=None, dtype=None)[source]
class mmcv.utils.BuildExtension(*args, **kwargs)[source]

A custom setuptools build extension .

This setuptools.build_ext subclass takes care of passing the minimum required compiler flags (e.g. -std=c++14) as well as mixed C++/CUDA compilation (and support for CUDA files in general).

When using BuildExtension, it is allowed to supply a dictionary for extra_compile_args (rather than the usual list) that maps from languages (cxx or nvcc) to a list of additional compiler flags to supply to the compiler. This makes it possible to supply different flags to the C++ and CUDA compiler during mixed compilation.

use_ninja (bool): If use_ninja is True (default), then we attempt to build using the Ninja backend. Ninja greatly speeds up compilation compared to the standard setuptools.build_ext. Fallbacks to the standard distutils backend if Ninja is not available.

Note

By default, the Ninja backend uses #CPUS + 2 workers to build the extension. This may use up too many resources on some systems. One can control the number of workers by setting the MAX_JOBS environment variable to a non-negative number.

finalize_options() → None[source]

Set final values for all the options that this command supports. This is always called as late as possible, ie. after any option assignments from the command-line or from other commands have been done. Thus, this is the place to code option dependencies: if ‘foo’ depends on ‘bar’, then it is safe to set ‘foo’ from ‘bar’ as long as ‘foo’ still has the same value it was assigned in ‘initialize_options()’.

This method must be implemented by all command classes.

get_ext_filename(ext_name)[source]

Convert the name of an extension (eg. “foo.bar”) into the name of the file from which it will be loaded (eg. “foo/bar.so”, or “foobar.pyd”).

classmethod with_options(**options)[source]

Returns a subclass with alternative constructor that extends any original keyword arguments to the original constructor with the given options.

mmcv.utils.CppExtension(name, sources, *args, **kwargs)[source]

Creates a setuptools.Extension for C++.

Convenience method that creates a setuptools.Extension with the bare minimum (but often sufficient) arguments to build a C++ extension.

All arguments are forwarded to the setuptools.Extension constructor.

Example

>>> from setuptools import setup
>>> from torch.utils.cpp_extension import BuildExtension, CppExtension
>>> setup(
        name='extension',
        ext_modules=[
            CppExtension(
                name='extension',
                sources=['extension.cpp'],
                extra_compile_args=['-g']),
        ],
        cmdclass={
            'build_ext': BuildExtension
        })
mmcv.utils.CUDAExtension(name, sources, *args, **kwargs)[source]

Creates a setuptools.Extension for CUDA/C++.

Convenience method that creates a setuptools.Extension with the bare minimum (but often sufficient) arguments to build a CUDA/C++ extension. This includes the CUDA include path, library path and runtime library.

All arguments are forwarded to the setuptools.Extension constructor.

Example

>>> from setuptools import setup
>>> from torch.utils.cpp_extension import BuildExtension, CUDAExtension
>>> setup(
        name='cuda_extension',
        ext_modules=[
            CUDAExtension(
                    name='cuda_extension',
                    sources=['extension.cpp', 'extension_kernel.cu'],
                    extra_compile_args={'cxx': ['-g'],
                                        'nvcc': ['-O2']})
        ],
        cmdclass={
            'build_ext': BuildExtension
        })

Compute capabilities:

By default the extension will be compiled to run on all archs of the cards visible during the building process of the extension, plus PTX. If down the road a new card is installed the extension may need to be recompiled. If a visible card has a compute capability (CC) that’s newer than the newest version for which your nvcc can build fully-compiled binaries, Pytorch will make nvcc fall back to building kernels with the newest version of PTX your nvcc does support (see below for details on PTX).

You can override the default behavior using TORCH_CUDA_ARCH_LIST to explicitly specify which CCs you want the extension to support:

TORCH_CUDA_ARCH_LIST=”6.1 8.6” python build_my_extension.py TORCH_CUDA_ARCH_LIST=”5.2 6.0 6.1 7.0 7.5 8.0 8.6+PTX” python build_my_extension.py

The +PTX option causes extension kernel binaries to include PTX instructions for the specified CC. PTX is an intermediate representation that allows kernels to runtime-compile for any CC >= the specified CC (for example, 8.6+PTX generates PTX that can runtime-compile for any GPU with CC >= 8.6). This improves your binary’s forward compatibility. However, relying on older PTX to provide forward compat by runtime-compiling for newer CCs can modestly reduce performance on those newer CCs. If you know exact CC(s) of the GPUs you want to target, you’re always better off specifying them individually. For example, if you want your extension to run on 8.0 and 8.6, “8.0+PTX” would work functionally because it includes PTX that can runtime-compile for 8.6, but “8.0 8.6” would be better.

Note that while it’s possible to include all supported archs, the more archs get included the slower the building process will be, as it will build a separate kernel image for each arch.

class mmcv.utils.DataLoader(dataset: torch.utils.data.dataset.Dataset[+T_co][T_co], batch_size: Optional[int] = 1, shuffle: bool = False, sampler: Optional[torch.utils.data.sampler.Sampler[int][int]] = None, batch_sampler: Optional[torch.utils.data.sampler.Sampler[typing.Sequence[int]][Sequence[int]]] = None, num_workers: int = 0, collate_fn: Optional[Callable[[List[T]], Any]] = None, pin_memory: bool = False, drop_last: bool = False, timeout: float = 0, worker_init_fn: Optional[Callable[[int], None]] = None, multiprocessing_context=None, generator=None, *, prefetch_factor: int = 2, persistent_workers: bool = False)[source]

Data loader. Combines a dataset and a sampler, and provides an iterable over the given dataset.

The DataLoader supports both map-style and iterable-style datasets with single- or multi-process loading, customizing loading order and optional automatic batching (collation) and memory pinning.

See torch.utils.data documentation page for more details.

Parameters:
  • dataset (Dataset) – dataset from which to load the data.
  • batch_size (int, optional) – how many samples per batch to load (default: 1).
  • shuffle (bool, optional) – set to True to have the data reshuffled at every epoch (default: False).
  • sampler (Sampler or Iterable, optional) – defines the strategy to draw samples from the dataset. Can be any Iterable with __len__ implemented. If specified, shuffle must not be specified.
  • batch_sampler (Sampler or Iterable, optional) – like sampler, but returns a batch of indices at a time. Mutually exclusive with batch_size, shuffle, sampler, and drop_last.
  • num_workers (int, optional) – how many subprocesses to use for data loading. 0 means that the data will be loaded in the main process. (default: 0)
  • collate_fn (callable, optional) – merges a list of samples to form a mini-batch of Tensor(s). Used when using batched loading from a map-style dataset.
  • pin_memory (bool, optional) – If True, the data loader will copy Tensors into CUDA pinned memory before returning them. If your data elements are a custom type, or your collate_fn returns a batch that is a custom type, see the example below.
  • drop_last (bool, optional) – set to True to drop the last incomplete batch, if the dataset size is not divisible by the batch size. If False and the size of dataset is not divisible by the batch size, then the last batch will be smaller. (default: False)
  • timeout (numeric, optional) – if positive, the timeout value for collecting a batch from workers. Should always be non-negative. (default: 0)
  • worker_init_fn (callable, optional) – If not None, this will be called on each worker subprocess with the worker id (an int in [0, num_workers - 1]) as input, after seeding and before data loading. (default: None)
  • generator (torch.Generator, optional) – If not None, this RNG will be used by RandomSampler to generate random indexes and multiprocessing to generate base_seed for workers. (default: None)
  • prefetch_factor (int, optional, keyword-only arg) – Number of samples loaded in advance by each worker. 2 means there will be a total of 2 * num_workers samples prefetched across all workers. (default: 2)
  • persistent_workers (bool, optional) – If True, the data loader will not shutdown the worker processes after a dataset has been consumed once. This allows to maintain the workers Dataset instances alive. (default: False)

Warning

If the spawn start method is used, worker_init_fn cannot be an unpicklable object, e.g., a lambda function. See multiprocessing-best-practices on more details related to multiprocessing in PyTorch.

Warning

len(dataloader) heuristic is based on the length of the sampler used. When dataset is an IterableDataset, it instead returns an estimate based on len(dataset) / batch_size, with proper rounding depending on drop_last, regardless of multi-process loading configurations. This represents the best guess PyTorch can make because PyTorch trusts user dataset code in correctly handling multi-process loading to avoid duplicate data.

However, if sharding results in multiple workers having incomplete last batches, this estimate can still be inaccurate, because (1) an otherwise complete batch can be broken into multiple ones and (2) more than one batch worth of samples can be dropped when drop_last is set. Unfortunately, PyTorch can not detect such cases in general.

See `Dataset Types`_ for more details on these two types of datasets and how IterableDataset interacts with `Multi-process data loading`_.

Warning

See reproducibility, and dataloader-workers-random-seed, and data-loading-randomness notes for random seed related questions.

mmcv.utils.PoolDataLoader

alias of torch.utils.data.dataloader.DataLoader

mmcv.utils.deprecated_api_warning(name_dict, cls_name=None)[source]

A decorator to check if some arguments are deprecate and try to replace deprecate src_arg_name to dst_arg_name.

Parameters:name_dict (dict) – key (str): Deprecate argument names. val (str): Expected argument names.
Returns:New function.
Return type:func
mmcv.utils.digit_version(version_str: str, length: int = 4)[source]

Convert a version string into a tuple of integers.

This method is usually used for comparing two versions. For pre-release versions: alpha < beta < rc.

Parameters:
  • version_str (str) – The version string.
  • length (int) – The maximum number of version levels. Default: 4.
Returns:

The version info in digits (integers).

Return type:

tuple[int]

mmcv.utils.get_git_hash(fallback='unknown', digits=None)[source]

Get the git hash of the current repo.

Parameters:
  • fallback (str, optional) – The fallback string when git hash is unavailable. Defaults to ‘unknown’.
  • digits (int, optional) – kept digits of the hash. Defaults to None, meaning all digits are kept.
Returns:

Git commit hash.

Return type:

str

mmcv.utils.import_modules_from_strings(imports, allow_failed_imports=False)[source]

Import modules from the given list of strings.

Parameters:
  • imports (list | str | None) – The given module names to be imported.
  • allow_failed_imports (bool) – If True, the failed imports will return None. Otherwise, an ImportError is raise. Default: False.
Returns:

The imported modules.

Return type:

list[module] | module | None

Examples

>>> osp, sys = import_modules_from_strings(
...     ['os.path', 'sys'])
>>> import os.path as osp_
>>> import sys as sys_
>>> assert osp == osp_
>>> assert sys == sys_
mmcv.utils.assert_dict_contains_subset(dict_obj: Dict[Any, Any], expected_subset: Dict[Any, Any]) → bool[source]

Check if the dict_obj contains the expected_subset.

Parameters:
  • dict_obj (Dict[Any, Any]) – Dict object to be checked.
  • expected_subset (Dict[Any, Any]) – Subset expected to be contained in dict_obj.
Returns:

Whether the dict_obj contains the expected_subset.

Return type:

bool

mmcv.utils.assert_attrs_equal(obj: Any, expected_attrs: Dict[str, Any]) → bool[source]

Check if attribute of class object is correct.

Parameters:
  • obj (object) – Class object to be checked.
  • expected_attrs (Dict[str, Any]) – Dict of the expected attrs.
Returns:

Whether the attribute of class object is correct.

Return type:

bool

mmcv.utils.assert_dict_has_keys(obj: Dict[str, Any], expected_keys: List[str]) → bool[source]

Check if the obj has all the expected_keys.

Parameters:
  • obj (Dict[str, Any]) – Object to be checked.
  • expected_keys (List[str]) – Keys expected to contained in the keys of the obj.
Returns:

Whether the obj has the expected keys.

Return type:

bool

mmcv.utils.assert_keys_equal(result_keys: List[str], target_keys: List[str]) → bool[source]

Check if target_keys is equal to result_keys.

Parameters:
  • result_keys (List[str]) – Result keys to be checked.
  • target_keys (List[str]) – Target keys to be checked.
Returns:

Whether target_keys is equal to result_keys.

Return type:

bool

mmcv.utils.assert_is_norm_layer(module) → bool[source]

Check if the module is a norm layer.

Parameters:module (nn.Module) – The module to be checked.
Returns:Whether the module is a norm layer.
Return type:bool
mmcv.utils.assert_params_all_zeros(module) → bool[source]

Check if the parameters of the module is all zeros.

Parameters:module (nn.Module) – The module to be checked.
Returns:Whether the parameters of the module is all zeros.
Return type:bool
mmcv.utils.check_python_script(cmd)[source]

Run the python cmd script with __main__. The difference between os.system is that, this function exectues code in the current process, so that it can be tracked by coverage tools. Currently it supports two forms:

  • ./tests/data/scripts/hello.py zz
  • python tests/data/scripts/hello.py zz
mmcv.utils.is_method_overridden(method, base_class, derived_class)[source]

Check if a method of base class is overridden in derived class.

Parameters:
  • method (str) – the method name to check.
  • base_class (type) – the class of the base class.
  • derived_class (type | Any) – the class or instance of the derived class.

cnn

class mmcv.cnn.AlexNet(num_classes=-1)[source]

AlexNet backbone.

Parameters:num_classes (int) – number of classes for classification.
forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.cnn.VGG(depth, with_bn=False, num_classes=-1, num_stages=5, dilations=(1, 1, 1, 1, 1), out_indices=(0, 1, 2, 3, 4), frozen_stages=-1, bn_eval=True, bn_frozen=False, ceil_mode=False, with_last_pool=True)[source]

VGG backbone.

Parameters:
  • depth (int) – Depth of vgg, from {11, 13, 16, 19}.
  • with_bn (bool) – Use BatchNorm or not.
  • num_classes (int) – number of classes for classification.
  • num_stages (int) – VGG stages, normally 5.
  • dilations (Sequence[int]) – Dilation of each stage.
  • out_indices (Sequence[int]) – Output from which stages.
  • frozen_stages (int) – Stages to be frozen (all param fixed). -1 means not freezing any parameters.
  • bn_eval (bool) – Whether to set BN layers as eval mode, namely, freeze running stats (mean and var).
  • bn_frozen (bool) – Whether to freeze weight and bias of BN layers.
forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

train(mode=True)[source]

Sets the module in training mode.

This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g. Dropout, BatchNorm, etc.

Parameters:mode (bool) – whether to set training mode (True) or evaluation mode (False). Default: True.
Returns:self
Return type:Module
class mmcv.cnn.ResNet(depth, num_stages=4, strides=(1, 2, 2, 2), dilations=(1, 1, 1, 1), out_indices=(0, 1, 2, 3), style='pytorch', frozen_stages=-1, bn_eval=True, bn_frozen=False, with_cp=False)[source]

ResNet backbone.

Parameters:
  • depth (int) – Depth of resnet, from {18, 34, 50, 101, 152}.
  • num_stages (int) – Resnet stages, normally 4.
  • strides (Sequence[int]) – Strides of the first block of each stage.
  • dilations (Sequence[int]) – Dilation of each stage.
  • out_indices (Sequence[int]) – Output from which stages.
  • style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer.
  • frozen_stages (int) – Stages to be frozen (all param fixed). -1 means not freezing any parameters.
  • bn_eval (bool) – Whether to set BN layers as eval mode, namely, freeze running stats (mean and var).
  • bn_frozen (bool) – Whether to freeze weight and bias of BN layers.
  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed.
forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

train(mode=True)[source]

Sets the module in training mode.

This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g. Dropout, BatchNorm, etc.

Parameters:mode (bool) – whether to set training mode (True) or evaluation mode (False). Default: True.
Returns:self
Return type:Module
mmcv.cnn.bias_init_with_prob(prior_prob)[source]

initialize conv/fc bias value according to a given probability value.

class mmcv.cnn.ConvModule(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias='auto', conv_cfg=None, norm_cfg=None, act_cfg={'type': 'ReLU'}, inplace=True, with_spectral_norm=False, padding_mode='zeros', order=('conv', 'norm', 'act'))[source]

A conv block that bundles conv/norm/activation layers.

This block simplifies the usage of convolution layers, which are commonly used with a norm layer (e.g., BatchNorm) and activation layer (e.g., ReLU). It is based upon three build methods: build_conv_layer(), build_norm_layer() and build_activation_layer().

Besides, we add some additional features in this module. 1. Automatically set bias of the conv layer. 2. Spectral norm is supported. 3. More padding modes are supported. Before PyTorch 1.5, nn.Conv2d only supports zero and circular padding, and we add “reflect” padding mode.

Parameters:
  • in_channels (int) – Number of channels in the input feature map. Same as that in nn._ConvNd.
  • out_channels (int) – Number of channels produced by the convolution. Same as that in nn._ConvNd.
  • kernel_size (int | tuple[int]) – Size of the convolving kernel. Same as that in nn._ConvNd.
  • stride (int | tuple[int]) – Stride of the convolution. Same as that in nn._ConvNd.
  • padding (int | tuple[int]) – Zero-padding added to both sides of the input. Same as that in nn._ConvNd.
  • dilation (int | tuple[int]) – Spacing between kernel elements. Same as that in nn._ConvNd.
  • groups (int) – Number of blocked connections from input channels to output channels. Same as that in nn._ConvNd.
  • bias (bool | str) – If specified as auto, it will be decided by the norm_cfg. Bias will be set as True if norm_cfg is None, otherwise False. Default: “auto”.
  • conv_cfg (dict) – Config dict for convolution layer. Default: None, which means using conv2d.
  • norm_cfg (dict) – Config dict for normalization layer. Default: None.
  • act_cfg (dict) – Config dict for activation layer. Default: dict(type=’ReLU’).
  • inplace (bool) – Whether to use inplace mode for activation. Default: True.
  • with_spectral_norm (bool) – Whether use spectral norm in conv module. Default: False.
  • padding_mode (str) – If the padding_mode has not been supported by current Conv2d in PyTorch, we will use our own padding layer instead. Currently, we support [‘zeros’, ‘circular’] with official implementation and [‘reflect’] with our own implementation. Default: ‘zeros’.
  • order (tuple[str]) – The order of conv/norm/activation layers. It is a sequence of “conv”, “norm” and “act”. Common examples are (“conv”, “norm”, “act”) and (“act”, “conv”, “norm”). Default: (‘conv’, ‘norm’, ‘act’).
forward(x, activate=True, norm=True)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

mmcv.cnn.build_activation_layer(cfg)[source]

Build activation layer.

Parameters:cfg (dict) – The activation layer config, which should contain: - type (str): Layer type. - layer args: Args needed to instantiate an activation layer.
Returns:Created activation layer.
Return type:nn.Module
mmcv.cnn.build_conv_layer(cfg, *args, **kwargs)[source]

Build convolution layer.

Parameters:
  • cfg (None or dict) – The conv layer config, which should contain: - type (str): Layer type. - layer args: Args needed to instantiate an conv layer.
  • args (argument list) – Arguments passed to the __init__ method of the corresponding conv layer.
  • kwargs (keyword arguments) – Keyword arguments passed to the __init__ method of the corresponding conv layer.
Returns:

Created conv layer.

Return type:

nn.Module

mmcv.cnn.build_norm_layer(cfg, num_features, postfix='')[source]

Build normalization layer.

Parameters:
  • cfg (dict) –

    The norm layer config, which should contain:

    • type (str): Layer type.
    • layer args: Args needed to instantiate a norm layer.
    • requires_grad (bool, optional): Whether stop gradient updates.
  • num_features (int) – Number of input channels.
  • postfix (int | str) – The postfix to be appended into norm abbreviation to create named layer.
Returns:

The first element is the layer name consisting of

abbreviation and postfix, e.g., bn1, gn. The second element is the created norm layer.

Return type:

(str, nn.Module)

mmcv.cnn.build_padding_layer(cfg, *args, **kwargs)[source]

Build padding layer.

Parameters:cfg (None or dict) – The padding layer config, which should contain: - type (str): Layer type. - layer args: Args needed to instantiate a padding layer.
Returns:Created padding layer.
Return type:nn.Module
mmcv.cnn.build_upsample_layer(cfg, *args, **kwargs)[source]

Build upsample layer.

Parameters:
  • cfg (dict) –

    The upsample layer config, which should contain:

    • type (str): Layer type.
    • scale_factor (int): Upsample ratio, which is not applicable to
      deconv.
    • layer args: Args needed to instantiate a upsample layer.
  • args (argument list) – Arguments passed to the __init__ method of the corresponding conv layer.
  • kwargs (keyword arguments) – Keyword arguments passed to the __init__ method of the corresponding conv layer.
Returns:

Created upsample layer.

Return type:

nn.Module

mmcv.cnn.build_plugin_layer(cfg, postfix='', **kwargs)[source]

Build plugin layer.

Parameters:
  • cfg (None or dict) – cfg should contain: type (str): identify plugin layer type. layer args: args needed to instantiate a plugin layer.
  • postfix (int, str) – appended into norm abbreviation to create named layer. Default: ‘’.
Returns:

name (str): abbreviation + postfix layer (nn.Module): created plugin layer

Return type:

tuple[str, nn.Module]

mmcv.cnn.is_norm(layer, exclude=None)[source]

Check if a layer is a normalization layer.

Parameters:
  • layer (nn.Module) – The layer to be checked.
  • exclude (type | tuple[type]) – Types to be excluded.
Returns:

Whether the layer is a norm layer.

Return type:

bool

class mmcv.cnn.NonLocal1d(in_channels, sub_sample=False, conv_cfg={'type': 'Conv1d'}, **kwargs)[source]

1D Non-local module.

Parameters:
  • in_channels (int) – Same as NonLocalND.
  • sub_sample (bool) – Whether to apply max pooling after pairwise function (Note that the sub_sample is applied on spatial only). Default: False.
  • conv_cfg (None | dict) – Same as NonLocalND. Default: dict(type=’Conv1d’).
class mmcv.cnn.NonLocal2d(in_channels, sub_sample=False, conv_cfg={'type': 'Conv2d'}, **kwargs)[source]

2D Non-local module.

Parameters:
  • in_channels (int) – Same as NonLocalND.
  • sub_sample (bool) – Whether to apply max pooling after pairwise function (Note that the sub_sample is applied on spatial only). Default: False.
  • conv_cfg (None | dict) – Same as NonLocalND. Default: dict(type=’Conv2d’).
class mmcv.cnn.NonLocal3d(in_channels, sub_sample=False, conv_cfg={'type': 'Conv3d'}, **kwargs)[source]

3D Non-local module.

Parameters:
  • in_channels (int) – Same as NonLocalND.
  • sub_sample (bool) – Whether to apply max pooling after pairwise function (Note that the sub_sample is applied on spatial only). Default: False.
  • conv_cfg (None | dict) – Same as NonLocalND. Default: dict(type=’Conv3d’).
class mmcv.cnn.ContextBlock(in_channels, ratio, pooling_type='att', fusion_types=('channel_add', ))[source]

ContextBlock module in GCNet.

See ‘GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond’ (https://arxiv.org/abs/1904.11492) for details.

Parameters:
  • in_channels (int) – Channels of the input feature map.
  • ratio (float) – Ratio of channels of transform bottleneck
  • pooling_type (str) – Pooling method for context modeling. Options are ‘att’ and ‘avg’, stand for attention pooling and average pooling respectively. Default: ‘att’.
  • fusion_types (Sequence[str]) – Fusion method for feature fusion, Options are ‘channels_add’, ‘channel_mul’, stand for channelwise addition and multiplication respectively. Default: (‘channel_add’,)
forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.cnn.HSigmoid(bias=1.0, divisor=2.0, min_value=0.0, max_value=1.0)[source]

Hard Sigmoid Module. Apply the hard sigmoid function: Hsigmoid(x) = min(max((x + bias) / divisor, min_value), max_value) Default: Hsigmoid(x) = min(max((x + 1) / 2, 0), 1)

Parameters:
  • bias (float) – Bias of the input feature map. Default: 1.0.
  • divisor (float) – Divisor of the input feature map. Default: 2.0.
  • min_value (float) – Lower bound value. Default: 0.0.
  • max_value (float) – Upper bound value. Default: 1.0.
Returns:

The output tensor.

Return type:

Tensor

forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.cnn.Swish[source]

Swish Module.

This module applies the swish function:

\[Swish(x) = x * Sigmoid(x)\]
Returns:The output tensor.
Return type:Tensor
forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.cnn.HSwish(inplace=False)[source]

Hard Swish Module.

This module applies the hard swish function:

\[Hswish(x) = x * ReLU6(x + 3) / 6\]
Parameters:inplace (bool) – can optionally do the operation in-place. Default: False.
Returns:The output tensor.
Return type:Tensor
forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.cnn.GeneralizedAttention(in_channels, spatial_range=-1, num_heads=9, position_embedding_dim=-1, position_magnitude=1, kv_stride=2, q_stride=1, attention_type='1111')[source]

GeneralizedAttention module.

See ‘An Empirical Study of Spatial Attention Mechanisms in Deep Networks’ (https://arxiv.org/abs/1711.07971) for details.

Parameters:
  • in_channels (int) – Channels of the input feature map.
  • spatial_range (int) – The spatial range. -1 indicates no spatial range constraint. Default: -1.
  • num_heads (int) – The head number of empirical_attention module. Default: 9.
  • position_embedding_dim (int) – The position embedding dimension. Default: -1.
  • position_magnitude (int) – A multiplier acting on coord difference. Default: 1.
  • kv_stride (int) – The feature stride acting on key/value feature map. Default: 2.
  • q_stride (int) – The feature stride acting on query feature map. Default: 1.
  • attention_type (str) –

    A binary indicator string for indicating which items in generalized empirical_attention module are used. Default: ‘1111’.

    • ’1000’ indicates ‘query and key content’ (appr - appr) item,
    • ’0100’ indicates ‘query content and relative position’ (appr - position) item,
    • ’0010’ indicates ‘key content only’ (bias - appr) item,
    • ’0001’ indicates ‘relative position only’ (bias - position) item.
forward(x_input)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.cnn.Scale(scale=1.0)[source]

A learnable scale parameter.

This layer scales the input by a learnable factor. It multiplies a learnable scale parameter of shape (1,) with input of any shape.

Parameters:scale (float) – Initial value of scale factor. Default: 1.0
forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

mmcv.cnn.get_model_complexity_info(model, input_shape, print_per_layer_stat=True, as_strings=True, input_constructor=None, flush=False, ost=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>)[source]

Get complexity information of a model.

This method can calculate FLOPs and parameter counts of a model with corresponding input shape. It can also print complexity information for each layer in a model.

Supported layers are listed as below:
  • Convolutions: nn.Conv1d, nn.Conv2d, nn.Conv3d.
  • Activations: nn.ReLU, nn.PReLU, nn.ELU, nn.LeakyReLU,
    nn.ReLU6.
  • Poolings: nn.MaxPool1d, nn.MaxPool2d, nn.MaxPool3d,
    nn.AvgPool1d, nn.AvgPool2d, nn.AvgPool3d, nn.AdaptiveMaxPool1d, nn.AdaptiveMaxPool2d, nn.AdaptiveMaxPool3d, nn.AdaptiveAvgPool1d, nn.AdaptiveAvgPool2d, nn.AdaptiveAvgPool3d.
  • BatchNorms: nn.BatchNorm1d, nn.BatchNorm2d,
    nn.BatchNorm3d, nn.GroupNorm, nn.InstanceNorm1d, InstanceNorm2d, InstanceNorm3d, nn.LayerNorm.
  • Linear: nn.Linear.
  • Deconvolution: nn.ConvTranspose2d.
  • Upsample: nn.Upsample.
Parameters:
  • model (nn.Module) – The model for complexity calculation.
  • input_shape (tuple) – Input shape used for calculation.
  • print_per_layer_stat (bool) – Whether to print complexity information for each layer in a model. Default: True.
  • as_strings (bool) – Output FLOPs and params counts in a string form. Default: True.
  • input_constructor (None | callable) – If specified, it takes a callable method that generates input. otherwise, it will generate a random tensor with input shape to calculate FLOPs. Default: None.
  • flush (bool) – same as that in print(). Default: False.
  • ost (stream) – same as file param in print(). Default: sys.stdout.
Returns:

If as_strings is set to True, it will return

FLOPs and parameter counts in a string format. otherwise, it will return those in a float number format.

Return type:

tuple[float | str]

class mmcv.cnn.ConvAWS2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True)[source]

AWS (Adaptive Weight Standardization)

This is a variant of Weight Standardization (https://arxiv.org/pdf/1903.10520.pdf) It is used in DetectoRS to avoid NaN (https://arxiv.org/pdf/2006.02334.pdf)

Parameters:
  • in_channels (int) – Number of channels in the input image
  • out_channels (int) – Number of channels produced by the convolution
  • kernel_size (int or tuple) – Size of the conv kernel
  • stride (int or tuple, optional) – Stride of the convolution. Default: 1
  • padding (int or tuple, optional) – Zero-padding added to both sides of the input. Default: 0
  • dilation (int or tuple, optional) – Spacing between kernel elements. Default: 1
  • groups (int, optional) – Number of blocked connections from input channels to output channels. Default: 1
  • bias (bool, optional) – If set True, adds a learnable bias to the output. Default: True
forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.cnn.ConvWS2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, eps=1e-05)[source]
forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

mmcv.cnn.fuse_conv_bn(module)[source]

Recursively fuse conv and bn in a module.

During inference, the functionary of batch norm layers is turned off but only the mean and var alone channels are used, which exposes the chance to fuse it with the preceding conv layers to save computations and simplify network structures.

Parameters:module (nn.Module) – Module to be fused.
Returns:Fused module.
Return type:nn.Module
class mmcv.cnn.DepthwiseSeparableConvModule(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, norm_cfg=None, act_cfg={'type': 'ReLU'}, dw_norm_cfg='default', dw_act_cfg='default', pw_norm_cfg='default', pw_act_cfg='default', **kwargs)[source]

Depthwise separable convolution module.

See https://arxiv.org/pdf/1704.04861.pdf for details.

This module can replace a ConvModule with the conv block replaced by two conv block: depthwise conv block and pointwise conv block. The depthwise conv block contains depthwise-conv/norm/activation layers. The pointwise conv block contains pointwise-conv/norm/activation layers. It should be noted that there will be norm/activation layer in the depthwise conv block if norm_cfg and act_cfg are specified.

Parameters:
  • in_channels (int) – Number of channels in the input feature map. Same as that in nn._ConvNd.
  • out_channels (int) – Number of channels produced by the convolution. Same as that in nn._ConvNd.
  • kernel_size (int | tuple[int]) – Size of the convolving kernel. Same as that in nn._ConvNd.
  • stride (int | tuple[int]) – Stride of the convolution. Same as that in nn._ConvNd. Default: 1.
  • padding (int | tuple[int]) – Zero-padding added to both sides of the input. Same as that in nn._ConvNd. Default: 0.
  • dilation (int | tuple[int]) – Spacing between kernel elements. Same as that in nn._ConvNd. Default: 1.
  • norm_cfg (dict) – Default norm config for both depthwise ConvModule and pointwise ConvModule. Default: None.
  • act_cfg (dict) – Default activation config for both depthwise ConvModule and pointwise ConvModule. Default: dict(type=’ReLU’).
  • dw_norm_cfg (dict) – Norm config of depthwise ConvModule. If it is ‘default’, it will be the same as norm_cfg. Default: ‘default’.
  • dw_act_cfg (dict) – Activation config of depthwise ConvModule. If it is ‘default’, it will be the same as act_cfg. Default: ‘default’.
  • pw_norm_cfg (dict) – Norm config of pointwise ConvModule. If it is ‘default’, it will be the same as norm_cfg. Default: ‘default’.
  • pw_act_cfg (dict) – Activation config of pointwise ConvModule. If it is ‘default’, it will be the same as act_cfg. Default: ‘default’.
  • kwargs (optional) – Other shared arguments for depthwise and pointwise ConvModule. See ConvModule for ref.
forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.cnn.Linear(in_features: int, out_features: int, bias: bool = True, device=None, dtype=None)[source]
forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.cnn.Conv2d(in_channels: int, out_channels: int, kernel_size: Union[int, Tuple[int, int]], stride: Union[int, Tuple[int, int]] = 1, padding: Union[str, int, Tuple[int, int]] = 0, dilation: Union[int, Tuple[int, int]] = 1, groups: int = 1, bias: bool = True, padding_mode: str = 'zeros', device=None, dtype=None)[source]
forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.cnn.ConvTranspose2d(in_channels: int, out_channels: int, kernel_size: Union[int, Tuple[int, int]], stride: Union[int, Tuple[int, int]] = 1, padding: Union[int, Tuple[int, int]] = 0, output_padding: Union[int, Tuple[int, int]] = 0, groups: int = 1, bias: bool = True, dilation: int = 1, padding_mode: str = 'zeros', device=None, dtype=None)[source]
forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.cnn.MaxPool2d(kernel_size: Union[int, Tuple[int, ...]], stride: Union[int, Tuple[int, ...], None] = None, padding: Union[int, Tuple[int, ...]] = 0, dilation: Union[int, Tuple[int, ...]] = 1, return_indices: bool = False, ceil_mode: bool = False)[source]
forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.cnn.ConvTranspose3d(in_channels: int, out_channels: int, kernel_size: Union[int, Tuple[int, int, int]], stride: Union[int, Tuple[int, int, int]] = 1, padding: Union[int, Tuple[int, int, int]] = 0, output_padding: Union[int, Tuple[int, int, int]] = 0, groups: int = 1, bias: bool = True, dilation: Union[int, Tuple[int, int, int]] = 1, padding_mode: str = 'zeros', device=None, dtype=None)[source]
forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.cnn.MaxPool3d(kernel_size: Union[int, Tuple[int, ...]], stride: Union[int, Tuple[int, ...], None] = None, padding: Union[int, Tuple[int, ...]] = 0, dilation: Union[int, Tuple[int, ...]] = 1, return_indices: bool = False, ceil_mode: bool = False)[source]
forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.cnn.Conv3d(in_channels: int, out_channels: int, kernel_size: Union[int, Tuple[int, int, int]], stride: Union[int, Tuple[int, int, int]] = 1, padding: Union[str, int, Tuple[int, int, int]] = 0, dilation: Union[int, Tuple[int, int, int]] = 1, groups: int = 1, bias: bool = True, padding_mode: str = 'zeros', device=None, dtype=None)[source]
forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

mmcv.cnn.initialize(module, init_cfg)[source]

Initialize a module.

Parameters:
  • module (torch.nn.Module) – the module will be initialized.
  • init_cfg (dict | list[dict]) – initialization configuration dict to define initializer. OpenMMLab has implemented 6 initializers including Constant, Xavier, Normal, Uniform, Kaiming, and Pretrained.

Example

>>> module = nn.Linear(2, 3, bias=True)
>>> init_cfg = dict(type='Constant', layer='Linear', val =1 , bias =2)
>>> initialize(module, init_cfg)
>>> module = nn.Sequential(nn.Conv1d(3, 1, 3), nn.Linear(1,2))
>>> # define key ``'layer'`` for initializing layer with different
>>> # configuration
>>> init_cfg = [dict(type='Constant', layer='Conv1d', val=1),
        dict(type='Constant', layer='Linear', val=2)]
>>> initialize(module, init_cfg)
>>> # define key``'override'`` to initialize some specific part in
>>> # module
>>> class FooNet(nn.Module):
>>>     def __init__(self):
>>>         super().__init__()
>>>         self.feat = nn.Conv2d(3, 16, 3)
>>>         self.reg = nn.Conv2d(16, 10, 3)
>>>         self.cls = nn.Conv2d(16, 5, 3)
>>> model = FooNet()
>>> init_cfg = dict(type='Constant', val=1, bias=2, layer='Conv2d',
>>>     override=dict(type='Constant', name='reg', val=3, bias=4))
>>> initialize(model, init_cfg)
>>> model = ResNet(depth=50)
>>> # Initialize weights with the pretrained model.
>>> init_cfg = dict(type='Pretrained',
        checkpoint='torchvision://resnet50')
>>> initialize(model, init_cfg)
>>> # Initialize weights of a sub-module with the specific part of
>>> # a pretrained model by using "prefix".
>>> url = 'http://download.openmmlab.com/mmdetection/v2.0/retinanet/'        >>>     'retinanet_r50_fpn_1x_coco/'        >>>     'retinanet_r50_fpn_1x_coco_20200130-c2398f9e.pth'
>>> init_cfg = dict(type='Pretrained',
        checkpoint=url, prefix='backbone.')
class mmcv.cnn.ConstantInit(val, **kwargs)[source]

Initialize module parameters with constant values.

Parameters:
  • val (int | float) – the value to fill the weights in the module with
  • bias (int | float) – the value to fill the bias. Defaults to 0.
  • bias_prob (float, optional) – the probability for bias initialization. Defaults to None.
  • layer (str | list[str], optional) – the layer will be initialized. Defaults to None.
class mmcv.cnn.XavierInit(gain=1, distribution='normal', **kwargs)[source]

Initialize module parameters with values according to the method described in Understanding the difficulty of training deep feedforward neural networks - Glorot, X. & Bengio, Y. (2010).

Parameters:
  • gain (int | float) – an optional scaling factor. Defaults to 1.
  • bias (int | float) – the value to fill the bias. Defaults to 0.
  • bias_prob (float, optional) – the probability for bias initialization. Defaults to None.
  • distribution (str) – distribution either be 'normal' or 'uniform'. Defaults to 'normal'.
  • layer (str | list[str], optional) – the layer will be initialized. Defaults to None.
class mmcv.cnn.NormalInit(mean=0, std=1, **kwargs)[source]

Initialize module parameters with the values drawn from the normal distribution \(\mathcal{N}(\text{mean}, \text{std}^2)\).

Parameters:
  • mean (int | float) – the mean of the normal distribution. Defaults to 0.
  • std (int | float) – the standard deviation of the normal distribution. Defaults to 1.
  • bias (int | float) – the value to fill the bias. Defaults to 0.
  • bias_prob (float, optional) – the probability for bias initialization. Defaults to None.
  • layer (str | list[str], optional) – the layer will be initialized. Defaults to None.
class mmcv.cnn.TruncNormalInit(mean: float = 0, std: float = 1, a: float = -2, b: float = 2, **kwargs)[source]

Initialize module parameters with the values drawn from the normal distribution \(\mathcal{N}(\text{mean}, \text{std}^2)\) with values outside \([a, b]\).

Parameters:
  • mean (float) – the mean of the normal distribution. Defaults to 0.
  • std (float) – the standard deviation of the normal distribution. Defaults to 1.
  • a (float) – The minimum cutoff value.
  • b (float) – The maximum cutoff value.
  • bias (float) – the value to fill the bias. Defaults to 0.
  • bias_prob (float, optional) – the probability for bias initialization. Defaults to None.
  • layer (str | list[str], optional) – the layer will be initialized. Defaults to None.
class mmcv.cnn.UniformInit(a=0, b=1, **kwargs)[source]

Initialize module parameters with values drawn from the uniform distribution \(\mathcal{U}(a, b)\).

Parameters:
  • a (int | float) – the lower bound of the uniform distribution. Defaults to 0.
  • b (int | float) – the upper bound of the uniform distribution. Defaults to 1.
  • bias (int | float) – the value to fill the bias. Defaults to 0.
  • bias_prob (float, optional) – the probability for bias initialization. Defaults to None.
  • layer (str | list[str], optional) – the layer will be initialized. Defaults to None.
class mmcv.cnn.KaimingInit(a=0, mode='fan_out', nonlinearity='relu', distribution='normal', **kwargs)[source]

Initialize module paramters with the valuse according to the method described in Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification - He, K. et al. (2015).

Parameters:
  • a (int | float) – the negative slope of the rectifier used after this layer (only used with 'leaky_relu'). Defaults to 0.
  • mode (str) – either 'fan_in' or 'fan_out'. Choosing 'fan_in' preserves the magnitude of the variance of the weights in the forward pass. Choosing 'fan_out' preserves the magnitudes in the backwards pass. Defaults to 'fan_out'.
  • nonlinearity (str) – the non-linear function (nn.functional name), recommended to use only with 'relu' or 'leaky_relu' . Defaults to ‘relu’.
  • bias (int | float) – the value to fill the bias. Defaults to 0.
  • bias_prob (float, optional) – the probability for bias initialization. Defaults to None.
  • distribution (str) – distribution either be 'normal' or 'uniform'. Defaults to 'normal'.
  • layer (str | list[str], optional) – the layer will be initialized. Defaults to None.
class mmcv.cnn.PretrainedInit(checkpoint, prefix=None, map_location=None)[source]

Initialize module by loading a pretrained model.

Parameters:
  • checkpoint (str) – the checkpoint file of the pretrained model should be load.
  • prefix (str, optional) – the prefix of a sub-module in the pretrained model. it is for loading a part of the pretrained model to initialize. For example, if we would like to only load the backbone of a detector model, we can set prefix='backbone.'. Defaults to None.
  • map_location (str) – map tensors into proper locations.
class mmcv.cnn.Caffe2XavierInit(**kwargs)[source]
mmcv.cnn.build_model_from_cfg(cfg, registry, default_args=None)[source]

Build a PyTorch model from config dict(s). Different from build_from_cfg, if cfg is a list, a nn.Sequential will be built.

Parameters:
  • cfg (dict, list[dict]) – The config of modules, is is either a config dict or a list of config dicts. If cfg is a list, a the built modules will be wrapped with nn.Sequential.
  • registry (Registry) – A registry the module belongs to.
  • default_args (dict, optional) – Default arguments to build the module. Defaults to None.
Returns:

A built nn module.

Return type:

nn.Module

runner

class mmcv.runner.BaseRunner(model, batch_processor=None, optimizer=None, work_dir=None, logger=None, meta=None, max_iters=None, max_epochs=None)[source]

The base class of Runner, a training helper for PyTorch.

All subclasses should implement the following APIs:

  • run()
  • train()
  • val()
  • save_checkpoint()
Parameters:
  • model (torch.nn.Module) – The model to be run.
  • batch_processor (callable) – A callable method that process a data batch. The interface of this method should be batch_processor(model, data, train_mode) -> dict
  • optimizer (dict or torch.optim.Optimizer) – It can be either an optimizer (in most cases) or a dict of optimizers (in models that requires more than one optimizer, e.g., GAN).
  • work_dir (str, optional) – The working directory to save checkpoints and logs. Defaults to None.
  • logger (logging.Logger) – Logger used during training. Defaults to None. (The default value is just for backward compatibility)
  • meta (dict | None) – A dict records some import information such as environment info and seed, which will be logged in logger hook. Defaults to None.
  • max_epochs (int, optional) – Total training epochs.
  • max_iters (int, optional) – Total training iterations.
call_hook(fn_name)[source]

Call all hooks.

Parameters:fn_name (str) – The function name in each hook to be called, such as “before_train_epoch”.
current_lr()[source]

Get current learning rates.

Returns:
Current learning rates of all
param groups. If the runner has a dict of optimizers, this method will return a dict.
Return type:list[float] | dict[str, list[float]]
current_momentum()[source]

Get current momentums.

Returns:
Current momentums of all
param groups. If the runner has a dict of optimizers, this method will return a dict.
Return type:list[float] | dict[str, list[float]]
epoch

Current epoch.

Type:int
hooks

A list of registered hooks.

Type:list[Hook]
inner_iter

Iteration in an epoch.

Type:int
iter

Current iteration.

Type:int
max_epochs

Maximum training epochs.

Type:int
max_iters

Maximum training iterations.

Type:int
model_name

Name of the model, usually the module class name.

Type:str
rank

Rank of current process. (distributed training)

Type:int
register_hook(hook, priority='NORMAL')[source]

Register a hook into the hook list.

The hook will be inserted into a priority queue, with the specified priority (See Priority for details of priorities). For hooks with the same priority, they will be triggered in the same order as they are registered.

Parameters:
  • hook (Hook) – The hook to be registered.
  • priority (int or str or Priority) – Hook priority. Lower value means higher priority.
register_hook_from_cfg(hook_cfg)[source]

Register a hook from its cfg.

Parameters:hook_cfg (dict) – Hook config. It should have at least keys ‘type’ and ‘priority’ indicating its type and priority.

Notes

The specific hook class to register should not use ‘type’ and ‘priority’ arguments during initialization.

register_training_hooks(lr_config, optimizer_config=None, checkpoint_config=None, log_config=None, momentum_config=None, timer_config={'type': 'IterTimerHook'}, custom_hooks_config=None)[source]

Register default and custom hooks for training.

Default and custom hooks include:

Hooks Priority
LrUpdaterHook VERY_HIGH (10)
MomentumUpdaterHook HIGH (30)
OptimizerStepperHook ABOVE_NORMAL (40)
CheckpointSaverHook NORMAL (50)
IterTimerHook LOW (70)
LoggerHook(s) VERY_LOW (90)
CustomHook(s) defaults to NORMAL (50)

If custom hooks have same priority with default hooks, custom hooks will be triggered after default hooks.

world_size

Number of processes participating in the job. (distributed training)

Type:int
class mmcv.runner.Runner(*args, **kwargs)[source]

Deprecated name of EpochBasedRunner.

class mmcv.runner.EpochBasedRunner(model, batch_processor=None, optimizer=None, work_dir=None, logger=None, meta=None, max_iters=None, max_epochs=None)[source]

Epoch-based Runner.

This runner train models epoch by epoch.

run(data_loaders, workflow, max_epochs=None, **kwargs)[source]

Start running.

Parameters:
  • data_loaders (list[DataLoader]) – Dataloaders for training and validation.
  • workflow (list[tuple]) – A list of (phase, epochs) to specify the running order and epochs. E.g, [(‘train’, 2), (‘val’, 1)] means running 2 epochs for training and 1 epoch for validation, iteratively.
save_checkpoint(out_dir, filename_tmpl='epoch_{}.pth', save_optimizer=True, meta=None, create_symlink=True)[source]

Save the checkpoint.

Parameters:
  • out_dir (str) – The directory that checkpoints are saved.
  • filename_tmpl (str, optional) – The checkpoint filename template, which contains a placeholder for the epoch number. Defaults to ‘epoch_{}.pth’.
  • save_optimizer (bool, optional) – Whether to save the optimizer to the checkpoint. Defaults to True.
  • meta (dict, optional) – The meta information to be saved in the checkpoint. Defaults to None.
  • create_symlink (bool, optional) – Whether to create a symlink “latest.pth” to point to the latest checkpoint. Defaults to True.
class mmcv.runner.IterBasedRunner(model, batch_processor=None, optimizer=None, work_dir=None, logger=None, meta=None, max_iters=None, max_epochs=None)[source]

Iteration-based Runner.

This runner train models iteration by iteration.

register_training_hooks(lr_config, optimizer_config=None, checkpoint_config=None, log_config=None, momentum_config=None, custom_hooks_config=None)[source]

Register default hooks for iter-based training.

Checkpoint hook, optimizer stepper hook and logger hooks will be set to by_epoch=False by default.

Default hooks include:

Hooks Priority
LrUpdaterHook VERY_HIGH (10)
MomentumUpdaterHook HIGH (30)
OptimizerStepperHook ABOVE_NORMAL (40)
CheckpointSaverHook NORMAL (50)
IterTimerHook LOW (70)
LoggerHook(s) VERY_LOW (90)
CustomHook(s) defaults to NORMAL (50)

If custom hooks have same priority with default hooks, custom hooks will be triggered after default hooks.

resume(checkpoint, resume_optimizer=True, map_location='default')[source]

Resume model from checkpoint.

Parameters:
  • checkpoint (str) – Checkpoint to resume from.
  • resume_optimizer (bool, optional) – Whether resume the optimizer(s) if the checkpoint file includes optimizer(s). Default to True.
  • map_location (str, optional) – Same as torch.load(). Default to ‘default’.
run(data_loaders, workflow, max_iters=None, **kwargs)[source]

Start running.

Parameters:
  • data_loaders (list[DataLoader]) – Dataloaders for training and validation.
  • workflow (list[tuple]) – A list of (phase, iters) to specify the running order and iterations. E.g, [(‘train’, 10000), (‘val’, 1000)] means running 10000 iterations for training and 1000 iterations for validation, iteratively.
save_checkpoint(out_dir, filename_tmpl='iter_{}.pth', meta=None, save_optimizer=True, create_symlink=True)[source]

Save checkpoint to file.

Parameters:
  • out_dir (str) – Directory to save checkpoint files.
  • filename_tmpl (str, optional) – Checkpoint file template. Defaults to ‘iter_{}.pth’.
  • meta (dict, optional) – Metadata to be saved in checkpoint. Defaults to None.
  • save_optimizer (bool, optional) – Whether save optimizer. Defaults to True.
  • create_symlink (bool, optional) – Whether create symlink to the latest checkpoint file. Defaults to True.
class mmcv.runner.CheckpointHook(interval=-1, by_epoch=True, save_optimizer=True, out_dir=None, max_keep_ckpts=-1, save_last=True, sync_buffer=False, **kwargs)[source]

Save checkpoints periodically.

Parameters:
  • interval (int) – The saving period. If by_epoch=True, interval indicates epochs, otherwise it indicates iterations. Default: -1, which means “never”.
  • by_epoch (bool) – Saving checkpoints by epoch or by iteration. Default: True.
  • save_optimizer (bool) – Whether to save optimizer state_dict in the checkpoint. It is usually used for resuming experiments. Default: True.
  • out_dir (str, optional) – The directory to save checkpoints. If not specified, runner.work_dir will be used by default.
  • max_keep_ckpts (int, optional) – The maximum checkpoints to keep. In some cases we want only the latest few checkpoints and would like to delete old ones to save the disk space. Default: -1, which means unlimited.
  • save_last (bool) – Whether to force the last checkpoint to be saved regardless of interval.
  • sync_buffer (bool) – Whether to synchronize buffers in different gpus. Default: False.
class mmcv.runner.LrUpdaterHook(by_epoch=True, warmup=None, warmup_iters=0, warmup_ratio=0.1, warmup_by_epoch=False)[source]

LR Scheduler in MMCV.

Parameters:
  • by_epoch (bool) – LR changes epoch by epoch
  • warmup (string) – Type of warmup used. It can be None(use no warmup), ‘constant’, ‘linear’ or ‘exp’
  • warmup_iters (int) – The number of iterations or epochs that warmup lasts
  • warmup_ratio (float) – LR used at the beginning of warmup equals to warmup_ratio * initial_lr
  • warmup_by_epoch (bool) – When warmup_by_epoch == True, warmup_iters means the number of epochs that warmup lasts, otherwise means the number of iteration that warmup lasts
class mmcv.runner.DistSamplerSeedHook[source]

Data-loading sampler for distributed training.

When distributed training, it is only useful in conjunction with EpochBasedRunner, while IterBasedRunner achieves the same purpose with IterLoader.

class mmcv.runner.LoggerHook(interval=10, ignore_last=True, reset_flag=False, by_epoch=True)[source]

Base class for logger hooks.

Parameters:
  • interval (int) – Logging interval (every k iterations).
  • ignore_last (bool) – Ignore the log of last iterations in each epoch if less than interval.
  • reset_flag (bool) – Whether to clear the output buffer after logging.
  • by_epoch (bool) – Whether EpochBasedRunner is used.
get_iter(runner, inner_iter=False)[source]

Get the current training iteration step.

static is_scalar(val, include_np=True, include_torch=True)[source]

Tell the input variable is a scalar or not.

Parameters:
  • val – Input variable.
  • include_np (bool) – Whether include 0-d np.ndarray as a scalar.
  • include_torch (bool) – Whether include 0-d torch.Tensor as a scalar.
Returns:

True or False.

Return type:

bool

class mmcv.runner.PaviLoggerHook(init_kwargs=None, add_graph=False, add_last_ckpt=False, interval=10, ignore_last=True, reset_flag=False, by_epoch=True, img_key='img_info')[source]
get_step(runner)[source]

Get the total training step/epoch.

class mmcv.runner.TextLoggerHook(by_epoch=True, interval=10, ignore_last=True, reset_flag=False, interval_exp_name=1000)[source]

Logger hook in text.

In this logger hook, the information will be printed on terminal and saved in json file.

Parameters:
  • by_epoch (bool) – Whether EpochBasedRunner is used.
  • interval (int) – Logging interval (every k iterations).
  • ignore_last (bool) – Ignore the log of last iterations in each epoch if less than interval.
  • reset_flag (bool) – Whether to clear the output buffer after logging.
  • interval_exp_name (int) – Logging interval for experiment name. This feature is to help users conveniently get the experiment information from screen or log file. Default: 1000.
class mmcv.runner.TensorboardLoggerHook(log_dir=None, interval=10, ignore_last=True, reset_flag=False, by_epoch=True)[source]
class mmcv.runner.NeptuneLoggerHook(init_kwargs=None, interval=10, ignore_last=True, reset_flag=True, with_step=True, by_epoch=True)[source]

Class to log metrics to NeptuneAI.

It requires neptune-client to be installed.

Parameters:
  • init_kwargs (dict) –

    a dict contains the initialization keys as below: - project (str): Name of a project in a form of

    namespace/project_name. If None, the value of NEPTUNE_PROJECT environment variable will be taken.
    • api_token (str): User’s API token.
      If None, the value of NEPTUNE_API_TOKEN environment variable will be taken. Note: It is strongly recommended to use NEPTUNE_API_TOKEN environment variable rather than placing your API token in plain text in your source code.
    • name (str, optional, default is ‘Untitled’): Editable name of
      the run. Name is displayed in the run’s Details and in Runs table as a column.
    Check https://docs.neptune.ai/api-reference/neptune#init for
    more init arguments.
  • interval (int) – Logging interval (every k iterations).
  • ignore_last (bool) – Ignore the log of last iterations in each epoch if less than interval.
  • reset_flag (bool) – Whether to clear the output buffer after logging
  • by_epoch (bool) – Whether EpochBasedRunner is used.
class mmcv.runner.WandbLoggerHook(init_kwargs=None, interval=10, ignore_last=True, reset_flag=False, commit=True, by_epoch=True, with_step=True)[source]
class mmcv.runner.MlflowLoggerHook(exp_name=None, tags=None, log_model=True, interval=10, ignore_last=True, reset_flag=False, by_epoch=True)[source]
class mmcv.runner.DvcliveLoggerHook(path, interval=10, ignore_last=True, reset_flag=True, by_epoch=True)[source]

Class to log metrics with dvclive.

It requires dvclive to be installed.

Parameters:
  • path (str) – Directory where dvclive will write TSV log files.
  • interval (int) – Logging interval (every k iterations). Default 10.
  • ignore_last (bool) – Ignore the log of last iterations in each epoch if less than interval. Default: True.
  • reset_flag (bool) – Whether to clear the output buffer after logging. Default: True.
  • by_epoch (bool) – Whether EpochBasedRunner is used. Default: True.
mmcv.runner.load_state_dict(module, state_dict, strict=False, logger=None)[source]

Load state_dict to a module.

This method is modified from torch.nn.Module.load_state_dict(). Default value for strict is set to False and the message for param mismatch will be shown even if strict is False.

Parameters:
  • module (Module) – Module that receives the state_dict.
  • state_dict (OrderedDict) – Weights.
  • strict (bool) – whether to strictly enforce that the keys in state_dict match the keys returned by this module’s state_dict() function. Default: False.
  • logger (logging.Logger, optional) – Logger to log the error message. If not specified, print function will be used.
mmcv.runner.load_checkpoint(model, filename, map_location=None, strict=False, logger=None, revise_keys=[('^module\\.', '')])[source]

Load checkpoint from a file or URI.

Parameters:
  • model (Module) – Module to load checkpoint.
  • filename (str) – Accept local filepath, URL, torchvision://xxx, open-mmlab://xxx. Please refer to docs/model_zoo.md for details.
  • map_location (str) – Same as torch.load().
  • strict (bool) – Whether to allow different params for the model and checkpoint.
  • logger (logging.Logger or None) – The logger for error message.
  • revise_keys (list) – A list of customized keywords to modify the state_dict in checkpoint. Each item is a (pattern, replacement) pair of the regular expression operations. Default: strip the prefix ‘module.’ by [(r’^module.’, ‘’)].
Returns:

The loaded checkpoint.

Return type:

dict or OrderedDict

mmcv.runner.weights_to_cpu(state_dict)[source]

Copy a model state_dict to cpu.

Parameters:state_dict (OrderedDict) – Model weights on GPU.
Returns:Model weights on GPU.
Return type:OrderedDict
mmcv.runner.save_checkpoint(model, filename, optimizer=None, meta=None)[source]

Save checkpoint to file.

The checkpoint will have 3 fields: meta, state_dict and optimizer. By default meta will contain version and time info.

Parameters:
  • model (Module) – Module whose params are to be saved.
  • filename (str) – Checkpoint filename.
  • optimizer (Optimizer, optional) – Optimizer to be saved.
  • meta (dict, optional) – Metadata to be saved in checkpoint.
class mmcv.runner.Priority[source]

Hook priority levels.

Level Value
HIGHEST 0
VERY_HIGH 10
HIGH 30
ABOVE_NORMAL 40
NORMAL 50
BELOW_NORMAL 60
LOW 70
VERY_LOW 90
LOWEST 100
mmcv.runner.get_priority(priority)[source]

Get priority value.

Parameters:priority (int or str or Priority) – Priority.
Returns:The priority value.
Return type:int
mmcv.runner.obj_from_dict(info, parent=None, default_args=None)[source]

Initialize an object from dict.

The dict must contain the key “type”, which indicates the object type, it can be either a string or type, such as “list” or list. Remaining fields are treated as the arguments for constructing the object.

Parameters:
  • info (dict) – Object types and arguments.
  • parent (module) – Module which may containing expected object classes.
  • default_args (dict, optional) – Default arguments for initializing the object.
Returns:

Object built from the dict.

Return type:

any type

class mmcv.runner.DefaultOptimizerConstructor(optimizer_cfg, paramwise_cfg=None)[source]

Default constructor for optimizers.

By default each parameter share the same optimizer settings, and we provide an argument paramwise_cfg to specify parameter-wise settings. It is a dict and may contain the following fields:

  • custom_keys (dict): Specified parameters-wise settings by keys. If one of the keys in custom_keys is a substring of the name of one parameter, then the setting of the parameter will be specified by custom_keys[key] and other setting like bias_lr_mult etc. will be ignored. It should be noted that the aforementioned key is the longest key that is a substring of the name of the parameter. If there are multiple matched keys with the same length, then the key with lower alphabet order will be chosen. custom_keys[key] should be a dict and may contain fields lr_mult and decay_mult. See Example 2 below.
  • bias_lr_mult (float): It will be multiplied to the learning rate for all bias parameters (except for those in normalization layers and offset layers of DCN).
  • bias_decay_mult (float): It will be multiplied to the weight decay for all bias parameters (except for those in normalization layers, depthwise conv layers, offset layers of DCN).
  • norm_decay_mult (float): It will be multiplied to the weight decay for all weight and bias parameters of normalization layers.
  • dwconv_decay_mult (float): It will be multiplied to the weight decay for all weight and bias parameters of depthwise conv layers.
  • dcn_offset_lr_mult (float): It will be multiplied to the learning rate for parameters of offset layer in the deformable convs of a model.
  • bypass_duplicate (bool): If true, the duplicate parameters would not be added into optimizer. Default: False.

Note

  1. If the option dcn_offset_lr_mult is used, the constructor will
    override the effect of bias_lr_mult in the bias of offset layer. So be careful when using both bias_lr_mult and dcn_offset_lr_mult. If you wish to apply both of them to the offset layer in deformable convs, set dcn_offset_lr_mult to the original dcn_offset_lr_mult * bias_lr_mult.
  2. If the option dcn_offset_lr_mult is used, the constructor will
    apply it to all the DCN layers in the model. So be carefull when the model contains multiple DCN layers in places other than backbone.
Parameters:
  • model (nn.Module) – The model with parameters to be optimized.
  • optimizer_cfg (dict) –

    The config dict of the optimizer. Positional fields are

    • type: class name of the optimizer.

    Optional fields are

    • any arguments of the corresponding optimizer type, e.g., lr, weight_decay, momentum, etc.
  • paramwise_cfg (dict, optional) – Parameter-wise options.
Example 1:
>>> model = torch.nn.modules.Conv1d(1, 1, 1)
>>> optimizer_cfg = dict(type='SGD', lr=0.01, momentum=0.9,
>>>                      weight_decay=0.0001)
>>> paramwise_cfg = dict(norm_decay_mult=0.)
>>> optim_builder = DefaultOptimizerConstructor(
>>>     optimizer_cfg, paramwise_cfg)
>>> optimizer = optim_builder(model)
Example 2:
>>> # assume model have attribute model.backbone and model.cls_head
>>> optimizer_cfg = dict(type='SGD', lr=0.01, weight_decay=0.95)
>>> paramwise_cfg = dict(custom_keys={
        '.backbone': dict(lr_mult=0.1, decay_mult=0.9)})
>>> optim_builder = DefaultOptimizerConstructor(
>>>     optimizer_cfg, paramwise_cfg)
>>> optimizer = optim_builder(model)
>>> # Then the `lr` and `weight_decay` for model.backbone is
>>> # (0.01 * 0.1, 0.95 * 0.9). `lr` and `weight_decay` for
>>> # model.cls_head is (0.01, 0.95).
add_params(params, module, prefix='', is_dcn_module=None)[source]

Add all parameters of module to the params list.

The parameters of the given module will be added to the list of param groups, with specific rules defined by paramwise_cfg.

Parameters:
  • params (list[dict]) – A list of param groups, it will be modified in place.
  • module (nn.Module) – The module to be added.
  • prefix (str) – The prefix of the module
  • is_dcn_module (int|float|None) – If the current module is a submodule of DCN, is_dcn_module will be passed to control conv_offset layer’s learning rate. Defaults to None.
mmcv.runner.set_random_seed(seed, deterministic=False, use_rank_shift=False)[source]

Set random seed.

Parameters:
  • seed (int) – Seed to be used.
  • deterministic (bool) – Whether to set the deterministic option for CUDNN backend, i.e., set torch.backends.cudnn.deterministic to True and torch.backends.cudnn.benchmark to False. Default: False.
  • rank_shift (bool) – Whether to add rank number to the random seed to have different random seed in different threads. Default: False.
mmcv.runner.auto_fp16(apply_to=None, out_fp32=False)[source]

Decorator to enable fp16 training automatically.

This decorator is useful when you write custom modules and want to support mixed precision training. If inputs arguments are fp32 tensors, they will be converted to fp16 automatically. Arguments other than fp32 tensors are ignored. If you are using PyTorch >= 1.6, torch.cuda.amp is used as the backend, otherwise, original mmcv implementation will be adopted.

Parameters:
  • apply_to (Iterable, optional) – The argument names to be converted. None indicates all arguments.
  • out_fp32 (bool) – Whether to convert the output back to fp32.

Example

>>> import torch.nn as nn
>>> class MyModule1(nn.Module):
>>>
>>>     # Convert x and y to fp16
>>>     @auto_fp16()
>>>     def forward(self, x, y):
>>>         pass
>>> import torch.nn as nn
>>> class MyModule2(nn.Module):
>>>
>>>     # convert pred to fp16
>>>     @auto_fp16(apply_to=('pred', ))
>>>     def do_something(self, pred, others):
>>>         pass
mmcv.runner.force_fp32(apply_to=None, out_fp16=False)[source]

Decorator to convert input arguments to fp32 in force.

This decorator is useful when you write custom modules and want to support mixed precision training. If there are some inputs that must be processed in fp32 mode, then this decorator can handle it. If inputs arguments are fp16 tensors, they will be converted to fp32 automatically. Arguments other than fp16 tensors are ignored. If you are using PyTorch >= 1.6, torch.cuda.amp is used as the backend, otherwise, original mmcv implementation will be adopted.

Parameters:
  • apply_to (Iterable, optional) – The argument names to be converted. None indicates all arguments.
  • out_fp16 (bool) – Whether to convert the output back to fp16.

Example

>>> import torch.nn as nn
>>> class MyModule1(nn.Module):
>>>
>>>     # Convert x and y to fp32
>>>     @force_fp32()
>>>     def loss(self, x, y):
>>>         pass
>>> import torch.nn as nn
>>> class MyModule2(nn.Module):
>>>
>>>     # convert pred to fp32
>>>     @force_fp32(apply_to=('pred', ))
>>>     def post_process(self, pred, others):
>>>         pass
mmcv.runner.wrap_fp16_model(model)[source]

Wrap the FP32 model to FP16.

If you are using PyTorch >= 1.6, torch.cuda.amp is used as the backend, otherwise, original mmcv implementation will be adopted.

For PyTorch >= 1.6, this function will 1. Set fp16 flag inside the model to True.

Otherwise: 1. Convert FP32 model to FP16. 2. Remain some necessary layers to be FP32, e.g., normalization layers. 3. Set fp16_enabled flag inside the model to True.

Parameters:model (nn.Module) – Model in FP32.
class mmcv.runner.Fp16OptimizerHook(grad_clip=None, coalesce=True, bucket_size_mb=-1, loss_scale=512.0, distributed=True)[source]

FP16 optimizer hook (using PyTorch’s implementation).

If you are using PyTorch >= 1.6, torch.cuda.amp is used as the backend, to take care of the optimization procedure.

Parameters:loss_scale (float | str | dict) – Scale factor configuration. If loss_scale is a float, static loss scaling will be used with the specified scale. If loss_scale is a string, it must be ‘dynamic’, then dynamic loss scaling will be used. It can also be a dict containing arguments of GradScalar. Defaults to 512. For Pytorch >= 1.6, mmcv uses official implementation of GradScaler. If you use a dict version of loss_scale to create GradScaler, please refer to: https://pytorch.org/docs/stable/amp.html#torch.cuda.amp.GradScaler for the parameters.

Examples

>>> loss_scale = dict(
...     init_scale=65536.0,
...     growth_factor=2.0,
...     backoff_factor=0.5,
...     growth_interval=2000
... )
>>> optimizer_hook = Fp16OptimizerHook(loss_scale=loss_scale)
after_train_iter(runner)[source]

Backward optimization steps for Mixed Precision Training. For dynamic loss scaling, please refer to https://pytorch.org/docs/stable/amp.html#torch.cuda.amp.GradScaler.

  1. Scale the loss by a scale factor.
  2. Backward the loss to obtain the gradients.
  3. Unscale the optimizer’s gradient tensors.
  4. Call optimizer.step() and update scale factor.
  5. Save loss_scaler state_dict for resume purpose.
before_run(runner)[source]

Preparing steps before Mixed Precision Training.

copy_grads_to_fp32(fp16_net, fp32_weights)[source]

Copy gradients from fp16 model to fp32 weight copy.

copy_params_to_fp16(fp16_net, fp32_weights)[source]

Copy updated params from fp32 weight copy to fp16 model.

class mmcv.runner.SyncBuffersHook(distributed=True)[source]

Synchronize model buffers such as running_mean and running_var in BN at the end of each epoch.

Parameters:distributed (bool) – Whether distributed training is used. It is effective only for distributed training. Defaults to True.
after_epoch(runner)[source]

All-reduce model buffers at the end of each epoch.

class mmcv.runner.EMAHook(momentum=0.0002, interval=1, warm_up=100, resume_from=None)[source]

Exponential Moving Average Hook.

Use Exponential Moving Average on all parameters of model in training process. All parameters have a ema backup, which update by the formula as below. EMAHook takes priority over EvalHook and CheckpointSaverHook.

\[\text{Xema_{t+1}} = (1 - \text{momentum}) \times \text{Xema_{t}} + \text{momentum} \times X_t\]
Parameters:
  • momentum (float) – The momentum used for updating ema parameter. Defaults to 0.0002.
  • interval (int) – Update ema parameter every interval iteration. Defaults to 1.
  • warm_up (int) – During first warm_up steps, we may use smaller momentum to update ema parameters more slowly. Defaults to 100.
  • resume_from (str) – The checkpoint path. Defaults to None.
after_train_epoch(runner)[source]

We load parameter values from ema backup to model before the EvalHook.

after_train_iter(runner)[source]

Update ema parameter every self.interval iterations.

before_run(runner)[source]

To resume model with it’s ema parameters more friendly.

Register ema parameter as named_buffer to model

before_train_epoch(runner)[source]

We recover model’s parameter from ema backup after last epoch’s EvalHook.

mmcv.runner.allreduce_grads(params, coalesce=True, bucket_size_mb=-1)[source]

Allreduce gradients.

Parameters:
  • params (list[torch.Parameters]) – List of parameters of a model
  • coalesce (bool, optional) – Whether allreduce parameters as a whole. Defaults to True.
  • bucket_size_mb (int, optional) – Size of bucket, the unit is MB. Defaults to -1.
mmcv.runner.allreduce_params(params, coalesce=True, bucket_size_mb=-1)[source]

Allreduce parameters.

Parameters:
  • params (list[torch.Parameters]) – List of parameters or buffers of a model.
  • coalesce (bool, optional) – Whether allreduce parameters as a whole. Defaults to True.
  • bucket_size_mb (int, optional) – Size of bucket, the unit is MB. Defaults to -1.
class mmcv.runner.LossScaler(init_scale=4294967296, mode='dynamic', scale_factor=2.0, scale_window=1000)[source]

Class that manages loss scaling in mixed precision training which supports both dynamic or static mode.

The implementation refers to https://github.com/NVIDIA/apex/blob/master/apex/fp16_utils/loss_scaler.py. Indirectly, by supplying mode='dynamic' for dynamic loss scaling. It’s important to understand how LossScaler operates. Loss scaling is designed to combat the problem of underflowing gradients encountered at long times when training fp16 networks. Dynamic loss scaling begins by attempting a very high loss scale. Ironically, this may result in OVERflowing gradients. If overflowing gradients are encountered, FP16_Optimizer then skips the update step for this particular iteration/minibatch, and LossScaler adjusts the loss scale to a lower value. If a certain number of iterations occur without overflowing gradients detected,:class:LossScaler increases the loss scale once more. In this way LossScaler attempts to “ride the edge” of always using the highest loss scale possible without incurring overflow.

Parameters:
  • init_scale (float) – Initial loss scale value, default: 2**32.
  • scale_factor (float) – Factor used when adjusting the loss scale. Default: 2.
  • mode (str) – Loss scaling mode. ‘dynamic’ or ‘static’
  • scale_window (int) – Number of consecutive iterations without an overflow to wait before increasing the loss scale. Default: 1000.
has_overflow(params)[source]

Check if params contain overflow.

load_state_dict(state_dict)[source]

Loads the loss_scaler state dict.

Parameters:state_dict (dict) – scaler state.
state_dict()[source]

Returns the state of the scaler as a dict.

update_scale(overflow)[source]

update the current loss scale value when overflow happens.

class mmcv.runner.CheckpointLoader[source]

A general checkpoint loader to manage all schemes.

classmethod load_checkpoint(filename, map_location=None, logger=None)[source]

load checkpoint through URL scheme path.

Parameters:
  • filename (str) – checkpoint file name with given prefix
  • map_location (str, optional) – Same as torch.load(). Default: None
  • logger (logging.Logger, optional) – The logger for message. Default: None
Returns:

The loaded checkpoint.

Return type:

dict or OrderedDict

classmethod register_scheme(prefixes, loader=None, force=False)[source]

Register a loader to CheckpointLoader.

This method can be used as a normal class method or a decorator.

Parameters:
  • prefixes (str or list[str] or tuple[str]) –
  • prefix of the registered loader. (The) –
  • loader (function, optional) – The loader function to be registered. When this method is used as a decorator, loader is None. Defaults to None.
  • force (bool, optional) – Whether to override the loader if the prefix has already been registered. Defaults to False.
class mmcv.runner.BaseModule(init_cfg=None)[source]

Base module for all modules in openmmlab.

BaseModule is a wrapper of torch.nn.Module with additional functionality of parameter initialization. Compared with torch.nn.Module, BaseModule mainly adds three attributes.

  • init_cfg: the config to control the initialization.
  • init_weights: The function of parameter
    initialization and recording initialization information.
  • _params_init_info: Used to track the parameter
    initialization information. This attribute only exists during executing the init_weights.
Parameters:init_cfg (dict, optional) – Initialization config dict.
init_weights()[source]

Initialize the weights.

class mmcv.runner.EvalHook(dataloader, start=None, interval=1, by_epoch=True, save_best=None, rule=None, test_fn=None, greater_keys=None, less_keys=None, **eval_kwargs)[source]

Non-Distributed evaluation hook.

This hook will regularly perform evaluation in a given interval when performing in non-distributed environment.

Parameters:
  • dataloader (DataLoader) – A PyTorch dataloader, whose dataset has implemented evaluate function.
  • start (int | None, optional) – Evaluation starting epoch. It enables evaluation before the training starts if start <= the resuming epoch. If None, whether to evaluate is merely decided by interval. Default: None.
  • interval (int) – Evaluation interval. Default: 1.
  • by_epoch (bool) – Determine perform evaluation by epoch or by iteration. If set to True, it will perform by epoch. Otherwise, by iteration. default: True.
  • save_best (str, optional) –

    If a metric is specified, it would measure the best checkpoint during evaluation. The information about best checkpoint would be saved in runner.meta['hook_msgs'] to keep best score value and best checkpoint path, which will be also loaded when resume checkpoint. Options are the evaluation metrics on the test dataset. e.g., bbox_mAP, segm_mAP for bbox detection and instance segmentation. AR@100 for proposal recall. If save_best is auto, the first key of the returned

    OrderedDict result will be used. Default: None.
  • rule (str | None, optional) – Comparison rule for best score. If set to None, it will infer a reasonable rule. Keys such as ‘acc’, ‘top’ .etc will be inferred by ‘greater’ rule. Keys contain ‘loss’ will be inferred by ‘less’ rule. Options are ‘greater’, ‘less’, None. Default: None.
  • test_fn (callable, optional) – test a model with samples from a dataloader, and return the test results. If None, the default test function mmcv.engine.single_gpu_test will be used. (default: None)
  • greater_keys (List[str] | None, optional) – Metric keys that will be inferred by ‘greater’ comparison rule. If None, _default_greater_keys will be used. (default: None)
  • less_keys (List[str] | None, optional) – Metric keys that will be inferred by ‘less’ comparison rule. If None, _default_less_keys will be used. (default: None)
  • **eval_kwargs – Evaluation arguments fed into the evaluate function of the dataset.

Notes

If new arguments are added for EvalHook, tools/test.py, tools/eval_metric.py may be affected.

after_train_epoch(runner)[source]

Called after every training epoch to evaluate the results.

after_train_iter(runner)[source]

Called after every training iter to evaluate the results.

before_train_epoch(runner)[source]

Evaluate the model only at the start of training by epoch.

before_train_iter(runner)[source]

Evaluate the model only at the start of training by iteration.

evaluate(runner, results)[source]

Evaluate the results.

Parameters:
  • runner (mmcv.Runner) – The underlined training runner.
  • results (list) – Output results.
class mmcv.runner.DistEvalHook(dataloader, start=None, interval=1, by_epoch=True, save_best=None, rule=None, test_fn=None, greater_keys=None, less_keys=None, broadcast_bn_buffer=True, tmpdir=None, gpu_collect=False, **eval_kwargs)[source]

Distributed evaluation hook.

This hook will regularly perform evaluation in a given interval when performing in distributed environment.

Parameters:
  • dataloader (DataLoader) – A PyTorch dataloader, whose dataset has implemented evaluate function.
  • start (int | None, optional) – Evaluation starting epoch. It enables evaluation before the training starts if start <= the resuming epoch. If None, whether to evaluate is merely decided by interval. Default: None.
  • interval (int) – Evaluation interval. Default: 1.
  • by_epoch (bool) – Determine perform evaluation by epoch or by iteration. If set to True, it will perform by epoch. Otherwise, by iteration. default: True.
  • save_best (str, optional) –

    If a metric is specified, it would measure the best checkpoint during evaluation. The information about best checkpoint would be saved in runner.meta['hook_msgs'] to keep best score value and best checkpoint path, which will be also loaded when resume checkpoint. Options are the evaluation metrics on the test dataset. e.g., bbox_mAP, segm_mAP for bbox detection and instance segmentation. AR@100 for proposal recall. If save_best is auto, the first key of the returned

    OrderedDict result will be used. Default: None.
  • rule (str | None, optional) – Comparison rule for best score. If set to None, it will infer a reasonable rule. Keys such as ‘acc’, ‘top’ .etc will be inferred by ‘greater’ rule. Keys contain ‘loss’ will be inferred by ‘less’ rule. Options are ‘greater’, ‘less’, None. Default: None.
  • test_fn (callable, optional) – test a model with samples from a dataloader in a multi-gpu manner, and return the test results. If None, the default test function mmcv.engine.multi_gpu_test will be used. (default: None)
  • tmpdir (str | None) – Temporary directory to save the results of all processes. Default: None.
  • gpu_collect (bool) – Whether to use gpu or cpu to collect results. Default: False.
  • broadcast_bn_buffer (bool) – Whether to broadcast the buffer(running_mean and running_var) of rank 0 to other rank before evaluation. Default: True.
  • **eval_kwargs – Evaluation arguments fed into the evaluate function of the dataset.
class mmcv.runner.Sequential(*args, init_cfg=None)[source]

Sequential module in openmmlab.

Parameters:init_cfg (dict, optional) – Initialization config dict.
class mmcv.runner.ModuleList(modules=None, init_cfg=None)[source]

ModuleList in openmmlab.

Parameters:
  • modules (iterable, optional) – an iterable of modules to add.
  • init_cfg (dict, optional) – Initialization config dict.

ops

mmcv.ops.bbox_overlaps(bboxes1, bboxes2, mode='iou', aligned=False, offset=0)[source]

Calculate overlap between two set of bboxes.

If aligned is False, then calculate the ious between each bbox of bboxes1 and bboxes2, otherwise the ious between each aligned pair of bboxes1 and bboxes2.

Parameters:
  • bboxes1 (Tensor) – shape (m, 4) in <x1, y1, x2, y2> format or empty.
  • bboxes2 (Tensor) – shape (n, 4) in <x1, y1, x2, y2> format or empty. If aligned is True, then m and n must be equal.
  • mode (str) – “iou” (intersection over union) or iof (intersection over foreground).
Returns:

shape (m, n) if aligned == False else shape (m, 1)

Return type:

ious(Tensor)

Example

>>> bboxes1 = torch.FloatTensor([
>>>     [0, 0, 10, 10],
>>>     [10, 10, 20, 20],
>>>     [32, 32, 38, 42],
>>> ])
>>> bboxes2 = torch.FloatTensor([
>>>     [0, 0, 10, 20],
>>>     [0, 10, 10, 19],
>>>     [10, 10, 20, 20],
>>> ])
>>> bbox_overlaps(bboxes1, bboxes2)
tensor([[0.5000, 0.0000, 0.0000],
        [0.0000, 0.0000, 1.0000],
        [0.0000, 0.0000, 0.0000]])

Example

>>> empty = torch.FloatTensor([])
>>> nonempty = torch.FloatTensor([
>>>     [0, 0, 10, 9],
>>> ])
>>> assert tuple(bbox_overlaps(empty, nonempty).shape) == (0, 1)
>>> assert tuple(bbox_overlaps(nonempty, empty).shape) == (1, 0)
>>> assert tuple(bbox_overlaps(empty, empty).shape) == (0, 0)
class mmcv.ops.CARAFE(kernel_size, group_size, scale_factor)[source]

CARAFE: Content-Aware ReAssembly of FEatures

Please refer to https://arxiv.org/abs/1905.02188 for more details.

Parameters:
  • kernel_size (int) – reassemble kernel size
  • group_size (int) – reassemble group size
  • scale_factor (int) – upsample ratio
Returns:

upsampled feature map

forward(features, masks)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.ops.CARAFENaive(kernel_size, group_size, scale_factor)[source]
forward(features, masks)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.ops.CARAFEPack(channels, scale_factor, up_kernel=5, up_group=1, encoder_kernel=3, encoder_dilation=1, compressed_channels=64)[source]

A unified package of CARAFE upsampler that contains: 1) channel compressor 2) content encoder 3) CARAFE op.

Official implementation of ICCV 2019 paper CARAFE: Content-Aware ReAssembly of FEatures Please refer to https://arxiv.org/abs/1905.02188 for more details.

Parameters:
  • channels (int) – input feature channels
  • scale_factor (int) – upsample ratio
  • up_kernel (int) – kernel size of CARAFE op
  • up_group (int) – group size of CARAFE op
  • encoder_kernel (int) – kernel size of content encoder
  • encoder_dilation (int) – dilation of content encoder
  • compressed_channels (int) – output channels of channels compressor
Returns:

upsampled feature map

forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.ops.CornerPool(mode)[source]

Corner Pooling.

Corner Pooling is a new type of pooling layer that helps a convolutional network better localize corners of bounding boxes.

Please refer to https://arxiv.org/abs/1808.01244 for more details. Code is modified from https://github.com/princeton-vl/CornerNet-Lite.

Parameters:mode (str) –

Pooling orientation for the pooling layer

  • ’bottom’: Bottom Pooling
  • ’left’: Left Pooling
  • ’right’: Right Pooling
  • ’top’: Top Pooling
Returns:Feature map after pooling.
forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.ops.DeformConv2d(in_channels: int, out_channels: int, kernel_size: Union[int, Tuple[int, ...]], stride: Union[int, Tuple[int, ...]] = 1, padding: Union[int, Tuple[int, ...]] = 0, dilation: Union[int, Tuple[int, ...]] = 1, groups: int = 1, deform_groups: int = 1, bias: bool = False)[source]

Deformable 2D convolution.

Applies a deformable 2D convolution over an input signal composed of several input planes. DeformConv2d was described in the paper Deformable Convolutional Networks

Parameters:
  • in_channels (int) – Number of channels in the input image.
  • out_channels (int) – Number of channels produced by the convolution.
  • kernel_size (int, tuple) – Size of the convolving kernel.
  • stride (int, tuple) – Stride of the convolution. Default: 1.
  • padding (int or tuple) – Zero-padding added to both sides of the input. Default: 0.
  • dilation (int or tuple) – Spacing between kernel elements. Default: 1.
  • groups (int) – Number of blocked connections from input. channels to output channels. Default: 1.
  • deform_groups (int) – Number of deformable group partitions.
  • bias (bool) – If True, adds a learnable bias to the output. Default: False.
forward(x: torch.Tensor, offset: torch.Tensor) → torch.Tensor[source]

Deformable Convolutional forward function.

Parameters:
  • x (Tensor) – Input feature, shape (B, C_in, H_in, W_in)
  • offset (Tensor) –

    Offset for deformable convolution, shape (B, deform_groups*kernel_size[0]*kernel_size[1]*2, H_out, W_out), H_out, W_out are equal to the output’s.

    An offset is like [y0, x0, y1, x1, y2, x2, …, y8, x8]. The spatial arrangement is like:

    (x0, y0) (x1, y1) (x2, y2)
    (x3, y3) (x4, y4) (x5, y5)
    (x6, y6) (x7, y7) (x8, y8)
    
Returns:

Output of the layer.

Return type:

Tensor

class mmcv.ops.DeformConv2dPack(*args, **kwargs)[source]

A Deformable Conv Encapsulation that acts as normal Conv layers.

The offset tensor is like [y0, x0, y1, x1, y2, x2, …, y8, x8]. The spatial arrangement is like:

(x0, y0) (x1, y1) (x2, y2)
(x3, y3) (x4, y4) (x5, y5)
(x6, y6) (x7, y7) (x8, y8)
Parameters:
  • in_channels (int) – Same as nn.Conv2d.
  • out_channels (int) – Same as nn.Conv2d.
  • kernel_size (int or tuple[int]) – Same as nn.Conv2d.
  • stride (int or tuple[int]) – Same as nn.Conv2d.
  • padding (int or tuple[int]) – Same as nn.Conv2d.
  • dilation (int or tuple[int]) – Same as nn.Conv2d.
  • groups (int) – Same as nn.Conv2d.
  • bias (bool or str) – If specified as auto, it will be decided by the norm_cfg. Bias will be set as True if norm_cfg is None, otherwise False.
forward(x)[source]

Deformable Convolutional forward function.

Parameters:
  • x (Tensor) – Input feature, shape (B, C_in, H_in, W_in)
  • offset (Tensor) –

    Offset for deformable convolution, shape (B, deform_groups*kernel_size[0]*kernel_size[1]*2, H_out, W_out), H_out, W_out are equal to the output’s.

    An offset is like [y0, x0, y1, x1, y2, x2, …, y8, x8]. The spatial arrangement is like:

    (x0, y0) (x1, y1) (x2, y2)
    (x3, y3) (x4, y4) (x5, y5)
    (x6, y6) (x7, y7) (x8, y8)
    
Returns:

Output of the layer.

Return type:

Tensor

class mmcv.ops.DeformRoIPool(output_size, spatial_scale=1.0, sampling_ratio=0, gamma=0.1)[source]
forward(input, rois, offset=None)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.ops.DeformRoIPoolPack(output_size, output_channels, deform_fc_channels=1024, spatial_scale=1.0, sampling_ratio=0, gamma=0.1)[source]
forward(input, rois)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.ops.ModulatedDeformRoIPoolPack(output_size, output_channels, deform_fc_channels=1024, spatial_scale=1.0, sampling_ratio=0, gamma=0.1)[source]
forward(input, rois)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.ops.SigmoidFocalLoss(gamma, alpha, weight=None, reduction='mean')[source]
forward(input, target)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.ops.SoftmaxFocalLoss(gamma, alpha, weight=None, reduction='mean')[source]
forward(input, target)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.ops.MaskedConv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True)[source]

A MaskedConv2d which inherits the official Conv2d.

The masked forward doesn’t implement the backward function and only supports the stride parameter to be 1 currently.

forward(input, mask=None)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.ops.ModulatedDeformConv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, deform_groups=1, bias=True)[source]
forward(x, offset, mask)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.ops.ModulatedDeformConv2dPack(*args, **kwargs)[source]

A ModulatedDeformable Conv Encapsulation that acts as normal Conv layers.

Parameters:
  • in_channels (int) – Same as nn.Conv2d.
  • out_channels (int) – Same as nn.Conv2d.
  • kernel_size (int or tuple[int]) – Same as nn.Conv2d.
  • stride (int) – Same as nn.Conv2d, while tuple is not supported.
  • padding (int) – Same as nn.Conv2d, while tuple is not supported.
  • dilation (int) – Same as nn.Conv2d, while tuple is not supported.
  • groups (int) – Same as nn.Conv2d.
  • bias (bool or str) – If specified as auto, it will be decided by the norm_cfg. Bias will be set as True if norm_cfg is None, otherwise False.
forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

mmcv.ops.batched_nms(boxes, scores, idxs, nms_cfg, class_agnostic=False)[source]

Performs non-maximum suppression in a batched fashion.

Modified from https://github.com/pytorch/vision/blob /505cd6957711af790211896d32b40291bea1bc21/torchvision/ops/boxes.py#L39. In order to perform NMS independently per class, we add an offset to all the boxes. The offset is dependent only on the class idx, and is large enough so that boxes from different classes do not overlap.

Parameters:
  • boxes (torch.Tensor) – boxes in shape (N, 4).
  • scores (torch.Tensor) – scores in shape (N, ).
  • idxs (torch.Tensor) – each index value correspond to a bbox cluster, and NMS will not be applied between elements of different idxs, shape (N, ).
  • nms_cfg (dict) –

    specify nms type and other parameters like iou_thr. Possible keys includes the following.

    • iou_thr (float): IoU threshold used for NMS.
    • split_thr (float): threshold number of boxes. In some cases the
      number of boxes is large (e.g., 200k). To avoid OOM during training, the users could set split_thr to a small value. If the number of boxes is greater than the threshold, it will perform NMS on each group of boxes separately and sequentially. Defaults to 10000.
  • class_agnostic (bool) – if true, nms is class agnostic, i.e. IoU thresholding happens over all boxes, regardless of the predicted class.
Returns:

kept dets and indice.

Return type:

tuple

mmcv.ops.nms(boxes, scores, iou_threshold, offset=0, score_threshold=0, max_num=-1)[source]

Dispatch to either CPU or GPU NMS implementations.

The input can be either torch tensor or numpy array. GPU NMS will be used if the input is gpu tensor, otherwise CPU NMS will be used. The returned type will always be the same as inputs.

Parameters:
  • boxes (torch.Tensor or np.ndarray) – boxes in shape (N, 4).
  • scores (torch.Tensor or np.ndarray) – scores in shape (N, ).
  • iou_threshold (float) – IoU threshold for NMS.
  • offset (int, 0 or 1) – boxes’ width or height is (x2 - x1 + offset).
  • score_threshold (float) – score threshold for NMS.
  • max_num (int) – maximum number of boxes after NMS.
Returns:

kept dets(boxes and scores) and indice, which is always the same data type as the input.

Return type:

tuple

Example

>>> boxes = np.array([[49.1, 32.4, 51.0, 35.9],
>>>                   [49.3, 32.9, 51.0, 35.3],
>>>                   [49.2, 31.8, 51.0, 35.4],
>>>                   [35.1, 11.5, 39.1, 15.7],
>>>                   [35.6, 11.8, 39.3, 14.2],
>>>                   [35.3, 11.5, 39.9, 14.5],
>>>                   [35.2, 11.7, 39.7, 15.7]], dtype=np.float32)
>>> scores = np.array([0.9, 0.9, 0.5, 0.5, 0.5, 0.4, 0.3],               dtype=np.float32)
>>> iou_threshold = 0.6
>>> dets, inds = nms(boxes, scores, iou_threshold)
>>> assert len(inds) == len(dets) == 3
mmcv.ops.soft_nms(boxes, scores, iou_threshold=0.3, sigma=0.5, min_score=0.001, method='linear', offset=0)[source]

Dispatch to only CPU Soft NMS implementations.

The input can be either a torch tensor or numpy array. The returned type will always be the same as inputs.

Parameters:
  • boxes (torch.Tensor or np.ndarray) – boxes in shape (N, 4).
  • scores (torch.Tensor or np.ndarray) – scores in shape (N, ).
  • iou_threshold (float) – IoU threshold for NMS.
  • sigma (float) – hyperparameter for gaussian method
  • min_score (float) – score filter threshold
  • method (str) – either ‘linear’ or ‘gaussian’
  • offset (int, 0 or 1) – boxes’ width or height is (x2 - x1 + offset).
Returns:

kept dets(boxes and scores) and indice, which is always the same data type as the input.

Return type:

tuple

Example

>>> boxes = np.array([[4., 3., 5., 3.],
>>>                   [4., 3., 5., 4.],
>>>                   [3., 1., 3., 1.],
>>>                   [3., 1., 3., 1.],
>>>                   [3., 1., 3., 1.],
>>>                   [3., 1., 3., 1.]], dtype=np.float32)
>>> scores = np.array([0.9, 0.9, 0.5, 0.5, 0.4, 0.0], dtype=np.float32)
>>> iou_threshold = 0.6
>>> dets, inds = soft_nms(boxes, scores, iou_threshold, sigma=0.5)
>>> assert len(inds) == len(dets) == 5
mmcv.ops.nms_match(dets, iou_threshold)[source]

Matched dets into different groups by NMS.

NMS match is Similar to NMS but when a bbox is suppressed, nms match will record the indice of suppressed bbox and form a group with the indice of kept bbox. In each group, indice is sorted as score order.

Parameters:
  • dets (torch.Tensor | np.ndarray) – Det boxes with scores, shape (N, 5).
  • iou_thr (float) – IoU thresh for NMS.
Returns:

The outer list corresponds different

matched group, the inner Tensor corresponds the indices for a group in score order.

Return type:

List[torch.Tensor | np.ndarray]

class mmcv.ops.RoIAlign(output_size, spatial_scale=1.0, sampling_ratio=0, pool_mode='avg', aligned=True, use_torchvision=False)[source]

RoI align pooling layer.

Parameters:
  • output_size (tuple) – h, w
  • spatial_scale (float) – scale the input boxes by this number
  • sampling_ratio (int) – number of inputs samples to take for each output sample. 0 to take samples densely for current models.
  • pool_mode (str, 'avg' or 'max') – pooling mode in each bin.
  • aligned (bool) – if False, use the legacy implementation in MMDetection. If True, align the results more perfectly.
  • use_torchvision (bool) – whether to use roi_align from torchvision.

Note

The implementation of RoIAlign when aligned=True is modified from https://github.com/facebookresearch/detectron2/

The meaning of aligned=True:

Given a continuous coordinate c, its two neighboring pixel indices (in our pixel model) are computed by floor(c - 0.5) and ceil(c - 0.5). For example, c=1.3 has pixel neighbors with discrete indices [0] and [1] (which are sampled from the underlying signal at continuous coordinates 0.5 and 1.5). But the original roi_align (aligned=False) does not subtract the 0.5 when computing neighboring pixel indices and therefore it uses pixels with a slightly incorrect alignment (relative to our pixel model) when performing bilinear interpolation.

With aligned=True, we first appropriately scale the ROI and then shift it by -0.5 prior to calling roi_align. This produces the correct neighbors;

The difference does not make a difference to the model’s performance if ROIAlign is used together with conv layers.

forward(input, rois)[source]
Parameters:
  • input – NCHW images
  • rois – Bx5 boxes. First column is the index into N. The other 4 columns are xyxy.
class mmcv.ops.RoIPool(output_size, spatial_scale=1.0)[source]
forward(input, rois)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.ops.SyncBatchNorm(num_features, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True, group=None)[source]
forward(input)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

mmcv.ops.Conv2d

alias of mmcv.ops.deprecated_wrappers.Conv2d_deprecated

mmcv.ops.ConvTranspose2d

alias of mmcv.ops.deprecated_wrappers.ConvTranspose2d_deprecated

mmcv.ops.Linear

alias of mmcv.ops.deprecated_wrappers.Linear_deprecated

mmcv.ops.MaxPool2d

alias of mmcv.ops.deprecated_wrappers.MaxPool2d_deprecated

class mmcv.ops.CrissCrossAttention(in_channels)[source]

Criss-Cross Attention Module.

forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.ops.PSAMask(psa_type, mask_size=None)[source]
forward(input)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

mmcv.ops.point_sample(input, points, align_corners=False, **kwargs)[source]

A wrapper around grid_sample() to support 3D point_coords tensors Unlike torch.nn.functional.grid_sample() it assumes point_coords to lie inside [0, 1] x [0, 1] square.

Parameters:
  • input (Tensor) – Feature map, shape (N, C, H, W).
  • points (Tensor) – Image based absolute point coordinates (normalized), range [0, 1] x [0, 1], shape (N, P, 2) or (N, Hgrid, Wgrid, 2).
  • align_corners (bool) – Whether align_corners. Default: False
Returns:

Features of point on input, shape (N, C, P) or

(N, C, Hgrid, Wgrid).

Return type:

Tensor

mmcv.ops.rel_roi_point_to_rel_img_point(rois, rel_roi_points, img, spatial_scale=1.0)[source]

Convert roi based relative point coordinates to image based absolute point coordinates.

Parameters:
  • rois (Tensor) – RoIs or BBoxes, shape (N, 4) or (N, 5)
  • rel_roi_points (Tensor) – Point coordinates inside RoI, relative to RoI, location, range (0, 1), shape (N, P, 2)
  • img (tuple/Tensor) – (height, width) of image or feature map.
  • spatial_scale (float) – Scale points by this factor. Default: 1.
Returns:

Image based relative point coordinates for sampling,

shape (N, P, 2)

Return type:

Tensor

class mmcv.ops.SimpleRoIAlign(output_size, spatial_scale, aligned=True)[source]
forward(features, rois)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.ops.SAConv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, use_deform=False)[source]

SAC (Switchable Atrous Convolution)

This is an implementation of SAC in DetectoRS (https://arxiv.org/pdf/2006.02334.pdf).

Parameters:
  • in_channels (int) – Number of channels in the input image
  • out_channels (int) – Number of channels produced by the convolution
  • kernel_size (int or tuple) – Size of the convolving kernel
  • stride (int or tuple, optional) – Stride of the convolution. Default: 1
  • padding (int or tuple, optional) – Zero-padding added to both sides of the input. Default: 0
  • padding_mode (string, optional) – 'zeros', 'reflect', 'replicate' or 'circular'. Default: 'zeros'
  • dilation (int or tuple, optional) – Spacing between kernel elements. Default: 1
  • groups (int, optional) – Number of blocked connections from input channels to output channels. Default: 1
  • bias (bool, optional) – If True, adds a learnable bias to the output. Default: True
  • use_deform – If True, replace convolution with deformable convolution. Default: False.
forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.ops.TINShift[source]

Temporal Interlace Shift.

Temporal Interlace shift is a differentiable temporal-wise frame shifting which is proposed in “Temporal Interlacing Network”

Please refer to https://arxiv.org/abs/2001.06499 for more details. Code is modified from https://github.com/mit-han-lab/temporal-shift-module

forward(input, shift)[source]

Perform temporal interlace shift.

Parameters:
  • input (Tensor) – Feature map with shape [N, num_segments, C, H * W].
  • shift (Tensor) – Shift tensor with shape [N, num_segments].
Returns:

Feature map after temporal interlace shift.

mmcv.ops.box_iou_rotated(bboxes1, bboxes2, mode='iou', aligned=False)[source]

Return intersection-over-union (Jaccard index) of boxes.

Both sets of boxes are expected to be in (x_center, y_center, width, height, angle) format.

If aligned is False, then calculate the ious between each bbox of bboxes1 and bboxes2, otherwise the ious between each aligned pair of bboxes1 and bboxes2.

Parameters:
  • boxes1 (Tensor) – rotated bboxes 1. It has shape (N, 5), indicating (x, y, w, h, theta) for each row. Note that theta is in radian.
  • boxes2 (Tensor) – rotated bboxes 2. It has shape (M, 5), indicating (x, y, w, h, theta) for each row. Note that theta is in radian.
  • mode (str) – “iou” (intersection over union) or iof (intersection over foreground).
Returns:

shape (N, M) if aligned == False else shape (N,)

Return type:

ious(Tensor)

mmcv.ops.nms_rotated(dets, scores, iou_threshold, labels=None)[source]

Performs non-maximum suppression (NMS) on the rotated boxes according to their intersection-over-union (IoU).

Rotated NMS iteratively removes lower scoring rotated boxes which have an IoU greater than iou_threshold with another (higher scoring) rotated box.

Parameters:
  • boxes (Tensor) – Rotated boxes in shape (N, 5). They are expected to be in (x_ctr, y_ctr, width, height, angle_radian) format.
  • scores (Tensor) – scores in shape (N, ).
  • iou_threshold (float) – IoU thresh for NMS.
  • labels (Tensor) – boxes’ label in shape (N,).
Returns:

kept dets(boxes and scores) and indice, which is always the same data type as the input.

Return type:

tuple

mmcv.ops.upfirdn2d(input, kernel, up=1, down=1, pad=(0, 0))[source]

UpFRIDn for 2d features.

UpFIRDn is short for upsample, apply FIR filter and downsample. More details can be found in: https://www.mathworks.com/help/signal/ref/upfirdn.html

Parameters:
  • input (Tensor) – Tensor with shape of (n, c, h, w).
  • kernel (Tensor) – Filter kernel.
  • up (int | tuple[int], optional) – Upsampling factor. If given a number, we will use this factor for the both height and width side. Defaults to 1.
  • down (int | tuple[int], optional) – Downsampling factor. If given a number, we will use this factor for the both height and width side. Defaults to 1.
  • pad (tuple[int], optional) – Padding for tensors, (x_pad, y_pad) or (x_pad_0, x_pad_1, y_pad_0, y_pad_1). Defaults to (0, 0).
Returns:

Tensor after UpFIRDn.

Return type:

Tensor

class mmcv.ops.FusedBiasLeakyReLU(num_channels, negative_slope=0.2, scale=1.4142135623730951)[source]

Fused bias leaky ReLU.

This function is introduced in the StyleGAN2: http://arxiv.org/abs/1912.04958

The bias term comes from the convolution operation. In addition, to keep the variance of the feature map or gradients unchanged, they also adopt a scale similarly with Kaiming initialization. However, since the \(1 + lpha^2\) : is too small, we can just ignore it. Therefore, the final scale is just \(\sqrt{2}\):. Of course, you may change it with # noqa: W605, E501 your own scale.

TODO: Implement the CPU version.

Parameters:
  • channel (int) – The channel number of the feature map.
  • negative_slope (float, optional) – Same as nn.LeakyRelu. Defaults to 0.2.
  • scale (float, optional) – A scalar to adjust the variance of the feature map. Defaults to 2**0.5.
forward(input)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

mmcv.ops.fused_bias_leakyrelu(input, bias, negative_slope=0.2, scale=1.4142135623730951)[source]

Fused bias leaky ReLU function.

This function is introduced in the StyleGAN2: http://arxiv.org/abs/1912.04958

The bias term comes from the convolution operation. In addition, to keep the variance of the feature map or gradients unchanged, they also adopt a scale similarly with Kaiming initialization. However, since the \(1 + lpha^2\) : is too small, we can just ignore it. Therefore, the final scale is just \(\sqrt{2}\):. Of course, you may change it with # noqa: W605, E501 your own scale.

Parameters:
  • input (torch.Tensor) – Input feature map.
  • bias (nn.Parameter) – The bias from convolution operation.
  • negative_slope (float, optional) – Same as nn.LeakyRelu. Defaults to 0.2.
  • scale (float, optional) – A scalar to adjust the variance of the feature map. Defaults to 2**0.5.
Returns:

Feature map after non-linear activation.

Return type:

torch.Tensor

class mmcv.ops.RoIAlignRotated(out_size, spatial_scale, sample_num=0, aligned=True, clockwise=False)[source]

RoI align pooling layer for rotated proposals.

It accepts a feature map of shape (N, C, H, W) and rois with shape (n, 6) with each roi decoded as (batch_index, center_x, center_y, w, h, angle). The angle is in radian.

Parameters:
  • out_size (tuple) – h, w
  • spatial_scale (float) – scale the input boxes by this number
  • sample_num (int) – number of inputs samples to take for each output sample. 0 to take samples densely for current models.
  • aligned (bool) – if False, use the legacy implementation in MMDetection. If True, align the results more perfectly. Default: True.
  • clockwise (bool) – If True, the angle in each proposal follows a clockwise fashion in image space, otherwise, the angle is counterclockwise. Default: False.

Note

The implementation of RoIAlign when aligned=True is modified from https://github.com/facebookresearch/detectron2/

The meaning of aligned=True:

Given a continuous coordinate c, its two neighboring pixel indices (in our pixel model) are computed by floor(c - 0.5) and ceil(c - 0.5). For example, c=1.3 has pixel neighbors with discrete indices [0] and [1] (which are sampled from the underlying signal at continuous coordinates 0.5 and 1.5). But the original roi_align (aligned=False) does not subtract the 0.5 when computing neighboring pixel indices and therefore it uses pixels with a slightly incorrect alignment (relative to our pixel model) when performing bilinear interpolation.

With aligned=True, we first appropriately scale the ROI and then shift it by -0.5 prior to calling roi_align. This produces the correct neighbors;

The difference does not make a difference to the model’s performance if ROIAlign is used together with conv layers.

forward(features, rois)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

mmcv.ops.pixel_group(score, mask, embedding, kernel_label, kernel_contour, kernel_region_num, distance_threshold)[source]

Group pixels into text instances, which is widely used text detection methods.

Parameters:
  • score (np.array or Tensor) – The foreground score with size hxw.
  • mask (np.array or Tensor) – The foreground mask with size hxw.
  • embedding (np.array or Tensor) – The embedding with size hxwxc to distinguish instances.
  • kernel_label (np.array or Tensor) – The instance kernel index with size hxw.
  • kernel_contour (np.array or Tensor) – The kernel contour with size hxw.
  • kernel_region_num (int) – The instance kernel region number.
  • distance_threshold (float) – The embedding distance threshold between kernel and pixel in one instance.
Returns:

The instance coordinate list.

Each element consists of averaged confidence, pixel number, and coordinates (x_i, y_i for all pixels) in order.

Return type:

pixel_assignment (List[List[float]])

mmcv.ops.contour_expand(kernel_mask, internal_kernel_label, min_kernel_area, kernel_num)[source]

Expand kernel contours so that foreground pixels are assigned into instances.

Parameters:
  • kernel_mask (np.array or Tensor) – The instance kernel mask with size hxw.
  • internal_kernel_label (np.array or Tensor) – The instance internal kernel label with size hxw.
  • min_kernel_area (int) – The minimum kernel area.
  • kernel_num (int) – The instance kernel number.
Returns:

The instance index map with size hxw.

Return type:

label (np.array or Tensor)

class mmcv.ops.MultiScaleDeformableAttention(embed_dims=256, num_heads=8, num_levels=4, num_points=4, im2col_step=64, dropout=0.1, batch_first=False, norm_cfg=None, init_cfg=None)[source]

An attention module used in Deformable-Detr. `Deformable DETR: Deformable Transformers for End-to-End Object Detection.

Parameters:
  • embed_dims (int) – The embedding dimension of Attention. Default: 256.
  • num_heads (int) – Parallel attention heads. Default: 64.
  • num_levels (int) – The number of feature map used in Attention. Default: 4.
  • num_points (int) – The number of sampling points for each query in each head. Default: 4.
  • im2col_step (int) – The step used in image_to_column. Default: 64.
  • dropout (float) – A Dropout layer on inp_identity. Default: 0.1.
  • batch_first (bool) – Key, Query and Value are shape of (batch, n, embed_dim) or (n, batch, embed_dim). Default to False.
  • norm_cfg (dict) – Config dict for normalization layer. Default: None.
  • (obj (init_cfg) – mmcv.ConfigDict): The Config for initialization. Default: None.
forward(query, key=None, value=None, identity=None, query_pos=None, key_padding_mask=None, reference_points=None, spatial_shapes=None, level_start_index=None, **kwargs)[source]

Forward Function of MultiScaleDeformAttention.

Parameters:
  • query (Tensor) – Query of Transformer with shape (num_query, bs, embed_dims).
  • key (Tensor) – The key tensor with shape (num_key, bs, embed_dims).
  • value (Tensor) – The value tensor with shape (num_key, bs, embed_dims).
  • identity (Tensor) – The tensor used for addition, with the same shape as query. Default None. If None, query will be used.
  • query_pos (Tensor) – The positional encoding for query. Default: None.
  • key_pos (Tensor) – The positional encoding for key. Default None.
  • reference_points (Tensor) – The normalized reference points with shape (bs, num_query, num_levels, 2), all elements is range in [0, 1], top-left (0,0), bottom-right (1, 1), including padding area. or (N, Length_{query}, num_levels, 4), add additional two dimensions is (w, h) to form reference boxes.
  • key_padding_mask (Tensor) – ByteTensor for query, with shape [bs, num_key].
  • spatial_shapes (Tensor) – Spatial shape of features in different levels. With shape (num_levels, 2), last dimension represents (h, w).
  • level_start_index (Tensor) – The start index of each level. A tensor has shape (num_levels, ) and can be represented as [0, h_0*w_0, h_0*w_0+h_1*w_1, …].
Returns:

forwarded results with shape [num_query, bs, embed_dims].

Return type:

Tensor

init_weights()[source]

Default initialization for Parameters of Module.

class mmcv.ops.BorderAlign(pool_size)[source]

Border align pooling layer.

Applies border_align over the input feature based on predicted bboxes. The details were described in the paper BorderDet: Border Feature for Dense Object Detection.

For each border line (e.g. top, left, bottom or right) of each box, border_align does the following:

  1. uniformly samples `pool_size`+1 positions on this line, involving the start and end points.
  2. the corresponding features on these points are computed by bilinear interpolation.
  3. max pooling over all the `pool_size`+1 positions are used for computing pooled feature.
Parameters:pool_size (int) – number of positions sampled over the boxes’ borders (e.g. top, bottom, left, right).
forward(input, boxes)[source]
Parameters:
  • input – Features with shape [N,4C,H,W]. Channels ranged in [0,C), [C,2C), [2C,3C), [3C,4C) represent the top, left, bottom, right features respectively.
  • boxes – Boxes with shape [N,H*W,4]. Coordinate format (x1,y1,x2,y2).
Returns:

Pooled features with shape [N,C,H*W,4]. The order is

(top,left,bottom,right) for the last dimension.

Return type:

Tensor