fileio¶

class mmcv.fileio.BaseStorageBackend[source]¶

Abstract class of storage backends.

All backends need to implement two apis: get() and get_text(). get() reads the file as a byte stream and get_text() reads the file as texts.

class mmcv.fileio.FileClient(backend=None, prefix=None, **kwargs)[source]¶

A general file client to access files in different backends.

The client loads a file or text in a specified backend from its path and returns it as a binary or text file. There are two ways to choose a backend, the name of backend and the prefix of path. Although both of them can be used to choose a storage backend, backend has a higher priority that is if they are all set, the storage backend will be chosen by the backend argument. If they are all None, the disk backend will be chosen. Note that It can also register other backend accessor with a given name, prefixes, and backend class. In addition, We use the singleton pattern to avoid repeated object creation. If the arguments are the same, the same object will be returned.

Parameters

backend (str, optional) – The storage backend type. Options are “disk”, “ceph”, “memcached”, “lmdb”, “http” and “petrel”. Default: None.
prefix (str, optional) – The prefix of the registered storage backend. Options are “s3”, “http”, “https”. Default: None.

Examples

>>> # only set backend
>>> file_client = FileClient(backend='petrel')
>>> # only set prefix
>>> file_client = FileClient(prefix='s3')
>>> # set both backend and prefix but use backend to choose client
>>> file_client = FileClient(backend='petrel', prefix='s3')
>>> # if the arguments are the same, the same object is returned
>>> file_client1 = FileClient(backend='petrel')
>>> file_client1 is file_client
True

client¶

The backend object.

Type: BaseStorageBackend

exists(filepath: Union[str, pathlib.Path]) → bool[source]¶

Check whether a file path exists.

Parameters: filepath (str or Path) – Path to be checked whether exists.
Returns: Return True if filepath exists, False otherwise.
Return type: bool

get(filepath: Union[str, pathlib.Path]) → Union[bytes, memoryview][source]¶

Read data from a given filepath with ‘rb’ mode.

Note

There are two types of return values for get, one is bytes and the other is memoryview. The advantage of using memoryview is that you can avoid copying, and if you want to convert it to bytes, you can use .tobytes().

Parameters: filepath (str or Path) – Path to read data.
Returns: Expected bytes object or a memory view of the bytes object.
Return type: bytes | memoryview

get_local_path(filepath: Union[str, pathlib.Path]) → Generator[Union[str, pathlib.Path], None, None][source]¶

Download data from filepath and write the data to local path.

get_local_path is decorated by contxtlib.contextmanager(). It can be called with with statement, and when exists from the with statement, the temporary path will be released.

Note

If the filepath is a local path, just return itself.

Warning

get_local_path is an experimental interface that may change in the future.

Parameters: filepath (str or Path) – Path to be read data.

Examples

>>> file_client = FileClient(prefix='s3')
>>> with file_client.get_local_path('s3://bucket/abc.jpg') as path:
...     # do something here

Yields: Iterable[str] – Only yield one path.

get_text(filepath: Union[str, pathlib.Path], encoding='utf-8') → str[source]¶

Read data from a given filepath with ‘r’ mode.

Parameters

filepath (str or Path) – Path to read data.
encoding (str) – The encoding format used to open the filepath. Default: ‘utf-8’.

Returns

Expected text reading from filepath.

Return type

str

classmethod infer_client(file_client_args: Optional[dict] = None, uri: Optional[Union[str, pathlib.Path]] = None) → mmcv.fileio.file_client.FileClient [source]¶

Infer a suitable file client based on the URI and arguments.

Parameters

file_client_args (dict, optional) – Arguments to instantiate a FileClient. Default: None.
uri (str | Path, optional) – Uri to be parsed that contains the file prefix. Default: None.

Examples

>>> uri = 's3://path/of/your/file'
>>> file_client = FileClient.infer_client(uri=uri)
>>> file_client_args = {'backend': 'petrel'}
>>> file_client = FileClient.infer_client(file_client_args)

Returns: Instantiated FileClient object.
Return type: FileClient

isdir(filepath: Union[str, pathlib.Path]) → bool[source]¶

Check whether a file path is a directory.

Parameters: filepath (str or Path) – Path to be checked whether it is a directory.
Returns: Return True if filepath points to a directory, False otherwise.
Return type: bool

isfile(filepath: Union[str, pathlib.Path]) → bool[source]¶

Check whether a file path is a file.

Parameters: filepath (str or Path) – Path to be checked whether it is a file.
Returns: Return True if filepath points to a file, False otherwise.
Return type: bool

join_path(filepath: Union[str, pathlib.Path], *filepaths: Union[str, pathlib.Path]) → str[source]¶

Concatenate all file paths.

Join one or more filepath components intelligently. The return value is the concatenation of filepath and any members of *filepaths.

Parameters: filepath (str or Path) – Path to be concatenated.
Returns: The result of concatenation.
Return type: str

list_dir_or_file(dir_path: Union[str, pathlib.Path], list_dir: bool = True, list_file: bool = True, suffix: Optional[Union[str, Tuple[str]]] = None, recursive: bool = False) → Iterator[str][source]¶

Scan a directory to find the interested directories or files in arbitrary order.

Note

list_dir_or_file() returns the path relative to dir_path.

Parameters

dir_path (str | Path) – Path of the directory.
list_dir (bool) – List the directories. Default: True.
list_file (bool) – List the path of files. Default: True.
suffix (str or tuple[str], optional) – File suffix that we are interested in. Default: None.
recursive (bool) – If set to True, recursively scan the directory. Default: False.

Yields

Iterable[str] – A relative path to dir_path.

static parse_uri_prefix(uri: Union[str, pathlib.Path]) → Optional[str][source]¶

Parse the prefix of a uri.

Parameters: uri (str | Path) – Uri to be parsed that contains the file prefix.

Examples

>>> FileClient.parse_uri_prefix('s3://path/of/your/file')
's3'

Returns: Return the prefix of uri if the uri contains ‘://’ else None.
Return type: str | None

put(obj: bytes, filepath: Union[str, pathlib.Path]) → None[source]¶

Write data to a given filepath with ‘wb’ mode.

Note

put should create a directory if the directory of filepath does not exist.

Parameters

obj (bytes) – Data to be written.
filepath (str or Path) – Path to write data.

put_text(obj: str, filepath: Union[str, pathlib.Path]) → None[source]¶

Write data to a given filepath with ‘w’ mode.

Note

put_text should create a directory if the directory of filepath does not exist.

Parameters

obj (str) – Data to be written.
filepath (str or Path) – Path to write data.
encoding (str, optional) – The encoding format used to open the filepath. Default: ‘utf-8’.

classmethod register_backend(name, backend=None, force=False, prefixes=None)[source]¶

Register a backend to FileClient.

This method can be used as a normal class method or a decorator.

class NewBackend(BaseStorageBackend):

    def get(self, filepath):
        return filepath

    def get_text(self, filepath):
        return filepath

FileClient.register_backend('new', NewBackend)

or

@FileClient.register_backend('new')
class NewBackend(BaseStorageBackend):

    def get(self, filepath):
        return filepath

    def get_text(self, filepath):
        return filepath

Parameters

name (str) – The name of the registered backend.
backend (class, optional) – The backend class to be registered, which must be a subclass of BaseStorageBackend. When this method is used as a decorator, backend is None. Defaults to None.
force (bool, optional) – Whether to override the backend if the name has already been registered. Defaults to False.
prefixes (str or list[str] or tuple[str], optional) – The prefixes of the registered storage backend. Default: None. New in version 1.3.15.

remove(filepath: Union[str, pathlib.Path]) → None[source]¶

Remove a file.

Parameters: filepath (str, Path) – Path to be removed.

mmcv.fileio.dict_from_file(filename: Union[str, pathlib.Path], key_type: type = <class 'str'>, encoding: str = 'utf-8', file_client_args: Optional[Dict] = None) → Dict[source]¶

Load a text file and parse the content as a dict.

Each line of the text file will be two or more columns split by whitespaces or tabs. The first column will be parsed as dict keys, and the following columns will be parsed as dict values.

Note

In v1.3.16 and later, dict_from_file supports loading a text file which can be storaged in different backends and parsing the content as a dict.

Parameters

filename (str) – Filename.
key_type (type) – Type of the dict keys. str is user by default and type conversion will be performed if specified.
encoding (str) – Encoding used to open the file. Default utf-8.
file_client_args (dict, optional) – Arguments to instantiate a FileClient. See mmcv.fileio.FileClient for details. Default: None.

Examples

>>> dict_from_file('/path/of/your/file')  # disk
{'key1': 'value1', 'key2': 'value2'}
>>> dict_from_file('s3://path/of/your/file')  # ceph or petrel
{'key1': 'value1', 'key2': 'value2'}

Returns: The parsed contents.
Return type: dict

mmcv.fileio.dump(obj: Any, file: Optional[Union[str, pathlib.Path, TextIO, _io.StringIO, _io.BytesIO]] = None, file_format: Optional[str] = None, file_client_args: Optional[Dict] = None, **kwargs)[source]¶

Dump data to json/yaml/pickle strings or files.

This method provides a unified api for dumping data as strings or to files, and also supports custom arguments for each file format.

Note

In v1.3.16 and later, dump supports dumping data as strings or to files which is saved to different backends.

Parameters

obj (any) – The python object to be dumped.
file (str or Path or file-like object, optional) – If not specified, then the object is dumped to a str, otherwise to a file specified by the filename or file-like object.
file_format (str, optional) – Same as load().
file_client_args (dict, optional) – Arguments to instantiate a FileClient. See mmcv.fileio.FileClient for details. Default: None.

Examples

>>> dump('hello world', '/path/of/your/file')  # disk
>>> dump('hello world', 's3://path/of/your/file')  # ceph or petrel

Returns: True for success, False otherwise.
Return type: bool

mmcv.fileio.list_from_file(filename: Union[str, pathlib.Path], prefix: str = '', offset: int = 0, max_num: int = 0, encoding: str = 'utf-8', file_client_args: Optional[Dict] = None) → List[source]¶

Load a text file and parse the content as a list of strings.

Note

In v1.3.16 and later, list_from_file supports loading a text file which can be storaged in different backends and parsing the content as a list for strings.

Parameters

filename (str) – Filename.
prefix (str) – The prefix to be inserted to the beginning of each item.
offset (int) – The offset of lines.
max_num (int) – The maximum number of lines to be read, zeros and negatives mean no limitation.
encoding (str) – Encoding used to open the file. Default utf-8.
file_client_args (dict, optional) – Arguments to instantiate a FileClient. See mmcv.fileio.FileClient for details. Default: None.

Examples

>>> list_from_file('/path/of/your/file')  # disk
['hello', 'world']
>>> list_from_file('s3://path/of/your/file')  # ceph or petrel
['hello', 'world']

Returns: A list of strings.
Return type: list[str]

mmcv.fileio.load(file: Union[str, pathlib.Path, TextIO, _io.StringIO, _io.BytesIO], file_format: Optional[str] = None, file_client_args: Optional[Dict] = None, **kwargs)[source]¶

Load data from json/yaml/pickle files.

This method provides a unified api for loading data from serialized files.

Note

In v1.3.16 and later, load supports loading data from serialized files those can be storaged in different backends.

Parameters

file (str or Path or file-like object) – Filename or a file-like object.
file_format (str, optional) – If not specified, the file format will be inferred from the file extension, otherwise use the specified one. Currently supported formats include “json”, “yaml/yml” and “pickle/pkl”.
file_client_args (dict, optional) – Arguments to instantiate a FileClient. See mmcv.fileio.FileClient for details. Default: None.

Examples

>>> load('/path/of/your/file')  # file is storaged in disk
>>> load('https://path/of/your/file')  # file is storaged in Internet
>>> load('s3://path/of/your/file')  # file is storaged in petrel

Returns: The content from the file.

image¶

mmcv.image.adjust_brightness(img, factor=1.0, backend=None)[source]¶

Adjust image brightness.

This function controls the brightness of an image. An enhancement factor of 0.0 gives a black image. A factor of 1.0 gives the original image. This function blends the source image and the degenerated black image:

\[output = img * factor + degenerated * (1 - factor)\]

Parameters

img (ndarray) – Image to be brightened.
factor (float) – A value controls the enhancement. Factor 1.0 returns the original image, lower factors mean less color (brightness, contrast, etc), and higher values more. Default 1.
backend (str | None) – The image processing backend type. Options are cv2, pillow, None. If backend is None, the global imread_backend specified by mmcv.use_backend() will be used. Defaults to None.

Returns

The brightened image.

Return type

ndarray

mmcv.image.adjust_color(img, alpha=1, beta=None, gamma=0, backend=None)[source]¶

It blends the source image and its gray image:

\[output = img * alpha + gray\_img * beta + gamma\]

Parameters

img (ndarray) – The input source image.
alpha (int | float) – Weight for the source image. Default 1.
beta (int | float) – Weight for the converted gray image. If None, it’s assigned the value (1 - alpha).
gamma (int | float) – Scalar added to each sum. Same as cv2.addWeighted(). Default 0.
backend (str | None) – The image processing backend type. Options are cv2, pillow, None. If backend is None, the global imread_backend specified by mmcv.use_backend() will be used. Defaults to None.

Returns

Colored image which has the same size and dtype as input.

Return type

ndarray

mmcv.image.adjust_contrast(img, factor=1.0, backend=None)[source]¶

Adjust image contrast.

This function controls the contrast of an image. An enhancement factor of 0.0 gives a solid grey image. A factor of 1.0 gives the original image. It blends the source image and the degenerated mean image:

\[output = img * factor + degenerated * (1 - factor)\]

Parameters

img (ndarray) – Image to be contrasted. BGR order.
factor (float) – Same as mmcv.adjust_brightness().
backend (str | None) – The image processing backend type. Options are cv2, pillow, None. If backend is None, the global imread_backend specified by mmcv.use_backend() will be used. Defaults to None.

Returns

The contrasted image.

Return type

ndarray

mmcv.image.adjust_hue(img: numpy.ndarray, hue_factor: float, backend: Optional[str] = None) → numpy.ndarray[source]¶

Adjust hue of an image.

The image hue is adjusted by converting the image to HSV and cyclically shifting the intensities in the hue channel (H). The image is then converted back to original image mode.

hue_factor is the amount of shift in H channel and must be in the interval [-0.5, 0.5].

Modified from https://github.com/pytorch/vision/blob/main/torchvision/ transforms/functional.py

Parameters

img (ndarray) – Image to be adjusted.
hue_factor (float) – How much to shift the hue channel. Should be in [-0.5, 0.5]. 0.5 and -0.5 give complete reversal of hue channel in HSV space in positive and negative direction respectively. 0 means no shift. Therefore, both -0.5 and 0.5 will give an image with complementary colors while 0 gives the original image.
backend (str | None) – The image processing backend type. Options are cv2, pillow, None. If backend is None, the global imread_backend specified by mmcv.use_backend() will be used. Defaults to None.

Returns

Hue adjusted image.

Return type

ndarray

mmcv.image.adjust_lighting(img, eigval, eigvec, alphastd=0.1, to_rgb=True)[source]¶

AlexNet-style PCA jitter.

This data augmentation is proposed in ImageNet Classification with Deep Convolutional Neural Networks.

Parameters

img (ndarray) – Image to be adjusted lighting. BGR order.
eigval (ndarray) – the eigenvalue of the convariance matrix of pixel values, respectively.
eigvec (ndarray) – the eigenvector of the convariance matrix of pixel values, respectively.
alphastd (float) – The standard deviation for distribution of alpha. Defaults to 0.1
to_rgb (bool) – Whether to convert img to rgb.

Returns

The adjusted image.

Return type

ndarray

mmcv.image.adjust_sharpness(img, factor=1.0, kernel=None)[source]¶

Adjust image sharpness.

This function controls the sharpness of an image. An enhancement factor of 0.0 gives a blurred image. A factor of 1.0 gives the original image. And a factor of 2.0 gives a sharpened image. It blends the source image and the degenerated mean image:

\[output = img * factor + degenerated * (1 - factor)\]

Parameters

img (ndarray) – Image to be sharpened. BGR order.
factor (float) – Same as mmcv.adjust_brightness().
kernel (np.ndarray, optional) – Filter kernel to be applied on the img to obtain the degenerated img. Defaults to None.

Note

No value sanity check is enforced on the kernel set by users. So with an inappropriate kernel, the adjust_sharpness may fail to perform the function its name indicates but end up performing whatever transform determined by the kernel.

Returns: The sharpened image.
Return type: ndarray

mmcv.image.auto_contrast(img, cutoff=0)[source]¶

Auto adjust image contrast.

This function maximize (normalize) image contrast by first removing cutoff percent of the lightest and darkest pixels from the histogram and remapping the image so that the darkest pixel becomes black (0), and the lightest becomes white (255).

Parameters

img (ndarray) – Image to be contrasted. BGR order.
cutoff (int | float | tuple) – The cutoff percent of the lightest and darkest pixels to be removed. If given as tuple, it shall be (low, high). Otherwise, the single value will be used for both. Defaults to 0.

Returns

The contrasted image.

Return type

ndarray

mmcv.image.bgr2gray(img: numpy.ndarray, keepdim: bool = False) → numpy.ndarray[source]¶

Convert a BGR image to grayscale image.

Parameters

img (ndarray) – The input image.
keepdim (bool) – If False (by default), then return the grayscale image with 2 dims, otherwise 3 dims.

Returns

The converted grayscale image.

Return type

ndarray

mmcv.image.bgr2hls(img: numpy.ndarray) → numpy.ndarray¶

Convert a BGR image to HLS: image.

Parameters: img (ndarray or str) – The input image.
Returns: The converted HLS image.
Return type: ndarray

mmcv.image.bgr2hsv(img: numpy.ndarray) → numpy.ndarray¶

Convert a BGR image to HSV: image.

Parameters: img (ndarray or str) – The input image.
Returns: The converted HSV image.
Return type: ndarray

mmcv.image.bgr2rgb(img: numpy.ndarray) → numpy.ndarray¶

Convert a BGR image to RGB: image.

Parameters: img (ndarray or str) – The input image.
Returns: The converted RGB image.
Return type: ndarray

mmcv.image.bgr2ycbcr(img: numpy.ndarray, y_only: bool = False) → numpy.ndarray[source]¶

Convert a BGR image to YCbCr image.

The bgr version of rgb2ycbcr. It implements the ITU-R BT.601 conversion for standard-definition television. See more details in https://en.wikipedia.org/wiki/YCbCr#ITU-R_BT.601_conversion.

It differs from a similar function in cv2.cvtColor: BGR <-> YCrCb. In OpenCV, it implements a JPEG conversion. See more details in https://en.wikipedia.org/wiki/YCbCr#JPEG_conversion.

Parameters

img (ndarray) – The input image. It accepts: 1. np.uint8 type with range [0, 255]; 2. np.float32 type with range [0, 1].
y_only (bool) – Whether to only return Y channel. Default: False.

Returns

The converted YCbCr image. The output image has the same type and range as input image.

Return type

ndarray

mmcv.image.clahe(img, clip_limit=40.0, tile_grid_size=(8, 8))[source]¶

Use CLAHE method to process the image.

See ZUIDERVELD,K. Contrast Limited Adaptive Histogram Equalization[J]. Graphics Gems, 1994:474-485. for more information.

Parameters

img (ndarray) – Image to be processed.
clip_limit (float) – Threshold for contrast limiting. Default: 40.0.
tile_grid_size (tuple[int]) – Size of grid for histogram equalization. Input image will be divided into equally sized rectangular tiles. It defines the number of tiles in row and column. Default: (8, 8).

Returns

The processed image.

Return type

ndarray

mmcv.image.cutout(img: numpy.ndarray, shape: Union[int, Tuple[int, int]], pad_val: Union[int, float, tuple] = 0) → numpy.ndarray[source]¶

Randomly cut out a rectangle from the original img.

Parameters

img (ndarray) – Image to be cutout.
shape (int | tuple[int]) – Expected cutout shape (h, w). If given as a int, the value will be used for both h and w.
pad_val (int | float | tuple[int | float]) – Values to be filled in the cut area. Defaults to 0.

Returns

The cutout image.

Return type

ndarray

mmcv.image.gray2bgr(img: numpy.ndarray) → numpy.ndarray[source]¶

Convert a grayscale image to BGR image.

Parameters: img (ndarray) – The input image.
Returns: The converted BGR image.
Return type: ndarray

mmcv.image.gray2rgb(img: numpy.ndarray) → numpy.ndarray[source]¶

Convert a grayscale image to RGB image.

Parameters: img (ndarray) – The input image.
Returns: The converted RGB image.
Return type: ndarray

mmcv.image.hls2bgr(img: numpy.ndarray) → numpy.ndarray¶

Convert a HLS image to BGR: image.

Parameters: img (ndarray or str) – The input image.
Returns: The converted BGR image.
Return type: ndarray

mmcv.image.hsv2bgr(img: numpy.ndarray) → numpy.ndarray¶

Convert a HSV image to BGR: image.

Parameters: img (ndarray or str) – The input image.
Returns: The converted BGR image.
Return type: ndarray

mmcv.image.imconvert(img: numpy.ndarray, src: str, dst: str) → numpy.ndarray[source]¶

Convert an image from the src colorspace to dst colorspace.

Parameters

img (ndarray) – The input image.
src (str) – The source colorspace, e.g., ‘rgb’, ‘hsv’.
dst (str) – The destination colorspace, e.g., ‘rgb’, ‘hsv’.

Returns

The converted image.

Return type

ndarray

mmcv.image.imcrop(img: numpy.ndarray, bboxes: numpy.ndarray, scale: float = 1.0, pad_fill: Optional[Union[float, list]] = None) → Union[numpy.ndarray, List[numpy.ndarray]][source]¶

Crop image patches.

3 steps: scale the bboxes -> clip bboxes -> crop and pad.

Parameters

img (ndarray) – Image to be cropped.
bboxes (ndarray) – Shape (k, 4) or (4, ), location of cropped bboxes.
scale (float, optional) – Scale ratio of bboxes, the default value 1.0 means no scaling.
pad_fill (Number | list[Number]) – Value to be filled for padding. Default: None, which means no padding.

Returns

The cropped image patches.

Return type

list[ndarray] | ndarray

mmcv.image.imequalize(img)[source]¶

Equalize the image histogram.

This function applies a non-linear mapping to the input image, in order to create a uniform distribution of grayscale values in the output image.

Parameters: img (ndarray) – Image to be equalized.
Returns: The equalized image.
Return type: ndarray

mmcv.image.imflip(img: numpy.ndarray, direction: str = 'horizontal') → numpy.ndarray[source]¶

Flip an image horizontally or vertically.

Parameters

img (ndarray) – Image to be flipped.
direction (str) – The flip direction, either “horizontal” or “vertical” or “diagonal”.

Returns

The flipped image.

Return type

ndarray

mmcv.image.imflip_(img: numpy.ndarray, direction: str = 'horizontal') → numpy.ndarray[source]¶

Inplace flip an image horizontally or vertically.

Parameters

img (ndarray) – Image to be flipped.
direction (str) – The flip direction, either “horizontal” or “vertical” or “diagonal”.

Returns

The flipped image (inplace).

Return type

ndarray

mmcv.image.imfrombytes(content: bytes, flag: str = 'color', channel_order: str = 'bgr', backend: Optional[str] = None) → numpy.ndarray[source]¶

Read an image from bytes.

Parameters

content (bytes) – Image bytes got from files or other streams.
flag (str) – Same as imread().
channel_order (str) – The channel order of the output, candidates are ‘bgr’ and ‘rgb’. Default to ‘bgr’.
backend (str | None) – The image decoding backend type. Options are cv2, pillow, turbojpeg, tifffile, None. If backend is None, the global imread_backend specified by mmcv.use_backend() will be used. Default: None.

Returns

Loaded image array.

Return type

ndarray

Examples

>>> img_path = '/path/to/img.jpg'
>>> with open(img_path, 'rb') as f:
>>>     img_buff = f.read()
>>> img = mmcv.imfrombytes(img_buff)
>>> img = mmcv.imfrombytes(img_buff, flag='color', channel_order='rgb')
>>> img = mmcv.imfrombytes(img_buff, backend='pillow')
>>> img = mmcv.imfrombytes(img_buff, backend='cv2')

mmcv.image.iminvert(img)[source]¶

Invert (negate) an image.

Parameters: img (ndarray) – Image to be inverted.
Returns: The inverted image.
Return type: ndarray

mmcv.image.imnormalize(img, mean, std, to_rgb=True)[source]¶

Normalize an image with mean and std.

Parameters

img (ndarray) – Image to be normalized.
mean (ndarray) – The mean to be used for normalize.
std (ndarray) – The std to be used for normalize.
to_rgb (bool) – Whether to convert to rgb.

Returns

The normalized image.

Return type

ndarray

mmcv.image.imnormalize_(img, mean, std, to_rgb=True)[source]¶

Inplace normalize an image with mean and std.

Parameters

img (ndarray) – Image to be normalized.
mean (ndarray) – The mean to be used for normalize.
std (ndarray) – The std to be used for normalize.
to_rgb (bool) – Whether to convert to rgb.

Returns

The normalized image.

Return type

ndarray

mmcv.image.impad(img: numpy.ndarray, *, shape: Optional[Tuple[int, int]] = None, padding: Optional[Union[int, tuple]] = None, pad_val: Union[float, List] = 0, padding_mode: str = 'constant') → numpy.ndarray[source]¶

Pad the given image to a certain shape or pad on all sides with specified padding mode and padding value.

Parameters

img (ndarray) – Image to be padded.
shape (tuple[int]) – Expected padding shape (h, w). Default: None.
padding (int or tuple[int]) – Padding on each border. If a single int is provided this is used to pad all borders. If tuple of length 2 is provided this is the padding on left/right and top/bottom respectively. If a tuple of length 4 is provided this is the padding for the left, top, right and bottom borders respectively. Default: None. Note that shape and padding can not be both set.
pad_val (Number | Sequence[Number]) – Values to be filled in padding areas when padding_mode is ‘constant’. Default: 0.
padding_mode (str) –
Type of padding. Should be: constant, edge, reflect or symmetric. Default: constant. - constant: pads with a constant value, this value is specified

with pad_val.
- edge: pads with the last value at the edge of the image.
- reflect: pads with reflection of image without repeating the last value on the edge. For example, padding [1, 2, 3, 4] with 2 elements on both sides in reflect mode will result in [3, 2, 1, 2, 3, 4, 3, 2].
- symmetric: pads with reflection of image repeating the last value on the edge. For example, padding [1, 2, 3, 4] with 2 elements on both sides in symmetric mode will result in [2, 1, 1, 2, 3, 4, 4, 3]

Returns

The padded image.

Return type

ndarray

mmcv.image.impad_to_multiple(img: numpy.ndarray, divisor: int, pad_val: Union[float, List] = 0) → numpy.ndarray[source]¶

Pad an image to ensure each edge to be multiple to some number.

Parameters

img (ndarray) – Image to be padded.
divisor (int) – Padded image edges will be multiple to divisor.
pad_val (Number | Sequence[Number]) – Same as impad().

Returns

The padded image.

Return type

ndarray

mmcv.image.imread(img_or_path: Union[numpy.ndarray, str, pathlib.Path], flag: str = 'color', channel_order: str = 'bgr', backend: Optional[str] = None, file_client_args: Optional[dict] = None) → numpy.ndarray[source]¶

Read an image.

Note

In v1.4.1 and later, add file_client_args parameters.

Parameters

img_or_path (ndarray or str or Path) – Either a numpy array or str or pathlib.Path. If it is a numpy array (loaded image), then it will be returned as is.
flag (str) – Flags specifying the color type of a loaded image, candidates are color, grayscale, unchanged, color_ignore_orientation and grayscale_ignore_orientation. By default, cv2 and pillow backend would rotate the image according to its EXIF info unless called with unchanged or *_ignore_orientation flags. turbojpeg and tifffile backend always ignore image’s EXIF info regardless of the flag. The turbojpeg backend only supports color and grayscale.
channel_order (str) – Order of channel, candidates are bgr and rgb.
backend (str | None) – The image decoding backend type. Options are cv2, pillow, turbojpeg, tifffile, None. If backend is None, the global imread_backend specified by mmcv.use_backend() will be used. Default: None.
file_client_args (dict | None) – Arguments to instantiate a FileClient. See mmcv.fileio.FileClient for details. Default: None.

Returns

Loaded image array.

Return type

ndarray

Examples

>>> import mmcv
>>> img_path = '/path/to/img.jpg'
>>> img = mmcv.imread(img_path)
>>> img = mmcv.imread(img_path, flag='color', channel_order='rgb',
...     backend='cv2')
>>> img = mmcv.imread(img_path, flag='color', channel_order='bgr',
...     backend='pillow')
>>> s3_img_path = 's3://bucket/img.jpg'
>>> # infer the file backend by the prefix s3
>>> img = mmcv.imread(s3_img_path)
>>> # manually set the file backend petrel
>>> img = mmcv.imread(s3_img_path, file_client_args={
...     'backend': 'petrel'})
>>> http_img_path = 'http://path/to/img.jpg'
>>> img = mmcv.imread(http_img_path)
>>> img = mmcv.imread(http_img_path, file_client_args={
...     'backend': 'http'})

mmcv.image.imrescale(img: numpy.ndarray, scale: Union[float, Tuple[int, int]], return_scale: bool = False, interpolation: str = 'bilinear', backend: Optional[str] = None) → Union[numpy.ndarray, Tuple[numpy.ndarray, float]][source]¶

Resize image while keeping the aspect ratio.

Parameters

img (ndarray) – The input image.
scale (float | tuple[int]) – The scaling factor or maximum size. If it is a float number, then the image will be rescaled by this factor, else if it is a tuple of 2 integers, then the image will be rescaled as large as possible within the scale.
return_scale (bool) – Whether to return the scaling factor besides the rescaled image.
interpolation (str) – Same as resize().
backend (str | None) – Same as resize().

Returns

The rescaled image.

Return type

ndarray

mmcv.image.imresize(img: numpy.ndarray, size: Tuple[int, int], return_scale: bool = False, interpolation: str = 'bilinear', out: Optional[numpy.ndarray] = None, backend: Optional[str] = None) → Union[Tuple[numpy.ndarray, float, float], numpy.ndarray][source]¶

Resize image to a given size.

Parameters

img (ndarray) – The input image.
size (tuple[int]) – Target size (w, h).
return_scale (bool) – Whether to return w_scale and h_scale.
interpolation (str) – Interpolation method, accepted values are “nearest”, “bilinear”, “bicubic”, “area”, “lanczos” for ‘cv2’ backend, “nearest”, “bilinear” for ‘pillow’ backend.
out (ndarray) – The output destination.
backend (str | None) – The image resize backend type. Options are cv2, pillow, None. If backend is None, the global imread_backend specified by mmcv.use_backend() will be used. Default: None.

Returns

(resized_img, w_scale, h_scale) or resized_img.

Return type

tuple | ndarray

mmcv.image.imresize_like(img: numpy.ndarray, dst_img: numpy.ndarray, return_scale: bool = False, interpolation: str = 'bilinear', backend: Optional[str] = None) → Union[Tuple[numpy.ndarray, float, float], numpy.ndarray][source]¶

Resize image to the same size of a given image.

Parameters

img (ndarray) – The input image.
dst_img (ndarray) – The target image.
return_scale (bool) – Whether to return w_scale and h_scale.
interpolation (str) – Same as resize().
backend (str | None) – Same as resize().

Returns

(resized_img, w_scale, h_scale) or resized_img.

Return type

tuple or ndarray

mmcv.image.imresize_to_multiple(img: numpy.ndarray, divisor: Union[int, Tuple[int, int]], size: Optional[Union[int, Tuple[int, int]]] = None, scale_factor: Optional[Union[float, Tuple[float, float]]] = None, keep_ratio: bool = False, return_scale: bool = False, interpolation: str = 'bilinear', out: Optional[numpy.ndarray] = None, backend: Optional[str] = None) → Union[Tuple[numpy.ndarray, float, float], numpy.ndarray][source]¶

Resize image according to a given size or scale factor and then rounds up the the resized or rescaled image size to the nearest value that can be divided by the divisor.

Parameters

img (ndarray) – The input image.
divisor (int | tuple) – Resized image size will be a multiple of divisor. If divisor is a tuple, divisor should be (w_divisor, h_divisor).
size (None | int | tuple[int]) – Target size (w, h). Default: None.
scale_factor (None | float | tuple[float]) – Multiplier for spatial size. Should match input size if it is a tuple and the 2D style is (w_scale_factor, h_scale_factor). Default: None.
keep_ratio (bool) – Whether to keep the aspect ratio when resizing the image. Default: False.
return_scale (bool) – Whether to return w_scale and h_scale.
interpolation (str) – Interpolation method, accepted values are “nearest”, “bilinear”, “bicubic”, “area”, “lanczos” for ‘cv2’ backend, “nearest”, “bilinear” for ‘pillow’ backend.
out (ndarray) – The output destination.
backend (str | None) – The image resize backend type. Options are cv2, pillow, None. If backend is None, the global imread_backend specified by mmcv.use_backend() will be used. Default: None.

Returns

(resized_img, w_scale, h_scale) or resized_img.

Return type

tuple | ndarray

mmcv.image.imrotate(img: numpy.ndarray, angle: float, center: Optional[Tuple[float, float]] = None, scale: float = 1.0, border_value: int = 0, interpolation: str = 'bilinear', auto_bound: bool = False, border_mode: str = 'constant') → numpy.ndarray[source]¶

Rotate an image.

Parameters

img (np.ndarray) – Image to be rotated.
angle (float) – Rotation angle in degrees, positive values mean clockwise rotation.
center (tuple[float], optional) – Center point (w, h) of the rotation in the source image. If not specified, the center of the image will be used.
scale (float) – Isotropic scale factor.
border_value (int) – Border value used in case of a constant border. Defaults to 0.
interpolation (str) – Same as resize().
auto_bound (bool) – Whether to adjust the image size to cover the whole rotated image.
border_mode (str) – Pixel extrapolation method. Defaults to ‘constant’.

Returns

The rotated image.

Return type

np.ndarray

mmcv.image.imshear(img: numpy.ndarray, magnitude: Union[int, float], direction: str = 'horizontal', border_value: Union[int, Tuple[int, int]] = 0, interpolation: str = 'bilinear') → numpy.ndarray[source]¶

Shear an image.

Parameters

img (ndarray) – Image to be sheared with format (h, w) or (h, w, c).
magnitude (int | float) – The magnitude used for shear.
direction (str) – The flip direction, either “horizontal” or “vertical”.
border_value (int | tuple[int]) – Value used in case of a constant border.
interpolation (str) – Same as resize().

Returns

The sheared image.

Return type

ndarray

mmcv.image.imtranslate(img: numpy.ndarray, offset: Union[int, float], direction: str = 'horizontal', border_value: Union[int, tuple] = 0, interpolation: str = 'bilinear') → numpy.ndarray[source]¶

Translate an image.

Parameters

img (ndarray) – Image to be translated with format (h, w) or (h, w, c).
offset (int | float) – The offset used for translate.
direction (str) – The translate direction, either “horizontal” or “vertical”.
border_value (int | tuple[int]) – Value used in case of a constant border.
interpolation (str) – Same as resize().

Returns

The translated image.

Return type

ndarray

mmcv.image.imwrite(img: numpy.ndarray, file_path: str, params: Optional[list] = None, auto_mkdir: Optional[bool] = None, file_client_args: Optional[dict] = None) → bool[source]¶

Write image to file.

Note

In v1.4.1 and later, add file_client_args parameters.

Warning

The parameter auto_mkdir will be deprecated in the future and every file clients will make directory automatically.

Parameters

img (ndarray) – Image array to be written.
file_path (str) – Image file path.
params (None or list) – Same as opencv imwrite() interface.
auto_mkdir (bool) – If the parent folder of file_path does not exist, whether to create it automatically. It will be deprecated.
file_client_args (dict | None) – Arguments to instantiate a FileClient. See mmcv.fileio.FileClient for details. Default: None.

Returns

Successful or not.

Return type

bool

Examples

>>> # write to hard disk client
>>> ret = mmcv.imwrite(img, '/path/to/img.jpg')
>>> # infer the file backend by the prefix s3
>>> ret = mmcv.imwrite(img, 's3://bucket/img.jpg')
>>> # manually set the file backend petrel
>>> ret = mmcv.imwrite(img, 's3://bucket/img.jpg', file_client_args={
...     'backend': 'petrel'})

mmcv.image.lut_transform(img, lut_table)[source]¶

Transform array by look-up table.

The function lut_transform fills the output array with values from the look-up table. Indices of the entries are taken from the input array.

Parameters

img (ndarray) – Image to be transformed.
lut_table (ndarray) – look-up table of 256 elements; in case of multi-channel input array, the table should either have a single channel (in this case the same table is used for all channels) or the same number of channels as in the input array.

Returns

The transformed image.

Return type

ndarray

mmcv.image.posterize(img, bits)[source]¶

Posterize an image (reduce the number of bits for each color channel)

Parameters

img (ndarray) – Image to be posterized.
bits (int) – Number of bits (1 to 8) to use for posterizing.

Returns

The posterized image.

Return type

ndarray

mmcv.image.rescale_size(old_size: tuple, scale: Union[float, int, tuple], return_scale: bool = False) → tuple[source]¶

Calculate the new size to be rescaled to.

Parameters

old_size (tuple[int]) – The old size (w, h) of image.
scale (float | tuple[int]) – The scaling factor or maximum size. If it is a float number, then the image will be rescaled by this factor, else if it is a tuple of 2 integers, then the image will be rescaled as large as possible within the scale.
return_scale (bool) – Whether to return the scaling factor besides the rescaled image size.

Returns

The new rescaled image size.

Return type

tuple[int]

mmcv.image.rgb2bgr(img: numpy.ndarray) → numpy.ndarray¶

Convert a RGB image to BGR: image.

Parameters: img (ndarray or str) – The input image.
Returns: The converted BGR image.
Return type: ndarray

mmcv.image.rgb2gray(img: numpy.ndarray, keepdim: bool = False) → numpy.ndarray[source]¶

Convert a RGB image to grayscale image.

Parameters

img (ndarray) – The input image.
keepdim (bool) – If False (by default), then return the grayscale image with 2 dims, otherwise 3 dims.

Returns

The converted grayscale image.

Return type

ndarray

mmcv.image.rgb2ycbcr(img: numpy.ndarray, y_only: bool = False) → numpy.ndarray[source]¶

Convert a RGB image to YCbCr image.

This function produces the same results as Matlab’s rgb2ycbcr function. It implements the ITU-R BT.601 conversion for standard-definition television. See more details in https://en.wikipedia.org/wiki/YCbCr#ITU-R_BT.601_conversion.

It differs from a similar function in cv2.cvtColor: RGB <-> YCrCb. In OpenCV, it implements a JPEG conversion. See more details in https://en.wikipedia.org/wiki/YCbCr#JPEG_conversion.

Parameters

img (ndarray) – The input image. It accepts: 1. np.uint8 type with range [0, 255]; 2. np.float32 type with range [0, 1].
y_only (bool) – Whether to only return Y channel. Default: False.

Returns

The converted YCbCr image. The output image has the same type and range as input image.

Return type

ndarray

mmcv.image.solarize(img, thr=128)[source]¶

Solarize an image (invert all pixel values above a threshold)

Parameters

img (ndarray) – Image to be solarized.
thr (int) – Threshold for solarizing (0 - 255).

Returns

The solarized image.

Return type

ndarray

mmcv.image.tensor2imgs(tensor, mean: Optional[tuple] = None, std: Optional[tuple] = None, to_rgb: bool = True) → list[source]¶

Convert tensor to 3-channel images or 1-channel gray images.

Parameters

tensor (torch.Tensor) – Tensor that contains multiple images, shape ( N, C, H, W). \(C\) can be either 3 or 1.
mean (tuple[float], optional) – Mean of images. If None, (0, 0, 0) will be used for tensor with 3-channel, while (0, ) for tensor with 1-channel. Defaults to None.
std (tuple[float], optional) – Standard deviation of images. If None, (1, 1, 1) will be used for tensor with 3-channel, while (1, ) for tensor with 1-channel. Defaults to None.
to_rgb (bool, optional) – Whether the tensor was converted to RGB format in the first place. If so, convert it back to BGR. For the tensor with 1 channel, it must be False. Defaults to True.

Returns

A list that contains multiple images.

Return type

list[np.ndarray]

mmcv.image.use_backend(backend: str) → None[source]¶

Select a backend for image decoding.

Parameters

backend (str) – The image decoding backend type. Options are cv2,
pillow – //github.com/lilohuang/PyTurboJPEG)
(see https (turbojpeg) – //github.com/lilohuang/PyTurboJPEG)
tifffile. turbojpeg is faster but it only supports .jpeg (and) –
format. (file) –

mmcv.image.ycbcr2bgr(img: numpy.ndarray) → numpy.ndarray[source]¶

Convert a YCbCr image to BGR image.

The bgr version of ycbcr2rgb. It implements the ITU-R BT.601 conversion for standard-definition television. See more details in https://en.wikipedia.org/wiki/YCbCr#ITU-R_BT.601_conversion.

It differs from a similar function in cv2.cvtColor: YCrCb <-> BGR. In OpenCV, it implements a JPEG conversion. See more details in https://en.wikipedia.org/wiki/YCbCr#JPEG_conversion.

Parameters: img (ndarray) – The input image. It accepts: 1. np.uint8 type with range [0, 255]; 2. np.float32 type with range [0, 1].
Returns: The converted BGR image. The output image has the same type and range as input image.
Return type: ndarray

mmcv.image.ycbcr2rgb(img: numpy.ndarray) → numpy.ndarray[source]¶

Convert a YCbCr image to RGB image.

This function produces the same results as Matlab’s ycbcr2rgb function. It implements the ITU-R BT.601 conversion for standard-definition television. See more details in https://en.wikipedia.org/wiki/YCbCr#ITU-R_BT.601_conversion.

It differs from a similar function in cv2.cvtColor: YCrCb <-> RGB. In OpenCV, it implements a JPEG conversion. See more details in https://en.wikipedia.org/wiki/YCbCr#JPEG_conversion.

Parameters: img (ndarray) – The input image. It accepts: 1. np.uint8 type with range [0, 255]; 2. np.float32 type with range [0, 1].
Returns: The converted RGB image. The output image has the same type and range as input image.
Return type: ndarray

video¶

class mmcv.video.VideoReader(filename, cache_capacity=10)[source]¶

Video class with similar usage to a list object.

This video wrapper class provides convenient apis to access frames. There exists an issue of OpenCV’s VideoCapture class that jumping to a certain frame may be inaccurate. It is fixed in this class by checking the position after jumping each time. Cache is used when decoding videos. So if the same frame is visited for the second time, there is no need to decode again if it is stored in the cache.

Examples

>>> import mmcv
>>> v = mmcv.VideoReader('sample.mp4')
>>> len(v)  # get the total frame number with `len()`
120
>>> for img in v:  # v is iterable
>>>     mmcv.imshow(img)
>>> v[5]  # get the 6th frame

current_frame()[source]¶

Get the current frame (frame that is just visited).

Returns: If the video is fresh, return None, otherwise return the frame.
Return type: ndarray or None

cvt2frames(frame_dir, file_start=0, filename_tmpl='{:06d}.jpg', start=0, max_num=0, show_progress=True)[source]¶

Convert a video to frame images.

Parameters

frame_dir (str) – Output directory to store all the frame images.
file_start (int) – Filenames will start from the specified number.
filename_tmpl (str) – Filename template with the index as the placeholder.
start (int) – The starting frame index.
max_num (int) – Maximum number of frames to be written.
show_progress (bool) – Whether to show a progress bar.

property fourcc¶

“Four character code” of the video.

Type: str

property fps¶

FPS of the video.

Type: float

property frame_cnt¶

Total frames of the video.

Type: int

get_frame(frame_id)[source]¶

Get frame by index.

Parameters: frame_id (int) – Index of the expected frame, 0-based.
Returns: Return the frame if successful, otherwise None.
Return type: ndarray or None

property height¶

Height of video frames.

Type: int

property opened¶

Indicate whether the video is opened.

Type: bool

property position¶

Current cursor position, indicating frame decoded.

Type: int

read()[source]¶

Read the next frame.

If the next frame have been decoded before and in the cache, then return it directly, otherwise decode, cache and return it.

Returns: Return the frame if successful, otherwise None.
Return type: ndarray or None

property resolution¶

Video resolution (width, height).

Type: tuple

property vcap¶

The raw VideoCapture object.

Type: cv2.VideoCapture

property width¶

Width of video frames.

Type: int

mmcv.video.concat_video(video_list: List, out_file: str, vcodec: Optional[str] = None, acodec: Optional[str] = None, log_level: str = 'info', print_cmd: bool = False) → None[source]¶

Concatenate multiple videos into a single one.

Parameters

video_list (list) – A list of video filenames
out_file (str) – Output video filename
vcodec (None or str) – Output video codec, None for unchanged
acodec (None or str) – Output audio codec, None for unchanged
log_level (str) – Logging level of ffmpeg.
print_cmd (bool) – Whether to print the final ffmpeg command.

mmcv.video.convert_video(in_file: str, out_file: str, print_cmd: bool = False, pre_options: str = '', **kwargs) → None[source]¶

Convert a video with ffmpeg.

This provides a general api to ffmpeg, the executed command is:

`ffmpeg -y <pre_options> -i <in_file> <options> <out_file>`

Options(kwargs) are mapped to ffmpeg commands with the following rules:

key=val: “-key val”
key=True: “-key”
key=False: “”

Parameters

in_file (str) – Input video filename.
out_file (str) – Output video filename.
pre_options (str) – Options appears before “-i <in_file>”.
print_cmd (bool) – Whether to print the final ffmpeg command.

mmcv.video.cut_video(in_file: str, out_file: str, start: Optional[float] = None, end: Optional[float] = None, vcodec: Optional[str] = None, acodec: Optional[str] = None, log_level: str = 'info', print_cmd: bool = False) → None[source]¶

Cut a clip from a video.

Parameters

in_file (str) – Input video filename.
out_file (str) – Output video filename.
start (None or float) – Start time (in seconds).
end (None or float) – End time (in seconds).
vcodec (None or str) – Output video codec, None for unchanged.
acodec (None or str) – Output audio codec, None for unchanged.
log_level (str) – Logging level of ffmpeg.
print_cmd (bool) – Whether to print the final ffmpeg command.

mmcv.video.dequantize_flow(dx: numpy.ndarray, dy: numpy.ndarray, max_val: float = 0.02, denorm: bool = True) → numpy.ndarray[source]¶

Recover from quantized flow.

Parameters

dx (ndarray) – Quantized dx.
dy (ndarray) – Quantized dy.
max_val (float) – Maximum value used when quantizing.
denorm (bool) – Whether to multiply flow values with width/height.

Returns

Dequantized flow.

Return type

ndarray

mmcv.video.flow_from_bytes(content: bytes) → numpy.ndarray[source]¶

Read dense optical flow from bytes.

Note

This load optical flow function works for FlyingChairs, FlyingThings3D, Sintel, FlyingChairsOcc datasets, but cannot load the data from ChairsSDHom.

Parameters: content (bytes) – Optical flow bytes got from files or other streams.
Returns: Loaded optical flow with the shape (H, W, 2).
Return type: ndarray

mmcv.video.flow_warp(img: numpy.ndarray, flow: numpy.ndarray, filling_value: int = 0, interpolate_mode: str = 'nearest') → numpy.ndarray[source]¶

Use flow to warp img.

Parameters

img (ndarray) – Image to be warped.
flow (ndarray) – Optical Flow.
filling_value (int) – The missing pixels will be set with filling_value.
interpolate_mode (str) – bilinear -> Bilinear Interpolation; nearest -> Nearest Neighbor.

Returns

Warped image with the same shape of img

Return type

ndarray

mmcv.video.flowread(flow_or_path: Union[numpy.ndarray, str], quantize: bool = False, concat_axis: int = 0, *args, **kwargs) → numpy.ndarray[source]¶

Read an optical flow map.

Parameters

flow_or_path (ndarray or str) – A flow map or filepath.
quantize (bool) – whether to read quantized pair, if set to True, remaining args will be passed to dequantize_flow().
concat_axis (int) – The axis that dx and dy are concatenated, can be either 0 or 1. Ignored if quantize is False.

Returns

Optical flow represented as a (h, w, 2) numpy array

Return type

ndarray

mmcv.video.flowwrite(flow: numpy.ndarray, filename: str, quantize: bool = False, concat_axis: int = 0, *args, **kwargs) → None[source]¶

Write optical flow to file.

If the flow is not quantized, it will be saved as a .flo file losslessly, otherwise a jpeg image which is lossy but of much smaller size. (dx and dy will be concatenated horizontally into a single image if quantize is True.)

Parameters

flow (ndarray) – (h, w, 2) array of optical flow.
filename (str) – Output filepath.
quantize (bool) – Whether to quantize the flow and save it to 2 jpeg images. If set to True, remaining args will be passed to quantize_flow().
concat_axis (int) – The axis that dx and dy are concatenated, can be either 0 or 1. Ignored if quantize is False.

mmcv.video.frames2video(frame_dir: str, video_file: str, fps: float = 30, fourcc: str = 'XVID', filename_tmpl: str = '{:06d}.jpg', start: int = 0, end: int = 0, show_progress: bool = True) → None[source]¶

Read the frame images from a directory and join them as a video.

Parameters

frame_dir (str) – The directory containing video frames.
video_file (str) – Output filename.
fps (float) – FPS of the output video.
fourcc (str) – Fourcc of the output video, this should be compatible with the output file type.
filename_tmpl (str) – Filename template with the index as the variable.
start (int) – Starting frame index.
end (int) – Ending frame index.
show_progress (bool) – Whether to show a progress bar.

mmcv.video.quantize_flow(flow: numpy.ndarray, max_val: float = 0.02, norm: bool = True) → tuple[source]¶

Quantize flow to [0, 255].

After this step, the size of flow will be much smaller, and can be dumped as jpeg images.

Parameters

flow (ndarray) – (h, w, 2) array of optical flow.
max_val (float) – Maximum value of flow, values beyond [-max_val, max_val] will be truncated.
norm (bool) – Whether to divide flow values by image width/height.

Returns

Quantized dx and dy.

Return type

tuple[ndarray]

mmcv.video.resize_video(in_file: str, out_file: str, size: Optional[tuple] = None, ratio: Optional[Union[tuple, float]] = None, keep_ar: bool = False, log_level: str = 'info', print_cmd: bool = False) → None[source]¶

Resize a video.

Parameters

in_file (str) – Input video filename.
out_file (str) – Output video filename.
size (tuple) – Expected size (w, h), eg, (320, 240) or (320, -1).
ratio (tuple or float) – Expected resize ratio, (2, 0.5) means (w*2, h*0.5).
keep_ar (bool) – Whether to keep original aspect ratio.
log_level (str) – Logging level of ffmpeg.
print_cmd (bool) – Whether to print the final ffmpeg command.

mmcv.video.sparse_flow_from_bytes(content: bytes) → Tuple[numpy.ndarray, numpy.ndarray][source]¶

Read the optical flow in KITTI datasets from bytes.

This function is modified from RAFT load the KITTI datasets.

Parameters: content (bytes) – Optical flow bytes got from files or other streams.
Returns: Loaded optical flow with the shape (H, W, 2) and flow valid mask with the shape (H, W).
Return type: Tuple(ndarray, ndarray)

arraymisc¶

mmcv.arraymisc.dequantize(arr: numpy.ndarray, min_val: Union[int, float], max_val: Union[int, float], levels: int, dtype=<class 'numpy.float64'>) → tuple[source]¶

Dequantize an array.

Parameters

arr (ndarray) – Input array.
min_val (int or float) – Minimum value to be clipped.
max_val (int or float) – Maximum value to be clipped.
levels (int) – Quantization levels.
dtype (np.type) – The type of the dequantized array.

Returns

Dequantized array.

Return type

tuple

mmcv.arraymisc.quantize(arr: numpy.ndarray, min_val: Union[int, float], max_val: Union[int, float], levels: int, dtype=<class 'numpy.int64'>) → tuple[source]¶

Quantize an array of (-inf, inf) to [0, levels-1].

Parameters

arr (ndarray) – Input array.
min_val (int or float) – Minimum value to be clipped.
max_val (int or float) – Maximum value to be clipped.
levels (int) – Quantization levels.
dtype (np.type) – The type of the quantized array.

Returns

Quantized array.

Return type

tuple

visualization¶

class mmcv.visualization.Color(value)[source]¶

An enum that defines common colors.

Contains red, green, blue, cyan, yellow, magenta, white and black.

mmcv.visualization.color_val(color: Union[mmcv.visualization.color.Color, str, tuple, int, numpy.ndarray]) → tuple[source]¶

Convert various input to color tuples.

Parameters: color (Color/str/tuple/int/ndarray) – Color inputs
Returns: A tuple of 3 integers indicating BGR channels.
Return type: tuple[int]

mmcv.visualization.flow2rgb(flow: numpy.ndarray, color_wheel: Optional[numpy.ndarray] = None, unknown_thr: float = 1000000.0) → numpy.ndarray[source]¶

Convert flow map to RGB image.

Parameters

flow (ndarray) – Array of optical flow.
color_wheel (ndarray or None) – Color wheel used to map flow field to RGB colorspace. Default color wheel will be used if not specified.
unknown_thr (float) – Values above this threshold will be marked as unknown and thus ignored.

Returns

RGB image that can be visualized.

Return type

ndarray

mmcv.visualization.flowshow(flow: Union[numpy.ndarray, str], win_name: str = '', wait_time: int = 0) → None[source]¶

Show optical flow.

Parameters

flow (ndarray or str) – The optical flow to be displayed.
win_name (str) – The window name.
wait_time (int) – Value of waitKey param.

mmcv.visualization.imshow(img: Union[str, numpy.ndarray], win_name: str = '', wait_time: int = 0)[source]¶

Show an image.

Parameters

img (str or ndarray) – The image to be displayed.
win_name (str) – The window name.
wait_time (int) – Value of waitKey param.

mmcv.visualization.imshow_bboxes(img: Union[str, numpy.ndarray], bboxes: Union[list, numpy.ndarray], colors: Union[mmcv.visualization.color.Color, str, tuple, int, numpy.ndarray] = 'green', top_k: int = - 1, thickness: int = 1, show: bool = True, win_name: str = '', wait_time: int = 0, out_file: Optional[str] = None)[source]¶

Draw bboxes on an image.

Parameters

img (str or ndarray) – The image to be displayed.
bboxes (list or ndarray) – A list of ndarray of shape (k, 4).
colors (Color or str or tuple or int or ndarray) – A list of colors.
top_k (int) – Plot the first k bboxes only if set positive.
thickness (int) – Thickness of lines.
show (bool) – Whether to show the image.
win_name (str) – The window name.
wait_time (int) – Value of waitKey param.
out_file (str, optional) – The filename to write the image.

Returns

The image with bboxes drawn on it.

Return type

ndarray

mmcv.visualization.imshow_det_bboxes(img: Union[str, numpy.ndarray], bboxes: numpy.ndarray, labels: numpy.ndarray, class_names: Optional[List[str]] = None, score_thr: float = 0, bbox_color: Union[mmcv.visualization.color.Color, str, tuple, int, numpy.ndarray] = 'green', text_color: Union[mmcv.visualization.color.Color, str, tuple, int, numpy.ndarray] = 'green', thickness: int = 1, font_scale: float = 0.5, show: bool = True, win_name: str = '', wait_time: int = 0, out_file: Optional[str] = None)[source]¶

Draw bboxes and class labels (with scores) on an image.

Parameters

img (str or ndarray) – The image to be displayed.
bboxes (ndarray) – Bounding boxes (with scores), shaped (n, 4) or (n, 5).
labels (ndarray) – Labels of bboxes.
class_names (list[str]) – Names of each classes.
score_thr (float) – Minimum score of bboxes to be shown.
bbox_color (Color or str or tuple or int or ndarray) – Color of bbox lines.
text_color (Color or str or tuple or int or ndarray) – Color of texts.
thickness (int) – Thickness of lines.
font_scale (float) – Font scales of texts.
show (bool) – Whether to show the image.
win_name (str) – The window name.
wait_time (int) – Value of waitKey param.
out_file (str or None) – The filename to write the image.

Returns

The image with bboxes drawn on it.

Return type

ndarray

mmcv.visualization.make_color_wheel(bins: Optional[Union[list, tuple]] = None) → numpy.ndarray[source]¶

Build a color wheel.

Parameters: bins (list or tuple, optional) – Specify the number of bins for each color range, corresponding to six ranges: red -> yellow, yellow -> green, green -> cyan, cyan -> blue, blue -> magenta, magenta -> red. [15, 6, 4, 11, 13, 6] is used for default (see Middlebury).
Returns: Color wheel of shape (total_bins, 3).
Return type: ndarray

utils¶

class mmcv.utils.BuildExtension(*args, **kwargs)[source]¶

A custom setuptools build extension .

This setuptools.build_ext subclass takes care of passing the minimum required compiler flags (e.g. -std=c++14) as well as mixed C++/CUDA compilation (and support for CUDA files in general).

When using BuildExtension, it is allowed to supply a dictionary for extra_compile_args (rather than the usual list) that maps from languages (cxx or nvcc) to a list of additional compiler flags to supply to the compiler. This makes it possible to supply different flags to the C++ and CUDA compiler during mixed compilation.

use_ninja (bool): If use_ninja is True (default), then we attempt to build using the Ninja backend. Ninja greatly speeds up compilation compared to the standard setuptools.build_ext. Fallbacks to the standard distutils backend if Ninja is not available.

Note

By default, the Ninja backend uses #CPUS + 2 workers to build the extension. This may use up too many resources on some systems. One can control the number of workers by setting the MAX_JOBS environment variable to a non-negative number.

finalize_options() → None[source]¶

Set final values for all the options that this command supports. This is always called as late as possible, ie. after any option assignments from the command-line or from other commands have been done. Thus, this is the place to code option dependencies: if ‘foo’ depends on ‘bar’, then it is safe to set ‘foo’ from ‘bar’ as long as ‘foo’ still has the same value it was assigned in ‘initialize_options()’.

This method must be implemented by all command classes.

get_ext_filename(ext_name)[source]¶: Convert the name of an extension (eg. “foo.bar”) into the name of the file from which it will be loaded (eg. “foo/bar.so”, or “foobar.pyd”).

classmethod with_options(**options)[source]¶: Returns a subclass with alternative constructor that extends any original keyword arguments to the original constructor with the given options.

mmcv.utils.CUDAExtension(name, sources, *args, **kwargs)[source]¶

Creates a setuptools.Extension for CUDA/C++.

Convenience method that creates a setuptools.Extension with the bare minimum (but often sufficient) arguments to build a CUDA/C++ extension. This includes the CUDA include path, library path and runtime library.

All arguments are forwarded to the setuptools.Extension constructor.

Example

>>> # xdoctest: +SKIP
>>> from setuptools import setup
>>> from torch.utils.cpp_extension import BuildExtension, CUDAExtension
>>> setup(
...     name='cuda_extension',
...     ext_modules=[
...         CUDAExtension(
...                 name='cuda_extension',
...                 sources=['extension.cpp', 'extension_kernel.cu'],
...                 extra_compile_args={'cxx': ['-g'],
...                                     'nvcc': ['-O2']})
...     ],
...     cmdclass={
...         'build_ext': BuildExtension
...     })

Compute capabilities:

By default the extension will be compiled to run on all archs of the cards visible during the building process of the extension, plus PTX. If down the road a new card is installed the extension may need to be recompiled. If a visible card has a compute capability (CC) that’s newer than the newest version for which your nvcc can build fully-compiled binaries, Pytorch will make nvcc fall back to building kernels with the newest version of PTX your nvcc does support (see below for details on PTX).

You can override the default behavior using TORCH_CUDA_ARCH_LIST to explicitly specify which CCs you want the extension to support:

TORCH_CUDA_ARCH_LIST=”6.1 8.6” python build_my_extension.py TORCH_CUDA_ARCH_LIST=”5.2 6.0 6.1 7.0 7.5 8.0 8.6+PTX” python build_my_extension.py

The +PTX option causes extension kernel binaries to include PTX instructions for the specified CC. PTX is an intermediate representation that allows kernels to runtime-compile for any CC >= the specified CC (for example, 8.6+PTX generates PTX that can runtime-compile for any GPU with CC >= 8.6). This improves your binary’s forward compatibility. However, relying on older PTX to provide forward compat by runtime-compiling for newer CCs can modestly reduce performance on those newer CCs. If you know exact CC(s) of the GPUs you want to target, you’re always better off specifying them individually. For example, if you want your extension to run on 8.0 and 8.6, “8.0+PTX” would work functionally because it includes PTX that can runtime-compile for 8.6, but “8.0 8.6” would be better.

Note that while it’s possible to include all supported archs, the more archs get included the slower the building process will be, as it will build a separate kernel image for each arch.

Note that CUDA-11.5 nvcc will hit internal compiler error while parsing torch/extension.h on Windows. To workaround the issue, move python binding logic to pure C++ file.

Example use:

>>> # xdoctest: +SKIP
>>> #include <ATen/ATen.h>
>>> at::Tensor SigmoidAlphaBlendForwardCuda(....)

Instead of:

>>> # xdoctest: +SKIP
>>> #include <torch/extension.h>
>>> torch::Tensor SigmoidAlphaBlendForwardCuda(...)

Currently open issue for nvcc bug: https://github.com/pytorch/pytorch/issues/69460 Complete workaround code example: https://github.com/facebookresearch/pytorch3d/commit/cb170ac024a949f1f9614ffe6af1c38d972f7d48

Relocatable device code linking:

If you want to reference device symbols across compilation units (across object files), the object files need to be built with relocatable device code (-rdc=true or -dc). An exception to this rule is “dynamic parallelism” (nested kernel launches) which is not used a lot anymore. Relocatable device code is less optimized so it needs to be used only on object files that need it. Using -dlto (Device Link Time Optimization) at the device code compilation step and dlink step help reduce the protentional perf degradation of -rdc. Note that it needs to be used at both steps to be useful.

If you have rdc objects you need to have an extra -dlink (device linking) step before the CPU symbol linking step. There is also a case where -dlink is used without -rdc: when an extension is linked against a static lib containing rdc-compiled objects like the [NVSHMEM library](https://developer.nvidia.com/nvshmem).

Note: Ninja is required to build a CUDA Extension with RDC linking.

Example

>>> # xdoctest: +SKIP
>>> CUDAExtension(
...        name='cuda_extension',
...        sources=['extension.cpp', 'extension_kernel.cu'],
...        dlink=True,
...        dlink_libraries=["dlink_lib"],
...        extra_compile_args={'cxx': ['-g'],
...                            'nvcc': ['-O2', '-rdc=true']})

class mmcv.utils.Config(cfg_dict=None, cfg_text=None, filename=None)[source]¶

A facility for config and config files.

It supports common file formats as configs: python/json/yaml. The interface is the same as a dict object and also allows access config values as attributes.

Example

>>> cfg = Config(dict(a=1, b=dict(b1=[0, 1])))
>>> cfg.a
1
>>> cfg.b
{'b1': [0, 1]}
>>> cfg.b.b1
[0, 1]
>>> cfg = Config.fromfile('tests/data/config/a.py')
>>> cfg.filename
"/home/kchen/projects/mmcv/tests/data/config/a.py"
>>> cfg.item4
'test'
>>> cfg
"Config [path: /home/kchen/projects/mmcv/tests/data/config/a.py]: "
"{'item1': [1, 2], 'item2': {'a': 0}, 'item3': True, 'item4': 'test'}"

static auto_argparser(description=None)[source]¶: Generate argparser from config file automatically (experimental)

dump(file=None)[source]¶

Dumps config into a file or returns a string representation of the config.

If a file argument is given, saves the config to that file using the format defined by the file argument extension.

Otherwise, returns a string representing the config. The formatting of this returned string is defined by the extension of self.filename. If self.filename is not defined, returns a string representation of a

dict (lowercased and using ‘ for strings).

Examples

>>> cfg_dict = dict(item1=[1, 2], item2=dict(a=0),
...     item3=True, item4='test')
>>> cfg = Config(cfg_dict=cfg_dict)
>>> dump_file = "a.py"
>>> cfg.dump(dump_file)

Parameters: file (str, optional) – Path of the output file where the config will be dumped. Defaults to None.

static fromstring(cfg_str, file_format)[source]¶

Generate config from config str.

Parameters

cfg_str (str) – Config str.
file_format (str) – Config file format corresponding to the config str. Only py/yml/yaml/json type are supported now!

Returns

Config obj.

Return type

Config

merge_from_dict(options, allow_list_keys=True)[source]¶

Merge list into cfg_dict.

Merge the dict parsed by MultipleKVAction into this cfg.

Examples

>>> options = {'model.backbone.depth': 50,
...            'model.backbone.with_cp':True}
>>> cfg = Config(dict(model=dict(backbone=dict(type='ResNet'))))
>>> cfg.merge_from_dict(options)
>>> cfg_dict = super(Config, self).__getattribute__('_cfg_dict')
>>> assert cfg_dict == dict(
...     model=dict(backbone=dict(depth=50, with_cp=True)))

>>> # Merge list element
>>> cfg = Config(dict(pipeline=[
...     dict(type='LoadImage'), dict(type='LoadAnnotations')]))
>>> options = dict(pipeline={'0': dict(type='SelfLoadImage')})
>>> cfg.merge_from_dict(options, allow_list_keys=True)
>>> cfg_dict = super(Config, self).__getattribute__('_cfg_dict')
>>> assert cfg_dict == dict(pipeline=[
...     dict(type='SelfLoadImage'), dict(type='LoadAnnotations')])

Parameters

options (dict) – dict of configs to merge from.
allow_list_keys (bool) – If True, int string keys (e.g. ‘0’, ‘1’) are allowed in options and will replace the element of the corresponding index in the config if the config is a list. Default: True.

class mmcv.utils.ConfigDict(*args, **kwargs)[source]¶

mmcv.utils.CppExtension(name, sources, *args, **kwargs)[source]¶

Creates a setuptools.Extension for C++.

Convenience method that creates a setuptools.Extension with the bare minimum (but often sufficient) arguments to build a C++ extension.

All arguments are forwarded to the setuptools.Extension constructor.

Example

>>> # xdoctest: +SKIP
>>> from setuptools import setup
>>> from torch.utils.cpp_extension import BuildExtension, CppExtension
>>> setup(
...     name='extension',
...     ext_modules=[
...         CppExtension(
...             name='extension',
...             sources=['extension.cpp'],
...             extra_compile_args=['-g']),
...     ],
...     cmdclass={
...         'build_ext': BuildExtension
...     })

class mmcv.utils.DataLoader(dataset: torch.utils.data.dataset.Dataset[torch.utils.data.dataloader.T_co], batch_size: Optional[int] = 1, shuffle: Optional[bool] = None, sampler: Optional[Union[torch.utils.data.sampler.Sampler, Iterable]] = None, batch_sampler: Optional[Union[torch.utils.data.sampler.Sampler[Sequence], Iterable[Sequence]]] = None, num_workers: int = 0, collate_fn: Optional[Callable[[List[torch.utils.data.dataloader.T]], Any]] = None, pin_memory: bool = False, drop_last: bool = False, timeout: float = 0, worker_init_fn: Optional[Callable[[int], None]] = None, multiprocessing_context=None, generator=None, *, prefetch_factor: int = 2, persistent_workers: bool = False, pin_memory_device: str = '')[source]¶

Data loader. Combines a dataset and a sampler, and provides an iterable over the given dataset.

The DataLoader supports both map-style and iterable-style datasets with single- or multi-process loading, customizing loading order and optional automatic batching (collation) and memory pinning.

See torch.utils.data documentation page for more details.

Parameters

dataset (Dataset) – dataset from which to load the data.
batch_size (int, optional) – how many samples per batch to load (default: 1).
shuffle (bool, optional) – set to True to have the data reshuffled at every epoch (default: False).
sampler (Sampler or Iterable, optional) – defines the strategy to draw samples from the dataset. Can be any Iterable with __len__ implemented. If specified, shuffle must not be specified.
batch_sampler (Sampler or Iterable, optional) – like sampler, but returns a batch of indices at a time. Mutually exclusive with batch_size, shuffle, sampler, and drop_last.
num_workers (int, optional) – how many subprocesses to use for data loading. 0 means that the data will be loaded in the main process. (default: 0)
collate_fn (Callable, optional) – merges a list of samples to form a mini-batch of Tensor(s). Used when using batched loading from a map-style dataset.
pin_memory (bool, optional) – If True, the data loader will copy Tensors into device/CUDA pinned memory before returning them. If your data elements are a custom type, or your collate_fn returns a batch that is a custom type, see the example below.
drop_last (bool, optional) – set to True to drop the last incomplete batch, if the dataset size is not divisible by the batch size. If False and the size of dataset is not divisible by the batch size, then the last batch will be smaller. (default: False)
timeout (numeric, optional) – if positive, the timeout value for collecting a batch from workers. Should always be non-negative. (default: 0)
worker_init_fn (Callable, optional) – If not None, this will be called on each worker subprocess with the worker id (an int in [0, num_workers - 1]) as input, after seeding and before data loading. (default: None)
generator (torch.Generator, optional) – If not None, this RNG will be used by RandomSampler to generate random indexes and multiprocessing to generate base_seed for workers. (default: None)
prefetch_factor (int, optional, keyword-only arg) – Number of batches loaded in advance by each worker. 2 means there will be a total of 2 * num_workers batches prefetched across all workers. (default: 2)
persistent_workers (bool, optional) – If True, the data loader will not shutdown the worker processes after a dataset has been consumed once. This allows to maintain the workers Dataset instances alive. (default: False)
pin_memory_device (str, optional) – the data loader will copy Tensors into device pinned memory before returning them if pin_memory is set to true.

Warning

If the spawn start method is used, worker_init_fn cannot be an unpicklable object, e.g., a lambda function. See multiprocessing-best-practices on more details related to multiprocessing in PyTorch.

Warning

len(dataloader) heuristic is based on the length of the sampler used. When dataset is an IterableDataset, it instead returns an estimate based on len(dataset) / batch_size, with proper rounding depending on drop_last, regardless of multi-process loading configurations. This represents the best guess PyTorch can make because PyTorch trusts user dataset code in correctly handling multi-process loading to avoid duplicate data.

However, if sharding results in multiple workers having incomplete last batches, this estimate can still be inaccurate, because (1) an otherwise complete batch can be broken into multiple ones and (2) more than one batch worth of samples can be dropped when drop_last is set. Unfortunately, PyTorch can not detect such cases in general.

See `Dataset Types`_ for more details on these two types of datasets and how IterableDataset interacts with `Multi-process data loading`_.

Warning

See reproducibility, and dataloader-workers-random-seed, and data-loading-randomness notes for random seed related questions.

class mmcv.utils.DictAction(option_strings, dest, nargs=None, const=None, default=None, type=None, choices=None, required=False, help=None, metavar=None)[source]¶: argparse action to split an argument into KEY=VALUE form on the first = and append to a dictionary. List options can be passed as comma separated values, i.e ‘KEY=V1,V2,V3’, or with explicit brackets, i.e. ‘KEY=[V1,V2,V3]’. It also support nested brackets to build list/tuple values. e.g. ‘KEY=[(V1,V2),(V3,V4)]’

mmcv.utils.PoolDataLoader¶: alias of torch.utils.data.dataloader.DataLoader

class mmcv.utils.ProgressBar(task_num=0, bar_width=50, start=True, file=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>)[source]¶: A progress bar which can print the progress.

class mmcv.utils.Registry(name, build_func=None, parent=None, scope=None)[source]¶

A registry to map strings to classes or functions.

Registered object could be built from registry. Meanwhile, registered functions could be called from registry.

Example

>>> MODELS = Registry('models')
>>> @MODELS.register_module()
>>> class ResNet:
>>>     pass
>>> resnet = MODELS.build(dict(type='ResNet'))
>>> @MODELS.register_module()
>>> def resnet50():
>>>     pass
>>> resnet = MODELS.build(dict(type='resnet50'))

Please refer to https://mmcv.readthedocs.io/en/latest/understand_mmcv/registry.html for advanced usage.

Parameters

name (str) – Registry name.
build_func (func, optional) – Build function to construct instance from Registry, func:build_from_cfg is used if neither parent or build_func is specified. If parent is specified and build_func is not given, build_func will be inherited from parent. Default: None.
parent (Registry, optional) – Parent registry. The class registered in children registry could be built from parent. Default: None.
scope (str, optional) – The scope of registry. It is the key to search for children registry. If not specified, scope will be the name of the package where class is defined, e.g. mmdet, mmcls, mmseg. Default: None.

get(key)[source]¶

Get the registry record.

Parameters: key (str) – The class name in string format.
Returns: The corresponding class.
Return type: class

static infer_scope()[source]¶

Infer the scope of registry.

The name of the package where registry is defined will be returned.

Example

>>> # in mmdet/models/backbone/resnet.py
>>> MODELS = Registry('models')
>>> @MODELS.register_module()
>>> class ResNet:
>>>     pass
The scope of ``ResNet`` will be ``mmdet``.

Returns: The inferred scope name.
Return type: str

register_module(name=None, force=False, module=None)[source]¶

Register a module.

A record will be added to self._module_dict, whose key is the class name or the specified name, and value is the class itself. It can be used as a decorator or a normal function.

Example

>>> backbones = Registry('backbone')
>>> @backbones.register_module()
>>> class ResNet:
>>>     pass

>>> backbones = Registry('backbone')
>>> @backbones.register_module(name='mnet')
>>> class MobileNet:
>>>     pass

>>> backbones = Registry('backbone')
>>> class ResNet:
>>>     pass
>>> backbones.register_module(ResNet)

Parameters

name (str | None) – The module name to be registered. If not specified, the class name will be used.
force (bool, optional) – Whether to override an existing class with the same name. Default: False.
module (type) – Module class or function to be registered.

static split_scope_key(key)[source]¶

Split scope and key.

The first scope will be split from key.

Examples

>>> Registry.split_scope_key('mmdet.ResNet')
'mmdet', 'ResNet'
>>> Registry.split_scope_key('ResNet')
None, 'ResNet'

Returns: The former element is the first scope of the key, which can be None. The latter is the remaining key.
Return type: tuple[str | None, str]

class mmcv.utils.SyncBatchNorm(num_features: int, eps: float = 1e-05, momentum: float = 0.1, affine: bool = True, track_running_stats: bool = True, process_group: Optional[Any] = None, device=None, dtype=None)[source]¶

class mmcv.utils.Timer(start=True, print_tmpl=None)[source]¶

A flexible Timer class.

Examples

>>> import time
>>> import mmcv
>>> with mmcv.Timer():
>>>     # simulate a code block that will run for 1s
>>>     time.sleep(1)
1.000
>>> with mmcv.Timer(print_tmpl='it takes {:.1f} seconds'):
>>>     # simulate a code block that will run for 1s
>>>     time.sleep(1)
it takes 1.0 seconds
>>> timer = mmcv.Timer()
>>> time.sleep(0.5)
>>> print(timer.since_start())
0.500
>>> time.sleep(0.5)
>>> print(timer.since_last_check())
0.500
>>> print(timer.since_start())
1.000

property is_running¶

indicate whether the timer is running

Type: bool

since_last_check()[source]¶

Time since the last checking.

Either since_start() or since_last_check() is a checking operation.

Returns: Time in seconds.
Return type: float

since_start()[source]¶

Total time since the timer is started.

Returns: Time in seconds.
Return type: float

start()[source]¶: Start the timer.

exception mmcv.utils.TimerError(message)[source]¶

mmcv.utils.assert_attrs_equal(obj: Any, expected_attrs: Dict[str, Any]) → bool[source]¶

Check if attribute of class object is correct.

Parameters

obj (object) – Class object to be checked.
expected_attrs (Dict[str, Any]) – Dict of the expected attrs.

Returns

Whether the attribute of class object is correct.

Return type

bool

mmcv.utils.assert_dict_contains_subset(dict_obj: Dict[Any, Any], expected_subset: Dict[Any, Any]) → bool[source]¶

Check if the dict_obj contains the expected_subset.

Parameters

dict_obj (Dict[Any, Any]) – Dict object to be checked.
expected_subset (Dict[Any, Any]) – Subset expected to be contained in dict_obj.

Returns

Whether the dict_obj contains the expected_subset.

Return type

bool

mmcv.utils.assert_dict_has_keys(obj: Dict[str, Any], expected_keys: List[str]) → bool[source]¶

Check if the obj has all the expected_keys.

Parameters

obj (Dict[str, Any]) – Object to be checked.
expected_keys (List[str]) – Keys expected to contained in the keys of the obj.

Returns

Whether the obj has the expected keys.

Return type

bool

mmcv.utils.assert_is_norm_layer(module) → bool[source]¶

Check if the module is a norm layer.

Parameters: module (nn.Module) – The module to be checked.
Returns: Whether the module is a norm layer.
Return type: bool

mmcv.utils.assert_keys_equal(result_keys: List[str], target_keys: List[str]) → bool[source]¶

Check if target_keys is equal to result_keys.

Parameters

result_keys (List[str]) – Result keys to be checked.
target_keys (List[str]) – Target keys to be checked.

Returns

Whether target_keys is equal to result_keys.

Return type

bool

mmcv.utils.assert_params_all_zeros(module) → bool[source]¶

Check if the parameters of the module is all zeros.

Parameters: module (nn.Module) – The module to be checked.
Returns: Whether the parameters of the module is all zeros.
Return type: bool

mmcv.utils.build_from_cfg(cfg: Dict, registry: mmcv.utils.registry.Registry, default_args: Optional[Dict] = None) → Any[source]¶

Build a module from config dict when it is a class configuration, or call a function from config dict when it is a function configuration.

Example

>>> MODELS = Registry('models')
>>> @MODELS.register_module()
>>> class ResNet:
>>>     pass
>>> resnet = build_from_cfg(dict(type='Resnet'), MODELS)
>>> # Returns an instantiated object
>>> @MODELS.register_module()
>>> def resnet50():
>>>     pass
>>> resnet = build_from_cfg(dict(type='resnet50'), MODELS)
>>> # Return a result of the calling function

Parameters

cfg (dict) – Config dict. It should at least contain the key “type”.
registry (Registry) – The registry to search the type from.
default_args (dict, optional) – Default initialization arguments.

Returns

The constructed object.

Return type

object

mmcv.utils.check_prerequisites(prerequisites, checker, msg_tmpl='Prerequisites "{}" are required in method "{}" but not found, please install them first.')[source]¶

A decorator factory to check if prerequisites are satisfied.

Parameters

prerequisites (str of list[str]) – Prerequisites to be checked.
checker (callable) – The checker method that returns True if a prerequisite is meet, False otherwise.
msg_tmpl (str) – The message template with two variables.

Returns

A specific decorator.

Return type

decorator

mmcv.utils.check_python_script(cmd)[source]¶

Run the python cmd script with __main__. The difference between os.system is that, this function exectues code in the current process, so that it can be tracked by coverage tools. Currently it supports two forms:

./tests/data/scripts/hello.py zz
python tests/data/scripts/hello.py zz

mmcv.utils.check_time(timer_id)[source]¶

Add check points in a single line.

This method is suitable for running a task on a list of items. A timer will be registered when the method is called for the first time.

Examples

>>> import time
>>> import mmcv
>>> for i in range(1, 6):
>>>     # simulate a code block
>>>     time.sleep(i)
>>>     mmcv.check_time('task1')
2.000
3.000
4.000
5.000

Parameters: str – Timer identifier.

mmcv.utils.collect_env()[source]¶

Collect the information of the running environments.

Returns

The environment information. The following fields are contained.

sys.platform: The variable of sys.platform.

Python: Python version.

CUDA available: Bool, indicating if CUDA is available.

GPU devices: Device type of each GPU.

CUDA_HOME (optional): The env var CUDA_HOME.

NVCC (optional): NVCC version.

GCC: GCC version, “n/a” if GCC is not installed.

MSVC: Microsoft Virtual C++ Compiler version, Windows only.

PyTorch: PyTorch version.

PyTorch compiling details: The output of torch.__config__.show().

TorchVision (optional): TorchVision version.

OpenCV: OpenCV version.

MMCV: MMCV version.

MMCV Compiler: The GCC version for compiling MMCV ops.

MMCV CUDA Compiler: The CUDA version for compiling MMCV ops.

Return type

dict

mmcv.utils.concat_list(in_list)[source]¶

Concatenate a list of list into a single list.

Parameters: in_list (list) – The list of list to be merged.
Returns: The concatenated flat list.
Return type: list

mmcv.utils.deprecated_api_warning(name_dict, cls_name=None)[source]¶

A decorator to check if some arguments are deprecate and try to replace deprecate src_arg_name to dst_arg_name.

Parameters: name_dict (dict) – key (str): Deprecate argument names. val (str): Expected argument names.
Returns: New function.
Return type: func

mmcv.utils.digit_version(version_str: str, length: int = 4)[source]¶

Convert a version string into a tuple of integers.

This method is usually used for comparing two versions. For pre-release versions: alpha < beta < rc.

Parameters

version_str (str) – The version string.
length (int) – The maximum number of version levels. Default: 4.

Returns

The version info in digits (integers).

Return type

tuple[int]

mmcv.utils.get_git_hash(fallback='unknown', digits=None)[source]¶

Get the git hash of the current repo.

Parameters

fallback (str, optional) – The fallback string when git hash is unavailable. Defaults to ‘unknown’.
digits (int, optional) – kept digits of the hash. Defaults to None, meaning all digits are kept.

Returns

Git commit hash.

Return type

str

mmcv.utils.get_logger(name, log_file=None, log_level=20, file_mode='w')[source]¶

Initialize and get a logger by name.

If the logger has not been initialized, this method will initialize the logger by adding one or two handlers, otherwise the initialized logger will be directly returned. During initialization, a StreamHandler will always be added. If log_file is specified and the process rank is 0, a FileHandler will also be added.

Parameters

name (str) – Logger name.
log_file (str | None) – The log filename. If specified, a FileHandler will be added to the logger.
log_level (int) – The logger level. Note that only the process of rank 0 is affected, and other processes will set the level to “Error” thus be silent most of the time.
file_mode (str) – The file mode used in opening log file. Defaults to ‘w’.

Returns

The expected logger.

Return type

logging.Logger

mmcv.utils.has_method(obj: object, method: str) → bool[source]¶

Check whether the object has a method.

Parameters

method (str) – The method name to check.
obj (object) – The object to check.

Returns

True if the object has the method else False.

Return type

bool

mmcv.utils.import_modules_from_strings(imports, allow_failed_imports=False)[source]¶

Import modules from the given list of strings.

Parameters

imports (list | str | None) – The given module names to be imported.
allow_failed_imports (bool) – If True, the failed imports will return None. Otherwise, an ImportError is raise. Default: False.

Returns

The imported modules.

Return type

list[module] | module | None

Examples

>>> osp, sys = import_modules_from_strings(
...     ['os.path', 'sys'])
>>> import os.path as osp_
>>> import sys as sys_
>>> assert osp == osp_
>>> assert sys == sys_

mmcv.utils.is_list_of(seq, expected_type)[source]¶

Check whether it is a list of some type.

A partial method of is_seq_of().

mmcv.utils.is_method_overridden(method, base_class, derived_class)[source]¶

Check if a method of base class is overridden in derived class.

Parameters

method (str) – the method name to check.
base_class (type) – the class of the base class.
derived_class (type | Any) – the class or instance of the derived class.

mmcv.utils.is_seq_of(seq, expected_type, seq_type=None)[source]¶

Check whether it is a sequence of some type.

Parameters

seq (Sequence) – The sequence to be checked.
expected_type (type) – Expected type of sequence items.
seq_type (type, optional) – Expected sequence type.

Returns

Whether the sequence is valid.

Return type

bool

mmcv.utils.is_str(x)[source]¶

Whether the input is an string instance.

Note: This method is deprecated since python 2 is no longer supported.

mmcv.utils.is_tuple_of(seq, expected_type)[source]¶

Check whether it is a tuple of some type.

A partial method of is_seq_of().

mmcv.utils.iter_cast(inputs, dst_type, return_type=None)[source]¶

Cast elements of an iterable object into some type.

Parameters

inputs (Iterable) – The input object.
dst_type (type) – Destination type.
return_type (type, optional) – If specified, the output object will be converted to this type, otherwise an iterator.

Returns

The converted object.

Return type

iterator or specified type

mmcv.utils.list_cast(inputs, dst_type)[source]¶

Cast elements of an iterable object into a list of some type.

A partial method of iter_cast().

mmcv.utils.load_url(url: str, model_dir: Optional[str] = None, map_location: Optional[Union[Callable[[torch.Tensor, str], torch.Tensor], torch.device, str, Dict[str, str]]] = None, progress: bool = True, check_hash: bool = False, file_name: Optional[str] = None) → Dict[str, Any]¶

Loads the Torch serialized object at the given URL.

If downloaded file is a zip file, it will be automatically decompressed.

If the object is already present in model_dir, it’s deserialized and returned. The default value of model_dir is <hub_dir>/checkpoints where hub_dir is the directory returned by get_dir().

Parameters

url (str) – URL of the object to download
model_dir (str, optional) – directory in which to save the object
map_location (optional) – a function or a dict specifying how to remap storage locations (see torch.load)
progress (bool, optional) – whether or not to display a progress bar to stderr. Default: True
check_hash (bool, optional) – If True, the filename part of the URL should follow the naming convention filename-<sha256>.ext where <sha256> is the first eight or more digits of the SHA256 hash of the contents of the file. The hash is used to ensure unique names and to verify the contents of the file. Default: False
file_name (str, optional) – name for the downloaded file. Filename from url will be used if not set.

Example

>>> state_dict = torch.hub.load_state_dict_from_url('https://s3.amazonaws.com/pytorch/models/resnet18-5c106cde.pth')

mmcv.utils.print_log(msg, logger=None, level=20)[source]¶

Print a log message.

Parameters

msg (str) – The message to be logged.
logger (logging.Logger | str | None) –
The logger to be used. Some special loggers are:
- ”silent”: no message will be printed.
- other str: the logger obtained with get_root_logger(logger).
- None: The print() method will be used to print log messages.
level (int) – Logging level. Only available when logger is a Logger object or “root”.

mmcv.utils.requires_executable(prerequisites)[source]¶

A decorator to check if some executable files are installed.

Example

>>> @requires_executable('ffmpeg')
>>> func(arg1, args):
>>>     print(1)
1

mmcv.utils.requires_package(prerequisites)[source]¶

A decorator to check if some python packages are installed.

Example

>>> @requires_package('numpy')
>>> func(arg1, args):
>>>     return numpy.zeros(1)
array([0.])
>>> @requires_package(['numpy', 'non_package'])
>>> func(arg1, args):
>>>     return numpy.zeros(1)
ImportError

mmcv.utils.scandir(dir_path, suffix=None, recursive=False, case_sensitive=True)[source]¶

Scan a directory to find the interested files.

Parameters

dir_path (str | Path) – Path of the directory.
suffix (str | tuple(str), optional) – File suffix that we are interested in. Default: None.
recursive (bool, optional) – If set to True, recursively scan the directory. Default: False.
case_sensitive (bool, optional) – If set to False, ignore the case of suffix. Default: True.

Returns

A generator for all the interested files with relative paths.

mmcv.utils.slice_list(in_list, lens)[source]¶

Slice a list into several sub lists by a list of given length.

Parameters

in_list (list) – The list to be sliced.
lens (int or list) – The expected length of each out list.

Returns

A list of sliced list.

Return type

list

mmcv.utils.torch_meshgrid(*tensors)[source]¶

A wrapper of torch.meshgrid to compat different PyTorch versions.

Since PyTorch 1.10.0a0, torch.meshgrid supports the arguments indexing. So we implement a wrapper here to avoid warning when using high-version PyTorch and avoid compatibility issues when using previous versions of PyTorch.

Parameters: tensors (List[Tensor]) – List of scalars or 1 dimensional tensors.
Returns: Sequence of meshgrid tensors.
Return type: Sequence[Tensor]

mmcv.utils.track_iter_progress(tasks, bar_width=50, file=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>)[source]¶

Track the progress of tasks iteration or enumeration with a progress bar.

Tasks are yielded with a simple for-loop.

Parameters

tasks (list or tuple[Iterable, int]) – A list of tasks or (tasks, total num).
bar_width (int) – Width of progress bar.

Yields

list – The task results.

mmcv.utils.track_parallel_progress(func, tasks, nproc, initializer=None, initargs=None, bar_width=50, chunksize=1, skip_first=False, keep_order=True, file=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>)[source]¶

Track the progress of parallel task execution with a progress bar.

The built-in multiprocessing module is used for process pools and tasks are done with Pool.map() or Pool.imap_unordered().

Parameters

func (callable) – The function to be applied to each task.
tasks (list or tuple[Iterable, int]) – A list of tasks or (tasks, total num).
nproc (int) – Process (worker) number.
initializer (None or callable) – Refer to multiprocessing.Pool for details.
initargs (None or tuple) – Refer to multiprocessing.Pool for details.
chunksize (int) – Refer to multiprocessing.Pool for details.
bar_width (int) – Width of progress bar.
skip_first (bool) – Whether to skip the first sample for each worker when estimating fps, since the initialization step may takes longer.
keep_order (bool) – If True, Pool.imap() is used, otherwise Pool.imap_unordered() is used.

Returns

The task results.

Return type

list

mmcv.utils.track_progress(func, tasks, bar_width=50, file=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>, **kwargs)[source]¶

Track the progress of tasks execution with a progress bar.

Tasks are done with a simple for-loop.

Parameters

func (callable) – The function to be applied to each task.
tasks (list or tuple[Iterable, int]) – A list of tasks or (tasks, total num).
bar_width (int) – Width of progress bar.

Returns

The task results.

Return type

list

mmcv.utils.tuple_cast(inputs, dst_type)[source]¶

Cast elements of an iterable object into a tuple of some type.

A partial method of iter_cast().

mmcv.utils.worker_init_fn(worker_id: int, num_workers: int, rank: int, seed: int)[source]¶

Function to initialize each worker.

The seed of each worker equals to num_worker * rank + worker_id + user_seed.

Parameters

worker_id (int) – Id for each worker.
num_workers (int) – Number of workers.
rank (int) – Rank in distributed training.
seed (int) – Random seed.

cnn¶

class mmcv.cnn.AlexNet(num_classes: int = - 1)[source]¶

AlexNet backbone.

Parameters: num_classes (int) – number of classes for classification.

forward(x: torch.Tensor) → torch.Tensor[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.cnn.Caffe2XavierInit(**kwargs)[source]¶

class mmcv.cnn.ConstantInit(val: Union[int, float], **kwargs)[source]¶

Initialize module parameters with constant values.

Parameters

val (int | float) – the value to fill the weights in the module with
bias (int | float) – the value to fill the bias. Defaults to 0.
bias_prob (float, optional) – the probability for bias initialization. Defaults to None.
layer (str | list[str], optional) – the layer will be initialized. Defaults to None.

class mmcv.cnn.ContextBlock(in_channels: int, ratio: float, pooling_type: str = 'att', fusion_types: tuple = ('channel_add'))[source]¶

ContextBlock module in GCNet.

See ‘GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond’ (https://arxiv.org/abs/1904.11492) for details.

Parameters

in_channels (int) – Channels of the input feature map.
ratio (float) – Ratio of channels of transform bottleneck
pooling_type (str) – Pooling method for context modeling. Options are ‘att’ and ‘avg’, stand for attention pooling and average pooling respectively. Default: ‘att’.
fusion_types (Sequence[str]) – Fusion method for feature fusion, Options are ‘channels_add’, ‘channel_mul’, stand for channelwise addition and multiplication respectively. Default: (‘channel_add’,)

forward(x: torch.Tensor) → torch.Tensor[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.cnn.Conv2d(in_channels: int, out_channels: int, kernel_size: Union[int, Tuple[int, int]], stride: Union[int, Tuple[int, int]] = 1, padding: Union[str, int, Tuple[int, int]] = 0, dilation: Union[int, Tuple[int, int]] = 1, groups: int = 1, bias: bool = True, padding_mode: str = 'zeros', device=None, dtype=None)[source]¶

forward(x: torch.Tensor) → torch.Tensor[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.cnn.Conv2dRFSearchOp(op_layer: torch.nn.modules.module.Module, global_config: dict, verbose: bool = True)[source]¶

Enable Conv2d with receptive field searching ability.

Parameters

op_layer (nn.Module) – pytorch module, e,g, Conv2d
global_config (dict) –
config dict. Defaults to None. By default this must include:
- ”init_alphas”: The value for initializing weights of each branch.
- ”num_branches”: The controller of the size of search space (the number of branches).
- ”exp_rate”: The controller of the sparsity of search space.
- ”mmin”: The minimum dilation rate.
- ”mmax”: The maximum dilation rate.
Extra keys may exist, but are used by RFSearchHook, e.g., “step”, “max_step”, “search_interval”, and “skip_layer”.
verbose (bool) – Determines whether to print rf-next related logging messages. Defaults to True.

estimate_rates()[source]¶: Estimate new dilation rate based on trained branch_weights.

expand_rates()[source]¶: Expand dilation rate.

forward(input: torch.Tensor) → torch.Tensor[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.cnn.Conv3d(in_channels: int, out_channels: int, kernel_size: Union[int, Tuple[int, int, int]], stride: Union[int, Tuple[int, int, int]] = 1, padding: Union[str, int, Tuple[int, int, int]] = 0, dilation: Union[int, Tuple[int, int, int]] = 1, groups: int = 1, bias: bool = True, padding_mode: str = 'zeros', device=None, dtype=None)[source]¶

forward(x: torch.Tensor) → torch.Tensor[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.cnn.ConvAWS2d(in_channels: int, out_channels: int, kernel_size: Union[int, Tuple[int, int]], stride: Union[int, Tuple[int, int]] = 1, padding: Union[int, Tuple[int, int]] = 0, dilation: Union[int, Tuple[int, int]] = 1, groups: int = 1, bias: bool = True)[source]¶

AWS (Adaptive Weight Standardization)

This is a variant of Weight Standardization (https://arxiv.org/pdf/1903.10520.pdf) It is used in DetectoRS to avoid NaN (https://arxiv.org/pdf/2006.02334.pdf)

Parameters

in_channels (int) – Number of channels in the input image
out_channels (int) – Number of channels produced by the convolution
kernel_size (int or tuple) – Size of the conv kernel
stride (int or tuple, optional) – Stride of the convolution. Default: 1
padding (int or tuple, optional) – Zero-padding added to both sides of the input. Default: 0
dilation (int or tuple, optional) – Spacing between kernel elements. Default: 1
groups (int, optional) – Number of blocked connections from input channels to output channels. Default: 1
bias (bool, optional) – If set True, adds a learnable bias to the output. Default: True

forward(x: torch.Tensor) → torch.Tensor[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.cnn.ConvModule(in_channels: int, out_channels: int, kernel_size: Union[int, Tuple[int, int]], stride: Union[int, Tuple[int, int]] = 1, padding: Union[int, Tuple[int, int]] = 0, dilation: Union[int, Tuple[int, int]] = 1, groups: int = 1, bias: Union[bool, str] = 'auto', conv_cfg: Optional[Dict] = None, norm_cfg: Optional[Dict] = None, act_cfg: Optional[Dict] = {'type': 'ReLU'}, inplace: bool = True, with_spectral_norm: bool = False, padding_mode: str = 'zeros', order: tuple = ('conv', 'norm', 'act'))[source]¶

A conv block that bundles conv/norm/activation layers.

This block simplifies the usage of convolution layers, which are commonly used with a norm layer (e.g., BatchNorm) and activation layer (e.g., ReLU). It is based upon three build methods: build_conv_layer(), build_norm_layer() and build_activation_layer().

Besides, we add some additional features in this module. 1. Automatically set bias of the conv layer. 2. Spectral norm is supported. 3. More padding modes are supported. Before PyTorch 1.5, nn.Conv2d only supports zero and circular padding, and we add “reflect” padding mode.

Parameters

in_channels (int) – Number of channels in the input feature map. Same as that in nn._ConvNd.
out_channels (int) – Number of channels produced by the convolution. Same as that in nn._ConvNd.
kernel_size (int | tuple[int]) – Size of the convolving kernel. Same as that in nn._ConvNd.
stride (int | tuple[int]) – Stride of the convolution. Same as that in nn._ConvNd.
padding (int | tuple[int]) – Zero-padding added to both sides of the input. Same as that in nn._ConvNd.
dilation (int | tuple[int]) – Spacing between kernel elements. Same as that in nn._ConvNd.
groups (int) – Number of blocked connections from input channels to output channels. Same as that in nn._ConvNd.
bias (bool | str) – If specified as auto, it will be decided by the norm_cfg. Bias will be set as True if norm_cfg is None, otherwise False. Default: “auto”.
conv_cfg (dict) – Config dict for convolution layer. Default: None, which means using conv2d.
norm_cfg (dict) – Config dict for normalization layer. Default: None.
act_cfg (dict) – Config dict for activation layer. Default: dict(type=’ReLU’).
inplace (bool) – Whether to use inplace mode for activation. Default: True.
with_spectral_norm (bool) – Whether use spectral norm in conv module. Default: False.
padding_mode (str) – If the padding_mode has not been supported by current Conv2d in PyTorch, we will use our own padding layer instead. Currently, we support [‘zeros’, ‘circular’] with official implementation and [‘reflect’] with our own implementation. Default: ‘zeros’.
order (tuple[str]) – The order of conv/norm/activation layers. It is a sequence of “conv”, “norm” and “act”. Common examples are (“conv”, “norm”, “act”) and (“act”, “conv”, “norm”). Default: (‘conv’, ‘norm’, ‘act’).

forward(x: torch.Tensor, activate: bool = True, norm: bool = True) → torch.Tensor[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.cnn.ConvTranspose2d(in_channels: int, out_channels: int, kernel_size: Union[int, Tuple[int, int]], stride: Union[int, Tuple[int, int]] = 1, padding: Union[int, Tuple[int, int]] = 0, output_padding: Union[int, Tuple[int, int]] = 0, groups: int = 1, bias: bool = True, dilation: Union[int, Tuple[int, int]] = 1, padding_mode: str = 'zeros', device=None, dtype=None)[source]¶

forward(x: torch.Tensor) → torch.Tensor[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.cnn.ConvTranspose3d(in_channels: int, out_channels: int, kernel_size: Union[int, Tuple[int, int, int]], stride: Union[int, Tuple[int, int, int]] = 1, padding: Union[int, Tuple[int, int, int]] = 0, output_padding: Union[int, Tuple[int, int, int]] = 0, groups: int = 1, bias: bool = True, dilation: Union[int, Tuple[int, int, int]] = 1, padding_mode: str = 'zeros', device=None, dtype=None)[source]¶

forward(x: torch.Tensor) → torch.Tensor[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.cnn.ConvWS2d(in_channels: int, out_channels: int, kernel_size: Union[int, Tuple[int, int]], stride: Union[int, Tuple[int, int]] = 1, padding: Union[int, Tuple[int, int]] = 0, dilation: Union[int, Tuple[int, int]] = 1, groups: int = 1, bias: bool = True, eps: float = 1e-05)[source]¶

forward(x: torch.Tensor) → torch.Tensor[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.cnn.DepthwiseSeparableConvModule(in_channels: int, out_channels: int, kernel_size: Union[int, Tuple[int, int]], stride: Union[int, Tuple[int, int]] = 1, padding: Union[int, Tuple[int, int]] = 0, dilation: Union[int, Tuple[int, int]] = 1, norm_cfg: Optional[Dict] = None, act_cfg: Dict = {'type': 'ReLU'}, dw_norm_cfg: Union[Dict, str] = 'default', dw_act_cfg: Union[Dict, str] = 'default', pw_norm_cfg: Union[Dict, str] = 'default', pw_act_cfg: Union[Dict, str] = 'default', **kwargs)[source]¶

Depthwise separable convolution module.

See https://arxiv.org/pdf/1704.04861.pdf for details.

This module can replace a ConvModule with the conv block replaced by two conv block: depthwise conv block and pointwise conv block. The depthwise conv block contains depthwise-conv/norm/activation layers. The pointwise conv block contains pointwise-conv/norm/activation layers. It should be noted that there will be norm/activation layer in the depthwise conv block if norm_cfg and act_cfg are specified.

Parameters

in_channels (int) – Number of channels in the input feature map. Same as that in nn._ConvNd.
out_channels (int) – Number of channels produced by the convolution. Same as that in nn._ConvNd.
kernel_size (int | tuple[int]) – Size of the convolving kernel. Same as that in nn._ConvNd.
stride (int | tuple[int]) – Stride of the convolution. Same as that in nn._ConvNd. Default: 1.
padding (int | tuple[int]) – Zero-padding added to both sides of the input. Same as that in nn._ConvNd. Default: 0.
dilation (int | tuple[int]) – Spacing between kernel elements. Same as that in nn._ConvNd. Default: 1.
norm_cfg (dict) – Default norm config for both depthwise ConvModule and pointwise ConvModule. Default: None.
act_cfg (dict) – Default activation config for both depthwise ConvModule and pointwise ConvModule. Default: dict(type=’ReLU’).
dw_norm_cfg (dict) – Norm config of depthwise ConvModule. If it is ‘default’, it will be the same as norm_cfg. Default: ‘default’.
dw_act_cfg (dict) – Activation config of depthwise ConvModule. If it is ‘default’, it will be the same as act_cfg. Default: ‘default’.
pw_norm_cfg (dict) – Norm config of pointwise ConvModule. If it is ‘default’, it will be the same as norm_cfg. Default: ‘default’.
pw_act_cfg (dict) – Activation config of pointwise ConvModule. If it is ‘default’, it will be the same as act_cfg. Default: ‘default’.
kwargs (optional) – Other shared arguments for depthwise and pointwise ConvModule. See ConvModule for ref.

forward(x: torch.Tensor) → torch.Tensor[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.cnn.GeneralizedAttention(in_channels: int, spatial_range: int = - 1, num_heads: int = 9, position_embedding_dim: int = - 1, position_magnitude: int = 1, kv_stride: int = 2, q_stride: int = 1, attention_type: str = '1111')[source]¶

GeneralizedAttention module.

See ‘An Empirical Study of Spatial Attention Mechanisms in Deep Networks’ (https://arxiv.org/abs/1904.05873) for details.

Parameters

in_channels (int) – Channels of the input feature map.
spatial_range (int) – The spatial range. -1 indicates no spatial range constraint. Default: -1.
num_heads (int) – The head number of empirical_attention module. Default: 9.
position_embedding_dim (int) – The position embedding dimension. Default: -1.
position_magnitude (int) – A multiplier acting on coord difference. Default: 1.
kv_stride (int) – The feature stride acting on key/value feature map. Default: 2.
q_stride (int) – The feature stride acting on query feature map. Default: 1.
attention_type (str) –
A binary indicator string for indicating which items in generalized empirical_attention module are used. Default: ‘1111’.
- ’1000’ indicates ‘query and key content’ (appr - appr) item,
- ’0100’ indicates ‘query content and relative position’ (appr - position) item,
- ’0010’ indicates ‘key content only’ (bias - appr) item,
- ’0001’ indicates ‘relative position only’ (bias - position) item.

forward(x_input: torch.Tensor) → torch.Tensor[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.cnn.HSigmoid(bias: float = 3.0, divisor: float = 6.0, min_value: float = 0.0, max_value: float = 1.0)[source]¶

Hard Sigmoid Module. Apply the hard sigmoid function: Hsigmoid(x) = min(max((x + bias) / divisor, min_value), max_value) Default: Hsigmoid(x) = min(max((x + 3) / 6, 0), 1)

Note

In MMCV v1.4.4, we modified the default value of args to align with PyTorch official.

Parameters

bias (float) – Bias of the input feature map. Default: 3.0.
divisor (float) – Divisor of the input feature map. Default: 6.0.
min_value (float) – Lower bound value. Default: 0.0.
max_value (float) – Upper bound value. Default: 1.0.

Returns

The output tensor.

Return type

Tensor

forward(x: torch.Tensor) → torch.Tensor[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.cnn.HSwish(inplace: bool = False)[source]¶

Hard Swish Module.

This module applies the hard swish function:

\[Hswish(x) = x * ReLU6(x + 3) / 6\]

Parameters: inplace (bool) – can optionally do the operation in-place. Default: False.
Returns: The output tensor.
Return type: Tensor

forward(x: torch.Tensor) → torch.Tensor[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.cnn.KaimingInit(a: float = 0, mode: str = 'fan_out', nonlinearity: str = 'relu', distribution: str = 'normal', **kwargs)[source]¶

Initialize module parameters with the values according to the method described in `Delving deep into rectifiers: Surpassing human-level.

performance on ImageNet classification - He, K. et al. (2015). <https://www.cv-foundation.org/openaccess/content_iccv_2015/ papers/He_Delving_Deep_into_ICCV_2015_paper.pdf>`_

Parameters

a (int | float) – the negative slope of the rectifier used after this layer (only used with 'leaky_relu'). Defaults to 0.
mode (str) – either 'fan_in' or 'fan_out'. Choosing 'fan_in' preserves the magnitude of the variance of the weights in the forward pass. Choosing 'fan_out' preserves the magnitudes in the backwards pass. Defaults to 'fan_out'.
nonlinearity (str) – the non-linear function (nn.functional name), recommended to use only with 'relu' or 'leaky_relu' . Defaults to ‘relu’.
bias (int | float) – the value to fill the bias. Defaults to 0.
bias_prob (float, optional) – the probability for bias initialization. Defaults to None.
distribution (str) – distribution either be 'normal' or 'uniform'. Defaults to 'normal'.
layer (str | list[str], optional) – the layer will be initialized. Defaults to None.

class mmcv.cnn.Linear(in_features: int, out_features: int, bias: bool = True, device=None, dtype=None)[source]¶

forward(x: torch.Tensor) → torch.Tensor[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.cnn.MaxPool2d(kernel_size: Union[int, Tuple[int, ...]], stride: Optional[Union[int, Tuple[int, ...]]] = None, padding: Union[int, Tuple[int, ...]] = 0, dilation: Union[int, Tuple[int, ...]] = 1, return_indices: bool = False, ceil_mode: bool = False)[source]¶

forward(x: torch.Tensor) → torch.Tensor[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.cnn.MaxPool3d(kernel_size: Union[int, Tuple[int, ...]], stride: Optional[Union[int, Tuple[int, ...]]] = None, padding: Union[int, Tuple[int, ...]] = 0, dilation: Union[int, Tuple[int, ...]] = 1, return_indices: bool = False, ceil_mode: bool = False)[source]¶

forward(x: torch.Tensor) → torch.Tensor[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.cnn.NonLocal1d(in_channels: int, sub_sample: bool = False, conv_cfg: Dict = {'type': 'Conv1d'}, **kwargs)[source]¶

1D Non-local module.

Parameters

in_channels (int) – Same as NonLocalND.
sub_sample (bool) – Whether to apply max pooling after pairwise function (Note that the sub_sample is applied on spatial only). Default: False.
conv_cfg (None | dict) – Same as NonLocalND. Default: dict(type=’Conv1d’).

class mmcv.cnn.NonLocal2d(in_channels: int, sub_sample: bool = False, conv_cfg: Dict = {'type': 'Conv2d'}, **kwargs)[source]¶

2D Non-local module.

Parameters

in_channels (int) – Same as NonLocalND.
sub_sample (bool) – Whether to apply max pooling after pairwise function (Note that the sub_sample is applied on spatial only). Default: False.
conv_cfg (None | dict) – Same as NonLocalND. Default: dict(type=’Conv2d’).

class mmcv.cnn.NonLocal3d(in_channels: int, sub_sample: bool = False, conv_cfg: Dict = {'type': 'Conv3d'}, **kwargs)[source]¶

3D Non-local module.

Parameters

in_channels (int) – Same as NonLocalND.
sub_sample (bool) – Whether to apply max pooling after pairwise function (Note that the sub_sample is applied on spatial only). Default: False.
conv_cfg (None | dict) – Same as NonLocalND. Default: dict(type=’Conv3d’).

class mmcv.cnn.NormalInit(mean: float = 0, std: float = 1, **kwargs)[source]¶

Initialize module parameters with the values drawn from the normal distribution \(\mathcal{N}(\text{mean}, \text{std}^2)\).

Parameters

mean (int | float) – the mean of the normal distribution. Defaults to 0.
std (int | float) – the standard deviation of the normal distribution. Defaults to 1.
bias (int | float) – the value to fill the bias. Defaults to 0.
bias_prob (float, optional) – the probability for bias initialization. Defaults to None.
layer (str | list[str], optional) – the layer will be initialized. Defaults to None.

class mmcv.cnn.PretrainedInit(checkpoint: str, prefix: Optional[str] = None, map_location: Optional[str] = None)[source]¶

Initialize module by loading a pretrained model.

Parameters

checkpoint (str) – the checkpoint file of the pretrained model should be load.
prefix (str, optional) – the prefix of a sub-module in the pretrained model. it is for loading a part of the pretrained model to initialize. For example, if we would like to only load the backbone of a detector model, we can set prefix='backbone.'. Defaults to None.
map_location (str) – map tensors into proper locations.

class mmcv.cnn.RFSearchHook(mode: str = 'search', config: Dict = {}, rfstructure_file: Optional[str] = None, by_epoch: bool = True, verbose: bool = True)[source]¶

Rcecptive field search via dilation rates.

Please refer to RF-Next: Efficient Receptive Field Search for Convolutional Neural Networks for more details.

Parameters

mode (str, optional) – It can be set to the following types: ‘search’, ‘fixed_single_branch’, or ‘fixed_multi_branch’. Defaults to ‘search’.
config (Dict, optional) –
config dict of search. By default this config contains “search”, and config[“search”] must include:
- ”step”: recording the current searching step.
- ”max_step”: The maximum number of searching steps to update the structures.
- ”search_interval”: The interval (epoch/iteration) between two updates.
- ”exp_rate”: The controller of the sparsity of search space.
- ”init_alphas”: The value for initializing weights of each branch.
- ”mmin”: The minimum dilation rate.
- ”mmax”: The maximum dilation rate.
- ”num_branches”: The controller of the size of search space (the number of branches).
- ”skip_layer”: The modules in skip_layer will be ignored during the receptive field search.
rfstructure_file (str, optional) – Path to load searched receptive fields of the model. Defaults to None.
by_epoch (bool, optional) – Determine to perform step by epoch or by iteration. If set to True, it will step by epoch. Otherwise, by iteration. Defaults to True.
verbose (bool) – Determines whether to print rf-next related logging messages. Defaults to True.

after_train_epoch(runner)[source]¶: Performs a dilation searching step after one training epoch.

after_train_iter(runner)[source]¶: Performs a dilation searching step after one training iteration.

estimate_and_expand(model: torch.nn.modules.module.Module)[source]¶

estimate and search for RFConvOp.

Parameters: model (nn.Module) – pytorch model

init_model(model: torch.nn.modules.module.Module)[source]¶

init model with search ability.

Parameters: model (nn.Module) – pytorch model
Raises: NotImplementedError – only support three modes: search/fixed_single_branch/fixed_multi_branch

set_model(model: torch.nn.modules.module.Module, search_op: str = 'Conv2d', init_rates: Optional[int] = None, prefix: str = '')[source]¶

set model based on config.

Parameters

model (nn.Module) – pytorch model
config (Dict) – config file
search_op (str) – The module that uses RF search. Defaults to ‘Conv2d’.
init_rates (int, optional) – Set to other initial dilation rates. Defaults to None.
prefix (str) – Prefix for function recursion. Defaults to ‘’.

step(model: torch.nn.modules.module.Module, work_dir: str)[source]¶

Performs a dilation searching step.

Parameters

model (nn.Module) – pytorch model
work_dir (str) – Directory to save the searching results.

wrap_model(model: torch.nn.modules.module.Module, search_op: str = 'Conv2d', prefix: str = '')[source]¶

wrap model to support searchable conv op.

Parameters

model (nn.Module) – pytorch model
search_op (str) – The module that uses RF search. Defaults to ‘Conv2d’.
init_rates (int, optional) – Set to other initial dilation rates. Defaults to None.
prefix (str) – Prefix for function recursion. Defaults to ‘’.

class mmcv.cnn.ResNet(depth: int, num_stages: int = 4, strides: Sequence[int] = (1, 2, 2, 2), dilations: Sequence[int] = (1, 1, 1, 1), out_indices: Sequence[int] = (0, 1, 2, 3), style: str = 'pytorch', frozen_stages: int = - 1, bn_eval: bool = True, bn_frozen: bool = False, with_cp: bool = False)[source]¶

ResNet backbone.

Parameters

depth (int) – Depth of resnet, from {18, 34, 50, 101, 152}.
num_stages (int) – Resnet stages, normally 4.
strides (Sequence[int]) – Strides of the first block of each stage.
dilations (Sequence[int]) – Dilation of each stage.
out_indices (Sequence[int]) – Output from which stages.
style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer.
frozen_stages (int) – Stages to be frozen (all param fixed). -1 means not freezing any parameters.
bn_eval (bool) – Whether to set BN layers as eval mode, namely, freeze running stats (mean and var).
bn_frozen (bool) – Whether to freeze weight and bias of BN layers.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed.

forward(x: torch.Tensor) → Union[torch.Tensor, Tuple[torch.Tensor]][source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

train(mode: bool = True) → None[source]¶

Sets the module in training mode.

This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g. Dropout, BatchNorm, etc.

Parameters: mode (bool) – whether to set training mode (True) or evaluation mode (False). Default: True.
Returns: self
Return type: Module

class mmcv.cnn.Scale(scale: float = 1.0)[source]¶

A learnable scale parameter.

This layer scales the input by a learnable factor. It multiplies a learnable scale parameter of shape (1,) with input of any shape.

Parameters: scale (float) – Initial value of scale factor. Default: 1.0

forward(x: torch.Tensor) → torch.Tensor[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.cnn.Swish[source]¶

Swish Module.

This module applies the swish function:

\[Swish(x) = x * Sigmoid(x)\]

Returns: The output tensor.
Return type: Tensor

forward(x: torch.Tensor) → torch.Tensor[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.cnn.TruncNormalInit(mean: float = 0, std: float = 1, a: float = - 2, b: float = 2, **kwargs)[source]¶

Initialize module parameters with the values drawn from the normal distribution \(\mathcal{N}(\text{mean}, \text{std}^2)\) with values outside \([a, b]\).

Parameters

mean (float) – the mean of the normal distribution. Defaults to 0.
std (float) – the standard deviation of the normal distribution. Defaults to 1.
a (float) – The minimum cutoff value.
b (float) – The maximum cutoff value.
bias (float) – the value to fill the bias. Defaults to 0.
bias_prob (float, optional) – the probability for bias initialization. Defaults to None.
layer (str | list[str], optional) – the layer will be initialized. Defaults to None.

class mmcv.cnn.UniformInit(a: float = 0.0, b: float = 1.0, **kwargs)[source]¶

Initialize module parameters with values drawn from the uniform distribution \(\mathcal{U}(a, b)\).

Parameters

a (int | float) – the lower bound of the uniform distribution. Defaults to 0.
b (int | float) – the upper bound of the uniform distribution. Defaults to 1.
bias (int | float) – the value to fill the bias. Defaults to 0.
bias_prob (float, optional) – the probability for bias initialization. Defaults to None.
layer (str | list[str], optional) – the layer will be initialized. Defaults to None.

class mmcv.cnn.VGG(depth: int, with_bn: bool = False, num_classes: int = - 1, num_stages: int = 5, dilations: Sequence[int] = (1, 1, 1, 1, 1), out_indices: Sequence[int] = (0, 1, 2, 3, 4), frozen_stages: int = - 1, bn_eval: bool = True, bn_frozen: bool = False, ceil_mode: bool = False, with_last_pool: bool = True)[source]¶

VGG backbone.

Parameters

depth (int) – Depth of vgg, from {11, 13, 16, 19}.
with_bn (bool) – Use BatchNorm or not.
num_classes (int) – number of classes for classification.
num_stages (int) – VGG stages, normally 5.
dilations (Sequence[int]) – Dilation of each stage.
out_indices (Sequence[int]) – Output from which stages.
frozen_stages (int) – Stages to be frozen (all param fixed). -1 means not freezing any parameters.
bn_eval (bool) – Whether to set BN layers as eval mode, namely, freeze running stats (mean and var).
bn_frozen (bool) – Whether to freeze weight and bias of BN layers.

forward(x: torch.Tensor) → Union[torch.Tensor, Tuple[torch.Tensor, ...]][source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

train(mode: bool = True) → None[source]¶

Sets the module in training mode.

This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g. Dropout, BatchNorm, etc.

Parameters: mode (bool) – whether to set training mode (True) or evaluation mode (False). Default: True.
Returns: self
Return type: Module

class mmcv.cnn.XavierInit(gain: float = 1, distribution: str = 'normal', **kwargs)[source]¶

Initialize module parameters with values according to the method described in `Understanding the difficulty of training deep feedforward.

neural networks - Glorot, X. & Bengio, Y. (2010). <http://proceedings.mlr.press/v9/glorot10a/glorot10a.pdf>`_

Parameters

gain (int | float) – an optional scaling factor. Defaults to 1.
bias (int | float) – the value to fill the bias. Defaults to 0.
bias_prob (float, optional) – the probability for bias initialization. Defaults to None.
distribution (str) – distribution either be 'normal' or 'uniform'. Defaults to 'normal'.
layer (str | list[str], optional) – the layer will be initialized. Defaults to None.

mmcv.cnn.bias_init_with_prob(prior_prob: float) → float[source]¶: initialize conv/fc bias value according to a given probability value.

mmcv.cnn.build_activation_layer(cfg: Dict) → torch.nn.modules.module.Module[source]¶

Build activation layer.

Parameters

cfg (dict) –

The activation layer config, which should contain:

type (str): Layer type.
layer args: Args needed to instantiate an activation layer.

Returns

Created activation layer.

Return type

nn.Module

mmcv.cnn.build_conv_layer(cfg: Optional[Dict], *args, **kwargs) → torch.nn.modules.module.Module[source]¶

Build convolution layer.

Parameters

cfg (None or dict) – The conv layer config, which should contain: - type (str): Layer type. - layer args: Args needed to instantiate an conv layer.
args (argument list) – Arguments passed to the __init__ method of the corresponding conv layer.
kwargs (keyword arguments) – Keyword arguments passed to the __init__ method of the corresponding conv layer.

Returns

Created conv layer.

Return type

nn.Module

mmcv.cnn.build_model_from_cfg(cfg, registry, default_args=None)[source]¶

Build a PyTorch model from config dict(s). Different from build_from_cfg, if cfg is a list, a nn.Sequential will be built.

Parameters

cfg (dict, list[dict]) – The config of modules, is is either a config dict or a list of config dicts. If cfg is a list, a the built modules will be wrapped with nn.Sequential.
registry (Registry) – A registry the module belongs to.
default_args (dict, optional) – Default arguments to build the module. Defaults to None.

Returns

A built nn module.

Return type

nn.Module

mmcv.cnn.build_norm_layer(cfg: Dict, num_features: int, postfix: Union[int, str] = '') → Tuple[str, torch.nn.modules.module.Module][source]¶

Build normalization layer.

Parameters

cfg (dict) –
The norm layer config, which should contain:
- type (str): Layer type.
- layer args: Args needed to instantiate a norm layer.
- requires_grad (bool, optional): Whether stop gradient updates.
num_features (int) – Number of input channels.
postfix (int | str) – The postfix to be appended into norm abbreviation to create named layer.

Returns

The first element is the layer name consisting of abbreviation and postfix, e.g., bn1, gn. The second element is the created norm layer.

Return type

tuple[str, nn.Module]

mmcv.cnn.build_padding_layer(cfg: Dict, *args, **kwargs) → torch.nn.modules.module.Module[source]¶

Build padding layer.

Parameters: cfg (dict) – The padding layer config, which should contain: - type (str): Layer type. - layer args: Args needed to instantiate a padding layer.
Returns: Created padding layer.
Return type: nn.Module

mmcv.cnn.build_plugin_layer(cfg: Dict, postfix: Union[int, str] = '', **kwargs) → Tuple[str, torch.nn.modules.module.Module][source]¶

Build plugin layer.

Parameters

cfg (dict) –
cfg should contain:
- type (str): identify plugin layer type.
- layer args: args needed to instantiate a plugin layer.
postfix (int, str) – appended into norm abbreviation to create named layer. Default: ‘’.

Returns

The first one is the concatenation of abbreviation and postfix. The second is the created plugin layer.

Return type

tuple[str, nn.Module]

mmcv.cnn.build_upsample_layer(cfg: Dict, *args, **kwargs) → torch.nn.modules.module.Module[source]¶

Build upsample layer.

Parameters

cfg (dict) –
The upsample layer config, which should contain:
- type (str): Layer type.
- scale_factor (int): Upsample ratio, which is not applicable to deconv.
- layer args: Args needed to instantiate a upsample layer.
args (argument list) – Arguments passed to the __init__ method of the corresponding conv layer.
kwargs (keyword arguments) – Keyword arguments passed to the __init__ method of the corresponding conv layer.

Returns

Created upsample layer.

Return type

nn.Module

mmcv.cnn.fuse_conv_bn(module: torch.nn.modules.module.Module) → torch.nn.modules.module.Module[source]¶

Recursively fuse conv and bn in a module.

During inference, the functionary of batch norm layers is turned off but only the mean and var alone channels are used, which exposes the chance to fuse it with the preceding conv layers to save computations and simplify network structures.

Parameters: module (nn.Module) – Module to be fused.
Returns: Fused module.
Return type: nn.Module

mmcv.cnn.get_model_complexity_info(model: torch.nn.modules.module.Module, input_shape: tuple, print_per_layer_stat: bool = True, as_strings: bool = True, input_constructor: Optional[Callable] = None, flush: bool = False, ost: TextIO = <_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>) → tuple[source]¶

Get complexity information of a model.

This method can calculate FLOPs and parameter counts of a model with corresponding input shape. It can also print complexity information for each layer in a model.

Supported layers are listed as below:

Convolutions: nn.Conv1d, nn.Conv2d, nn.Conv3d.
Activations: nn.ReLU, nn.PReLU, nn.ELU, nn.LeakyReLU, nn.ReLU6.
Poolings: nn.MaxPool1d, nn.MaxPool2d, nn.MaxPool3d, nn.AvgPool1d, nn.AvgPool2d, nn.AvgPool3d, nn.AdaptiveMaxPool1d, nn.AdaptiveMaxPool2d, nn.AdaptiveMaxPool3d, nn.AdaptiveAvgPool1d, nn.AdaptiveAvgPool2d, nn.AdaptiveAvgPool3d.
BatchNorms: nn.BatchNorm1d, nn.BatchNorm2d, nn.BatchNorm3d, nn.GroupNorm, nn.InstanceNorm1d, InstanceNorm2d, InstanceNorm3d, nn.LayerNorm.
Linear: nn.Linear.
Deconvolution: nn.ConvTranspose2d.
Upsample: nn.Upsample.

Parameters

model (nn.Module) – The model for complexity calculation.
input_shape (tuple) – Input shape used for calculation.
print_per_layer_stat (bool) – Whether to print complexity information for each layer in a model. Default: True.
as_strings (bool) – Output FLOPs and params counts in a string form. Default: True.
input_constructor (None | callable) – If specified, it takes a callable method that generates input. otherwise, it will generate a random tensor with input shape to calculate FLOPs. Default: None.
flush (bool) – same as that in print(). Default: False.
ost (stream) – same as file param in print(). Default: sys.stdout.

Returns

If as_strings is set to True, it will return FLOPs and parameter counts in a string format. otherwise, it will return those in a float number format.

Return type

tuple[float | str]

mmcv.cnn.initialize(module: torch.nn.modules.module.Module, init_cfg: Union[Dict, List[dict]]) → None[source]¶

Initialize a module.

Parameters

module (torch.nn.Module) – the module will be initialized.
init_cfg (dict | list[dict]) – initialization configuration dict to define initializer. OpenMMLab has implemented 6 initializers including Constant, Xavier, Normal, Uniform, Kaiming, and Pretrained.

Example

>>> module = nn.Linear(2, 3, bias=True)
>>> init_cfg = dict(type='Constant', layer='Linear', val =1 , bias =2)
>>> initialize(module, init_cfg)

>>> module = nn.Sequential(nn.Conv1d(3, 1, 3), nn.Linear(1,2))
>>> # define key ``'layer'`` for initializing layer with different
>>> # configuration
>>> init_cfg = [dict(type='Constant', layer='Conv1d', val=1),
        dict(type='Constant', layer='Linear', val=2)]
>>> initialize(module, init_cfg)

>>> # define key``'override'`` to initialize some specific part in
>>> # module
>>> class FooNet(nn.Module):
>>>     def __init__(self):
>>>         super().__init__()
>>>         self.feat = nn.Conv2d(3, 16, 3)
>>>         self.reg = nn.Conv2d(16, 10, 3)
>>>         self.cls = nn.Conv2d(16, 5, 3)
>>> model = FooNet()
>>> init_cfg = dict(type='Constant', val=1, bias=2, layer='Conv2d',
>>>     override=dict(type='Constant', name='reg', val=3, bias=4))
>>> initialize(model, init_cfg)

>>> model = ResNet(depth=50)
>>> # Initialize weights with the pretrained model.
>>> init_cfg = dict(type='Pretrained',
        checkpoint='torchvision://resnet50')
>>> initialize(model, init_cfg)

>>> # Initialize weights of a sub-module with the specific part of
>>> # a pretrained model by using "prefix".
>>> url = 'http://download.openmmlab.com/mmdetection/v2.0/retinanet/'\
>>>     'retinanet_r50_fpn_1x_coco/'\
>>>     'retinanet_r50_fpn_1x_coco_20200130-c2398f9e.pth'
>>> init_cfg = dict(type='Pretrained',
        checkpoint=url, prefix='backbone.')

mmcv.cnn.is_norm(layer: torch.nn.modules.module.Module, exclude: Optional[Union[type, tuple]] = None) → bool[source]¶

Check if a layer is a normalization layer.

Parameters

layer (nn.Module) – The layer to be checked.
exclude (type | tuple[type]) – Types to be excluded.

Returns

Whether the layer is a norm layer.

Return type

bool

runner¶

class mmcv.runner.BaseModule(init_cfg: Optional[dict] = None)[source]¶

Base module for all modules in openmmlab.

BaseModule is a wrapper of torch.nn.Module with additional functionality of parameter initialization. Compared with torch.nn.Module, BaseModule mainly adds three attributes.

init_cfg: the config to control the initialization.
init_weights: The function of parameter initialization and recording initialization information.
_params_init_info: Used to track the parameter initialization information. This attribute only exists during executing the init_weights.

Parameters: init_cfg (dict, optional) – Initialization config dict.

init_weights() → None[source]¶: Initialize the weights.

class mmcv.runner.BaseRunner(model: torch.nn.modules.module.Module, batch_processor: Optional[Callable] = None, optimizer: Optional[Union[Dict, torch.optim.optimizer.Optimizer]] = None, work_dir: Optional[str] = None, logger: Optional[logging.Logger] = None, meta: Optional[Dict] = None, max_iters: Optional[int] = None, max_epochs: Optional[int] = None)[source]¶

The base class of Runner, a training helper for PyTorch.

All subclasses should implement the following APIs:

run()
train()
val()
save_checkpoint()

Parameters

model (torch.nn.Module) – The model to be run.
batch_processor (callable) – A callable method that process a data batch. The interface of this method should be batch_processor(model, data, train_mode) -> dict
optimizer (dict or torch.optim.Optimizer) – It can be either an optimizer (in most cases) or a dict of optimizers (in models that requires more than one optimizer, e.g., GAN).
work_dir (str, optional) – The working directory to save checkpoints and logs. Defaults to None.
logger (logging.Logger) – Logger used during training. Defaults to None. (The default value is just for backward compatibility)
meta (dict | None) – A dict records some import information such as environment info and seed, which will be logged in logger hook. Defaults to None.
max_epochs (int, optional) – Total training epochs.
max_iters (int, optional) – Total training iterations.

call_hook(fn_name: str) → None[source]¶

Call all hooks.

Parameters: fn_name (str) – The function name in each hook to be called, such as “before_train_epoch”.

current_lr() → Union[List[float], Dict[str, List[float]]][source]¶

Get current learning rates.

Returns: Current learning rates of all param groups. If the runner has a dict of optimizers, this method will return a dict.
Return type: list[float] | dict[str, list[float]]

current_momentum() → Union[List[float], Dict[str, List[float]]][source]¶

Get current momentums.

Returns: Current momentums of all param groups. If the runner has a dict of optimizers, this method will return a dict.
Return type: list[float] | dict[str, list[float]]

property epoch: int¶

Current epoch.

Type: int

property hooks: List[mmcv.runner.hooks.hook.Hook]¶

A list of registered hooks.

Type: list[Hook]

property inner_iter: int¶

Iteration in an epoch.

Type: int

property iter: int¶

Current iteration.

Type: int

property max_epochs¶

Maximum training epochs.

Type: int

property max_iters¶

Maximum training iterations.

Type: int

property model_name: str¶

Name of the model, usually the module class name.

Type: str

property rank: int¶

Rank of current process. (distributed training)

Type: int

register_hook(hook: mmcv.runner.hooks.hook.Hook, priority: Union[int, str, mmcv.runner.priority.Priority] = 'NORMAL') → None[source]¶

Register a hook into the hook list.

The hook will be inserted into a priority queue, with the specified priority (See Priority for details of priorities). For hooks with the same priority, they will be triggered in the same order as they are registered.

Parameters

hook (Hook) – The hook to be registered.
priority (int or str or Priority) – Hook priority. Lower value means higher priority.

register_hook_from_cfg(hook_cfg: Dict) → None[source]¶

Register a hook from its cfg.

Parameters: hook_cfg (dict) – Hook config. It should have at least keys ‘type’ and ‘priority’ indicating its type and priority.

Note

The specific hook class to register should not use ‘type’ and ‘priority’ arguments during initialization.

register_training_hooks(lr_config: Optional[Union[Dict, mmcv.runner.hooks.hook.Hook]], optimizer_config: Optional[Union[Dict, mmcv.runner.hooks.hook.Hook]] = None, checkpoint_config: Optional[Union[Dict, mmcv.runner.hooks.hook.Hook]] = None, log_config: Optional[Dict] = None, momentum_config: Optional[Union[Dict, mmcv.runner.hooks.hook.Hook]] = None, timer_config: Union[Dict, mmcv.runner.hooks.hook.Hook] = {'type': 'IterTimerHook'}, custom_hooks_config: Optional[Union[List, Dict, mmcv.runner.hooks.hook.Hook]] = None) → None[source]¶

Register default and custom hooks for training.

Default and custom hooks include:

Hooks	Priority
LrUpdaterHook	VERY_HIGH (10)
MomentumUpdaterHook	HIGH (30)
OptimizerStepperHook	ABOVE_NORMAL (40)
CheckpointSaverHook	NORMAL (50)
IterTimerHook	LOW (70)
LoggerHook(s)	VERY_LOW (90)
CustomHook(s)	defaults to NORMAL (50)

If custom hooks have same priority with default hooks, custom hooks will be triggered after default hooks.

property world_size: int¶

Number of processes participating in the job. (distributed training)

Type: int

class mmcv.runner.CheckpointHook(interval: int = - 1, by_epoch: bool = True, save_optimizer: bool = True, out_dir: Optional[str] = None, max_keep_ckpts: int = - 1, save_last: bool = True, sync_buffer: bool = False, file_client_args: Optional[dict] = None, **kwargs)[source]¶

Save checkpoints periodically.

Parameters

interval (int) – The saving period. If by_epoch=True, interval indicates epochs, otherwise it indicates iterations. Default: -1, which means “never”.
by_epoch (bool) – Saving checkpoints by epoch or by iteration. Default: True.
save_optimizer (bool) – Whether to save optimizer state_dict in the checkpoint. It is usually used for resuming experiments. Default: True.
out_dir (str, optional) – The root directory to save checkpoints. If not specified, runner.work_dir will be used by default. If specified, the out_dir will be the concatenation of out_dir and the last level directory of runner.work_dir. Changed in version 1.3.16.
max_keep_ckpts (int, optional) – The maximum checkpoints to keep. In some cases we want only the latest few checkpoints and would like to delete old ones to save the disk space. Default: -1, which means unlimited.
save_last (bool, optional) – Whether to force the last checkpoint to be saved regardless of interval. Default: True.
sync_buffer (bool, optional) – Whether to synchronize buffers in different gpus. Default: False.
file_client_args (dict, optional) – Arguments to instantiate a FileClient. See mmcv.fileio.FileClient for details. Default: None. New in version 1.3.16.

Warning

Before v1.3.16, the out_dir argument indicates the path where the checkpoint is stored. However, since v1.3.16, out_dir indicates the root directory and the final path to save checkpoint is the concatenation of out_dir and the last level directory of runner.work_dir. Suppose the value of out_dir is “/path/of/A” and the value of runner.work_dir is “/path/of/B”, then the final path will be “/path/of/A/B”.

class mmcv.runner.CheckpointLoader[source]¶

A general checkpoint loader to manage all schemes.

classmethod load_checkpoint(filename: str, map_location: Optional[Union[str, Callable]] = None, logger: Optional[logging.Logger] = None) → Union[dict, collections.OrderedDict][source]¶

load checkpoint through URL scheme path.

Parameters

filename (str) – checkpoint file name with given prefix
map_location (str, optional) – Same as torch.load(). Default: None
logger (logging.Logger, optional) – The logger for message. Default: None

Returns

The loaded checkpoint.

Return type

dict or OrderedDict

classmethod register_scheme(prefixes: Union[str, List[str], Tuple[str, ...]], loader: Optional[Callable] = None, force: bool = False) → Callable[source]¶

Register a loader to CheckpointLoader.

This method can be used as a normal class method or a decorator.

Parameters

prefixes (str or Sequence[str]) –
prefix of the registered loader. (The) –
loader (function, optional) – The loader function to be registered. When this method is used as a decorator, loader is None. Defaults to None.
force (bool, optional) – Whether to override the loader if the prefix has already been registered. Defaults to False.

class mmcv.runner.ClearMLLoggerHook(init_kwargs: Optional[Dict] = None, interval: int = 10, ignore_last: bool = True, reset_flag: bool = False, by_epoch: bool = True)[source]¶

Class to log metrics with clearml.

It requires clearml to be installed.

Parameters

init_kwargs (dict) – A dict contains the clearml.Task.init initialization keys. See taskinit for more details.
interval (int) – Logging interval (every k iterations). Default 10.
ignore_last (bool) – Ignore the log of last iterations in each epoch if less than interval. Default: True.
reset_flag (bool) – Whether to clear the output buffer after logging. Default: False.
by_epoch (bool) – Whether EpochBasedRunner is used. Default: True.

class mmcv.runner.CosineAnnealingLrUpdaterHook(min_lr: Optional[float] = None, min_lr_ratio: Optional[float] = None, **kwargs)[source]¶

CosineAnnealing LR scheduler.

Parameters

min_lr (float, optional) – The minimum lr. Default: None.
min_lr_ratio (float, optional) – The ratio of minimum lr to the base lr. Either min_lr or min_lr_ratio should be specified. Default: None.

class mmcv.runner.CosineAnnealingMomentumUpdaterHook(min_momentum: Optional[float] = None, min_momentum_ratio: Optional[float] = None, **kwargs)[source]¶

Cosine annealing LR Momentum decays the Momentum of each parameter group linearly.

Parameters

min_momentum (float, optional) – The minimum momentum. Default: None.
min_momentum_ratio (float, optional) – The ratio of minimum momentum to the base momentum. Either min_momentum or min_momentum_ratio should be specified. Default: None.

class mmcv.runner.CosineRestartLrUpdaterHook(periods: List[int], restart_weights: List[float] = [1], min_lr: Optional[float] = None, min_lr_ratio: Optional[float] = None, **kwargs)[source]¶

Cosine annealing with restarts learning rate scheme.

Parameters

periods (list[int]) – Periods for each cosine anneling cycle.
restart_weights (list[float]) – Restart weights at each restart iteration. Defaults to [1].
min_lr (float, optional) – The minimum lr. Default: None.
min_lr_ratio (float, optional) – The ratio of minimum lr to the base lr. Either min_lr or min_lr_ratio should be specified. Default: None.

class mmcv.runner.CyclicLrUpdaterHook(by_epoch: bool = False, target_ratio: Union[float, tuple] = (10, 0.0001), cyclic_times: int = 1, step_ratio_up: float = 0.4, anneal_strategy: str = 'cos', gamma: float = 1, **kwargs)[source]¶

Cyclic LR Scheduler.

Implement the cyclical learning rate policy (CLR) described in https://arxiv.org/pdf/1506.01186.pdf

Different from the original paper, we use cosine annealing rather than triangular policy inside a cycle. This improves the performance in the 3D detection area.

Parameters

by_epoch (bool, optional) – Whether to update LR by epoch.
target_ratio (tuple[float], optional) – Relative ratio of the highest LR and the lowest LR to the initial LR.
cyclic_times (int, optional) – Number of cycles during training
step_ratio_up (float, optional) – The ratio of the increasing process of LR in the total cycle.
anneal_strategy (str, optional) – {‘cos’, ‘linear’} Specifies the annealing strategy: ‘cos’ for cosine annealing, ‘linear’ for linear annealing. Default: ‘cos’.
gamma (float, optional) – Cycle decay ratio. Default: 1. It takes values in the range (0, 1]. The difference between the maximum learning rate and the minimum learning rate decreases periodically when it is less than 1. New in version 1.4.4.

class mmcv.runner.CyclicMomentumUpdaterHook(by_epoch: bool = False, target_ratio: Tuple[float, float] = (0.8947368421052632, 1.0), cyclic_times: int = 1, step_ratio_up: float = 0.4, anneal_strategy: str = 'cos', gamma: float = 1.0, **kwargs)[source]¶

Cyclic momentum Scheduler.

Implement the cyclical momentum scheduler policy described in https://arxiv.org/pdf/1708.07120.pdf

This momentum scheduler usually used together with the CyclicLRUpdater to improve the performance in the 3D detection area.

Parameters

target_ratio (tuple[float]) – Relative ratio of the lowest momentum and the highest momentum to the initial momentum.
cyclic_times (int) – Number of cycles during training
step_ratio_up (float) – The ratio of the increasing process of momentum in the total cycle.
by_epoch (bool) – Whether to update momentum by epoch.
anneal_strategy (str, optional) – {‘cos’, ‘linear’} Specifies the annealing strategy: ‘cos’ for cosine annealing, ‘linear’ for linear annealing. Default: ‘cos’.
gamma (float, optional) – Cycle decay ratio. Default: 1. It takes values in the range (0, 1]. The difference between the maximum learning rate and the minimum learning rate decreases periodically when it is less than 1. New in version 1.4.4.

class mmcv.runner.DefaultOptimizerConstructor(optimizer_cfg: Dict, paramwise_cfg: Optional[Dict] = None)[source]¶

Default constructor for optimizers.

By default each parameter share the same optimizer settings, and we provide an argument paramwise_cfg to specify parameter-wise settings. It is a dict and may contain the following fields:

custom_keys (dict): Specified parameters-wise settings by keys. If one of the keys in custom_keys is a substring of the name of one parameter, then the setting of the parameter will be specified by custom_keys[key] and other setting like bias_lr_mult etc. will be ignored. It should be noted that the aforementioned key is the longest key that is a substring of the name of the parameter. If there are multiple matched keys with the same length, then the key with lower alphabet order will be chosen. custom_keys[key] should be a dict and may contain fields lr_mult and decay_mult. See Example 2 below.
bias_lr_mult (float): It will be multiplied to the learning rate for all bias parameters (except for those in normalization layers and offset layers of DCN).
bias_decay_mult (float): It will be multiplied to the weight decay for all bias parameters (except for those in normalization layers, depthwise conv layers, offset layers of DCN).
norm_decay_mult (float): It will be multiplied to the weight decay for all weight and bias parameters of normalization layers.
dwconv_decay_mult (float): It will be multiplied to the weight decay for all weight and bias parameters of depthwise conv layers.
dcn_offset_lr_mult (float): It will be multiplied to the learning rate for parameters of offset layer in the deformable convs of a model.
bypass_duplicate (bool): If true, the duplicate parameters would not be added into optimizer. Default: False.

Note

1. If the option dcn_offset_lr_mult is used, the constructor will override the effect of bias_lr_mult in the bias of offset layer. So be careful when using both bias_lr_mult and dcn_offset_lr_mult. If you wish to apply both of them to the offset layer in deformable convs, set dcn_offset_lr_mult to the original dcn_offset_lr_mult * bias_lr_mult.

2. If the option dcn_offset_lr_mult is used, the constructor will apply it to all the DCN layers in the model. So be careful when the model contains multiple DCN layers in places other than backbone.

Parameters

model (nn.Module) – The model with parameters to be optimized.
optimizer_cfg (dict) –
The config dict of the optimizer. Positional fields are
- type: class name of the optimizer.
Optional fields are
- any arguments of the corresponding optimizer type, e.g., lr, weight_decay, momentum, etc.
paramwise_cfg (dict, optional) – Parameter-wise options.

Example 1:

>>> model = torch.nn.modules.Conv1d(1, 1, 1)
>>> optimizer_cfg = dict(type='SGD', lr=0.01, momentum=0.9,
>>>                      weight_decay=0.0001)
>>> paramwise_cfg = dict(norm_decay_mult=0.)
>>> optim_builder = DefaultOptimizerConstructor(
>>>     optimizer_cfg, paramwise_cfg)
>>> optimizer = optim_builder(model)

Example 2:

>>> # assume model have attribute model.backbone and model.cls_head
>>> optimizer_cfg = dict(type='SGD', lr=0.01, weight_decay=0.95)
>>> paramwise_cfg = dict(custom_keys={
        'backbone': dict(lr_mult=0.1, decay_mult=0.9)})
>>> optim_builder = DefaultOptimizerConstructor(
>>>     optimizer_cfg, paramwise_cfg)
>>> optimizer = optim_builder(model)
>>> # Then the `lr` and `weight_decay` for model.backbone is
>>> # (0.01 * 0.1, 0.95 * 0.9). `lr` and `weight_decay` for
>>> # model.cls_head is (0.01, 0.95).

add_params(params: List[Dict], module: torch.nn.modules.module.Module, prefix: str = '', is_dcn_module: Optional[Union[int, float]] = None) → None[source]¶

Add all parameters of module to the params list.

The parameters of the given module will be added to the list of param groups, with specific rules defined by paramwise_cfg.

Parameters

params (list[dict]) – A list of param groups, it will be modified in place.
module (nn.Module) – The module to be added.
prefix (str) – The prefix of the module
is_dcn_module (int|float|None) – If the current module is a submodule of DCN, is_dcn_module will be passed to control conv_offset layer’s learning rate. Defaults to None.

class mmcv.runner.DefaultRunnerConstructor(runner_cfg: dict, default_args: Optional[dict] = None)[source]¶

Default constructor for runners.

Custom existing Runner like EpocBasedRunner though RunnerConstructor. For example, We can inject some new properties and functions for Runner.

Example

>>> from mmcv.runner import RUNNER_BUILDERS, build_runner
>>> # Define a new RunnerReconstructor
>>> @RUNNER_BUILDERS.register_module()
>>> class MyRunnerConstructor:
...     def __init__(self, runner_cfg, default_args=None):
...         if not isinstance(runner_cfg, dict):
...             raise TypeError('runner_cfg should be a dict',
...                             f'but got {type(runner_cfg)}')
...         self.runner_cfg = runner_cfg
...         self.default_args = default_args
...
...     def __call__(self):
...         runner = RUNNERS.build(self.runner_cfg,
...                                default_args=self.default_args)
...         # Add new properties for existing runner
...         runner.my_name = 'my_runner'
...         runner.my_function = lambda self: print(self.my_name)
...         ...
>>> # build your runner
>>> runner_cfg = dict(type='EpochBasedRunner', max_epochs=40,
...                   constructor='MyRunnerConstructor')
>>> runner = build_runner(runner_cfg)

class mmcv.runner.DistEvalHook(dataloader: torch.utils.data.dataloader.DataLoader, start: Optional[int] = None, interval: int = 1, by_epoch: bool = True, save_best: Optional[str] = None, rule: Optional[str] = None, test_fn: Optional[Callable] = None, greater_keys: Optional[List[str]] = None, less_keys: Optional[List[str]] = None, broadcast_bn_buffer: bool = True, tmpdir: Optional[str] = None, gpu_collect: bool = False, out_dir: Optional[str] = None, file_client_args: Optional[dict] = None, **eval_kwargs)[source]¶

Distributed evaluation hook.

This hook will regularly perform evaluation in a given interval when performing in distributed environment.

Parameters

dataloader (DataLoader) – A PyTorch dataloader, whose dataset has implemented evaluate function.
start (int | None, optional) – Evaluation starting epoch. It enables evaluation before the training starts if start <= the resuming epoch. If None, whether to evaluate is merely decided by interval. Default: None.
interval (int) – Evaluation interval. Default: 1.
by_epoch (bool) – Determine perform evaluation by epoch or by iteration. If set to True, it will perform by epoch. Otherwise, by iteration. default: True.
save_best (str, optional) – If a metric is specified, it would measure the best checkpoint during evaluation. The information about best checkpoint would be saved in runner.meta['hook_msgs'] to keep best score value and best checkpoint path, which will be also loaded when resume checkpoint. Options are the evaluation metrics on the test dataset. e.g., bbox_mAP, segm_mAP for bbox detection and instance segmentation. AR@100 for proposal recall. If save_best is auto, the first key of the returned OrderedDict result will be used. Default: None.
rule (str | None, optional) – Comparison rule for best score. If set to None, it will infer a reasonable rule. Keys such as ‘acc’, ‘top’ .etc will be inferred by ‘greater’ rule. Keys contain ‘loss’ will be inferred by ‘less’ rule. Options are ‘greater’, ‘less’, None. Default: None.
test_fn (callable, optional) – test a model with samples from a dataloader in a multi-gpu manner, and return the test results. If None, the default test function mmcv.engine.multi_gpu_test will be used. (default: None)
tmpdir (str | None) – Temporary directory to save the results of all processes. Default: None.
gpu_collect (bool) – Whether to use gpu or cpu to collect results. Default: False.
broadcast_bn_buffer (bool) – Whether to broadcast the buffer(running_mean and running_var) of rank 0 to other rank before evaluation. Default: True.
out_dir (str, optional) – The root directory to save checkpoints. If not specified, runner.work_dir will be used by default. If specified, the out_dir will be the concatenation of out_dir and the last level directory of runner.work_dir.
file_client_args (dict) – Arguments to instantiate a FileClient. See mmcv.fileio.FileClient for details. Default: None.
**eval_kwargs – Evaluation arguments fed into the evaluate function of the dataset.

class mmcv.runner.DistSamplerSeedHook[source]¶

Data-loading sampler for distributed training.

When distributed training, it is only useful in conjunction with EpochBasedRunner, while IterBasedRunner achieves the same purpose with IterLoader.

class mmcv.runner.DvcliveLoggerHook(model_file: Optional[str] = None, interval: int = 10, ignore_last: bool = True, reset_flag: bool = False, by_epoch: bool = True, dvclive=None, **kwargs)[source]¶

Class to log metrics with dvclive.

It requires dvclive to be installed.

Parameters

model_file (str) – Default None. If not None, after each epoch the model will be saved to {model_file}.
interval (int) – Logging interval (every k iterations). Default 10.
ignore_last (bool) – Ignore the log of last iterations in each epoch if less than interval. Default: True.
reset_flag (bool) – Whether to clear the output buffer after logging. Default: False.
by_epoch (bool) – Whether EpochBasedRunner is used. Determines whether log is called after_train_iter or after_train_epoch. Default: True.
dvclive (Live, optional) – An instance of the Live logger to use instead of initializing a new one internally. Defaults to None.
kwargs – Arguments for instantiating Live (ignored if dvclive is provided).

class mmcv.runner.EMAHook(momentum: float = 0.0002, interval: int = 1, warm_up: int = 100, resume_from: Optional[str] = None)[source]¶

Exponential Moving Average Hook.

Use Exponential Moving Average on all parameters of model in training process. All parameters have a ema backup, which update by the formula as below. EMAHook takes priority over EvalHook and CheckpointSaverHook.

\[Xema\_{t+1} = (1 - \text{momentum}) \times Xema\_{t} + \text{momentum} \times X_t\]

Parameters

momentum (float) – The momentum used for updating ema parameter. Defaults to 0.0002.
interval (int) – Update ema parameter every interval iteration. Defaults to 1.
warm_up (int) – During first warm_up steps, we may use smaller momentum to update ema parameters more slowly. Defaults to 100.
resume_from (str, optional) – The checkpoint path. Defaults to None.

after_train_epoch(runner)[source]¶: We load parameter values from ema backup to model before the EvalHook.

after_train_iter(runner)[source]¶: Update ema parameter every self.interval iterations.

before_run(runner)[source]¶

To resume model with it’s ema parameters more friendly.

Register ema parameter as named_buffer to model

before_train_epoch(runner)[source]¶: We recover model’s parameter from ema backup after last epoch’s EvalHook.

class mmcv.runner.EpochBasedRunner(model: torch.nn.modules.module.Module, batch_processor: Optional[Callable] = None, optimizer: Optional[Union[Dict, torch.optim.optimizer.Optimizer]] = None, work_dir: Optional[str] = None, logger: Optional[logging.Logger] = None, meta: Optional[Dict] = None, max_iters: Optional[int] = None, max_epochs: Optional[int] = None)[source]¶

Epoch-based Runner.

This runner train models epoch by epoch.

run(data_loaders: List[torch.utils.data.dataloader.DataLoader], workflow: List[Tuple[str, int]], max_epochs: Optional[int] = None, **kwargs) → None[source]¶

Start running.

Parameters

data_loaders (list[DataLoader]) – Dataloaders for training and validation.
workflow (list[tuple]) – A list of (phase, epochs) to specify the running order and epochs. E.g, [(‘train’, 2), (‘val’, 1)] means running 2 epochs for training and 1 epoch for validation, iteratively.

save_checkpoint(out_dir: str, filename_tmpl: str = 'epoch_{}.pth', save_optimizer: bool = True, meta: Optional[Dict] = None, create_symlink: bool = True) → None[source]¶

Save the checkpoint.

Parameters

out_dir (str) – The directory that checkpoints are saved.
filename_tmpl (str, optional) – The checkpoint filename template, which contains a placeholder for the epoch number. Defaults to ‘epoch_{}.pth’.
save_optimizer (bool, optional) – Whether to save the optimizer to the checkpoint. Defaults to True.
meta (dict, optional) – The meta information to be saved in the checkpoint. Defaults to None.
create_symlink (bool, optional) – Whether to create a symlink “latest.pth” to point to the latest checkpoint. Defaults to True.

class mmcv.runner.EvalHook(dataloader: torch.utils.data.dataloader.DataLoader, start: Optional[int] = None, interval: int = 1, by_epoch: bool = True, save_best: Optional[str] = None, rule: Optional[str] = None, test_fn: Optional[Callable] = None, greater_keys: Optional[List[str]] = None, less_keys: Optional[List[str]] = None, out_dir: Optional[str] = None, file_client_args: Optional[dict] = None, **eval_kwargs)[source]¶

Non-Distributed evaluation hook.

This hook will regularly perform evaluation in a given interval when performing in non-distributed environment.

Parameters

dataloader (DataLoader) – A PyTorch dataloader, whose dataset has implemented evaluate function.
start (int | None, optional) – Evaluation starting epoch or iteration. It enables evaluation before the training starts if start <= the resuming epoch or iteration. If None, whether to evaluate is merely decided by interval. Default: None.
interval (int) – Evaluation interval. Default: 1.
by_epoch (bool) – Determine perform evaluation by epoch or by iteration. If set to True, it will perform by epoch. Otherwise, by iteration. Default: True.
save_best (str, optional) – If a metric is specified, it would measure the best checkpoint during evaluation. The information about best checkpoint would be saved in runner.meta['hook_msgs'] to keep best score value and best checkpoint path, which will be also loaded when resume checkpoint. Options are the evaluation metrics on the test dataset. e.g., bbox_mAP, segm_mAP for bbox detection and instance segmentation. AR@100 for proposal recall. If save_best is auto, the first key of the returned OrderedDict result will be used. Default: None.
rule (str | None, optional) – Comparison rule for best score. If set to None, it will infer a reasonable rule. Keys such as ‘acc’, ‘top’ .etc will be inferred by ‘greater’ rule. Keys contain ‘loss’ will be inferred by ‘less’ rule. Options are ‘greater’, ‘less’, None. Default: None.
test_fn (callable, optional) – test a model with samples from a dataloader, and return the test results. If None, the default test function mmcv.engine.single_gpu_test will be used. (default: None)
greater_keys (List[str] | None, optional) – Metric keys that will be inferred by ‘greater’ comparison rule. If None, _default_greater_keys will be used. (default: None)
less_keys (List[str] | None, optional) – Metric keys that will be inferred by ‘less’ comparison rule. If None, _default_less_keys will be used. (default: None)
out_dir (str, optional) – The root directory to save checkpoints. If not specified, runner.work_dir will be used by default. If specified, the out_dir will be the concatenation of out_dir and the last level directory of runner.work_dir. New in version 1.3.16.
file_client_args (dict) – Arguments to instantiate a FileClient. See mmcv.fileio.FileClient for details. Default: None. New in version 1.3.16.
**eval_kwargs – Evaluation arguments fed into the evaluate function of the dataset.

Note

If new arguments are added for EvalHook, tools/test.py, tools/eval_metric.py may be affected.

after_train_epoch(runner)[source]¶: Called after every training epoch to evaluate the results.

after_train_iter(runner)[source]¶: Called after every training iter to evaluate the results.

before_train_epoch(runner)[source]¶: Evaluate the model only at the start of training by epoch.

before_train_iter(runner)[source]¶: Evaluate the model only at the start of training by iteration.

evaluate(runner, results)[source]¶

Evaluate the results.

Parameters

runner (mmcv.Runner) – The underlined training runner.
results (list) – Output results.

class mmcv.runner.ExpLrUpdaterHook(gamma: float, **kwargs)[source]¶

class mmcv.runner.FixedLrUpdaterHook(**kwargs)[source]¶

class mmcv.runner.FlatCosineAnnealingLrUpdaterHook(start_percent: float = 0.75, min_lr: Optional[float] = None, min_lr_ratio: Optional[float] = None, **kwargs)[source]¶

Flat + Cosine lr schedule.

Modified from https://github.com/fastai/fastai/blob/master/fastai/callback/schedule.py#L128 # noqa: E501

Parameters

start_percent (float) – When to start annealing the learning rate after the percentage of the total training steps. The value should be in range [0, 1). Default: 0.75
min_lr (float, optional) – The minimum lr. Default: None.
min_lr_ratio (float, optional) – The ratio of minimum lr to the base lr. Either min_lr or min_lr_ratio should be specified. Default: None.

class mmcv.runner.Fp16OptimizerHook(grad_clip: Optional[dict] = None, coalesce: bool = True, bucket_size_mb: int = - 1, loss_scale: Union[float, str, dict] = 512.0, distributed: bool = True)[source]¶

FP16 optimizer hook (using PyTorch’s implementation).

If you are using PyTorch >= 1.6, torch.cuda.amp is used as the backend, to take care of the optimization procedure.

Parameters: loss_scale (float | str | dict) – Scale factor configuration. If loss_scale is a float, static loss scaling will be used with the specified scale. If loss_scale is a string, it must be ‘dynamic’, then dynamic loss scaling will be used. It can also be a dict containing arguments of GradScalar. Defaults to 512. For Pytorch >= 1.6, mmcv uses official implementation of GradScaler. If you use a dict version of loss_scale to create GradScaler, please refer to: https://pytorch.org/docs/stable/amp.html#torch.cuda.amp.GradScaler for the parameters.

Examples

>>> loss_scale = dict(
...     init_scale=65536.0,
...     growth_factor=2.0,
...     backoff_factor=0.5,
...     growth_interval=2000
... )
>>> optimizer_hook = Fp16OptimizerHook(loss_scale=loss_scale)

after_train_iter(runner) → None[source]¶

Backward optimization steps for Mixed Precision Training. For dynamic loss scaling, please refer to https://pytorch.org/docs/stable/amp.html#torch.cuda.amp.GradScaler.

Scale the loss by a scale factor.
Backward the loss to obtain the gradients.
Unscale the optimizer’s gradient tensors.
Call optimizer.step() and update scale factor.
Save loss_scaler state_dict for resume purpose.

before_run(runner) → None[source]¶: Preparing steps before Mixed Precision Training.

copy_grads_to_fp32(fp16_net: torch.nn.modules.module.Module, fp32_weights: torch.Tensor) → None[source]¶: Copy gradients from fp16 model to fp32 weight copy.

copy_params_to_fp16(fp16_net: torch.nn.modules.module.Module, fp32_weights: torch.Tensor) → None[source]¶: Copy updated params from fp32 weight copy to fp16 model.

class mmcv.runner.GradientCumulativeFp16OptimizerHook(*args, **kwargs)[source]¶

Fp16 optimizer Hook (using PyTorch’s implementation) implements multi-iters gradient cumulating.

If you are using PyTorch >= 1.6, torch.cuda.amp is used as the backend, to take care of the optimization procedure.

after_train_iter(runner) → None[source]¶

Backward optimization steps for Mixed Precision Training. For dynamic loss scaling, please refer to https://pytorch.org/docs/stable/amp.html#torch.cuda.amp.GradScaler.

Scale the loss by a scale factor.
Backward the loss to obtain the gradients.
Unscale the optimizer’s gradient tensors.
Call optimizer.step() and update scale factor.
Save loss_scaler state_dict for resume purpose.

class mmcv.runner.GradientCumulativeOptimizerHook(cumulative_iters: int = 1, **kwargs)[source]¶

Optimizer Hook implements multi-iters gradient cumulating.

Parameters: cumulative_iters (int, optional) – Num of gradient cumulative iters. The optimizer will step every cumulative_iters iters. Defaults to 1.

Examples

>>> # Use cumulative_iters to simulate a large batch size
>>> # It is helpful when the hardware cannot handle a large batch size.
>>> loader = DataLoader(data, batch_size=64)
>>> optim_hook = GradientCumulativeOptimizerHook(cumulative_iters=4)
>>> # almost equals to
>>> loader = DataLoader(data, batch_size=256)
>>> optim_hook = OptimizerHook()

class mmcv.runner.InvLrUpdaterHook(gamma: float, power: float = 1.0, **kwargs)[source]¶

class mmcv.runner.IterBasedRunner(model: torch.nn.modules.module.Module, batch_processor: Optional[Callable] = None, optimizer: Optional[Union[Dict, torch.optim.optimizer.Optimizer]] = None, work_dir: Optional[str] = None, logger: Optional[logging.Logger] = None, meta: Optional[Dict] = None, max_iters: Optional[int] = None, max_epochs: Optional[int] = None)[source]¶

Iteration-based Runner.

This runner train models iteration by iteration.

register_training_hooks(lr_config, optimizer_config=None, checkpoint_config=None, log_config=None, momentum_config=None, custom_hooks_config=None)[source]¶

Register default hooks for iter-based training.

Checkpoint hook, optimizer stepper hook and logger hooks will be set to by_epoch=False by default.

Default hooks include:

Hooks	Priority
LrUpdaterHook	VERY_HIGH (10)
MomentumUpdaterHook	HIGH (30)
OptimizerStepperHook	ABOVE_NORMAL (40)
CheckpointSaverHook	NORMAL (50)
IterTimerHook	LOW (70)
LoggerHook(s)	VERY_LOW (90)
CustomHook(s)	defaults to NORMAL (50)

If custom hooks have same priority with default hooks, custom hooks will be triggered after default hooks.

resume(checkpoint: str, resume_optimizer: bool = True, map_location: Union[str, Callable] = 'default') → None[source]¶

Resume model from checkpoint.

Parameters

checkpoint (str) – Checkpoint to resume from.
resume_optimizer (bool, optional) – Whether resume the optimizer(s) if the checkpoint file includes optimizer(s). Default to True.
map_location (str, optional) – Same as torch.load(). Default to ‘default’.

run(data_loaders: List[torch.utils.data.dataloader.DataLoader], workflow: List[Tuple[str, int]], max_iters: Optional[int] = None, **kwargs) → None[source]¶

Start running.

Parameters

data_loaders (list[DataLoader]) – Dataloaders for training and validation.
workflow (list[tuple]) – A list of (phase, iters) to specify the running order and iterations. E.g, [(‘train’, 10000), (‘val’, 1000)] means running 10000 iterations for training and 1000 iterations for validation, iteratively.

save_checkpoint(out_dir: str, filename_tmpl: str = 'iter_{}.pth', meta: Optional[Dict] = None, save_optimizer: bool = True, create_symlink: bool = True) → None[source]¶

Save checkpoint to file.

Parameters

out_dir (str) – Directory to save checkpoint files.
filename_tmpl (str, optional) – Checkpoint file template. Defaults to ‘iter_{}.pth’.
meta (dict, optional) – Metadata to be saved in checkpoint. Defaults to None.
save_optimizer (bool, optional) – Whether save optimizer. Defaults to True.
create_symlink (bool, optional) – Whether create symlink to the latest checkpoint file. Defaults to True.

class mmcv.runner.LinearAnnealingLrUpdaterHook(min_lr: Optional[float] = None, min_lr_ratio: Optional[float] = None, **kwargs)[source]¶

Linear annealing LR Scheduler decays the learning rate of each parameter group linearly.

Parameters

min_lr (float, optional) – The minimum lr. Default: None.
min_lr_ratio (float, optional) – The ratio of minimum lr to the base lr. Either min_lr or min_lr_ratio should be specified. Default: None.

class mmcv.runner.LinearAnnealingMomentumUpdaterHook(min_momentum: Optional[float] = None, min_momentum_ratio: Optional[float] = None, **kwargs)[source]¶

Linear annealing LR Momentum decays the Momentum of each parameter group linearly.

Parameters

min_momentum (float, optional) – The minimum momentum. Default: None.
min_momentum_ratio (float, optional) – The ratio of minimum momentum to the base momentum. Either min_momentum or min_momentum_ratio should be specified. Default: None.

class mmcv.runner.LoggerHook(interval: int = 10, ignore_last: bool = True, reset_flag: bool = False, by_epoch: bool = True)[source]¶

Base class for logger hooks.

Parameters

interval (int) – Logging interval (every k iterations). Default 10.
ignore_last (bool) – Ignore the log of last iterations in each epoch if less than interval. Default True.
reset_flag (bool) – Whether to clear the output buffer after logging. Default False.
by_epoch (bool) – Whether EpochBasedRunner is used. Default True.

get_iter(runner, inner_iter: bool = False) → int[source]¶: Get the current training iteration step.

static is_scalar(val, include_np: bool = True, include_torch: bool = True) → bool[source]¶

Tell the input variable is a scalar or not.

Parameters

val – Input variable.
include_np (bool) – Whether include 0-d np.ndarray as a scalar.
include_torch (bool) – Whether include 0-d torch.Tensor as a scalar.

Returns

True or False.

Return type

bool

class mmcv.runner.LossScaler(init_scale: float = 4294967296, mode: str = 'dynamic', scale_factor: float = 2.0, scale_window: int = 1000)[source]¶

Class that manages loss scaling in mixed precision training which supports both dynamic or static mode.

The implementation refers to https://github.com/NVIDIA/apex/blob/master/apex/fp16_utils/loss_scaler.py. Indirectly, by supplying mode='dynamic' for dynamic loss scaling. It’s important to understand how LossScaler operates. Loss scaling is designed to combat the problem of underflowing gradients encountered at long times when training fp16 networks. Dynamic loss scaling begins by attempting a very high loss scale. Ironically, this may result in OVERflowing gradients. If overflowing gradients are encountered, FP16_Optimizer then skips the update step for this particular iteration/minibatch, and LossScaler adjusts the loss scale to a lower value. If a certain number of iterations occur without overflowing gradients detected,:class:LossScaler increases the loss scale once more. In this way LossScaler attempts to “ride the edge” of always using the highest loss scale possible without incurring overflow.

Parameters

init_scale (float) – Initial loss scale value, default: 2**32.
scale_factor (float) – Factor used when adjusting the loss scale. Default: 2.
mode (str) – Loss scaling mode. ‘dynamic’ or ‘static’
scale_window (int) – Number of consecutive iterations without an overflow to wait before increasing the loss scale. Default: 1000.

has_overflow(params: List[torch.nn.parameter.Parameter]) → bool[source]¶: Check if params contain overflow.

load_state_dict(state_dict: dict) → None[source]¶

Loads the loss_scaler state dict.

Parameters: state_dict (dict) – scaler state.

state_dict() → dict[source]¶: Returns the state of the scaler as a dict.

update_scale(overflow: bool) → None[source]¶: update the current loss scale value when overflow happens.

class mmcv.runner.LrUpdaterHook(by_epoch: bool = True, warmup: Optional[str] = None, warmup_iters: int = 0, warmup_ratio: float = 0.1, warmup_by_epoch: bool = False)[source]¶

LR Scheduler in MMCV.

Parameters

by_epoch (bool) – LR changes epoch by epoch
warmup (string) – Type of warmup used. It can be None(use no warmup), ‘constant’, ‘linear’ or ‘exp’
warmup_iters (int) – The number of iterations or epochs that warmup lasts
warmup_ratio (float) – LR used at the beginning of warmup equals to warmup_ratio * initial_lr
warmup_by_epoch (bool) – When warmup_by_epoch == True, warmup_iters means the number of epochs that warmup lasts, otherwise means the number of iteration that warmup lasts

class mmcv.runner.MlflowLoggerHook(exp_name: Optional[str] = None, tags: Optional[Dict] = None, params: Optional[Dict] = None, log_model: bool = True, interval: int = 10, ignore_last: bool = True, reset_flag: bool = False, by_epoch: bool = True)[source]¶

Class to log metrics and (optionally) a trained model to MLflow.

It requires MLflow to be installed.

Parameters

exp_name (str, optional) – Name of the experiment to be used. Default None. If not None, set the active experiment. If experiment does not exist, an experiment with provided name will be created.
tags (Dict[str], optional) – Tags for the current run. Default None. If not None, set tags for the current run.
params (Dict[str], optional) – Params for the current run. Default None. If not None, set params for the current run.
log_model (bool, optional) – Whether to log an MLflow artifact. Default True. If True, log runner.model as an MLflow artifact for the current run.
interval (int) – Logging interval (every k iterations). Default: 10.
ignore_last (bool) – Ignore the log of last iterations in each epoch if less than interval. Default: True.
reset_flag (bool) – Whether to clear the output buffer after logging. Default: False.
by_epoch (bool) – Whether EpochBasedRunner is used. Default: True.

class mmcv.runner.ModuleDict(modules: Optional[dict] = None, init_cfg: Optional[dict] = None)[source]¶

ModuleDict in openmmlab.

Parameters

modules (dict, optional) – a mapping (dictionary) of (string: module) or an iterable of key-value pairs of type (string, module).
init_cfg (dict, optional) – Initialization config dict.

class mmcv.runner.ModuleList(modules: Optional[Iterable] = None, init_cfg: Optional[dict] = None)[source]¶

ModuleList in openmmlab.

Parameters

modules (iterable, optional) – an iterable of modules to add.
init_cfg (dict, optional) – Initialization config dict.

class mmcv.runner.NeptuneLoggerHook(init_kwargs: Optional[Dict] = None, interval: int = 10, ignore_last: bool = True, reset_flag: bool = True, with_step: bool = True, by_epoch: bool = True)[source]¶

Class to log metrics to NeptuneAI.

It requires Neptune to be installed.

Parameters

init_kwargs (dict) –
a dict contains the initialization keys as below:
- project (str): Name of a project in a form of namespace/project_name. If None, the value of NEPTUNE_PROJECT environment variable will be taken.
- api_token (str): User’s API token. If None, the value of NEPTUNE_API_TOKEN environment variable will be taken. Note: It is strongly recommended to use NEPTUNE_API_TOKEN environment variable rather than placing your API token in plain text in your source code.
- name (str, optional, default is ‘Untitled’): Editable name of the run. Name is displayed in the run’s Details and in Runs table as a column.
Check https://docs.neptune.ai/api-reference/neptune#init for more init arguments.
interval (int) – Logging interval (every k iterations). Default: 10.
ignore_last (bool) – Ignore the log of last iterations in each epoch if less than interval. Default: True.
reset_flag (bool) – Whether to clear the output buffer after logging. Default: True.
with_step (bool) – If True, the step will be logged from self.get_iters. Otherwise, step will not be logged. Default: True.
by_epoch (bool) – Whether EpochBasedRunner is used. Default: True.

class mmcv.runner.OneCycleLrUpdaterHook(max_lr: Union[float, List], total_steps: Optional[int] = None, pct_start: float = 0.3, anneal_strategy: str = 'cos', div_factor: float = 25, final_div_factor: float = 10000.0, three_phase: bool = False, **kwargs)[source]¶

One Cycle LR Scheduler.

The 1cycle learning rate policy changes the learning rate after every batch. The one cycle learning rate policy is described in https://arxiv.org/pdf/1708.07120.pdf

Parameters

max_lr (float or list) – Upper learning rate boundaries in the cycle for each parameter group.
total_steps (int, optional) – The total number of steps in the cycle. Note that if a value is not provided here, it will be the max_iter of runner. Default: None.
pct_start (float) – The percentage of the cycle (in number of steps) spent increasing the learning rate. Default: 0.3
anneal_strategy (str) – {‘cos’, ‘linear’} Specifies the annealing strategy: ‘cos’ for cosine annealing, ‘linear’ for linear annealing. Default: ‘cos’
div_factor (float) – Determines the initial learning rate via initial_lr = max_lr/div_factor Default: 25
final_div_factor (float) – Determines the minimum learning rate via min_lr = initial_lr/final_div_factor Default: 1e4
three_phase (bool) – If three_phase is True, use a third phase of the schedule to annihilate the learning rate according to final_div_factor instead of modifying the second phase (the first two phases will be symmetrical about the step indicated by pct_start). Default: False

class mmcv.runner.OneCycleMomentumUpdaterHook(base_momentum: Union[float, list, dict] = 0.85, max_momentum: Union[float, list, dict] = 0.95, pct_start: float = 0.3, anneal_strategy: str = 'cos', three_phase: bool = False, **kwargs)[source]¶

OneCycle momentum Scheduler.

This momentum scheduler usually used together with the OneCycleLrUpdater to improve the performance.

Parameters

base_momentum (float or list) – Lower momentum boundaries in the cycle for each parameter group. Note that momentum is cycled inversely to learning rate; at the peak of a cycle, momentum is ‘base_momentum’ and learning rate is ‘max_lr’. Default: 0.85
max_momentum (float or list) – Upper momentum boundaries in the cycle for each parameter group. Functionally, it defines the cycle amplitude (max_momentum - base_momentum). Note that momentum is cycled inversely to learning rate; at the start of a cycle, momentum is ‘max_momentum’ and learning rate is ‘base_lr’ Default: 0.95
pct_start (float) – The percentage of the cycle (in number of steps) spent increasing the learning rate. Default: 0.3
anneal_strategy (str) – {‘cos’, ‘linear’} Specifies the annealing strategy: ‘cos’ for cosine annealing, ‘linear’ for linear annealing. Default: ‘cos’
three_phase (bool) – If three_phase is True, use a third phase of the schedule to annihilate the learning rate according to final_div_factor instead of modifying the second phase (the first two phases will be symmetrical about the step indicated by pct_start). Default: False

class mmcv.runner.OptimizerHook(grad_clip: Optional[dict] = None, detect_anomalous_params: bool = False)[source]¶

A hook contains custom operations for the optimizer.

Parameters

grad_clip (dict, optional) – A config dict to control the clip_grad. Default: None.
detect_anomalous_params (bool) –
This option is only used for debugging which will slow down the training speed. Detect anomalous parameters that are not included in the computational graph with loss as the root. There are two cases
- Parameters were not used during forward pass.
- Parameters were not used to produce loss.
Default: False.

class mmcv.runner.PaviLoggerHook(init_kwargs: Optional[Dict] = None, add_graph: Optional[bool] = None, img_key: Optional[str] = None, add_last_ckpt: bool = False, interval: int = 10, ignore_last: bool = True, reset_flag: bool = False, by_epoch: bool = True, add_graph_kwargs: Optional[Dict] = None, add_ckpt_kwargs: Optional[Dict] = None)[source]¶

Class to visual model, log metrics (for internal use).

Parameters

init_kwargs (dict) –
A dict contains the initialization keys as below:
- name (str, optional): Custom training name. Defaults to None, which means current work_dir.
- project (str, optional): Project name. Defaults to “default”.
- model (str, optional): Training model name. Defaults to current model.
- session_text (str, optional): Session string in YAML format. Defaults to current config.
- training_id (int, optional): Training ID in PAVI, if you want to use an existing training. Defaults to None.
- compare_id (int, optional): Compare ID in PAVI, if you want to add the task to an existing compare. Defaults to None.
- overwrite_last_training (bool, optional): Whether to upload data to the training with the same name in the same project, rather than creating a new one. Defaults to False.
add_graph (bool, optional) – Deprecated. Whether to visual model. Default: False.
img_key (str, optional) – Deprecated. Image key. Defaults to None.
add_last_ckpt (bool) – Whether to save checkpoint after run. Default: False.
interval (int) – Logging interval (every k iterations). Default: True.
ignore_last (bool) – Ignore the log of last iterations in each epoch if less than interval. Default: True.
reset_flag (bool) – Whether to clear the output buffer after logging. Default: False.
by_epoch (bool) – Whether EpochBasedRunner is used. Default: True.
add_graph_kwargs (dict, optional) –
A dict contains the params for adding graph, the keys are as below: - active (bool): Whether to use add_graph. Default: False. - start (int): The epoch or iteration to start. Default: 0. - interval (int): Interval of add_graph. Default: 1. - img_key (str): Get image data from Dataset. Default: ‘img’. - opset_version (int): opset_version of exporting onnx.

Default: 11.
- dummy_forward_kwargs (dict, optional): Set default parameters to
  model forward function except image. For example, you can set {‘return_loss’: False} for mmcls. Default: None.
add_ckpt_kwargs (dict, optional) – A dict contains the params for adding checkpoint, the keys are as below: - active (bool): Whether to upload checkpoint. Default: False. - start (int): The epoch or iteration to start. Default: 0. - interval (int): Interval of upload checkpoint. Default: 1.

get_step(runner) → int[source]¶: Get the total training step/epoch.

class mmcv.runner.PolyLrUpdaterHook(power: float = 1.0, min_lr: float = 0.0, **kwargs)[source]¶

class mmcv.runner.Priority(value)[source]¶

Hook priority levels.

Level	Value
HIGHEST	0
VERY_HIGH	10
HIGH	30
ABOVE_NORMAL	40
NORMAL	50
BELOW_NORMAL	60
LOW	70
VERY_LOW	90
LOWEST	100

class mmcv.runner.Runner(*args, **kwargs)[source]¶: Deprecated name of EpochBasedRunner.

class mmcv.runner.SegmindLoggerHook(interval: int = 10, ignore_last: bool = True, reset_flag: bool = False, by_epoch=True)[source]¶

Class to log metrics to Segmind.

It requires Segmind to be installed.

Parameters

interval (int) – Logging interval (every k iterations). Default: 10.
ignore_last (bool) – Ignore the log of last iterations in each epoch if less than interval. Default True.
reset_flag (bool) – Whether to clear the output buffer after logging. Default False.
by_epoch (bool) – Whether EpochBasedRunner is used. Default True.

class mmcv.runner.Sequential(*args, init_cfg: Optional[dict] = None)[source]¶

Sequential module in openmmlab.

Parameters: init_cfg (dict, optional) – Initialization config dict.

class mmcv.runner.StepLrUpdaterHook(step: Union[int, List[int]], gamma: float = 0.1, min_lr: Optional[float] = None, **kwargs)[source]¶

Step LR scheduler with min_lr clipping.

Parameters

step (int | list[int]) – Step to decay the LR. If an int value is given, regard it as the decay interval. If a list is given, decay LR at these steps.
gamma (float) – Decay LR ratio. Defaults to 0.1.
min_lr (float, optional) – Minimum LR value to keep. If LR after decay is lower than min_lr, it will be clipped to this value. If None is given, we don’t perform lr clipping. Default: None.

class mmcv.runner.StepMomentumUpdaterHook(step: Union[int, List[int]], gamma: float = 0.5, min_momentum: Optional[float] = None, **kwargs)[source]¶

Step momentum scheduler with min value clipping.

Parameters

step (int | list[int]) – Step to decay the momentum. If an int value is given, regard it as the decay interval. If a list is given, decay momentum at these steps.
gamma (float, optional) – Decay momentum ratio. Default: 0.5.
min_momentum (float, optional) – Minimum momentum value to keep. If momentum after decay is lower than this value, it will be clipped accordingly. If None is given, we don’t perform lr clipping. Default: None.

class mmcv.runner.SyncBuffersHook(distributed: bool = True)[source]¶

Synchronize model buffers such as running_mean and running_var in BN at the end of each epoch.

Parameters: distributed (bool) – Whether distributed training is used. It is effective only for distributed training. Defaults to True.

after_epoch(runner)[source]¶: All-reduce model buffers at the end of each epoch.

class mmcv.runner.TensorboardLoggerHook(log_dir: Optional[str] = None, interval: int = 10, ignore_last: bool = True, reset_flag: bool = False, by_epoch: bool = True)[source]¶

Class to log metrics to Tensorboard.

Parameters

log_dir (string) – Save directory location. Default: None. If default values are used, directory location is runner.work_dir/tf_logs.
interval (int) – Logging interval (every k iterations). Default: True.
ignore_last (bool) – Ignore the log of last iterations in each epoch if less than interval. Default: True.
reset_flag (bool) – Whether to clear the output buffer after logging. Default: False.
by_epoch (bool) – Whether EpochBasedRunner is used. Default: True.

class mmcv.runner.TextLoggerHook(by_epoch: bool = True, interval: int = 10, ignore_last: bool = True, reset_flag: bool = False, interval_exp_name: int = 1000, out_dir: Optional[str] = None, out_suffix: Union[str, tuple] = ('.log.json', '.log', '.py'), keep_local: bool = True, file_client_args: Optional[Dict] = None)[source]¶

Logger hook in text.

In this logger hook, the information will be printed on terminal and saved in json file.

Parameters

by_epoch (bool, optional) – Whether EpochBasedRunner is used. Default: True.
interval (int, optional) – Logging interval (every k iterations). Default: 10.
ignore_last (bool, optional) – Ignore the log of last iterations in each epoch if less than interval. Default: True.
reset_flag (bool, optional) – Whether to clear the output buffer after logging. Default: False.
interval_exp_name (int, optional) – Logging interval for experiment name. This feature is to help users conveniently get the experiment information from screen or log file. Default: 1000.
out_dir (str, optional) – Logs are saved in runner.work_dir default. If out_dir is specified, logs will be copied to a new directory which is the concatenation of out_dir and the last level directory of runner.work_dir. Default: None. New in version 1.3.16.
out_suffix (str or tuple[str], optional) – Those filenames ending with out_suffix will be copied to out_dir. Default: (‘.log.json’, ‘.log’, ‘.py’). New in version 1.3.16.
keep_local (bool, optional) – Whether to keep local log when out_dir is specified. If False, the local log will be removed. Default: True. New in version 1.3.16.
file_client_args (dict, optional) – Arguments to instantiate a FileClient. See mmcv.fileio.FileClient for details. Default: None. New in version 1.3.16.

class mmcv.runner.WandbLoggerHook(init_kwargs: Optional[Dict] = None, interval: int = 10, ignore_last: bool = True, reset_flag: bool = False, commit: bool = True, by_epoch: bool = True, with_step: bool = True, log_artifact: bool = True, out_suffix: Union[str, tuple] = ('.log.json', '.log', '.py'), define_metric_cfg: Optional[Dict] = None)[source]¶

Class to log metrics with wandb.

It requires wandb to be installed.

Parameters

init_kwargs (dict) – A dict contains the initialization keys. Check https://docs.wandb.ai/ref/python/init for more init arguments.
interval (int) – Logging interval (every k iterations). Default 10.
ignore_last (bool) – Ignore the log of last iterations in each epoch if less than interval. Default: True.
reset_flag (bool) – Whether to clear the output buffer after logging. Default: False.
commit (bool) – Save the metrics dict to the wandb server and increment the step. If false wandb.log just updates the current metrics dict with the row argument and metrics won’t be saved until wandb.log is called with commit=True. Default: True.
by_epoch (bool) – Whether EpochBasedRunner is used. Default: True.
with_step (bool) – If True, the step will be logged from self.get_iters. Otherwise, step will not be logged. Default: True.
log_artifact (bool) – If True, artifacts in {work_dir} will be uploaded to wandb after training ends. Default: True New in version 1.4.3.
out_suffix (str or tuple[str], optional) – Those filenames ending with out_suffix will be uploaded to wandb. Default: (‘.log.json’, ‘.log’, ‘.py’). New in version 1.4.3.
define_metric_cfg (dict, optional) –
A dict of metrics and summaries for wandb.define_metric. The key is metric and the value is summary. The summary should be in [“min”, “max”, “mean” ,”best”, “last”,

”none”].

For example, if setting define_metric_cfg={'coco/bbox_mAP': 'max'}, the maximum value of coco/bbox_mAP will be logged on wandb UI. See wandb docs for details. Defaults to None. New in version 1.6.3.

mmcv.runner.allreduce_grads(params: List[torch.nn.parameter.Parameter], coalesce: bool = True, bucket_size_mb: int = - 1) → None[source]¶

Allreduce gradients.

Parameters

params (list[torch.nn.Parameter]) – List of parameters of a model.
coalesce (bool, optional) – Whether allreduce parameters as a whole. Defaults to True.
bucket_size_mb (int, optional) – Size of bucket, the unit is MB. Defaults to -1.

mmcv.runner.allreduce_params(params: List[torch.nn.parameter.Parameter], coalesce: bool = True, bucket_size_mb: int = - 1) → None[source]¶

Allreduce parameters.

Parameters

params (list[torch.nn.Parameter]) – List of parameters or buffers of a model.
coalesce (bool, optional) – Whether allreduce parameters as a whole. Defaults to True.
bucket_size_mb (int, optional) – Size of bucket, the unit is MB. Defaults to -1.

mmcv.runner.auto_fp16(apply_to: Optional[Iterable] = None, out_fp32: bool = False, supported_types: tuple = (<class 'torch.nn.modules.module.Module'>, )) → Callable[source]¶

Decorator to enable fp16 training automatically.

This decorator is useful when you write custom modules and want to support mixed precision training. If inputs arguments are fp32 tensors, they will be converted to fp16 automatically. Arguments other than fp32 tensors are ignored. If you are using PyTorch >= 1.6, torch.cuda.amp is used as the backend, otherwise, original mmcv implementation will be adopted.

Parameters

apply_to (Iterable, optional) – The argument names to be converted. None indicates all arguments.
out_fp32 (bool) – Whether to convert the output back to fp32.
supported_types (tuple) – Classes can be decorated by auto_fp16. New in version 1.5.0.

Example

>>> import torch.nn as nn
>>> class MyModule1(nn.Module):
>>>
>>>     # Convert x and y to fp16
>>>     @auto_fp16()
>>>     def forward(self, x, y):
>>>         pass

>>> import torch.nn as nn
>>> class MyModule2(nn.Module):
>>>
>>>     # convert pred to fp16
>>>     @auto_fp16(apply_to=('pred', ))
>>>     def do_something(self, pred, others):
>>>         pass

mmcv.runner.force_fp32(apply_to: Optional[Iterable] = None, out_fp16: bool = False) → Callable[source]¶

Decorator to convert input arguments to fp32 in force.

This decorator is useful when you write custom modules and want to support mixed precision training. If there are some inputs that must be processed in fp32 mode, then this decorator can handle it. If inputs arguments are fp16 tensors, they will be converted to fp32 automatically. Arguments other than fp16 tensors are ignored. If you are using PyTorch >= 1.6, torch.cuda.amp is used as the backend, otherwise, original mmcv implementation will be adopted.

Parameters

apply_to (Iterable, optional) – The argument names to be converted. None indicates all arguments.
out_fp16 (bool) – Whether to convert the output back to fp16.

Example

>>> import torch.nn as nn
>>> class MyModule1(nn.Module):
>>>
>>>     # Convert x and y to fp32
>>>     @force_fp32()
>>>     def loss(self, x, y):
>>>         pass

>>> import torch.nn as nn
>>> class MyModule2(nn.Module):
>>>
>>>     # convert pred to fp32
>>>     @force_fp32(apply_to=('pred', ))
>>>     def post_process(self, pred, others):
>>>         pass

mmcv.runner.get_host_info() → str[source]¶

Get hostname and username.

Return empty string if exception raised, e.g. getpass.getuser() will lead to error in docker container

mmcv.runner.get_priority(priority: Union[int, str, mmcv.runner.priority.Priority]) → int[source]¶

Get priority value.

Parameters: priority (int or str or Priority) – Priority.
Returns: The priority value.
Return type: int

mmcv.runner.load_checkpoint(model: torch.nn.modules.module.Module, filename: str, map_location: Optional[Union[str, Callable]] = None, strict: bool = False, logger: Optional[logging.Logger] = None, revise_keys: list = [('^module\\.', '')]) → Union[dict, collections.OrderedDict][source]¶

Load checkpoint from a file or URI.

Parameters

model (Module) – Module to load checkpoint.
filename (str) – Accept local filepath, URL, torchvision://xxx, open-mmlab://xxx. Please refer to docs/model_zoo.md for details.
map_location (str) – Same as torch.load().
strict (bool) – Whether to allow different params for the model and checkpoint.
logger (logging.Logger or None) – The logger for error message.
revise_keys (list) – A list of customized keywords to modify the state_dict in checkpoint. Each item is a (pattern, replacement) pair of the regular expression operations. Default: strip the prefix ‘module.’ by [(r’^module.’, ‘’)].

Returns

The loaded checkpoint.

Return type

dict or OrderedDict

mmcv.runner.load_state_dict(module: torch.nn.modules.module.Module, state_dict: Union[dict, collections.OrderedDict], strict: bool = False, logger: Optional[logging.Logger] = None) → None[source]¶

Load state_dict to a module.

This method is modified from torch.nn.Module.load_state_dict(). Default value for strict is set to False and the message for param mismatch will be shown even if strict is False.

Parameters

module (Module) – Module that receives the state_dict.
state_dict (dict or OrderedDict) – Weights.
strict (bool) – whether to strictly enforce that the keys in state_dict match the keys returned by this module’s state_dict() function. Default: False.
logger (logging.Logger, optional) – Logger to log the error message. If not specified, print function will be used.

mmcv.runner.obj_from_dict(info: dict, parent: Optional[module] = None, default_args: Optional[dict] = None)[source]¶

Initialize an object from dict.

The dict must contain the key “type”, which indicates the object type, it can be either a string or type, such as “list” or list. Remaining fields are treated as the arguments for constructing the object.

Parameters

info (dict) – Object types and arguments.
parent (module) – Module which may containing expected object classes.
default_args (dict, optional) – Default arguments for initializing the object.

Returns

Object built from the dict.

Return type

any type

mmcv.runner.save_checkpoint(model: torch.nn.modules.module.Module, filename: str, optimizer: Optional[torch.optim.optimizer.Optimizer] = None, meta: Optional[dict] = None, file_client_args: Optional[dict] = None) → None[source]¶

Save checkpoint to file.

The checkpoint will have 3 fields: meta, state_dict and optimizer. By default meta will contain version and time info.

Parameters

model (Module) – Module whose params are to be saved.
filename (str) – Checkpoint filename.
optimizer (Optimizer, optional) – Optimizer to be saved.
meta (dict, optional) – Metadata to be saved in checkpoint.
file_client_args (dict, optional) – Arguments to instantiate a FileClient. See mmcv.fileio.FileClient for details. Default: None. New in version 1.3.16.

mmcv.runner.set_random_seed(seed: int, deterministic: bool = False, use_rank_shift: bool = False) → None[source]¶

Set random seed.

Parameters

seed (int) – Seed to be used.
deterministic (bool) – Whether to set the deterministic option for CUDNN backend, i.e., set torch.backends.cudnn.deterministic to True and torch.backends.cudnn.benchmark to False. Default: False.
rank_shift (bool) – Whether to add rank number to the random seed to have different random seed in different threads. Default: False.

mmcv.runner.weights_to_cpu(state_dict: collections.OrderedDict) → collections.OrderedDict[source]¶

Copy a model state_dict to cpu.

Parameters: state_dict (OrderedDict) – Model weights on GPU.
Returns: Model weights on GPU.
Return type: OrderedDict

mmcv.runner.wrap_fp16_model(model: torch.nn.modules.module.Module) → None[source]¶

Wrap the FP32 model to FP16.

If you are using PyTorch >= 1.6, torch.cuda.amp is used as the backend, otherwise, original mmcv implementation will be adopted.

For PyTorch >= 1.6, this function will 1. Set fp16 flag inside the model to True.

Otherwise: 1. Convert FP32 model to FP16. 2. Remain some necessary layers to be FP32, e.g., normalization layers. 3. Set fp16_enabled flag inside the model to True.

Parameters: model (nn.Module) – Model in FP32.

engine¶

mmcv.engine.collect_results_cpu(result_part: list, size: int, tmpdir: Optional[str] = None) → Optional[list][source]¶

Collect results under cpu mode.

On cpu mode, this function will save the results on different gpus to tmpdir and collect them by the rank 0 worker.

Parameters

result_part (list) – Result list containing result parts to be collected.
size (int) – Size of the results, commonly equal to length of the results.
tmpdir (str | None) – temporal directory for collected results to store. If set to None, it will create a random temporal directory for it.

Returns

The collected results.

Return type

list

mmcv.engine.collect_results_gpu(result_part: list, size: int) → Optional[list][source]¶

Collect results under gpu mode.

On gpu mode, this function will encode results to gpu tensors and use gpu communication for results collection.

Parameters

result_part (list) – Result list containing result parts to be collected.
size (int) – Size of the results, commonly equal to length of the results.

Returns

The collected results.

Return type

list

mmcv.engine.multi_gpu_test(model: torch.nn.modules.module.Module, data_loader: torch.utils.data.dataloader.DataLoader, tmpdir: Optional[str] = None, gpu_collect: bool = False) → Optional[list][source]¶

Test model with multiple gpus.

This method tests model with multiple gpus and collects the results under two different modes: gpu and cpu modes. By setting gpu_collect=True, it encodes results to gpu tensors and use gpu communication for results collection. On cpu mode it saves the results on different gpus to tmpdir and collects them by the rank 0 worker.

Parameters

model (nn.Module) – Model to be tested.
data_loader (nn.Dataloader) – Pytorch data loader.
tmpdir (str) – Path of directory to save the temporary results from different gpus under cpu mode.
gpu_collect (bool) – Option to use either gpu or cpu to collect results.

Returns

The prediction results.

Return type

list

mmcv.engine.single_gpu_test(model: torch.nn.modules.module.Module, data_loader: torch.utils.data.dataloader.DataLoader) → list[source]¶

Test model with a single gpu.

This method tests model with a single gpu and displays test progress bar.

Parameters

model (nn.Module) – Model to be tested.
data_loader (nn.Dataloader) – Pytorch data loader.

Returns

The prediction results.

Return type

list

ops¶

class mmcv.ops.BorderAlign(pool_size: int)[source]¶

Border align pooling layer.

Applies border_align over the input feature based on predicted bboxes. The details were described in the paper BorderDet: Border Feature for Dense Object Detection.

For each border line (e.g. top, left, bottom or right) of each box, border_align does the following:

uniformly samples pool_size +1 positions on this line, involving the start and end points.
the corresponding features on these points are computed by bilinear interpolation.
max pooling over all the pool_size +1 positions are used for computing pooled feature.

Parameters: pool_size (int) – number of positions sampled over the boxes’ borders (e.g. top, bottom, left, right).

forward(input: torch.Tensor, boxes: torch.Tensor) → torch.Tensor[source]¶

Parameters

input – Features with shape [N,4C,H,W]. Channels ranged in [0,C), [C,2C), [2C,3C), [3C,4C) represent the top, left, bottom, right features respectively.
boxes – Boxes with shape [N,H*W,4]. Coordinate format (x1,y1,x2,y2).

Returns

Pooled features with shape [N,C,H*W,4]. The order is (top,left,bottom,right) for the last dimension.

Return type

torch.Tensor

class mmcv.ops.CARAFE(kernel_size: int, group_size: int, scale_factor: int)[source]¶

CARAFE: Content-Aware ReAssembly of FEatures

Please refer to CARAFE: Content-Aware ReAssembly of FEatures for more details.

Parameters

kernel_size (int) – reassemble kernel size
group_size (int) – reassemble group size
scale_factor (int) – upsample ratio

Returns

upsampled feature map

forward(features: torch.Tensor, masks: torch.Tensor) → torch.Tensor[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.ops.CARAFENaive(kernel_size: int, group_size: int, scale_factor: int)[source]¶

forward(features: torch.Tensor, masks: torch.Tensor) → torch.Tensor[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.ops.CARAFEPack(channels: int, scale_factor: int, up_kernel: int = 5, up_group: int = 1, encoder_kernel: int = 3, encoder_dilation: int = 1, compressed_channels: int = 64)[source]¶

A unified package of CARAFE upsampler that contains: 1) channel compressor 2) content encoder 3) CARAFE op.

Official implementation of ICCV 2019 paper CARAFE: Content-Aware ReAssembly of FEatures.

Parameters

channels (int) – input feature channels
scale_factor (int) – upsample ratio
up_kernel (int) – kernel size of CARAFE op
up_group (int) – group size of CARAFE op
encoder_kernel (int) – kernel size of content encoder
encoder_dilation (int) – dilation of content encoder
compressed_channels (int) – output channels of channels compressor

Returns

upsampled feature map

forward(x: torch.Tensor) → torch.Tensor[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

mmcv.ops.Conv2d¶: alias of mmcv.ops.deprecated_wrappers.Conv2d_deprecated

mmcv.ops.ConvTranspose2d¶: alias of mmcv.ops.deprecated_wrappers.ConvTranspose2d_deprecated

class mmcv.ops.CornerPool(mode: str)[source]¶

Corner Pooling.

Corner Pooling is a new type of pooling layer that helps a convolutional network better localize corners of bounding boxes.

Please refer to CornerNet: Detecting Objects as Paired Keypoints for more details.

Code is modified from https://github.com/princeton-vl/CornerNet-Lite.

Parameters

mode (str) –

Pooling orientation for the pooling layer

’bottom’: Bottom Pooling
’left’: Left Pooling
’right’: Right Pooling
’top’: Top Pooling

Returns

Feature map after pooling.

forward(x: torch.Tensor) → torch.Tensor[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.ops.Correlation(kernel_size: int = 1, max_displacement: int = 1, stride: int = 1, padding: int = 0, dilation: int = 1, dilation_patch: int = 1)[source]¶

Correlation operator.

This correlation operator works for optical flow correlation computation.

There are two batched tensors with shape \((N, C, H, W)\), and the correlation output’s shape is \((N, max\_displacement \times 2 + 1, max\_displacement * 2 + 1, H_{out}, W_{out})\)

where

\[H_{out} = \left\lfloor\frac{H_{in} + 2 \times padding - dilation \times (kernel\_size - 1) - 1} {stride} + 1\right\rfloor\]

\[W_{out} = \left\lfloor\frac{W_{in} + 2 \times padding - dilation \times (kernel\_size - 1) - 1} {stride} + 1\right\rfloor\]

the correlation item \((N_i, dy, dx)\) is formed by taking the sliding window convolution between input1 and shifted input2,

\[Corr(N_i, dx, dy) = \sum_{c=0}^{C-1} input1(N_i, c) \star \mathcal{S}(input2(N_i, c), dy, dx)\]

where \(\star\) is the valid 2d sliding window convolution operator, and \(\mathcal{S}\) means shifting the input features (auto-complete zero marginal), and \(dx, dy\) are shifting distance, \(dx, dy \in [-max\_displacement \times dilation\_patch, max\_displacement \times dilation\_patch]\).

Parameters

kernel_size (int) – The size of sliding window i.e. local neighborhood representing the center points and involved in correlation computation. Defaults to 1.
max_displacement (int) – The radius for computing correlation volume, but the actual working space can be dilated by dilation_patch. Defaults to 1.
stride (int) – The stride of the sliding blocks in the input spatial dimensions. Defaults to 1.
padding (int) – Zero padding added to all four sides of the input1. Defaults to 0.
dilation (int) – The spacing of local neighborhood that will involved in correlation. Defaults to 1.
dilation_patch (int) – The spacing between position need to compute correlation. Defaults to 1.

forward(input1: torch.Tensor, input2: torch.Tensor) → torch.Tensor[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.ops.CrissCrossAttention(in_channels: int)[source]¶

Criss-Cross Attention Module.

Note

Before v1.3.13, we use a CUDA op. Since v1.3.13, we switch to a pure PyTorch and equivalent implementation. For more details, please refer to https://github.com/open-mmlab/mmcv/pull/1201.

Speed comparison for one forward pass

Input size: [2,512,97,97]
Device: 1 NVIDIA GeForce RTX 2080 Ti

	PyTorch version	CUDA version	Relative speed
with torch.no_grad()	0.00554402 s	0.0299619 s	5.4x
no with torch.no_grad()	0.00562803 s	0.0301349 s	5.4x

Parameters: in_channels (int) – Channels of the input feature map.

forward(x: torch.Tensor) → torch.Tensor[source]¶

forward function of Criss-Cross Attention.

Parameters: x (torch.Tensor) – Input feature with the shape of (batch_size, in_channels, height, width).
Returns: Output of the layer, with the shape of (batch_size, in_channels, height, width)
Return type: torch.Tensor

class mmcv.ops.DeformConv2d(in_channels: int, out_channels: int, kernel_size: Union[int, Tuple[int, ...]], stride: Union[int, Tuple[int, ...]] = 1, padding: Union[int, Tuple[int, ...]] = 0, dilation: Union[int, Tuple[int, ...]] = 1, groups: int = 1, deform_groups: int = 1, bias: bool = False, im2col_step: int = 32)[source]¶

Deformable 2D convolution.

Applies a deformable 2D convolution over an input signal composed of several input planes. DeformConv2d was described in the paper Deformable Convolutional Networks

Note

The argument im2col_step was added in version 1.3.17, which means number of samples processed by the im2col_cuda_kernel per call. It enables users to define batch_size and im2col_step more flexibly and solved issue mmcv#1440.

Parameters

in_channels (int) – Number of channels in the input image.
out_channels (int) – Number of channels produced by the convolution.
kernel_size (int, tuple) – Size of the convolving kernel.
stride (int, tuple) – Stride of the convolution. Default: 1.
padding (int or tuple) – Zero-padding added to both sides of the input. Default: 0.
dilation (int or tuple) – Spacing between kernel elements. Default: 1.
groups (int) – Number of blocked connections from input. channels to output channels. Default: 1.
deform_groups (int) – Number of deformable group partitions.
bias (bool) – If True, adds a learnable bias to the output. Default: False.
im2col_step (int) – Number of samples processed by im2col_cuda_kernel per call. It will work when batch_size > im2col_step, but batch_size must be divisible by im2col_step. Default: 32. New in version 1.3.17.

forward(x: torch.Tensor, offset: torch.Tensor) → torch.Tensor[source]¶

Deformable Convolutional forward function.

Parameters

x (Tensor) – Input feature, shape (B, C_in, H_in, W_in)
offset (Tensor) –
Offset for deformable convolution, shape (B, deform_groups*kernel_size[0]*kernel_size[1]*2, H_out, W_out), H_out, W_out are equal to the output’s.

An offset is like [y0, x0, y1, x1, y2, x2, …, y8, x8]. The spatial arrangement is like:
```
(x0, y0) (x1, y1) (x2, y2)
(x3, y3) (x4, y4) (x5, y5)
(x6, y6) (x7, y7) (x8, y8)
```

Returns

Output of the layer.

Return type

Tensor

class mmcv.ops.DeformConv2dPack(*args, **kwargs)[source]¶

A Deformable Conv Encapsulation that acts as normal Conv layers.

The offset tensor is like [y0, x0, y1, x1, y2, x2, …, y8, x8]. The spatial arrangement is like:

(x0, y0) (x1, y1) (x2, y2)
(x3, y3) (x4, y4) (x5, y5)
(x6, y6) (x7, y7) (x8, y8)

Parameters

in_channels (int) – Same as nn.Conv2d.
out_channels (int) – Same as nn.Conv2d.
kernel_size (int or tuple[int]) – Same as nn.Conv2d.
stride (int or tuple[int]) – Same as nn.Conv2d.
padding (int or tuple[int]) – Same as nn.Conv2d.
dilation (int or tuple[int]) – Same as nn.Conv2d.
groups (int) – Same as nn.Conv2d.
bias (bool or str) – If specified as auto, it will be decided by the norm_cfg. Bias will be set as True if norm_cfg is None, otherwise False.

forward(x: torch.Tensor) → torch.Tensor[source]¶

Deformable Convolutional forward function.

Parameters

x (Tensor) – Input feature, shape (B, C_in, H_in, W_in)
offset (Tensor) –
Offset for deformable convolution, shape (B, deform_groups*kernel_size[0]*kernel_size[1]*2, H_out, W_out), H_out, W_out are equal to the output’s.

An offset is like [y0, x0, y1, x1, y2, x2, …, y8, x8]. The spatial arrangement is like:
```
(x0, y0) (x1, y1) (x2, y2)
(x3, y3) (x4, y4) (x5, y5)
(x6, y6) (x7, y7) (x8, y8)
```

Returns

Output of the layer.

Return type

Tensor

class mmcv.ops.DeformRoIPool(output_size: Tuple[int, ...], spatial_scale: float = 1.0, sampling_ratio: int = 0, gamma: float = 0.1)[source]¶

forward(input: torch.Tensor, rois: torch.Tensor, offset: Optional[torch.Tensor] = None) → torch.Tensor[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.ops.DeformRoIPoolPack(output_size: Tuple[int, ...], output_channels: int, deform_fc_channels: int = 1024, spatial_scale: float = 1.0, sampling_ratio: int = 0, gamma: float = 0.1)[source]¶

forward(input: torch.Tensor, rois: torch.Tensor) → torch.Tensor[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.ops.DynamicScatter(voxel_size: List, point_cloud_range: List, average_points: bool)[source]¶

Scatters points into voxels, used in the voxel encoder with dynamic voxelization.

Note

The CPU and GPU implementation get the same output, but have numerical difference after summation and division (e.g., 5e-7).

Parameters

voxel_size (list) – list [x, y, z] size of three dimension.
point_cloud_range (list) – The coordinate range of points, [x_min, y_min, z_min, x_max, y_max, z_max].
average_points (bool) – whether to use avg pooling to scatter points into voxel.

forward(points: torch.Tensor, coors: torch.Tensor) → Tuple[torch.Tensor, torch.Tensor][source]¶

Scatters points/features into voxels.

Parameters

points (torch.Tensor) – Points to be reduced into voxels.
coors (torch.Tensor) – Corresponding voxel coordinates (specifically multi-dim voxel index) of each points.

Returns

A tuple contains two elements. The first one is the voxel features with shape [M, C] which are respectively reduced from input features that share the same voxel coordinates. The second is voxel coordinates with shape [M, ndim].

Return type

tuple[torch.Tensor]

forward_single(points: torch.Tensor, coors: torch.Tensor) → Tuple[torch.Tensor, torch.Tensor][source]¶

Scatters points into voxels.

Parameters

points (torch.Tensor) – Points to be reduced into voxels.
coors (torch.Tensor) – Corresponding voxel coordinates (specifically multi-dim voxel index) of each points.

Returns

A tuple contains two elements. The first one is the voxel features with shape [M, C] which are respectively reduced from input features that share the same voxel coordinates. The second is voxel coordinates with shape [M, ndim].

Return type

tuple[torch.Tensor]

class mmcv.ops.FusedBiasLeakyReLU(num_channels: int, negative_slope: float = 0.2, scale: float = 1.4142135623730951)[source]¶

Fused bias leaky ReLU.

This function is introduced in the StyleGAN2: Analyzing and Improving the Image Quality of StyleGAN

The bias term comes from the convolution operation. In addition, to keep the variance of the feature map or gradients unchanged, they also adopt a scale similarly with Kaiming initialization. However, since the \(1+{alpha}^2\) is too small, we can just ignore it. Therefore, the final scale is just \(\sqrt{2}\). Of course, you may change it with your own scale.

TODO: Implement the CPU version.

Parameters

num_channels (int) – The channel number of the feature map.
negative_slope (float, optional) – Same as nn.LeakyRelu. Defaults to 0.2.
scale (float, optional) – A scalar to adjust the variance of the feature map. Defaults to 2**0.5.

forward(input: torch.Tensor) → torch.Tensor[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.ops.GroupAll(use_xyz: bool = True)[source]¶

Group xyz with feature.

Parameters: use_xyz (bool) – Whether to use xyz.

forward(xyz: torch.Tensor, new_xyz: torch.Tensor, features: Optional[torch.Tensor] = None) → torch.Tensor[source]¶

Parameters

xyz (Tensor) – (B, N, 3) xyz coordinates of the features.
new_xyz (Tensor) – new xyz coordinates of the features.
features (Tensor) – (B, C, N) features to group.

Returns

(B, C + 3, 1, N) Grouped feature.

Return type

Tensor

mmcv.ops.Linear¶: alias of mmcv.ops.deprecated_wrappers.Linear_deprecated

class mmcv.ops.MaskedConv2d(in_channels: int, out_channels: int, kernel_size: Union[int, Tuple[int, ...]], stride: int = 1, padding: int = 0, dilation: int = 1, groups: int = 1, bias: bool = True)[source]¶

A MaskedConv2d which inherits the official Conv2d.

The masked forward doesn’t implement the backward function and only supports the stride parameter to be 1 currently.

forward(input: torch.Tensor, mask: Optional[torch.Tensor] = None) → torch.Tensor[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

mmcv.ops.MaxPool2d¶: alias of mmcv.ops.deprecated_wrappers.MaxPool2d_deprecated

class mmcv.ops.ModulatedDeformConv2d(in_channels: int, out_channels: int, kernel_size: Union[int, Tuple[int]], stride: int = 1, padding: int = 0, dilation: int = 1, groups: int = 1, deform_groups: int = 1, bias: Union[bool, str] = True)[source]¶

forward(x: torch.Tensor, offset: torch.Tensor, mask: torch.Tensor) → torch.Tensor[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.ops.ModulatedDeformConv2dPack(*args, **kwargs)[source]¶

A ModulatedDeformable Conv Encapsulation that acts as normal Conv layers.

Parameters

in_channels (int) – Same as nn.Conv2d.
out_channels (int) – Same as nn.Conv2d.
kernel_size (int or tuple[int]) – Same as nn.Conv2d.
stride (int) – Same as nn.Conv2d, while tuple is not supported.
padding (int) – Same as nn.Conv2d, while tuple is not supported.
dilation (int) – Same as nn.Conv2d, while tuple is not supported.
groups (int) – Same as nn.Conv2d.
bias (bool or str) – If specified as auto, it will be decided by the norm_cfg. Bias will be set as True if norm_cfg is None, otherwise False.

forward(x: torch.Tensor) → torch.Tensor[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.ops.ModulatedDeformRoIPoolPack(output_size: Tuple[int, ...], output_channels: int, deform_fc_channels: int = 1024, spatial_scale: float = 1.0, sampling_ratio: int = 0, gamma: float = 0.1)[source]¶

forward(input: torch.Tensor, rois: torch.Tensor) → torch.Tensor[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.ops.MultiScaleDeformableAttention(embed_dims: int = 256, num_heads: int = 8, num_levels: int = 4, num_points: int = 4, im2col_step: int = 64, dropout: float = 0.1, batch_first: bool = False, norm_cfg: Optional[dict] = None, init_cfg: Optional[mmcv.utils.config.ConfigDict] = None)[source]¶

An attention module used in Deformable-Detr.

Deformable DETR: Deformable Transformers for End-to-End Object Detection..

Parameters

embed_dims (int) – The embedding dimension of Attention. Default: 256.
num_heads (int) – Parallel attention heads. Default: 8.
num_levels (int) – The number of feature map used in Attention. Default: 4.
num_points (int) – The number of sampling points for each query in each head. Default: 4.
im2col_step (int) – The step used in image_to_column. Default: 64.
dropout (float) – A Dropout layer on inp_identity. Default: 0.1.
batch_first (bool) – Key, Query and Value are shape of (batch, n, embed_dim) or (n, batch, embed_dim). Default to False.
norm_cfg (dict) – Config dict for normalization layer. Default: None.
(obj (init_cfg) – mmcv.ConfigDict): The Config for initialization. Default: None.

forward(query: torch.Tensor, key: Optional[torch.Tensor] = None, value: Optional[torch.Tensor] = None, identity: Optional[torch.Tensor] = None, query_pos: Optional[torch.Tensor] = None, key_padding_mask: Optional[torch.Tensor] = None, reference_points: Optional[torch.Tensor] = None, spatial_shapes: Optional[torch.Tensor] = None, level_start_index: Optional[torch.Tensor] = None, **kwargs) → torch.Tensor[source]¶

Forward Function of MultiScaleDeformAttention.

Parameters

query (torch.Tensor) – Query of Transformer with shape (num_query, bs, embed_dims).
key (torch.Tensor) – The key tensor with shape (num_key, bs, embed_dims).
value (torch.Tensor) – The value tensor with shape (num_key, bs, embed_dims).
identity (torch.Tensor) – The tensor used for addition, with the same shape as query. Default None. If None, query will be used.
query_pos (torch.Tensor) – The positional encoding for query. Default: None.
key_padding_mask (torch.Tensor) – ByteTensor for query, with shape [bs, num_key].
reference_points (torch.Tensor) – The normalized reference points with shape (bs, num_query, num_levels, 2), all elements is range in [0, 1], top-left (0,0), bottom-right (1, 1), including padding area. or (N, Length_{query}, num_levels, 4), add additional two dimensions is (w, h) to form reference boxes.
spatial_shapes (torch.Tensor) – Spatial shape of features in different levels. With shape (num_levels, 2), last dimension represents (h, w).
level_start_index (torch.Tensor) – The start index of each level. A tensor has shape (num_levels, ) and can be represented as [0, h_0*w_0, h_0*w_0+h_1*w_1, …].

Returns

forwarded results with shape [num_query, bs, embed_dims].

Return type

torch.Tensor

init_weights() → None[source]¶: Default initialization for Parameters of Module.

class mmcv.ops.PSAMask(psa_type: str, mask_size: Optional[tuple] = None)[source]¶

forward(input: torch.Tensor) → torch.Tensor[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.ops.PointsSampler(num_point: List[int], fps_mod_list: List[str] = ['D-FPS'], fps_sample_range_list: List[int] = [- 1])[source]¶

Points sampling.

Parameters

num_point (list[int]) – Number of sample points.
fps_mod_list (list[str], optional) – Type of FPS method, valid mod [‘F-FPS’, ‘D-FPS’, ‘FS’], Default: [‘D-FPS’]. F-FPS: using feature distances for FPS. D-FPS: using Euclidean distances of points for FPS. FS: using F-FPS and D-FPS simultaneously.
fps_sample_range_list (list[int], optional) – Range of points to apply FPS. Default: [-1].

forward(points_xyz: torch.Tensor, features: torch.Tensor) → torch.Tensor[source]¶

Parameters

points_xyz (torch.Tensor) – (B, N, 3) xyz coordinates of the points.
features (torch.Tensor) – (B, C, N) features of the points.

Returns

(B, npoint, sample_num) Indices of sampled points.

Return type

torch.Tensor

class mmcv.ops.PrRoIPool(output_size: Union[int, tuple], spatial_scale: float = 1.0)[source]¶

The operation of precision RoI pooling. The implementation of PrRoIPool is modified from https://github.com/vacancy/PreciseRoIPooling/

Precise RoI Pooling (PrRoIPool) is an integration-based (bilinear interpolation) average pooling method for RoI Pooling. It avoids any quantization and has a continuous gradient on bounding box coordinates. It is:

1. different from the original RoI Pooling proposed in Fast R-CNN. PrRoI Pooling uses average pooling instead of max pooling for each bin and has a continuous gradient on bounding box coordinates. That is, one can take the derivatives of some loss function w.r.t the coordinates of each RoI and optimize the RoI coordinates. 2. different from the RoI Align proposed in Mask R-CNN. PrRoI Pooling uses a full integration-based average pooling instead of sampling a constant number of points. This makes the gradient w.r.t. the coordinates continuous.

Parameters

output_size (Union[int, tuple]) – h, w.
spatial_scale (float, optional) – scale the input boxes by this number. Defaults to 1.0.

forward(features: torch.Tensor, rois: torch.Tensor) → torch.Tensor[source]¶

Forward function.

Parameters

features (torch.Tensor) – The feature map.
rois (torch.Tensor) – The RoI bboxes in [tl_x, tl_y, br_x, br_y] format.

Returns

The pooled results.

Return type

torch.Tensor

class mmcv.ops.QueryAndGroup(max_radius: float, sample_num: int, min_radius: float = 0.0, use_xyz: bool = True, return_grouped_xyz: bool = False, normalize_xyz: bool = False, uniform_sample: bool = False, return_unique_cnt: bool = False, return_grouped_idx: bool = False)[source]¶

Groups points with a ball query of radius.

Parameters

max_radius (float) – The maximum radius of the balls. If None is given, we will use kNN sampling instead of ball query.
sample_num (int) – Maximum number of features to gather in the ball.
min_radius (float, optional) – The minimum radius of the balls. Default: 0.
use_xyz (bool, optional) – Whether to use xyz. Default: True.
return_grouped_xyz (bool, optional) – Whether to return grouped xyz. Default: False.
normalize_xyz (bool, optional) – Whether to normalize xyz. Default: False.
uniform_sample (bool, optional) – Whether to sample uniformly. Default: False
return_unique_cnt (bool, optional) – Whether to return the count of unique samples. Default: False.
return_grouped_idx (bool, optional) – Whether to return grouped idx. Default: False.

forward(points_xyz: torch.Tensor, center_xyz: torch.Tensor, features: Optional[torch.Tensor] = None) → Union[torch.Tensor, Tuple][source]¶

Parameters

points_xyz (torch.Tensor) – (B, N, 3) xyz coordinates of the points.
center_xyz (torch.Tensor) – (B, npoint, 3) coordinates of the centriods.
features (torch.Tensor) – (B, C, N) The features of grouped points.

Returns

(B, 3 + C, npoint, sample_num) Grouped concatenated coordinates and features of points.

Return type

Tuple | torch.Tensor

class mmcv.ops.RiRoIAlignRotated(out_size: tuple, spatial_scale: float, num_samples: int = 0, num_orientations: int = 8, clockwise: bool = False)[source]¶

Rotation-invariant RoI align pooling layer for rotated proposals.

It accepts a feature map of shape (N, C, H, W) and rois with shape (n, 6) with each roi decoded as (batch_index, center_x, center_y, w, h, angle). The angle is in radian.

The details are described in the paper ReDet: A Rotation-equivariant Detector for Aerial Object Detection.

Parameters

out_size (tuple) – fixed dimensional RoI output with shape (h, w).
spatial_scale (float) – scale the input boxes by this number
num_samples (int) – number of inputs samples to take for each output sample. 0 to take samples densely for current models.
num_orientations (int) – number of oriented channels.
clockwise (bool) – If True, the angle in each proposal follows a clockwise fashion in image space, otherwise, the angle is counterclockwise. Default: False.

forward(features: torch.Tensor, rois: torch.Tensor) → torch.Tensor[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.ops.RoIAlign(output_size: tuple, spatial_scale: float = 1.0, sampling_ratio: int = 0, pool_mode: str = 'avg', aligned: bool = True, use_torchvision: bool = False)[source]¶

RoI align pooling layer.

Parameters

output_size (tuple) – h, w
spatial_scale (float) – scale the input boxes by this number
sampling_ratio (int) – number of inputs samples to take for each output sample. 0 to take samples densely for current models.
pool_mode (str, 'avg' or 'max') – pooling mode in each bin.
aligned (bool) – if False, use the legacy implementation in MMDetection. If True, align the results more perfectly.
use_torchvision (bool) – whether to use roi_align from torchvision.

Note

The implementation of RoIAlign when aligned=True is modified from https://github.com/facebookresearch/detectron2/

The meaning of aligned=True:

Given a continuous coordinate c, its two neighboring pixel indices (in our pixel model) are computed by floor(c - 0.5) and ceil(c - 0.5). For example, c=1.3 has pixel neighbors with discrete indices [0] and [1] (which are sampled from the underlying signal at continuous coordinates 0.5 and 1.5). But the original roi_align (aligned=False) does not subtract the 0.5 when computing neighboring pixel indices and therefore it uses pixels with a slightly incorrect alignment (relative to our pixel model) when performing bilinear interpolation.

With aligned=True, we first appropriately scale the ROI and then shift it by -0.5 prior to calling roi_align. This produces the correct neighbors;

The difference does not make a difference to the model’s performance if ROIAlign is used together with conv layers.

forward(input: torch.Tensor, rois: torch.Tensor) → torch.Tensor[source]¶

Parameters

input – NCHW images
rois – Bx5 boxes. First column is the index into N. The other 4 columns are xyxy.

class mmcv.ops.RoIAlignRotated(output_size: Union[int, tuple], spatial_scale: float, sampling_ratio: int = 0, aligned: bool = True, clockwise: bool = False)[source]¶

RoI align pooling layer for rotated proposals.

It accepts a feature map of shape (N, C, H, W) and rois with shape (n, 6) with each roi decoded as (batch_index, center_x, center_y, w, h, angle). The angle is in radian.

Parameters

output_size (tuple) – h, w
spatial_scale (float) – scale the input boxes by this number
sampling_ratio (int) – number of inputs samples to take for each output sample. 0 to take samples densely for current models.
aligned (bool) – if False, use the legacy implementation in MMDetection. If True, align the results more perfectly. Default: True.
clockwise (bool) – If True, the angle in each proposal follows a clockwise fashion in image space, otherwise, the angle is counterclockwise. Default: False.

Note

The implementation of RoIAlign when aligned=True is modified from https://github.com/facebookresearch/detectron2/

The meaning of aligned=True:

Given a continuous coordinate c, its two neighboring pixel indices (in our pixel model) are computed by floor(c - 0.5) and ceil(c - 0.5). For example, c=1.3 has pixel neighbors with discrete indices [0] and [1] (which are sampled from the underlying signal at continuous coordinates 0.5 and 1.5). But the original roi_align (aligned=False) does not subtract the 0.5 when computing neighboring pixel indices and therefore it uses pixels with a slightly incorrect alignment (relative to our pixel model) when performing bilinear interpolation.

With aligned=True, we first appropriately scale the ROI and then shift it by -0.5 prior to calling roi_align. This produces the correct neighbors;

The difference does not make a difference to the model’s performance if ROIAlign is used together with conv layers.

forward(input: torch.Tensor, rois: torch.Tensor) → torch.Tensor[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.ops.RoIAwarePool3d(out_size: Union[int, tuple], max_pts_per_voxel: int = 128, mode: str = 'max')[source]¶

Encode the geometry-specific features of each 3D proposal.

Please refer to PartA2 for more details.

Parameters

out_size (int or tuple) – The size of output features. n or [n1, n2, n3].
max_pts_per_voxel (int, optional) – The maximum number of points per voxel. Default: 128.
mode (str, optional) – Pooling method of RoIAware, ‘max’ or ‘avg’. Default: ‘max’.

forward(rois: torch.Tensor, pts: torch.Tensor, pts_feature: torch.Tensor) → torch.Tensor[source]¶

Parameters

rois (torch.Tensor) – [N, 7], in LiDAR coordinate, (x, y, z) is the bottom center of rois.
pts (torch.Tensor) – [npoints, 3], coordinates of input points.
pts_feature (torch.Tensor) – [npoints, C], features of input points.

Returns

Pooled features whose shape is [N, out_x, out_y, out_z, C].

Return type

torch.Tensor

class mmcv.ops.RoIPointPool3d(num_sampled_points: int = 512)[source]¶

Encode the geometry-specific features of each 3D proposal.

Please refer to Paper of PartA2 for more details.

Parameters: num_sampled_points (int, optional) – Number of samples in each roi. Default: 512.

forward(points: torch.Tensor, point_features: torch.Tensor, boxes3d: torch.Tensor) → Tuple[torch.Tensor][source]¶

Parameters

points (torch.Tensor) – Input points whose shape is (B, N, C).
point_features (torch.Tensor) – Features of input points whose shape is (B, N, C).
boxes3d (B, M, 7), Input bounding boxes whose shape is (B, M, 7) –

Returns

A tuple contains two elements. The first one is the pooled features whose shape is (B, M, 512, 3 + C). The second is an empty flag whose shape is (B, M).

Return type

tuple[torch.Tensor]

class mmcv.ops.RoIPool(output_size: Union[int, tuple], spatial_scale: float = 1.0)[source]¶

forward(input: torch.Tensor, rois: torch.Tensor) → torch.Tensor[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.ops.SAConv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, use_deform=False)[source]¶

SAC (Switchable Atrous Convolution)

This is an implementation of DetectoRS: Detecting Objects with Recursive Feature Pyramid and Switchable Atrous Convolution.

Parameters

in_channels (int) – Number of channels in the input image
out_channels (int) – Number of channels produced by the convolution
kernel_size (int or tuple) – Size of the convolving kernel
stride (int or tuple, optional) – Stride of the convolution. Default: 1
padding (int or tuple, optional) – Zero-padding added to both sides of the input. Default: 0
padding_mode (string, optional) – 'zeros', 'reflect', 'replicate' or 'circular'. Default: 'zeros'
dilation (int or tuple, optional) – Spacing between kernel elements. Default: 1
groups (int, optional) – Number of blocked connections from input channels to output channels. Default: 1
bias (bool, optional) – If True, adds a learnable bias to the output. Default: True
use_deform – If True, replace convolution with deformable convolution. Default: False.

forward(x)[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.ops.SigmoidFocalLoss(gamma: float, alpha: float, weight: Optional[torch.Tensor] = None, reduction: str = 'mean')[source]¶

forward(input: torch.Tensor, target: Union[torch.LongTensor, torch.cuda.LongTensor]) → torch.Tensor[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.ops.SimpleRoIAlign(output_size: Tuple[int], spatial_scale: float, aligned: bool = True)[source]¶

forward(features: torch.Tensor, rois: torch.Tensor) → torch.Tensor[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.ops.SoftmaxFocalLoss(gamma: float, alpha: float, weight: Optional[torch.Tensor] = None, reduction: str = 'mean')[source]¶

forward(input: torch.Tensor, target: Union[torch.LongTensor, torch.cuda.LongTensor]) → torch.Tensor[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.ops.SparseConv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, indice_key=None)[source]¶

class mmcv.ops.SparseConv3d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, indice_key=None)[source]¶

class mmcv.ops.SparseConvTranspose2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, indice_key=None)[source]¶

class mmcv.ops.SparseConvTranspose3d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, indice_key=None)[source]¶

class mmcv.ops.SparseInverseConv2d(in_channels, out_channels, kernel_size, indice_key=None, bias=True)[source]¶

class mmcv.ops.SparseInverseConv3d(in_channels, out_channels, kernel_size, indice_key=None, bias=True)[source]¶

class mmcv.ops.SparseMaxPool2d(kernel_size, stride=1, padding=0, dilation=1)[source]¶

class mmcv.ops.SparseMaxPool3d(kernel_size, stride=1, padding=0, dilation=1)[source]¶

class mmcv.ops.SparseModule[source]¶: place holder, All module subclass from this will take sptensor in SparseSequential.

class mmcv.ops.SparseSequential(*args, **kwargs)[source]¶

A sequential container. Modules will be added to it in the order they are passed in the constructor. Alternatively, an ordered dict of modules can also be passed in.

To make it easier to understand, given is a small example:

.. rubric:: Example

>>> # using Sequential:
>>> from mmcv.ops import SparseSequential
>>> model = SparseSequential(
            SparseConv2d(1,20,5),
            nn.ReLU(),
            SparseConv2d(20,64,5),
            nn.ReLU()
            )

>>> # using Sequential with OrderedDict
>>> model = SparseSequential(OrderedDict([
              ('conv1', SparseConv2d(1,20,5)),
              ('relu1', nn.ReLU()),
              ('conv2', SparseConv2d(20,64,5)),
              ('relu2', nn.ReLU())
            ]))

>>> # using Sequential with kwargs(python 3.6+)
>>> model = SparseSequential(
              conv1=SparseConv2d(1,20,5),
              relu1=nn.ReLU(),
              conv2=SparseConv2d(20,64,5),
              relu2=nn.ReLU()
            )

forward(input: torch.Tensor) → torch.Tensor[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.ops.SubMConv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, indice_key=None)[source]¶

class mmcv.ops.SubMConv3d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, indice_key=None)[source]¶

class mmcv.ops.SyncBatchNorm(num_features: int, eps: float = 1e-05, momentum: float = 0.1, affine: bool = True, track_running_stats: bool = True, group: Optional[int] = None, stats_mode: str = 'default')[source]¶

Synchronized Batch Normalization.

Parameters

num_features (int) – number of features/chennels in input tensor
eps (float, optional) – a value added to the denominator for numerical stability. Defaults to 1e-5.
momentum (float, optional) – the value used for the running_mean and running_var computation. Defaults to 0.1.
affine (bool, optional) – whether to use learnable affine parameters. Defaults to True.
track_running_stats (bool, optional) – whether to track the running mean and variance during training. When set to False, this module does not track such statistics, and initializes statistics buffers running_mean and running_var as None. When these buffers are None, this module always uses batch statistics in both training and eval modes. Defaults to True.
group (int, optional) – synchronization of stats happen within each process group individually. By default it is synchronization across the whole world. Defaults to None.
stats_mode (str, optional) – The statistical mode. Available options includes 'default' and 'N'. Defaults to ‘default’. When stats_mode=='default', it computes the overall statistics using those from each worker with equal weight, i.e., the statistics are synchronized and simply divied by group. This mode will produce inaccurate statistics when empty tensors occur. When stats_mode=='N', it compute the overall statistics using the total number of batches in each worker ignoring the number of group, i.e., the statistics are synchronized and then divied by the total batch N. This mode is beneficial when empty tensors occur during training, as it average the total mean by the real number of batch.

forward(input: torch.Tensor) → torch.Tensor[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.ops.TINShift[source]¶

Temporal Interlace Shift.

Temporal Interlace shift is a differentiable temporal-wise frame shifting which is proposed in “Temporal Interlacing Network”

Please refer to Temporal Interlacing Network for more details.

Code is modified from https://github.com/mit-han-lab/temporal-shift-module

forward(input, shift)[source]¶

Perform temporal interlace shift.

Parameters

input (torch.Tensor) – Feature map with shape [N, num_segments, C, H * W].
shift (torch.Tensor) – Shift tensor with shape [N, num_segments].

Returns

Feature map after temporal interlace shift.

class mmcv.ops.Voxelization(voxel_size: List, point_cloud_range: List, max_num_points: int, max_voxels: Union[tuple, int] = 20000, deterministic: bool = True)[source]¶

Convert kitti points(N, >=3) to voxels.

Please refer to Point-Voxel CNN for Efficient 3D Deep Learning for more details.

Parameters

voxel_size (tuple or float) – The size of voxel with the shape of [3].
point_cloud_range (tuple or float) – The coordinate range of voxel with the shape of [6].
max_num_points (int) – maximum points contained in a voxel. if max_points=-1, it means using dynamic_voxelize.
max_voxels (int, optional) – maximum voxels this function create. for second, 20000 is a good choice. Users should shuffle points before call this function because max_voxels may drop points. Default: 20000.

forward(input: torch.Tensor) → torch.Tensor[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

mmcv.ops.batched_nms(boxes: torch.Tensor, scores: torch.Tensor, idxs: torch.Tensor, nms_cfg: Optional[Dict], class_agnostic: bool = False) → Tuple[torch.Tensor, torch.Tensor][source]¶

Performs non-maximum suppression in a batched fashion.

Modified from torchvision/ops/boxes.py#L39. In order to perform NMS independently per class, we add an offset to all the boxes. The offset is dependent only on the class idx, and is large enough so that boxes from different classes do not overlap.

Note

In v1.4.1 and later, batched_nms supports skipping the NMS and returns sorted raw results when nms_cfg is None.

Parameters

boxes (torch.Tensor) – boxes in shape (N, 4) or (N, 5).
scores (torch.Tensor) – scores in shape (N, ).
idxs (torch.Tensor) – each index value correspond to a bbox cluster, and NMS will not be applied between elements of different idxs, shape (N, ).
nms_cfg (dict | optional) –
Supports skipping the nms when nms_cfg is None, otherwise it should specify nms type and other parameters like iou_thr. Possible keys includes the following.
- iou_threshold (float): IoU threshold used for NMS.
- split_thr (float): threshold number of boxes. In some cases the number of boxes is large (e.g., 200k). To avoid OOM during training, the users could set split_thr to a small value. If the number of boxes is greater than the threshold, it will perform NMS on each group of boxes separately and sequentially. Defaults to 10000.
class_agnostic (bool) – if true, nms is class agnostic, i.e. IoU thresholding happens over all boxes, regardless of the predicted class. Defaults to False.

Returns

kept dets and indice.

boxes (Tensor): Bboxes with score after nms, has shape (num_bboxes, 5). last dimension 5 arrange as (x1, y1, x2, y2, score)
keep (Tensor): The indices of remaining boxes in input boxes.

Return type

tuple

mmcv.ops.bbox_overlaps(bboxes1: torch.Tensor, bboxes2: torch.Tensor, mode: str = 'iou', aligned: bool = False, offset: int = 0) → torch.Tensor[source]¶

Calculate overlap between two set of bboxes.

If aligned is False, then calculate the ious between each bbox of bboxes1 and bboxes2, otherwise the ious between each aligned pair of bboxes1 and bboxes2.

Parameters

bboxes1 (torch.Tensor) – shape (m, 4) in <x1, y1, x2, y2> format or empty.
bboxes2 (torch.Tensor) – shape (n, 4) in <x1, y1, x2, y2> format or empty. If aligned is True, then m and n must be equal.
mode (str) – “iou” (intersection over union) or iof (intersection over foreground).

Returns

Return the ious betweens boxes. If aligned is False, the shape of ious is (m, n) else (m, 1).

Return type

torch.Tensor

Example

>>> bboxes1 = torch.FloatTensor([
>>>     [0, 0, 10, 10],
>>>     [10, 10, 20, 20],
>>>     [32, 32, 38, 42],
>>> ])
>>> bboxes2 = torch.FloatTensor([
>>>     [0, 0, 10, 20],
>>>     [0, 10, 10, 19],
>>>     [10, 10, 20, 20],
>>> ])
>>> bbox_overlaps(bboxes1, bboxes2)
tensor([[0.5000, 0.0000, 0.0000],
        [0.0000, 0.0000, 1.0000],
        [0.0000, 0.0000, 0.0000]])

Example

>>> empty = torch.FloatTensor([])
>>> nonempty = torch.FloatTensor([
>>>     [0, 0, 10, 9],
>>> ])
>>> assert tuple(bbox_overlaps(empty, nonempty).shape) == (0, 1)
>>> assert tuple(bbox_overlaps(nonempty, empty).shape) == (1, 0)
>>> assert tuple(bbox_overlaps(empty, empty).shape) == (0, 0)

mmcv.ops.box_iou_quadri(bboxes1: torch.Tensor, bboxes2: torch.Tensor, mode: str = 'iou', aligned: bool = False) → torch.Tensor[source]¶

Return intersection-over-union (Jaccard index) of boxes.

Both sets of boxes are expected to be in (x1, y1, …, x4, y4) format.

If aligned is False, then calculate the ious between each bbox of bboxes1 and bboxes2, otherwise the ious between each aligned pair of bboxes1 and bboxes2.

Parameters

bboxes1 (torch.Tensor) – quadrilateral bboxes 1. It has shape (N, 8), indicating (x1, y1, …, x4, y4) for each row.
bboxes2 (torch.Tensor) – quadrilateral bboxes 2. It has shape (M, 8), indicating (x1, y1, …, x4, y4) for each row.
mode (str) – “iou” (intersection over union) or iof (intersection over foreground).

Returns

Return the ious betweens boxes. If aligned is False, the shape of ious is (N, M) else (N,).

Return type

torch.Tensor

mmcv.ops.box_iou_rotated(bboxes1: torch.Tensor, bboxes2: torch.Tensor, mode: str = 'iou', aligned: bool = False, clockwise: bool = True) → torch.Tensor[source]¶

Return intersection-over-union (Jaccard index) of boxes.

Both sets of boxes are expected to be in (x_center, y_center, width, height, angle) format.

If aligned is False, then calculate the ious between each bbox of bboxes1 and bboxes2, otherwise the ious between each aligned pair of bboxes1 and bboxes2.

Note

The operator assumes:

The positive direction along x axis is left -> right.
The positive direction along y axis is top -> down.
The w border is in parallel with x axis when angle = 0.

However, there are 2 opposite definitions of the positive angular direction, clockwise (CW) and counter-clockwise (CCW). MMCV supports both definitions and uses CW by default.

Please set clockwise=False if you are using the CCW definition.

The coordinate system when clockwise is True (default)

0-------------------> x (0 rad)
|  A-------------B
|  |             |
|  |     box     h
|  |   angle=0   |
|  D------w------C
v
y (pi/2 rad)
In such coordination system the rotation matrix is

\[\begin{split}\begin{pmatrix} \cos\alpha & -\sin\alpha \\ \sin\alpha & \cos\alpha \end{pmatrix}\end{split}\]

The coordinates of the corner point A can be calculated as:

\[\begin{split}P_A= \begin{pmatrix} x_A \\ y_A\end{pmatrix} = \begin{pmatrix} x_{center} \\ y_{center}\end{pmatrix} + \begin{pmatrix}\cos\alpha & -\sin\alpha \\ \sin\alpha & \cos\alpha\end{pmatrix} \begin{pmatrix} -0.5w \\ -0.5h\end{pmatrix} \\ = \begin{pmatrix} x_{center}-0.5w\cos\alpha+0.5h\sin\alpha \\ y_{center}-0.5w\sin\alpha-0.5h\cos\alpha\end{pmatrix}\end{split}\]

The coordinate system when clockwise is False

0-------------------> x (0 rad)
|  A-------------B
|  |             |
|  |     box     h
|  |   angle=0   |
|  D------w------C
v
y (-pi/2 rad)
In such coordination system the rotation matrix is

\[\begin{split}\begin{pmatrix} \cos\alpha & \sin\alpha \\ -\sin\alpha & \cos\alpha \end{pmatrix}\end{split}\]

The coordinates of the corner point A can be calculated as:

\[\begin{split}P_A= \begin{pmatrix} x_A \\ y_A\end{pmatrix} = \begin{pmatrix} x_{center} \\ y_{center}\end{pmatrix} + \begin{pmatrix}\cos\alpha & \sin\alpha \\ -\sin\alpha & \cos\alpha\end{pmatrix} \begin{pmatrix} -0.5w \\ -0.5h\end{pmatrix} \\ = \begin{pmatrix} x_{center}-0.5w\cos\alpha-0.5h\sin\alpha \\ y_{center}+0.5w\sin\alpha-0.5h\cos\alpha\end{pmatrix}\end{split}\]

Parameters

boxes1 (torch.Tensor) – rotated bboxes 1. It has shape (N, 5), indicating (x, y, w, h, theta) for each row. Note that theta is in radian.
boxes2 (torch.Tensor) – rotated bboxes 2. It has shape (M, 5), indicating (x, y, w, h, theta) for each row. Note that theta is in radian.
mode (str) – “iou” (intersection over union) or iof (intersection over foreground).
clockwise (bool) – flag indicating whether the positive angular orientation is clockwise. default True. New in version 1.4.3.

Returns

Return the ious betweens boxes. If aligned is False, the shape of ious is (N, M) else (N,).

Return type

torch.Tensor

mmcv.ops.boxes_iou3d(boxes_a: torch.Tensor, boxes_b: torch.Tensor) → torch.Tensor[source]¶

Calculate boxes 3D IoU.

Parameters

boxes_a (torch.Tensor) – Input boxes a with shape (M, 7).
boxes_b (torch.Tensor) – Input boxes b with shape (N, 7).

Returns

3D IoU result with shape (M, N).

Return type

torch.Tensor

mmcv.ops.boxes_iou_bev(boxes_a: torch.Tensor, boxes_b: torch.Tensor) → torch.Tensor[source]¶

Calculate boxes IoU in the Bird’s Eye View.

Parameters

boxes_a (torch.Tensor) – Input boxes a with shape (M, 5) ([x1, y1, x2, y2, ry]).
boxes_b (torch.Tensor) – Input boxes b with shape (N, 5) ([x1, y1, x2, y2, ry]).

Returns

IoU result with shape (M, N).

Return type

torch.Tensor

mmcv.ops.boxes_overlap_bev(boxes_a: torch.Tensor, boxes_b: torch.Tensor) → torch.Tensor[source]¶

Calculate boxes BEV overlap.

Parameters

boxes_a (torch.Tensor) – Input boxes a with shape (M, 7).
boxes_b (torch.Tensor) – Input boxes b with shape (N, 7).

Returns

BEV overlap result with shape (M, N).

Return type

torch.Tensor

mmcv.ops.contour_expand(kernel_mask: Union[numpy.array, torch.Tensor], internal_kernel_label: Union[numpy.array, torch.Tensor], min_kernel_area: int, kernel_num: int) → list[source]¶

Expand kernel contours so that foreground pixels are assigned into instances.

Parameters

kernel_mask (np.array or torch.Tensor) – The instance kernel mask with size hxw.
internal_kernel_label (np.array or torch.Tensor) – The instance internal kernel label with size hxw.
min_kernel_area (int) – The minimum kernel area.
kernel_num (int) – The instance kernel number.

Returns

The instance index map with size hxw.

Return type

list

mmcv.ops.convex_giou(pointsets: torch.Tensor, polygons: torch.Tensor) → Tuple[torch.Tensor, torch.Tensor][source]¶

Return generalized intersection-over-union (Jaccard index) between point sets and polygons.

Parameters

pointsets (torch.Tensor) – It has shape (N, 18), indicating (x1, y1, x2, y2, …, x9, y9) for each row.
polygons (torch.Tensor) – It has shape (N, 8), indicating (x1, y1, x2, y2, x3, y3, x4, y4) for each row.

Returns

The first element is the gious between point sets and polygons with the shape (N,). The second element is the gradient of point sets with the shape (N, 18).

Return type

tuple[torch.Tensor, torch.Tensor]

mmcv.ops.convex_iou(pointsets: torch.Tensor, polygons: torch.Tensor) → torch.Tensor[source]¶

Return intersection-over-union (Jaccard index) between point sets and polygons.

Parameters

pointsets (torch.Tensor) – It has shape (N, 18), indicating (x1, y1, x2, y2, …, x9, y9) for each row.
polygons (torch.Tensor) – It has shape (K, 8), indicating (x1, y1, x2, y2, x3, y3, x4, y4) for each row.

Returns

Return the ious between point sets and polygons with the shape (N, K).

Return type

torch.Tensor

mmcv.ops.diff_iou_rotated_2d(box1: torch.Tensor, box2: torch.Tensor) → torch.Tensor[source]¶

Calculate differentiable iou of rotated 2d boxes.

Parameters

box1 (Tensor) – (B, N, 5) First box.
box2 (Tensor) – (B, N, 5) Second box.

Returns

(B, N) IoU.

Return type

Tensor

mmcv.ops.diff_iou_rotated_3d(box3d1: torch.Tensor, box3d2: torch.Tensor) → torch.Tensor[source]¶

Calculate differentiable iou of rotated 3d boxes.

Parameters

box3d1 (Tensor) – (B, N, 3+3+1) First box (x,y,z,w,h,l,alpha).
box3d2 (Tensor) – (B, N, 3+3+1) Second box (x,y,z,w,h,l,alpha).

Returns

(B, N) IoU.

Return type

Tensor

mmcv.ops.fused_bias_leakyrelu(input: torch.Tensor, bias: torch.nn.parameter.Parameter, negative_slope: float = 0.2, scale: float = 1.4142135623730951) → torch.Tensor[source]¶

Fused bias leaky ReLU function.

This function is introduced in the StyleGAN2: Analyzing and Improving the Image Quality of StyleGAN

The bias term comes from the convolution operation. In addition, to keep the variance of the feature map or gradients unchanged, they also adopt a scale similarly with Kaiming initialization. However, since the \(1+{alpha}^2\) is too small, we can just ignore it. Therefore, the final scale is just \(\sqrt{2}\). Of course, you may change it with your own scale.

Parameters

input (torch.Tensor) – Input feature map.
bias (nn.Parameter) – The bias from convolution operation.
negative_slope (float, optional) – Same as nn.LeakyRelu. Defaults to 0.2.
scale (float, optional) – A scalar to adjust the variance of the feature map. Defaults to 2**0.5.

Returns

Feature map after non-linear activation.

Return type

torch.Tensor

mmcv.ops.min_area_polygons(pointsets: torch.Tensor) → torch.Tensor[source]¶

Find the smallest polygons that surrounds all points in the point sets.

Parameters: pointsets (Tensor) – point sets with shape (N, 18).
Returns: Return the smallest polygons with shape (N, 8).
Return type: torch.Tensor

mmcv.ops.nms(boxes: Union[torch.Tensor, numpy.ndarray], scores: Union[torch.Tensor, numpy.ndarray], iou_threshold: float, offset: int = 0, score_threshold: float = 0, max_num: int = - 1) → Tuple[Union[torch.Tensor, numpy.ndarray], Union[torch.Tensor, numpy.ndarray]][source]¶

Dispatch to either CPU or GPU NMS implementations.

The input can be either torch tensor or numpy array. GPU NMS will be used if the input is gpu tensor, otherwise CPU NMS will be used. The returned type will always be the same as inputs.

Parameters

boxes (torch.Tensor or np.ndarray) – boxes in shape (N, 4).
scores (torch.Tensor or np.ndarray) – scores in shape (N, ).
iou_threshold (float) – IoU threshold for NMS.
offset (int, 0 or 1) – boxes’ width or height is (x2 - x1 + offset).
score_threshold (float) – score threshold for NMS.
max_num (int) – maximum number of boxes after NMS.

Returns

kept dets (boxes and scores) and indice, which always have the same data type as the input.

Return type

tuple

Example

>>> boxes = np.array([[49.1, 32.4, 51.0, 35.9],
>>>                   [49.3, 32.9, 51.0, 35.3],
>>>                   [49.2, 31.8, 51.0, 35.4],
>>>                   [35.1, 11.5, 39.1, 15.7],
>>>                   [35.6, 11.8, 39.3, 14.2],
>>>                   [35.3, 11.5, 39.9, 14.5],
>>>                   [35.2, 11.7, 39.7, 15.7]], dtype=np.float32)
>>> scores = np.array([0.9, 0.9, 0.5, 0.5, 0.5, 0.4, 0.3],               dtype=np.float32)
>>> iou_threshold = 0.6
>>> dets, inds = nms(boxes, scores, iou_threshold)
>>> assert len(inds) == len(dets) == 3

mmcv.ops.nms3d(boxes: torch.Tensor, scores: torch.Tensor, iou_threshold: float) → torch.Tensor[source]¶

3D NMS function GPU implementation (for BEV boxes).

Parameters

boxes (torch.Tensor) – Input boxes with the shape of (N, 7) ([x, y, z, dx, dy, dz, heading]).
scores (torch.Tensor) – Scores of boxes with the shape of (N).
iou_threshold (float) – Overlap threshold of NMS.

Returns

Indexes after NMS.

Return type

torch.Tensor

mmcv.ops.nms3d_normal(boxes: torch.Tensor, scores: torch.Tensor, iou_threshold: float) → torch.Tensor[source]¶

Normal 3D NMS function GPU implementation. The overlap of two boxes for IoU calculation is defined as the exact overlapping area of the two boxes WITH their yaw angle set to 0.

Parameters

boxes (torch.Tensor) – Input boxes with shape (N, 7). ([x, y, z, dx, dy, dz, heading]).
scores (torch.Tensor) – Scores of predicted boxes with shape (N).
iou_threshold (float) – Overlap threshold of NMS.

Returns

Remaining indices with scores in descending order.

Return type

torch.Tensor

mmcv.ops.nms_bev(boxes: torch.Tensor, scores: torch.Tensor, thresh: float, pre_max_size: Optional[int] = None, post_max_size: Optional[int] = None) → torch.Tensor[source]¶

NMS function GPU implementation (for BEV boxes).

The overlap of two boxes for IoU calculation is defined as the exact overlapping area of the two boxes. In this function, one can also set pre_max_size and post_max_size. :param boxes: Input boxes with the shape of (N, 5)

([x1, y1, x2, y2, ry]).

Parameters

scores (torch.Tensor) – Scores of boxes with the shape of (N,).
thresh (float) – Overlap threshold of NMS.
pre_max_size (int, optional) – Max size of boxes before NMS. Default: None.
post_max_size (int, optional) – Max size of boxes after NMS. Default: None.

Returns

Indexes after NMS.

Return type

torch.Tensor

mmcv.ops.nms_match(dets: Union[torch.Tensor, numpy.ndarray], iou_threshold: float) → List[Union[torch.Tensor, numpy.ndarray]][source]¶

Matched dets into different groups by NMS.

NMS match is Similar to NMS but when a bbox is suppressed, nms match will record the indice of suppressed bbox and form a group with the indice of kept bbox. In each group, indice is sorted as score order.

Parameters

dets (torch.Tensor | np.ndarray) – Det boxes with scores, shape (N, 5).
iou_threshold (float) – IoU thresh for NMS.

Returns

The outer list corresponds different matched group, the inner Tensor corresponds the indices for a group in score order.

Return type

list[torch.Tensor | np.ndarray]

mmcv.ops.nms_normal_bev(boxes: torch.Tensor, scores: torch.Tensor, thresh: float) → torch.Tensor[source]¶

Normal NMS function GPU implementation (for BEV boxes).

The overlap of two boxes for IoU calculation is defined as the exact overlapping area of the two boxes WITH their yaw angle set to 0. :param boxes: Input boxes with shape (N, 5)

([x1, y1, x2, y2, ry]).

Parameters

scores (torch.Tensor) – Scores of predicted boxes with shape (N,).
thresh (float) – Overlap threshold of NMS.

Returns

Remaining indices with scores in descending order.

Return type

torch.Tensor

mmcv.ops.nms_quadri(dets: torch.Tensor, scores: torch.Tensor, iou_threshold: float, labels: Optional[torch.Tensor] = None) → Tuple[torch.Tensor, torch.Tensor][source]¶

Performs non-maximum suppression (NMS) on the quadrilateral boxes according to their intersection-over-union (IoU).

Quadri NMS iteratively removes lower scoring quadrilateral boxes which have an IoU greater than iou_threshold with another (higher scoring) quadrilateral box.

Parameters

dets (torch.Tensor) – Quadri boxes in shape (N, 8). They are expected to be in (x1, y1, …, x4, y4) format.
scores (torch.Tensor) – scores in shape (N, ).
iou_threshold (float) – IoU thresh for NMS.
labels (torch.Tensor, optional) – boxes’ label in shape (N,).

Returns

kept dets(boxes and scores) and indice, which is always the same data type as the input.

Return type

tuple

mmcv.ops.nms_rotated(dets: torch.Tensor, scores: torch.Tensor, iou_threshold: float, labels: Optional[torch.Tensor] = None, clockwise: bool = True) → Tuple[torch.Tensor, torch.Tensor][source]¶

Performs non-maximum suppression (NMS) on the rotated boxes according to their intersection-over-union (IoU).

Rotated NMS iteratively removes lower scoring rotated boxes which have an IoU greater than iou_threshold with another (higher scoring) rotated box.

Parameters

dets (torch.Tensor) – Rotated boxes in shape (N, 5). They are expected to be in (x_ctr, y_ctr, width, height, angle_radian) format.
scores (torch.Tensor) – scores in shape (N, ).
iou_threshold (float) – IoU thresh for NMS.
labels (torch.Tensor, optional) – boxes’ label in shape (N,).
clockwise (bool) – flag indicating whether the positive angular orientation is clockwise. default True. New in version 1.4.3.

Returns

kept dets(boxes and scores) and indice, which is always the same data type as the input.

Return type

tuple

mmcv.ops.pixel_group(score: Union[numpy.ndarray, torch.Tensor], mask: Union[numpy.ndarray, torch.Tensor], embedding: Union[numpy.ndarray, torch.Tensor], kernel_label: Union[numpy.ndarray, torch.Tensor], kernel_contour: Union[numpy.ndarray, torch.Tensor], kernel_region_num: int, distance_threshold: float) → List[List[float]][source]¶

Group pixels into text instances, which is widely used text detection methods.

Parameters

score (np.array or torch.Tensor) – The foreground score with size hxw.
mask (np.array or Tensor) – The foreground mask with size hxw.
embedding (np.array or torch.Tensor) – The embedding with size hxwxc to distinguish instances.
kernel_label (np.array or torch.Tensor) – The instance kernel index with size hxw.
kernel_contour (np.array or torch.Tensor) – The kernel contour with size hxw.
kernel_region_num (int) – The instance kernel region number.
distance_threshold (float) – The embedding distance threshold between kernel and pixel in one instance.

Returns

The instance coordinates and attributes list. Each element consists of averaged confidence, pixel number, and coordinates (x_i, y_i for all pixels) in order.

Return type

list[list[float]]

mmcv.ops.point_sample(input: torch.Tensor, points: torch.Tensor, align_corners: bool = False, **kwargs) → torch.Tensor[source]¶

A wrapper around grid_sample() to support 3D point_coords tensors Unlike torch.nn.functional.grid_sample() it assumes point_coords to lie inside [0, 1] x [0, 1] square.

Parameters

input (torch.Tensor) – Feature map, shape (N, C, H, W).
points (torch.Tensor) – Image based absolute point coordinates (normalized), range [0, 1] x [0, 1], shape (N, P, 2) or (N, Hgrid, Wgrid, 2).
align_corners (bool, optional) – Whether align_corners. Default: False

Returns

Features of point on input, shape (N, C, P) or (N, C, Hgrid, Wgrid).

Return type

torch.Tensor

mmcv.ops.points_in_boxes_all(points: torch.Tensor, boxes: torch.Tensor) → torch.Tensor[source]¶

Find all boxes in which each point is (CUDA).

Parameters

points (torch.Tensor) – [B, M, 3], [x, y, z] in LiDAR/DEPTH coordinate
boxes (torch.Tensor) – [B, T, 7], num_valid_boxes <= T, [x, y, z, x_size, y_size, z_size, rz], (x, y, z) is the bottom center.

Returns

Return the box indices of points with the shape of (B, M, T). Default background = 0.

Return type

torch.Tensor

mmcv.ops.points_in_boxes_cpu(points: torch.Tensor, boxes: torch.Tensor) → torch.Tensor[source]¶

Find all boxes in which each point is (CPU). The CPU version of points_in_boxes_all().

Parameters

points (torch.Tensor) – [B, M, 3], [x, y, z] in LiDAR/DEPTH coordinate
boxes (torch.Tensor) – [B, T, 7], num_valid_boxes <= T, [x, y, z, x_size, y_size, z_size, rz], (x, y, z) is the bottom center.

Returns

Return the box indices of points with the shape of (B, M, T). Default background = 0.

Return type

torch.Tensor

mmcv.ops.points_in_boxes_part(points: torch.Tensor, boxes: torch.Tensor) → torch.Tensor[source]¶

Find the box in which each point is (CUDA).

Parameters

points (torch.Tensor) – [B, M, 3], [x, y, z] in LiDAR/DEPTH coordinate.
boxes (torch.Tensor) – [B, T, 7], num_valid_boxes <= T, [x, y, z, x_size, y_size, z_size, rz] in LiDAR/DEPTH coordinate, (x, y, z) is the bottom center.

Returns

Return the box indices of points with the shape of (B, M). Default background = -1.

Return type

torch.Tensor

mmcv.ops.points_in_polygons(points: torch.Tensor, polygons: torch.Tensor) → torch.Tensor[source]¶

Judging whether points are inside polygons, which is used in the ATSS assignment for the rotated boxes.

It should be noted that when the point is just at the polygon boundary, the judgment will be inaccurate, but the effect on assignment is limited.

Parameters

points (torch.Tensor) – It has shape (B, 2), indicating (x, y). M means the number of predicted points.
polygons (torch.Tensor) – It has shape (M, 8), indicating (x1, y1, x2, y2, x3, y3, x4, y4). M means the number of ground truth polygons.

Returns

Return the result with the shape of (B, M), 1 indicates that the point is inside the polygon, 0 indicates that the point is outside the polygon.

Return type

torch.Tensor

mmcv.ops.rel_roi_point_to_rel_img_point(rois: torch.Tensor, rel_roi_points: torch.Tensor, img: Union[tuple, torch.Tensor], spatial_scale: float = 1.0) → torch.Tensor[source]¶

Convert roi based relative point coordinates to image based absolute point coordinates.

Parameters

rois (torch.Tensor) – RoIs or BBoxes, shape (N, 4) or (N, 5)
rel_roi_points (torch.Tensor) – Point coordinates inside RoI, relative to RoI, location, range (0, 1), shape (N, P, 2)
img (tuple or torch.Tensor) – (height, width) of image or feature map.
spatial_scale (float, optional) – Scale points by this factor. Default: 1.

Returns

Image based relative point coordinates for sampling, shape (N, P, 2).

Return type

torch.Tensor

mmcv.ops.scatter_nd(indices: torch.Tensor, updates: torch.Tensor, shape: torch.Tensor) → torch.Tensor[source]¶

pytorch edition of tensorflow scatter_nd.

this function don’t contain except handle code. so use this carefully when indice repeats, don’t support repeat add which is supported in tensorflow.

mmcv.ops.soft_nms(boxes: Union[torch.Tensor, numpy.ndarray], scores: Union[torch.Tensor, numpy.ndarray], iou_threshold: float = 0.3, sigma: float = 0.5, min_score: float = 0.001, method: str = 'linear', offset: int = 0) → Tuple[Union[torch.Tensor, numpy.ndarray], Union[torch.Tensor, numpy.ndarray]][source]¶

Dispatch to only CPU Soft NMS implementations.

The input can be either a torch tensor or numpy array. The returned type will always be the same as inputs.

Parameters

boxes (torch.Tensor or np.ndarray) – boxes in shape (N, 4).
scores (torch.Tensor or np.ndarray) – scores in shape (N, ).
iou_threshold (float) – IoU threshold for NMS.
sigma (float) – hyperparameter for gaussian method
min_score (float) – score filter threshold
method (str) – either ‘linear’ or ‘gaussian’
offset (int, 0 or 1) – boxes’ width or height is (x2 - x1 + offset).

Returns

kept dets (boxes and scores) and indice, which always have the same data type as the input.

Return type

tuple

Example

>>> boxes = np.array([[4., 3., 5., 3.],
>>>                   [4., 3., 5., 4.],
>>>                   [3., 1., 3., 1.],
>>>                   [3., 1., 3., 1.],
>>>                   [3., 1., 3., 1.],
>>>                   [3., 1., 3., 1.]], dtype=np.float32)
>>> scores = np.array([0.9, 0.9, 0.5, 0.5, 0.4, 0.0], dtype=np.float32)
>>> iou_threshold = 0.6
>>> dets, inds = soft_nms(boxes, scores, iou_threshold, sigma=0.5)
>>> assert len(inds) == len(dets) == 5

mmcv.ops.upfirdn2d(input: torch.Tensor, kernel: torch.Tensor, up: Union[int, tuple] = 1, down: Union[int, tuple] = 1, pad: tuple = (0, 0)) → torch.Tensor[source]¶

UpFRIDn for 2d features.

UpFIRDn is short for upsample, apply FIR filter and downsample. More details can be found in: https://www.mathworks.com/help/signal/ref/upfirdn.html

Parameters

input (torch.Tensor) – Tensor with shape of (n, c, h, w).
kernel (torch.Tensor) – Filter kernel.
up (int | tuple[int], optional) – Upsampling factor. If given a number, we will use this factor for the both height and width side. Defaults to 1.
down (int | tuple[int], optional) – Downsampling factor. If given a number, we will use this factor for the both height and width side. Defaults to 1.
pad (tuple[int], optional) – Padding for tensors, (x_pad, y_pad) or (x_pad_0, x_pad_1, y_pad_0, y_pad_1). Defaults to (0, 0).

Returns

Tensor after UpFIRDn.

Return type

torch.Tensor