Shortcuts

fileio

class mmcv.fileio.BaseStorageBackend[source]

Abstract class of storage backends.

All backends need to implement two apis: get() and get_text(). get() reads the file as a byte stream and get_text() reads the file as texts.

class mmcv.fileio.FileClient(backend=None, prefix=None, **kwargs)[source]

A general file client to access files in different backends.

The client loads a file or text in a specified backend from its path and returns it as a binary or text file. There are two ways to choose a backend, the name of backend and the prefix of path. Although both of them can be used to choose a storage backend, backend has a higher priority that is if they are all set, the storage backend will be chosen by the backend argument. If they are all None, the disk backend will be chosen. Note that It can also register other backend accessor with a given name, prefixes, and backend class. In addition, We use the singleton pattern to avoid repeated object creation. If the arguments are the same, the same object will be returned.

Parameters
  • backend (str, optional) – The storage backend type. Options are “disk”, “ceph”, “memcached”, “lmdb”, “http” and “petrel”. Default: None.

  • prefix (str, optional) – The prefix of the registered storage backend. Options are “s3”, “http”, “https”. Default: None.

Examples

>>> # only set backend
>>> file_client = FileClient(backend='petrel')
>>> # only set prefix
>>> file_client = FileClient(prefix='s3')
>>> # set both backend and prefix but use backend to choose client
>>> file_client = FileClient(backend='petrel', prefix='s3')
>>> # if the arguments are the same, the same object is returned
>>> file_client1 = FileClient(backend='petrel')
>>> file_client1 is file_client
True
client

The backend object.

Type

BaseStorageBackend

exists(filepath: Union[str, pathlib.Path])bool[source]

Check whether a file path exists.

Parameters

filepath (str or Path) – Path to be checked whether exists.

Returns

Return True if filepath exists, False otherwise.

Return type

bool

get(filepath: Union[str, pathlib.Path])Union[bytes, memoryview][source]

Read data from a given filepath with ‘rb’ mode.

Note

There are two types of return values for get, one is bytes and the other is memoryview. The advantage of using memoryview is that you can avoid copying, and if you want to convert it to bytes, you can use .tobytes().

Parameters

filepath (str or Path) – Path to read data.

Returns

Expected bytes object or a memory view of the bytes object.

Return type

bytes | memoryview

get_local_path(filepath: Union[str, pathlib.Path])Iterable[str][source]

Download data from filepath and write the data to local path.

get_local_path is decorated by contxtlib.contextmanager(). It can be called with with statement, and when exists from the with statement, the temporary path will be released.

Note

If the filepath is a local path, just return itself.

Warning

get_local_path is an experimental interface that may change in the future.

Parameters

filepath (str or Path) – Path to be read data.

Examples

>>> file_client = FileClient(prefix='s3')
>>> with file_client.get_local_path('s3://bucket/abc.jpg') as path:
...     # do something here
Yields

Iterable[str] – Only yield one path.

get_text(filepath: Union[str, pathlib.Path], encoding='utf-8')str[source]

Read data from a given filepath with ‘r’ mode.

Parameters
  • filepath (str or Path) – Path to read data.

  • encoding (str) – The encoding format used to open the filepath. Default: ‘utf-8’.

Returns

Expected text reading from filepath.

Return type

str

classmethod infer_client(file_client_args: Optional[dict] = None, uri: Optional[Union[str, pathlib.Path]] = None)mmcv.fileio.file_client.FileClient[source]

Infer a suitable file client based on the URI and arguments.

Parameters
  • file_client_args (dict, optional) – Arguments to instantiate a FileClient. Default: None.

  • uri (str | Path, optional) – Uri to be parsed that contains the file prefix. Default: None.

Examples

>>> uri = 's3://path/of/your/file'
>>> file_client = FileClient.infer_client(uri=uri)
>>> file_client_args = {'backend': 'petrel'}
>>> file_client = FileClient.infer_client(file_client_args)
Returns

Instantiated FileClient object.

Return type

FileClient

isdir(filepath: Union[str, pathlib.Path])bool[source]

Check whether a file path is a directory.

Parameters

filepath (str or Path) – Path to be checked whether it is a directory.

Returns

Return True if filepath points to a directory, False otherwise.

Return type

bool

isfile(filepath: Union[str, pathlib.Path])bool[source]

Check whether a file path is a file.

Parameters

filepath (str or Path) – Path to be checked whether it is a file.

Returns

Return True if filepath points to a file, False otherwise.

Return type

bool

join_path(filepath: Union[str, pathlib.Path], *filepaths: Union[str, pathlib.Path])str[source]

Concatenate all file paths.

Join one or more filepath components intelligently. The return value is the concatenation of filepath and any members of *filepaths.

Parameters

filepath (str or Path) – Path to be concatenated.

Returns

The result of concatenation.

Return type

str

list_dir_or_file(dir_path: Union[str, pathlib.Path], list_dir: bool = True, list_file: bool = True, suffix: Optional[Union[str, Tuple[str]]] = None, recursive: bool = False)Iterator[str][source]

Scan a directory to find the interested directories or files in arbitrary order.

Note

list_dir_or_file() returns the path relative to dir_path.

Parameters
  • dir_path (str | Path) – Path of the directory.

  • list_dir (bool) – List the directories. Default: True.

  • list_file (bool) – List the path of files. Default: True.

  • suffix (str or tuple[str], optional) – File suffix that we are interested in. Default: None.

  • recursive (bool) – If set to True, recursively scan the directory. Default: False.

Yields

Iterable[str] – A relative path to dir_path.

static parse_uri_prefix(uri: Union[str, pathlib.Path])Optional[str][source]

Parse the prefix of a uri.

Parameters

uri (str | Path) – Uri to be parsed that contains the file prefix.

Examples

>>> FileClient.parse_uri_prefix('s3://path/of/your/file')
's3'
Returns

Return the prefix of uri if the uri contains ‘://’ else None.

Return type

str | None

put(obj: bytes, filepath: Union[str, pathlib.Path])None[source]

Write data to a given filepath with ‘wb’ mode.

Note

put should create a directory if the directory of filepath does not exist.

Parameters
  • obj (bytes) – Data to be written.

  • filepath (str or Path) – Path to write data.

put_text(obj: str, filepath: Union[str, pathlib.Path])None[source]

Write data to a given filepath with ‘w’ mode.

Note

put_text should create a directory if the directory of filepath does not exist.

Parameters
  • obj (str) – Data to be written.

  • filepath (str or Path) – Path to write data.

  • encoding (str, optional) – The encoding format used to open the filepath. Default: ‘utf-8’.

classmethod register_backend(name, backend=None, force=False, prefixes=None)[source]

Register a backend to FileClient.

This method can be used as a normal class method or a decorator.

class NewBackend(BaseStorageBackend):

    def get(self, filepath):
        return filepath

    def get_text(self, filepath):
        return filepath

FileClient.register_backend('new', NewBackend)

or

@FileClient.register_backend('new')
class NewBackend(BaseStorageBackend):

    def get(self, filepath):
        return filepath

    def get_text(self, filepath):
        return filepath
Parameters
  • name (str) – The name of the registered backend.

  • backend (class, optional) – The backend class to be registered, which must be a subclass of BaseStorageBackend. When this method is used as a decorator, backend is None. Defaults to None.

  • force (bool, optional) – Whether to override the backend if the name has already been registered. Defaults to False.

  • prefixes (str or list[str] or tuple[str], optional) – The prefixes of the registered storage backend. Default: None. New in version 1.3.15.

remove(filepath: Union[str, pathlib.Path])None[source]

Remove a file.

Parameters

filepath (str, Path) – Path to be removed.

mmcv.fileio.dict_from_file(filename, key_type=<class 'str'>, encoding='utf-8', file_client_args=None)[source]

Load a text file and parse the content as a dict.

Each line of the text file will be two or more columns split by whitespaces or tabs. The first column will be parsed as dict keys, and the following columns will be parsed as dict values.

Note

In v1.3.16 and later, dict_from_file supports loading a text file which can be storaged in different backends and parsing the content as a dict.

Parameters
  • filename (str) – Filename.

  • key_type (type) – Type of the dict keys. str is user by default and type conversion will be performed if specified.

  • encoding (str) – Encoding used to open the file. Default utf-8.

  • file_client_args (dict, optional) – Arguments to instantiate a FileClient. See mmcv.fileio.FileClient for details. Default: None.

Examples

>>> dict_from_file('/path/of/your/file')  # disk
{'key1': 'value1', 'key2': 'value2'}
>>> dict_from_file('s3://path/of/your/file')  # ceph or petrel
{'key1': 'value1', 'key2': 'value2'}
Returns

The parsed contents.

Return type

dict

mmcv.fileio.dump(obj, file=None, file_format=None, file_client_args=None, **kwargs)[source]

Dump data to json/yaml/pickle strings or files.

This method provides a unified api for dumping data as strings or to files, and also supports custom arguments for each file format.

Note

In v1.3.16 and later, dump supports dumping data as strings or to files which is saved to different backends.

Parameters
  • obj (any) – The python object to be dumped.

  • file (str or Path or file-like object, optional) – If not specified, then the object is dumped to a str, otherwise to a file specified by the filename or file-like object.

  • file_format (str, optional) – Same as load().

  • file_client_args (dict, optional) – Arguments to instantiate a FileClient. See mmcv.fileio.FileClient for details. Default: None.

Examples

>>> dump('hello world', '/path/of/your/file')  # disk
>>> dump('hello world', 's3://path/of/your/file')  # ceph or petrel
Returns

True for success, False otherwise.

Return type

bool

mmcv.fileio.list_from_file(filename, prefix='', offset=0, max_num=0, encoding='utf-8', file_client_args=None)[source]

Load a text file and parse the content as a list of strings.

Note

In v1.3.16 and later, list_from_file supports loading a text file which can be storaged in different backends and parsing the content as a list for strings.

Parameters
  • filename (str) – Filename.

  • prefix (str) – The prefix to be inserted to the beginning of each item.

  • offset (int) – The offset of lines.

  • max_num (int) – The maximum number of lines to be read, zeros and negatives mean no limitation.

  • encoding (str) – Encoding used to open the file. Default utf-8.

  • file_client_args (dict, optional) – Arguments to instantiate a FileClient. See mmcv.fileio.FileClient for details. Default: None.

Examples

>>> list_from_file('/path/of/your/file')  # disk
['hello', 'world']
>>> list_from_file('s3://path/of/your/file')  # ceph or petrel
['hello', 'world']
Returns

A list of strings.

Return type

list[str]

mmcv.fileio.load(file, file_format=None, file_client_args=None, **kwargs)[source]

Load data from json/yaml/pickle files.

This method provides a unified api for loading data from serialized files.

Note

In v1.3.16 and later, load supports loading data from serialized files those can be storaged in different backends.

Parameters
  • file (str or Path or file-like object) – Filename or a file-like object.

  • file_format (str, optional) – If not specified, the file format will be inferred from the file extension, otherwise use the specified one. Currently supported formats include “json”, “yaml/yml” and “pickle/pkl”.

  • file_client_args (dict, optional) – Arguments to instantiate a FileClient. See mmcv.fileio.FileClient for details. Default: None.

Examples

>>> load('/path/of/your/file')  # file is storaged in disk
>>> load('https://path/of/your/file')  # file is storaged in Internet
>>> load('s3://path/of/your/file')  # file is storaged in petrel
Returns

The content from the file.

image

mmcv.image.adjust_brightness(img, factor=1.0)[source]

Adjust image brightness.

This function controls the brightness of an image. An enhancement factor of 0.0 gives a black image. A factor of 1.0 gives the original image. This function blends the source image and the degenerated black image:

\[output = img * factor + degenerated * (1 - factor)\]
Parameters
  • img (ndarray) – Image to be brightened.

  • factor (float) – A value controls the enhancement. Factor 1.0 returns the original image, lower factors mean less color (brightness, contrast, etc), and higher values more. Default 1.

Returns

The brightened image.

Return type

ndarray

mmcv.image.adjust_color(img, alpha=1, beta=None, gamma=0)[source]

It blends the source image and its gray image:

\[output = img * alpha + gray\_img * beta + gamma\]
Parameters
  • img (ndarray) – The input source image.

  • alpha (int | float) – Weight for the source image. Default 1.

  • beta (int | float) – Weight for the converted gray image. If None, it’s assigned the value (1 - alpha).

  • gamma (int | float) – Scalar added to each sum. Same as cv2.addWeighted(). Default 0.

Returns

Colored image which has the same size and dtype as input.

Return type

ndarray

mmcv.image.adjust_contrast(img, factor=1.0)[source]

Adjust image contrast.

This function controls the contrast of an image. An enhancement factor of 0.0 gives a solid grey image. A factor of 1.0 gives the original image. It blends the source image and the degenerated mean image:

\[output = img * factor + degenerated * (1 - factor)\]
Parameters
  • img (ndarray) – Image to be contrasted. BGR order.

  • factor (float) – Same as mmcv.adjust_brightness().

Returns

The contrasted image.

Return type

ndarray

mmcv.image.adjust_lighting(img, eigval, eigvec, alphastd=0.1, to_rgb=True)[source]

AlexNet-style PCA jitter.

This data augmentation is proposed in ImageNet Classification with Deep Convolutional Neural Networks.

Parameters
  • img (ndarray) – Image to be adjusted lighting. BGR order.

  • eigval (ndarray) – the eigenvalue of the convariance matrix of pixel values, respectively.

  • eigvec (ndarray) – the eigenvector of the convariance matrix of pixel values, respectively.

  • alphastd (float) – The standard deviation for distribution of alpha. Defaults to 0.1

  • to_rgb (bool) – Whether to convert img to rgb.

Returns

The adjusted image.

Return type

ndarray

mmcv.image.adjust_sharpness(img, factor=1.0, kernel=None)[source]

Adjust image sharpness.

This function controls the sharpness of an image. An enhancement factor of 0.0 gives a blurred image. A factor of 1.0 gives the original image. And a factor of 2.0 gives a sharpened image. It blends the source image and the degenerated mean image:

\[output = img * factor + degenerated * (1 - factor)\]
Parameters
  • img (ndarray) – Image to be sharpened. BGR order.

  • factor (float) – Same as mmcv.adjust_brightness().

  • kernel (np.ndarray, optional) – Filter kernel to be applied on the img to obtain the degenerated img. Defaults to None.

Note

No value sanity check is enforced on the kernel set by users. So with an inappropriate kernel, the adjust_sharpness may fail to perform the function its name indicates but end up performing whatever transform determined by the kernel.

Returns

The sharpened image.

Return type

ndarray

mmcv.image.auto_contrast(img, cutoff=0)[source]

Auto adjust image contrast.

This function maximize (normalize) image contrast by first removing cutoff percent of the lightest and darkest pixels from the histogram and remapping the image so that the darkest pixel becomes black (0), and the lightest becomes white (255).

Parameters
  • img (ndarray) – Image to be contrasted. BGR order.

  • cutoff (int | float | tuple) – The cutoff percent of the lightest and darkest pixels to be removed. If given as tuple, it shall be (low, high). Otherwise, the single value will be used for both. Defaults to 0.

Returns

The contrasted image.

Return type

ndarray

mmcv.image.bgr2gray(img, keepdim=False)[source]

Convert a BGR image to grayscale image.

Parameters
  • img (ndarray) – The input image.

  • keepdim (bool) – If False (by default), then return the grayscale image with 2 dims, otherwise 3 dims.

Returns

The converted grayscale image.

Return type

ndarray

mmcv.image.bgr2hls(img)
Convert a BGR image to HLS

image.

Parameters

img (ndarray or str) – The input image.

Returns

The converted HLS image.

Return type

ndarray

mmcv.image.bgr2hsv(img)
Convert a BGR image to HSV

image.

Parameters

img (ndarray or str) – The input image.

Returns

The converted HSV image.

Return type

ndarray

mmcv.image.bgr2rgb(img)
Convert a BGR image to RGB

image.

Parameters

img (ndarray or str) – The input image.

Returns

The converted RGB image.

Return type

ndarray

mmcv.image.bgr2ycbcr(img, y_only=False)[source]

Convert a BGR image to YCbCr image.

The bgr version of rgb2ycbcr. It implements the ITU-R BT.601 conversion for standard-definition television. See more details in https://en.wikipedia.org/wiki/YCbCr#ITU-R_BT.601_conversion.

It differs from a similar function in cv2.cvtColor: BGR <-> YCrCb. In OpenCV, it implements a JPEG conversion. See more details in https://en.wikipedia.org/wiki/YCbCr#JPEG_conversion.

Parameters
  • img (ndarray) – The input image. It accepts: 1. np.uint8 type with range [0, 255]; 2. np.float32 type with range [0, 1].

  • y_only (bool) – Whether to only return Y channel. Default: False.

Returns

The converted YCbCr image. The output image has the same type and range as input image.

Return type

ndarray

mmcv.image.clahe(img, clip_limit=40.0, tile_grid_size=(8, 8))[source]

Use CLAHE method to process the image.

See ZUIDERVELD,K. Contrast Limited Adaptive Histogram Equalization[J]. Graphics Gems, 1994:474-485. for more information.

Parameters
  • img (ndarray) – Image to be processed.

  • clip_limit (float) – Threshold for contrast limiting. Default: 40.0.

  • tile_grid_size (tuple[int]) – Size of grid for histogram equalization. Input image will be divided into equally sized rectangular tiles. It defines the number of tiles in row and column. Default: (8, 8).

Returns

The processed image.

Return type

ndarray

mmcv.image.cutout(img, shape, pad_val=0)[source]

Randomly cut out a rectangle from the original img.

Parameters
  • img (ndarray) – Image to be cutout.

  • shape (int | tuple[int]) – Expected cutout shape (h, w). If given as a int, the value will be used for both h and w.

  • pad_val (int | float | tuple[int | float]) – Values to be filled in the cut area. Defaults to 0.

Returns

The cutout image.

Return type

ndarray

mmcv.image.gray2bgr(img)[source]

Convert a grayscale image to BGR image.

Parameters

img (ndarray) – The input image.

Returns

The converted BGR image.

Return type

ndarray

mmcv.image.gray2rgb(img)[source]

Convert a grayscale image to RGB image.

Parameters

img (ndarray) – The input image.

Returns

The converted RGB image.

Return type

ndarray

mmcv.image.hls2bgr(img)
Convert a HLS image to BGR

image.

Parameters

img (ndarray or str) – The input image.

Returns

The converted BGR image.

Return type

ndarray

mmcv.image.hsv2bgr(img)
Convert a HSV image to BGR

image.

Parameters

img (ndarray or str) – The input image.

Returns

The converted BGR image.

Return type

ndarray

mmcv.image.imconvert(img, src, dst)[source]

Convert an image from the src colorspace to dst colorspace.

Parameters
  • img (ndarray) – The input image.

  • src (str) – The source colorspace, e.g., ‘rgb’, ‘hsv’.

  • dst (str) – The destination colorspace, e.g., ‘rgb’, ‘hsv’.

Returns

The converted image.

Return type

ndarray

mmcv.image.imcrop(img, bboxes, scale=1.0, pad_fill=None)[source]

Crop image patches.

3 steps: scale the bboxes -> clip bboxes -> crop and pad.

Parameters
  • img (ndarray) – Image to be cropped.

  • bboxes (ndarray) – Shape (k, 4) or (4, ), location of cropped bboxes.

  • scale (float, optional) – Scale ratio of bboxes, the default value 1.0 means no padding.

  • pad_fill (Number | list[Number]) – Value to be filled for padding. Default: None, which means no padding.

Returns

The cropped image patches.

Return type

list[ndarray] | ndarray

mmcv.image.imequalize(img)[source]

Equalize the image histogram.

This function applies a non-linear mapping to the input image, in order to create a uniform distribution of grayscale values in the output image.

Parameters

img (ndarray) – Image to be equalized.

Returns

The equalized image.

Return type

ndarray

mmcv.image.imflip(img, direction='horizontal')[source]

Flip an image horizontally or vertically.

Parameters
  • img (ndarray) – Image to be flipped.

  • direction (str) – The flip direction, either “horizontal” or “vertical” or “diagonal”.

Returns

The flipped image.

Return type

ndarray

mmcv.image.imflip_(img, direction='horizontal')[source]

Inplace flip an image horizontally or vertically.

Parameters
  • img (ndarray) – Image to be flipped.

  • direction (str) – The flip direction, either “horizontal” or “vertical” or “diagonal”.

Returns

The flipped image (inplace).

Return type

ndarray

mmcv.image.imfrombytes(content, flag='color', channel_order='bgr', backend=None)[source]

Read an image from bytes.

Parameters
  • content (bytes) – Image bytes got from files or other streams.

  • flag (str) – Same as imread().

  • backend (str | None) – The image decoding backend type. Options are cv2, pillow, turbojpeg, tifffile, None. If backend is None, the global imread_backend specified by mmcv.use_backend() will be used. Default: None.

Returns

Loaded image array.

Return type

ndarray

Examples

>>> img_path = '/path/to/img.jpg'
>>> with open(img_path, 'rb') as f:
>>>     img_buff = f.read()
>>> img = mmcv.imfrombytes(img_buff)
>>> img = mmcv.imfrombytes(img_buff, flag='color', channel_order='rgb')
>>> img = mmcv.imfrombytes(img_buff, backend='pillow')
>>> img = mmcv.imfrombytes(img_buff, backend='cv2')
mmcv.image.iminvert(img)[source]

Invert (negate) an image.

Parameters

img (ndarray) – Image to be inverted.

Returns

The inverted image.

Return type

ndarray

mmcv.image.imnormalize(img, mean, std, to_rgb=True)[source]

Normalize an image with mean and std.

Parameters
  • img (ndarray) – Image to be normalized.

  • mean (ndarray) – The mean to be used for normalize.

  • std (ndarray) – The std to be used for normalize.

  • to_rgb (bool) – Whether to convert to rgb.

Returns

The normalized image.

Return type

ndarray

mmcv.image.imnormalize_(img, mean, std, to_rgb=True)[source]

Inplace normalize an image with mean and std.

Parameters
  • img (ndarray) – Image to be normalized.

  • mean (ndarray) – The mean to be used for normalize.

  • std (ndarray) – The std to be used for normalize.

  • to_rgb (bool) – Whether to convert to rgb.

Returns

The normalized image.

Return type

ndarray

mmcv.image.impad(img, *, shape=None, padding=None, pad_val=0, padding_mode='constant')[source]

Pad the given image to a certain shape or pad on all sides with specified padding mode and padding value.

Parameters
  • img (ndarray) – Image to be padded.

  • shape (tuple[int]) – Expected padding shape (h, w). Default: None.

  • padding (int or tuple[int]) – Padding on each border. If a single int is provided this is used to pad all borders. If tuple of length 2 is provided this is the padding on left/right and top/bottom respectively. If a tuple of length 4 is provided this is the padding for the left, top, right and bottom borders respectively. Default: None. Note that shape and padding can not be both set.

  • pad_val (Number | Sequence[Number]) – Values to be filled in padding areas when padding_mode is ‘constant’. Default: 0.

  • padding_mode (str) –

    Type of padding. Should be: constant, edge, reflect or symmetric. Default: constant.

    • constant: pads with a constant value, this value is specified with pad_val.

    • edge: pads with the last value at the edge of the image.

    • reflect: pads with reflection of image without repeating the last value on the edge. For example, padding [1, 2, 3, 4] with 2 elements on both sides in reflect mode will result in [3, 2, 1, 2, 3, 4, 3, 2].

    • symmetric: pads with reflection of image repeating the last value on the edge. For example, padding [1, 2, 3, 4] with 2 elements on both sides in symmetric mode will result in [2, 1, 1, 2, 3, 4, 4, 3]

Returns

The padded image.

Return type

ndarray

mmcv.image.impad_to_multiple(img, divisor, pad_val=0)[source]

Pad an image to ensure each edge to be multiple to some number.

Parameters
  • img (ndarray) – Image to be padded.

  • divisor (int) – Padded image edges will be multiple to divisor.

  • pad_val (Number | Sequence[Number]) – Same as impad().

Returns

The padded image.

Return type

ndarray

mmcv.image.imread(img_or_path, flag='color', channel_order='bgr', backend=None, file_client_args=None)[source]

Read an image.

Note

In v1.4.1 and later, add file_client_args parameters.

Parameters
  • img_or_path (ndarray or str or Path) – Either a numpy array or str or pathlib.Path. If it is a numpy array (loaded image), then it will be returned as is.

  • flag (str) – Flags specifying the color type of a loaded image, candidates are color, grayscale, unchanged, color_ignore_orientation and grayscale_ignore_orientation. By default, cv2 and pillow backend would rotate the image according to its EXIF info unless called with unchanged or *_ignore_orientation flags. turbojpeg and tifffile backend always ignore image’s EXIF info regardless of the flag. The turbojpeg backend only supports color and grayscale.

  • channel_order (str) – Order of channel, candidates are bgr and rgb.

  • backend (str | None) – The image decoding backend type. Options are cv2, pillow, turbojpeg, tifffile, None. If backend is None, the global imread_backend specified by mmcv.use_backend() will be used. Default: None.

  • file_client_args (dict | None) – Arguments to instantiate a FileClient. See mmcv.fileio.FileClient for details. Default: None.

Returns

Loaded image array.

Return type

ndarray

Examples

>>> import mmcv
>>> img_path = '/path/to/img.jpg'
>>> img = mmcv.imread(img_path)
>>> img = mmcv.imread(img_path, flag='color', channel_order='rgb',
...     backend='cv2')
>>> img = mmcv.imread(img_path, flag='color', channel_order='bgr',
...     backend='pillow')
>>> s3_img_path = 's3://bucket/img.jpg'
>>> # infer the file backend by the prefix s3
>>> img = mmcv.imread(s3_img_path)
>>> # manually set the file backend petrel
>>> img = mmcv.imread(s3_img_path, file_client_args={
...     'backend': 'petrel'})
>>> http_img_path = 'http://path/to/img.jpg'
>>> img = mmcv.imread(http_img_path)
>>> img = mmcv.imread(http_img_path, file_client_args={
...     'backend': 'http'})
mmcv.image.imrescale(img, scale, return_scale=False, interpolation='bilinear', backend=None)[source]

Resize image while keeping the aspect ratio.

Parameters
  • img (ndarray) – The input image.

  • scale (float | tuple[int]) – The scaling factor or maximum size. If it is a float number, then the image will be rescaled by this factor, else if it is a tuple of 2 integers, then the image will be rescaled as large as possible within the scale.

  • return_scale (bool) – Whether to return the scaling factor besides the rescaled image.

  • interpolation (str) – Same as resize().

  • backend (str | None) – Same as resize().

Returns

The rescaled image.

Return type

ndarray

mmcv.image.imresize(img, size, return_scale=False, interpolation='bilinear', out=None, backend=None)[source]

Resize image to a given size.

Parameters
  • img (ndarray) – The input image.

  • size (tuple[int]) – Target size (w, h).

  • return_scale (bool) – Whether to return w_scale and h_scale.

  • interpolation (str) – Interpolation method, accepted values are “nearest”, “bilinear”, “bicubic”, “area”, “lanczos” for ‘cv2’ backend, “nearest”, “bilinear” for ‘pillow’ backend.

  • out (ndarray) – The output destination.

  • backend (str | None) – The image resize backend type. Options are cv2, pillow, None. If backend is None, the global imread_backend specified by mmcv.use_backend() will be used. Default: None.

Returns

(resized_img, w_scale, h_scale) or resized_img.

Return type

tuple | ndarray

mmcv.image.imresize_like(img, dst_img, return_scale=False, interpolation='bilinear', backend=None)[source]

Resize image to the same size of a given image.

Parameters
  • img (ndarray) – The input image.

  • dst_img (ndarray) – The target image.

  • return_scale (bool) – Whether to return w_scale and h_scale.

  • interpolation (str) – Same as resize().

  • backend (str | None) – Same as resize().

Returns

(resized_img, w_scale, h_scale) or resized_img.

Return type

tuple or ndarray

mmcv.image.imresize_to_multiple(img, divisor, size=None, scale_factor=None, keep_ratio=False, return_scale=False, interpolation='bilinear', out=None, backend=None)[source]

Resize image according to a given size or scale factor and then rounds up the the resized or rescaled image size to the nearest value that can be divided by the divisor.

Parameters
  • img (ndarray) – The input image.

  • divisor (int | tuple) – Resized image size will be a multiple of divisor. If divisor is a tuple, divisor should be (w_divisor, h_divisor).

  • size (None | int | tuple[int]) – Target size (w, h). Default: None.

  • scale_factor (None | float | tuple[float]) – Multiplier for spatial size. Should match input size if it is a tuple and the 2D style is (w_scale_factor, h_scale_factor). Default: None.

  • keep_ratio (bool) – Whether to keep the aspect ratio when resizing the image. Default: False.

  • return_scale (bool) – Whether to return w_scale and h_scale.

  • interpolation (str) – Interpolation method, accepted values are “nearest”, “bilinear”, “bicubic”, “area”, “lanczos” for ‘cv2’ backend, “nearest”, “bilinear” for ‘pillow’ backend.

  • out (ndarray) – The output destination.

  • backend (str | None) – The image resize backend type. Options are cv2, pillow, None. If backend is None, the global imread_backend specified by mmcv.use_backend() will be used. Default: None.

Returns

(resized_img, w_scale, h_scale) or resized_img.

Return type

tuple | ndarray

mmcv.image.imrotate(img, angle, center=None, scale=1.0, border_value=0, interpolation='bilinear', auto_bound=False)[source]

Rotate an image.

Parameters
  • img (ndarray) – Image to be rotated.

  • angle (float) – Rotation angle in degrees, positive values mean clockwise rotation.

  • center (tuple[float], optional) – Center point (w, h) of the rotation in the source image. If not specified, the center of the image will be used.

  • scale (float) – Isotropic scale factor.

  • border_value (int) – Border value.

  • interpolation (str) – Same as resize().

  • auto_bound (bool) – Whether to adjust the image size to cover the whole rotated image.

Returns

The rotated image.

Return type

ndarray

mmcv.image.imshear(img, magnitude, direction='horizontal', border_value=0, interpolation='bilinear')[source]

Shear an image.

Parameters
  • img (ndarray) – Image to be sheared with format (h, w) or (h, w, c).

  • magnitude (int | float) – The magnitude used for shear.

  • direction (str) – The flip direction, either “horizontal” or “vertical”.

  • border_value (int | tuple[int]) – Value used in case of a constant border.

  • interpolation (str) – Same as resize().

Returns

The sheared image.

Return type

ndarray

mmcv.image.imtranslate(img, offset, direction='horizontal', border_value=0, interpolation='bilinear')[source]

Translate an image.

Parameters
  • img (ndarray) – Image to be translated with format (h, w) or (h, w, c).

  • offset (int | float) – The offset used for translate.

  • direction (str) – The translate direction, either “horizontal” or “vertical”.

  • border_value (int | tuple[int]) – Value used in case of a constant border.

  • interpolation (str) – Same as resize().

Returns

The translated image.

Return type

ndarray

mmcv.image.imwrite(img, file_path, params=None, auto_mkdir=None, file_client_args=None)[source]

Write image to file.

Note

In v1.4.1 and later, add file_client_args parameters.

Warning

The parameter auto_mkdir will be deprecated in the future and every file clients will make directory automatically.

Parameters
  • img (ndarray) – Image array to be written.

  • file_path (str) – Image file path.

  • params (None or list) – Same as opencv imwrite() interface.

  • auto_mkdir (bool) – If the parent folder of file_path does not exist, whether to create it automatically. It will be deprecated.

  • file_client_args (dict | None) – Arguments to instantiate a FileClient. See mmcv.fileio.FileClient for details. Default: None.

Returns

Successful or not.

Return type

bool

Examples

>>> # write to hard disk client
>>> ret = mmcv.imwrite(img, '/path/to/img.jpg')
>>> # infer the file backend by the prefix s3
>>> ret = mmcv.imwrite(img, 's3://bucket/img.jpg')
>>> # manually set the file backend petrel
>>> ret = mmcv.imwrite(img, 's3://bucket/img.jpg', file_client_args={
...     'backend': 'petrel'})
mmcv.image.lut_transform(img, lut_table)[source]

Transform array by look-up table.

The function lut_transform fills the output array with values from the look-up table. Indices of the entries are taken from the input array.

Parameters
  • img (ndarray) – Image to be transformed.

  • lut_table (ndarray) – look-up table of 256 elements; in case of multi-channel input array, the table should either have a single channel (in this case the same table is used for all channels) or the same number of channels as in the input array.

Returns

The transformed image.

Return type

ndarray

mmcv.image.posterize(img, bits)[source]

Posterize an image (reduce the number of bits for each color channel)

Parameters
  • img (ndarray) – Image to be posterized.

  • bits (int) – Number of bits (1 to 8) to use for posterizing.

Returns

The posterized image.

Return type

ndarray

mmcv.image.rescale_size(old_size, scale, return_scale=False)[source]

Calculate the new size to be rescaled to.

Parameters
  • old_size (tuple[int]) – The old size (w, h) of image.

  • scale (float | tuple[int]) – The scaling factor or maximum size. If it is a float number, then the image will be rescaled by this factor, else if it is a tuple of 2 integers, then the image will be rescaled as large as possible within the scale.

  • return_scale (bool) – Whether to return the scaling factor besides the rescaled image size.

Returns

The new rescaled image size.

Return type

tuple[int]

mmcv.image.rgb2bgr(img)
Convert a RGB image to BGR

image.

Parameters

img (ndarray or str) – The input image.

Returns

The converted BGR image.

Return type

ndarray

mmcv.image.rgb2gray(img, keepdim=False)[source]

Convert a RGB image to grayscale image.

Parameters
  • img (ndarray) – The input image.

  • keepdim (bool) – If False (by default), then return the grayscale image with 2 dims, otherwise 3 dims.

Returns

The converted grayscale image.

Return type

ndarray

mmcv.image.rgb2ycbcr(img, y_only=False)[source]

Convert a RGB image to YCbCr image.

This function produces the same results as Matlab’s rgb2ycbcr function. It implements the ITU-R BT.601 conversion for standard-definition television. See more details in https://en.wikipedia.org/wiki/YCbCr#ITU-R_BT.601_conversion.

It differs from a similar function in cv2.cvtColor: RGB <-> YCrCb. In OpenCV, it implements a JPEG conversion. See more details in https://en.wikipedia.org/wiki/YCbCr#JPEG_conversion.

Parameters
  • img (ndarray) – The input image. It accepts: 1. np.uint8 type with range [0, 255]; 2. np.float32 type with range [0, 1].

  • y_only (bool) – Whether to only return Y channel. Default: False.

Returns

The converted YCbCr image. The output image has the same type and range as input image.

Return type

ndarray

mmcv.image.solarize(img, thr=128)[source]

Solarize an image (invert all pixel values above a threshold)

Parameters
  • img (ndarray) – Image to be solarized.

  • thr (int) – Threshold for solarizing (0 - 255).

Returns

The solarized image.

Return type

ndarray

mmcv.image.tensor2imgs(tensor, mean=None, std=None, to_rgb=True)[source]

Convert tensor to 3-channel images or 1-channel gray images.

Parameters
  • tensor (torch.Tensor) – Tensor that contains multiple images, shape ( N, C, H, W). \(C\) can be either 3 or 1.

  • mean (tuple[float], optional) – Mean of images. If None, (0, 0, 0) will be used for tensor with 3-channel, while (0, ) for tensor with 1-channel. Defaults to None.

  • std (tuple[float], optional) – Standard deviation of images. If None, (1, 1, 1) will be used for tensor with 3-channel, while (1, ) for tensor with 1-channel. Defaults to None.

  • to_rgb (bool, optional) – Whether the tensor was converted to RGB format in the first place. If so, convert it back to BGR. For the tensor with 1 channel, it must be False. Defaults to True.

Returns

A list that contains multiple images.

Return type

list[np.ndarray]

mmcv.image.use_backend(backend)[source]

Select a backend for image decoding.

Parameters
  • backend (str) – The image decoding backend type. Options are cv2,

  • pillow – //github.com/lilohuang/PyTurboJPEG)

  • (see https (turbojpeg) – //github.com/lilohuang/PyTurboJPEG)

  • tifffile. turbojpeg is faster but it only supports .jpeg (and) –

  • format. (file) –

mmcv.image.ycbcr2bgr(img)[source]

Convert a YCbCr image to BGR image.

The bgr version of ycbcr2rgb. It implements the ITU-R BT.601 conversion for standard-definition television. See more details in https://en.wikipedia.org/wiki/YCbCr#ITU-R_BT.601_conversion.

It differs from a similar function in cv2.cvtColor: YCrCb <-> BGR. In OpenCV, it implements a JPEG conversion. See more details in https://en.wikipedia.org/wiki/YCbCr#JPEG_conversion.

Parameters

img (ndarray) – The input image. It accepts: 1. np.uint8 type with range [0, 255]; 2. np.float32 type with range [0, 1].

Returns

The converted BGR image. The output image has the same type and range as input image.

Return type

ndarray

mmcv.image.ycbcr2rgb(img)[source]

Convert a YCbCr image to RGB image.

This function produces the same results as Matlab’s ycbcr2rgb function. It implements the ITU-R BT.601 conversion for standard-definition television. See more details in https://en.wikipedia.org/wiki/YCbCr#ITU-R_BT.601_conversion.

It differs from a similar function in cv2.cvtColor: YCrCb <-> RGB. In OpenCV, it implements a JPEG conversion. See more details in https://en.wikipedia.org/wiki/YCbCr#JPEG_conversion.

Parameters

img (ndarray) – The input image. It accepts: 1. np.uint8 type with range [0, 255]; 2. np.float32 type with range [0, 1].

Returns

The converted RGB image. The output image has the same type and range as input image.

Return type

ndarray

video

class mmcv.video.VideoReader(filename, cache_capacity=10)[source]

Video class with similar usage to a list object.

This video warpper class provides convenient apis to access frames. There exists an issue of OpenCV’s VideoCapture class that jumping to a certain frame may be inaccurate. It is fixed in this class by checking the position after jumping each time. Cache is used when decoding videos. So if the same frame is visited for the second time, there is no need to decode again if it is stored in the cache.

Examples

>>> import mmcv
>>> v = mmcv.VideoReader('sample.mp4')
>>> len(v)  # get the total frame number with `len()`
120
>>> for img in v:  # v is iterable
>>>     mmcv.imshow(img)
>>> v[5]  # get the 6th frame
current_frame()[source]

Get the current frame (frame that is just visited).

Returns

If the video is fresh, return None, otherwise return the frame.

Return type

ndarray or None

cvt2frames(frame_dir, file_start=0, filename_tmpl='{:06d}.jpg', start=0, max_num=0, show_progress=True)[source]

Convert a video to frame images.

Parameters
  • frame_dir (str) – Output directory to store all the frame images.

  • file_start (int) – Filenames will start from the specified number.

  • filename_tmpl (str) – Filename template with the index as the placeholder.

  • start (int) – The starting frame index.

  • max_num (int) – Maximum number of frames to be written.

  • show_progress (bool) – Whether to show a progress bar.

property fourcc

“Four character code” of the video.

Type

str

property fps

FPS of the video.

Type

float

property frame_cnt

Total frames of the video.

Type

int

get_frame(frame_id)[source]

Get frame by index.

Parameters

frame_id (int) – Index of the expected frame, 0-based.

Returns

Return the frame if successful, otherwise None.

Return type

ndarray or None

property height

Height of video frames.

Type

int

property opened

Indicate whether the video is opened.

Type

bool

property position

Current cursor position, indicating frame decoded.

Type

int

read()[source]

Read the next frame.

If the next frame have been decoded before and in the cache, then return it directly, otherwise decode, cache and return it.

Returns

Return the frame if successful, otherwise None.

Return type

ndarray or None

property resolution

Video resolution (width, height).

Type

tuple

property vcap

The raw VideoCapture object.

Type

cv2.VideoCapture

property width

Width of video frames.

Type

int

mmcv.video.concat_video(video_list, out_file, vcodec=None, acodec=None, log_level='info', print_cmd=False)[source]

Concatenate multiple videos into a single one.

Parameters
  • video_list (list) – A list of video filenames

  • out_file (str) – Output video filename

  • vcodec (None or str) – Output video codec, None for unchanged

  • acodec (None or str) – Output audio codec, None for unchanged

  • log_level (str) – Logging level of ffmpeg.

  • print_cmd (bool) – Whether to print the final ffmpeg command.

mmcv.video.convert_video(in_file, out_file, print_cmd=False, pre_options='', **kwargs)[source]

Convert a video with ffmpeg.

This provides a general api to ffmpeg, the executed command is:

`ffmpeg -y <pre_options> -i <in_file> <options> <out_file>`

Options(kwargs) are mapped to ffmpeg commands with the following rules:

  • key=val: “-key val”

  • key=True: “-key”

  • key=False: “”

Parameters
  • in_file (str) – Input video filename.

  • out_file (str) – Output video filename.

  • pre_options (str) – Options appears before “-i <in_file>”.

  • print_cmd (bool) – Whether to print the final ffmpeg command.

mmcv.video.cut_video(in_file, out_file, start=None, end=None, vcodec=None, acodec=None, log_level='info', print_cmd=False)[source]

Cut a clip from a video.

Parameters
  • in_file (str) – Input video filename.

  • out_file (str) – Output video filename.

  • start (None or float) – Start time (in seconds).

  • end (None or float) – End time (in seconds).

  • vcodec (None or str) – Output video codec, None for unchanged.

  • acodec (None or str) – Output audio codec, None for unchanged.

  • log_level (str) – Logging level of ffmpeg.

  • print_cmd (bool) – Whether to print the final ffmpeg command.

mmcv.video.dequantize_flow(dx, dy, max_val=0.02, denorm=True)[source]

Recover from quantized flow.

Parameters
  • dx (ndarray) – Quantized dx.

  • dy (ndarray) – Quantized dy.

  • max_val (float) – Maximum value used when quantizing.

  • denorm (bool) – Whether to multiply flow values with width/height.

Returns

Dequantized flow.

Return type

ndarray

mmcv.video.flow_from_bytes(content)[source]

Read dense optical flow from bytes.

Note

This load optical flow function works for FlyingChairs, FlyingThings3D, Sintel, FlyingChairsOcc datasets, but cannot load the data from ChairsSDHom.

Parameters

content (bytes) – Optical flow bytes got from files or other streams.

Returns

Loaded optical flow with the shape (H, W, 2).

Return type

ndarray

mmcv.video.flow_warp(img, flow, filling_value=0, interpolate_mode='nearest')[source]

Use flow to warp img.

Parameters
  • img (ndarray, float or uint8) – Image to be warped.

  • flow (ndarray, float) – Optical Flow.

  • filling_value (int) – The missing pixels will be set with filling_value.

  • interpolate_mode (str) – bilinear -> Bilinear Interpolation; nearest -> Nearest Neighbor.

Returns

Warped image with the same shape of img

Return type

ndarray

mmcv.video.flowread(flow_or_path, quantize=False, concat_axis=0, *args, **kwargs)[source]

Read an optical flow map.

Parameters
  • flow_or_path (ndarray or str) – A flow map or filepath.

  • quantize (bool) – whether to read quantized pair, if set to True, remaining args will be passed to dequantize_flow().

  • concat_axis (int) – The axis that dx and dy are concatenated, can be either 0 or 1. Ignored if quantize is False.

Returns

Optical flow represented as a (h, w, 2) numpy array

Return type

ndarray

mmcv.video.flowwrite(flow, filename, quantize=False, concat_axis=0, *args, **kwargs)[source]

Write optical flow to file.

If the flow is not quantized, it will be saved as a .flo file losslessly, otherwise a jpeg image which is lossy but of much smaller size. (dx and dy will be concatenated horizontally into a single image if quantize is True.)

Parameters
  • flow (ndarray) – (h, w, 2) array of optical flow.

  • filename (str) – Output filepath.

  • quantize (bool) – Whether to quantize the flow and save it to 2 jpeg images. If set to True, remaining args will be passed to quantize_flow().

  • concat_axis (int) – The axis that dx and dy are concatenated, can be either 0 or 1. Ignored if quantize is False.

mmcv.video.frames2video(frame_dir, video_file, fps=30, fourcc='XVID', filename_tmpl='{:06d}.jpg', start=0, end=0, show_progress=True)[source]

Read the frame images from a directory and join them as a video.

Parameters
  • frame_dir (str) – The directory containing video frames.

  • video_file (str) – Output filename.

  • fps (float) – FPS of the output video.

  • fourcc (str) – Fourcc of the output video, this should be compatible with the output file type.

  • filename_tmpl (str) – Filename template with the index as the variable.

  • start (int) – Starting frame index.

  • end (int) – Ending frame index.

  • show_progress (bool) – Whether to show a progress bar.

mmcv.video.quantize_flow(flow, max_val=0.02, norm=True)[source]

Quantize flow to [0, 255].

After this step, the size of flow will be much smaller, and can be dumped as jpeg images.

Parameters
  • flow (ndarray) – (h, w, 2) array of optical flow.

  • max_val (float) – Maximum value of flow, values beyond [-max_val, max_val] will be truncated.

  • norm (bool) – Whether to divide flow values by image width/height.

Returns

Quantized dx and dy.

Return type

tuple[ndarray]

mmcv.video.resize_video(in_file, out_file, size=None, ratio=None, keep_ar=False, log_level='info', print_cmd=False)[source]

Resize a video.

Parameters
  • in_file (str) – Input video filename.

  • out_file (str) – Output video filename.

  • size (tuple) – Expected size (w, h), eg, (320, 240) or (320, -1).

  • ratio (tuple or float) – Expected resize ratio, (2, 0.5) means (w*2, h*0.5).

  • keep_ar (bool) – Whether to keep original aspect ratio.

  • log_level (str) – Logging level of ffmpeg.

  • print_cmd (bool) – Whether to print the final ffmpeg command.

mmcv.video.sparse_flow_from_bytes(content)[source]

Read the optical flow in KITTI datasets from bytes.

This function is modified from RAFT load the KITTI datasets.

Parameters

content (bytes) – Optical flow bytes got from files or other streams.

Returns

Loaded optical flow with the shape (H, W, 2) and flow valid mask with the shape (H, W).

Return type

Tuple(ndarray, ndarray)

arraymisc

mmcv.arraymisc.dequantize(arr, min_val, max_val, levels, dtype=<class 'numpy.float64'>)[source]

Dequantize an array.

Parameters
  • arr (ndarray) – Input array.

  • min_val (scalar) – Minimum value to be clipped.

  • max_val (scalar) – Maximum value to be clipped.

  • levels (int) – Quantization levels.

  • dtype (np.type) – The type of the dequantized array.

Returns

Dequantized array.

Return type

tuple

mmcv.arraymisc.quantize(arr, min_val, max_val, levels, dtype=<class 'numpy.int64'>)[source]

Quantize an array of (-inf, inf) to [0, levels-1].

Parameters
  • arr (ndarray) – Input array.

  • min_val (scalar) – Minimum value to be clipped.

  • max_val (scalar) – Maximum value to be clipped.

  • levels (int) – Quantization levels.

  • dtype (np.type) – The type of the quantized array.

Returns

Quantized array.

Return type

tuple

visualization

class mmcv.visualization.Color(value)[source]

An enum that defines common colors.

Contains red, green, blue, cyan, yellow, magenta, white and black.

mmcv.visualization.color_val(color)[source]

Convert various input to color tuples.

Parameters

color (Color/str/tuple/int/ndarray) – Color inputs

Returns

A tuple of 3 integers indicating BGR channels.

Return type

tuple[int]

mmcv.visualization.flow2rgb(flow, color_wheel=None, unknown_thr=1000000.0)[source]

Convert flow map to RGB image.

Parameters
  • flow (ndarray) – Array of optical flow.

  • color_wheel (ndarray or None) – Color wheel used to map flow field to RGB colorspace. Default color wheel will be used if not specified.

  • unknown_thr (str) – Values above this threshold will be marked as unknown and thus ignored.

Returns

RGB image that can be visualized.

Return type

ndarray

mmcv.visualization.flowshow(flow, win_name='', wait_time=0)[source]

Show optical flow.

Parameters
  • flow (ndarray or str) – The optical flow to be displayed.

  • win_name (str) – The window name.

  • wait_time (int) – Value of waitKey param.

mmcv.visualization.imshow(img, win_name='', wait_time=0)[source]

Show an image.

Parameters
  • img (str or ndarray) – The image to be displayed.

  • win_name (str) – The window name.

  • wait_time (int) – Value of waitKey param.

mmcv.visualization.imshow_bboxes(img, bboxes, colors='green', top_k=- 1, thickness=1, show=True, win_name='', wait_time=0, out_file=None)[source]

Draw bboxes on an image.

Parameters
  • img (str or ndarray) – The image to be displayed.

  • bboxes (list or ndarray) – A list of ndarray of shape (k, 4).

  • colors (list[str or tuple or Color]) – A list of colors.

  • top_k (int) – Plot the first k bboxes only if set positive.

  • thickness (int) – Thickness of lines.

  • show (bool) – Whether to show the image.

  • win_name (str) – The window name.

  • wait_time (int) – Value of waitKey param.

  • out_file (str, optional) – The filename to write the image.

Returns

The image with bboxes drawn on it.

Return type

ndarray

mmcv.visualization.imshow_det_bboxes(img, bboxes, labels, class_names=None, score_thr=0, bbox_color='green', text_color='green', thickness=1, font_scale=0.5, show=True, win_name='', wait_time=0, out_file=None)[source]

Draw bboxes and class labels (with scores) on an image.

Parameters
  • img (str or ndarray) – The image to be displayed.

  • bboxes (ndarray) – Bounding boxes (with scores), shaped (n, 4) or (n, 5).

  • labels (ndarray) – Labels of bboxes.

  • class_names (list[str]) – Names of each classes.

  • score_thr (float) – Minimum score of bboxes to be shown.

  • bbox_color (str or tuple or Color) – Color of bbox lines.

  • text_color (str or tuple or Color) – Color of texts.

  • thickness (int) – Thickness of lines.

  • font_scale (float) – Font scales of texts.

  • show (bool) – Whether to show the image.

  • win_name (str) – The window name.

  • wait_time (int) – Value of waitKey param.

  • out_file (str or None) – The filename to write the image.

Returns

The image with bboxes drawn on it.

Return type

ndarray

mmcv.visualization.make_color_wheel(bins=None)[source]

Build a color wheel.

Parameters

bins (list or tuple, optional) – Specify the number of bins for each color range, corresponding to six ranges: red -> yellow, yellow -> green, green -> cyan, cyan -> blue, blue -> magenta, magenta -> red. [15, 6, 4, 11, 13, 6] is used for default (see Middlebury).

Returns

Color wheel of shape (total_bins, 3).

Return type

ndarray

utils

class mmcv.utils.BuildExtension(*args, **kwargs)[source]

A custom setuptools build extension .

This setuptools.build_ext subclass takes care of passing the minimum required compiler flags (e.g. -std=c++14) as well as mixed C++/CUDA compilation (and support for CUDA files in general).

When using BuildExtension, it is allowed to supply a dictionary for extra_compile_args (rather than the usual list) that maps from languages (cxx or nvcc) to a list of additional compiler flags to supply to the compiler. This makes it possible to supply different flags to the C++ and CUDA compiler during mixed compilation.

use_ninja (bool): If use_ninja is True (default), then we attempt to build using the Ninja backend. Ninja greatly speeds up compilation compared to the standard setuptools.build_ext. Fallbacks to the standard distutils backend if Ninja is not available.

Note

By default, the Ninja backend uses #CPUS + 2 workers to build the extension. This may use up too many resources on some systems. One can control the number of workers by setting the MAX_JOBS environment variable to a non-negative number.

finalize_options()None[source]

Set final values for all the options that this command supports. This is always called as late as possible, ie. after any option assignments from the command-line or from other commands have been done. Thus, this is the place to code option dependencies: if ‘foo’ depends on ‘bar’, then it is safe to set ‘foo’ from ‘bar’ as long as ‘foo’ still has the same value it was assigned in ‘initialize_options()’.

This method must be implemented by all command classes.

get_ext_filename(ext_name)[source]

Convert the name of an extension (eg. “foo.bar”) into the name of the file from which it will be loaded (eg. “foo/bar.so”, or “foobar.pyd”).

classmethod with_options(**options)[source]

Returns a subclass with alternative constructor that extends any original keyword arguments to the original constructor with the given options.

mmcv.utils.CUDAExtension(name, sources, *args, **kwargs)[source]

Creates a setuptools.Extension for CUDA/C++.

Convenience method that creates a setuptools.Extension with the bare minimum (but often sufficient) arguments to build a CUDA/C++ extension. This includes the CUDA include path, library path and runtime library.

All arguments are forwarded to the setuptools.Extension constructor.

Example

>>> from setuptools import setup
>>> from torch.utils.cpp_extension import BuildExtension, CUDAExtension
>>> setup(
        name='cuda_extension',
        ext_modules=[
            CUDAExtension(
                    name='cuda_extension',
                    sources=['extension.cpp', 'extension_kernel.cu'],
                    extra_compile_args={'cxx': ['-g'],
                                        'nvcc': ['-O2']})
        ],
        cmdclass={
            'build_ext': BuildExtension
        })

Compute capabilities:

By default the extension will be compiled to run on all archs of the cards visible during the building process of the extension, plus PTX. If down the road a new card is installed the extension may need to be recompiled. If a visible card has a compute capability (CC) that’s newer than the newest version for which your nvcc can build fully-compiled binaries, Pytorch will make nvcc fall back to building kernels with the newest version of PTX your nvcc does support (see below for details on PTX).

You can override the default behavior using TORCH_CUDA_ARCH_LIST to explicitly specify which CCs you want the extension to support:

TORCH_CUDA_ARCH_LIST=”6.1 8.6” python build_my_extension.py TORCH_CUDA_ARCH_LIST=”5.2 6.0 6.1 7.0 7.5 8.0 8.6+PTX” python build_my_extension.py

The +PTX option causes extension kernel binaries to include PTX instructions for the specified CC. PTX is an intermediate representation that allows kernels to runtime-compile for any CC >= the specified CC (for example, 8.6+PTX generates PTX that can runtime-compile for any GPU with CC >= 8.6). This improves your binary’s forward compatibility. However, relying on older PTX to provide forward compat by runtime-compiling for newer CCs can modestly reduce performance on those newer CCs. If you know exact CC(s) of the GPUs you want to target, you’re always better off specifying them individually. For example, if you want your extension to run on 8.0 and 8.6, “8.0+PTX” would work functionally because it includes PTX that can runtime-compile for 8.6, but “8.0 8.6” would be better.

Note that while it’s possible to include all supported archs, the more archs get included the slower the building process will be, as it will build a separate kernel image for each arch.

Note that CUDA-11.5 nvcc will hit internal compiler error while parsing torch/extension.h on Windows. To workaround the issue, move python binding logic to pure C++ file.

Example use:
>>> #include <ATen/ATen.h>
>>> at::Tensor SigmoidAlphaBlendForwardCuda(....)
Instead of:
>>> #include <torch/extension.h>
>>> torch::Tensor SigmoidAlphaBlendForwardCuda(...)

Currently open issue for nvcc bug: https://github.com/pytorch/pytorch/issues/69460 Complete workaround code example: https://github.com/facebookresearch/pytorch3d/commit/cb170ac024a949f1f9614ffe6af1c38d972f7d48

class mmcv.utils.Config(cfg_dict=None, cfg_text=None, filename=None)[source]

A facility for config and config files.

It supports common file formats as configs: python/json/yaml. The interface is the same as a dict object and also allows access config values as attributes.

Example

>>> cfg = Config(dict(a=1, b=dict(b1=[0, 1])))
>>> cfg.a
1
>>> cfg.b
{'b1': [0, 1]}
>>> cfg.b.b1
[0, 1]
>>> cfg = Config.fromfile('tests/data/config/a.py')
>>> cfg.filename
"/home/kchen/projects/mmcv/tests/data/config/a.py"
>>> cfg.item4
'test'
>>> cfg
"Config [path: /home/kchen/projects/mmcv/tests/data/config/a.py]: "
"{'item1': [1, 2], 'item2': {'a': 0}, 'item3': True, 'item4': 'test'}"
static auto_argparser(description=None)[source]

Generate argparser from config file automatically (experimental)

static fromstring(cfg_str, file_format)[source]

Generate config from config str.

Parameters
  • cfg_str (str) – Config str.

  • file_format (str) – Config file format corresponding to the config str. Only py/yml/yaml/json type are supported now!

Returns

Config obj.

Return type

Config

merge_from_dict(options, allow_list_keys=True)[source]

Merge list into cfg_dict.

Merge the dict parsed by MultipleKVAction into this cfg.

Examples

>>> options = {'model.backbone.depth': 50,
...            'model.backbone.with_cp':True}
>>> cfg = Config(dict(model=dict(backbone=dict(type='ResNet'))))
>>> cfg.merge_from_dict(options)
>>> cfg_dict = super(Config, self).__getattribute__('_cfg_dict')
>>> assert cfg_dict == dict(
...     model=dict(backbone=dict(depth=50, with_cp=True)))
>>> # Merge list element
>>> cfg = Config(dict(pipeline=[
...     dict(type='LoadImage'), dict(type='LoadAnnotations')]))
>>> options = dict(pipeline={'0': dict(type='SelfLoadImage')})
>>> cfg.merge_from_dict(options, allow_list_keys=True)
>>> cfg_dict = super(Config, self).__getattribute__('_cfg_dict')
>>> assert cfg_dict == dict(pipeline=[
...     dict(type='SelfLoadImage'), dict(type='LoadAnnotations')])
Parameters
  • options (dict) – dict of configs to merge from.

  • allow_list_keys (bool) – If True, int string keys (e.g. ‘0’, ‘1’) are allowed in options and will replace the element of the corresponding index in the config if the config is a list. Default: True.

class mmcv.utils.ConfigDict(*args, **kwargs)[source]
mmcv.utils.CppExtension(name, sources, *args, **kwargs)[source]

Creates a setuptools.Extension for C++.

Convenience method that creates a setuptools.Extension with the bare minimum (but often sufficient) arguments to build a C++ extension.

All arguments are forwarded to the setuptools.Extension constructor.

Example

>>> from setuptools import setup
>>> from torch.utils.cpp_extension import BuildExtension, CppExtension
>>> setup(
        name='extension',
        ext_modules=[
            CppExtension(
                name='extension',
                sources=['extension.cpp'],
                extra_compile_args=['-g']),
        ],
        cmdclass={
            'build_ext': BuildExtension
        })
class mmcv.utils.DataLoader(dataset: torch.utils.data.dataset.Dataset[torch.utils.data.dataloader.T_co], batch_size: Optional[int] = 1, shuffle: bool = False, sampler: Optional[Union[torch.utils.data.sampler.Sampler, Iterable]] = None, batch_sampler: Optional[Union[torch.utils.data.sampler.Sampler[Sequence], Iterable[Sequence]]] = None, num_workers: int = 0, collate_fn: Optional[Callable[[List[torch.utils.data.dataloader.T]], Any]] = None, pin_memory: bool = False, drop_last: bool = False, timeout: float = 0, worker_init_fn: Optional[Callable[[int], None]] = None, multiprocessing_context=None, generator=None, *, prefetch_factor: int = 2, persistent_workers: bool = False)[source]

Data loader. Combines a dataset and a sampler, and provides an iterable over the given dataset.

The DataLoader supports both map-style and iterable-style datasets with single- or multi-process loading, customizing loading order and optional automatic batching (collation) and memory pinning.

See torch.utils.data documentation page for more details.

Parameters
  • dataset (Dataset) – dataset from which to load the data.

  • batch_size (int, optional) – how many samples per batch to load (default: 1).

  • shuffle (bool, optional) – set to True to have the data reshuffled at every epoch (default: False).

  • sampler (Sampler or Iterable, optional) – defines the strategy to draw samples from the dataset. Can be any Iterable with __len__ implemented. If specified, shuffle must not be specified.

  • batch_sampler (Sampler or Iterable, optional) – like sampler, but returns a batch of indices at a time. Mutually exclusive with batch_size, shuffle, sampler, and drop_last.

  • num_workers (int, optional) – how many subprocesses to use for data loading. 0 means that the data will be loaded in the main process. (default: 0)

  • collate_fn (callable, optional) – merges a list of samples to form a mini-batch of Tensor(s). Used when using batched loading from a map-style dataset.

  • pin_memory (bool, optional) – If True, the data loader will copy Tensors into CUDA pinned memory before returning them. If your data elements are a custom type, or your collate_fn returns a batch that is a custom type, see the example below.

  • drop_last (bool, optional) – set to True to drop the last incomplete batch, if the dataset size is not divisible by the batch size. If False and the size of dataset is not divisible by the batch size, then the last batch will be smaller. (default: False)

  • timeout (numeric, optional) – if positive, the timeout value for collecting a batch from workers. Should always be non-negative. (default: 0)

  • worker_init_fn (callable, optional) – If not None, this will be called on each worker subprocess with the worker id (an int in [0, num_workers - 1]) as input, after seeding and before data loading. (default: None)

  • generator (torch.Generator, optional) – If not None, this RNG will be used by RandomSampler to generate random indexes and multiprocessing to generate base_seed for workers. (default: None)

  • prefetch_factor (int, optional, keyword-only arg) – Number of samples loaded in advance by each worker. 2 means there will be a total of 2 * num_workers samples prefetched across all workers. (default: 2)

  • persistent_workers (bool, optional) – If True, the data loader will not shutdown the worker processes after a dataset has been consumed once. This allows to maintain the workers Dataset instances alive. (default: False)

Warning

If the spawn start method is used, worker_init_fn cannot be an unpicklable object, e.g., a lambda function. See multiprocessing-best-practices on more details related to multiprocessing in PyTorch.

Warning

len(dataloader) heuristic is based on the length of the sampler used. When dataset is an IterableDataset, it instead returns an estimate based on len(dataset) / batch_size, with proper rounding depending on drop_last, regardless of multi-process loading configurations. This represents the best guess PyTorch can make because PyTorch trusts user dataset code in correctly handling multi-process loading to avoid duplicate data.

However, if sharding results in multiple workers having incomplete last batches, this estimate can still be inaccurate, because (1) an otherwise complete batch can be broken into multiple ones and (2) more than one batch worth of samples can be dropped when drop_last is set. Unfortunately, PyTorch can not detect such cases in general.

See `Dataset Types`_ for more details on these two types of datasets and how IterableDataset interacts with `Multi-process data loading`_.

Warning

See reproducibility, and dataloader-workers-random-seed, and data-loading-randomness notes for random seed related questions.

class mmcv.utils.DictAction(option_strings, dest, nargs=None, const=None, default=None, type=None, choices=None, required=False, help=None, metavar=None)[source]

argparse action to split an argument into KEY=VALUE form on the first = and append to a dictionary. List options can be passed as comma separated values, i.e ‘KEY=V1,V2,V3’, or with explicit brackets, i.e. ‘KEY=[V1,V2,V3]’. It also support nested brackets to build list/tuple values. e.g. ‘KEY=[(V1,V2),(V3,V4)]’

mmcv.utils.PoolDataLoader

alias of torch.utils.data.dataloader.DataLoader

class mmcv.utils.ProgressBar(task_num=0, bar_width=50, start=True, file=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>)[source]

A progress bar which can print the progress.

class mmcv.utils.Registry(name, build_func=None, parent=None, scope=None)[source]

A registry to map strings to classes.

Registered object could be built from registry.

Example

>>> MODELS = Registry('models')
>>> @MODELS.register_module()
>>> class ResNet:
>>>     pass
>>> resnet = MODELS.build(dict(type='ResNet'))

Please refer to https://mmcv.readthedocs.io/en/latest/understand_mmcv/registry.html for advanced usage.

Parameters
  • name (str) – Registry name.

  • build_func (func, optional) – Build function to construct instance from Registry, func:build_from_cfg is used if neither parent or build_func is specified. If parent is specified and build_func is not given, build_func will be inherited from parent. Default: None.

  • parent (Registry, optional) – Parent registry. The class registered in children registry could be built from parent. Default: None.

  • scope (str, optional) – The scope of registry. It is the key to search for children registry. If not specified, scope will be the name of the package where class is defined, e.g. mmdet, mmcls, mmseg. Default: None.

get(key)[source]

Get the registry record.

Parameters

key (str) – The class name in string format.

Returns

The corresponding class.

Return type

class

static infer_scope()[source]

Infer the scope of registry.

The name of the package where registry is defined will be returned.

Example

>>> # in mmdet/models/backbone/resnet.py
>>> MODELS = Registry('models')
>>> @MODELS.register_module()
>>> class ResNet:
>>>     pass
The scope of ``ResNet`` will be ``mmdet``.
Returns

The inferred scope name.

Return type

str

register_module(name=None, force=False, module=None)[source]

Register a module.

A record will be added to self._module_dict, whose key is the class name or the specified name, and value is the class itself. It can be used as a decorator or a normal function.

Example

>>> backbones = Registry('backbone')
>>> @backbones.register_module()
>>> class ResNet:
>>>     pass
>>> backbones = Registry('backbone')
>>> @backbones.register_module(name='mnet')
>>> class MobileNet:
>>>     pass
>>> backbones = Registry('backbone')
>>> class ResNet:
>>>     pass
>>> backbones.register_module(ResNet)
Parameters
  • name (str | None) – The module name to be registered. If not specified, the class name will be used.

  • force (bool, optional) – Whether to override an existing class with the same name. Default: False.

  • module (type) – Module class to be registered.

static split_scope_key(key)[source]

Split scope and key.

The first scope will be split from key.

Examples

>>> Registry.split_scope_key('mmdet.ResNet')
'mmdet', 'ResNet'
>>> Registry.split_scope_key('ResNet')
None, 'ResNet'
Returns

The former element is the first scope of the key, which can be None. The latter is the remaining key.

Return type

tuple[str | None, str]

class mmcv.utils.SyncBatchNorm(num_features: int, eps: float = 1e-05, momentum: float = 0.1, affine: bool = True, track_running_stats: bool = True, process_group: Optional[Any] = None, device=None, dtype=None)[source]
class mmcv.utils.Timer(start=True, print_tmpl=None)[source]

A flexible Timer class.

Examples

>>> import time
>>> import mmcv
>>> with mmcv.Timer():
>>>     # simulate a code block that will run for 1s
>>>     time.sleep(1)
1.000
>>> with mmcv.Timer(print_tmpl='it takes {:.1f} seconds'):
>>>     # simulate a code block that will run for 1s
>>>     time.sleep(1)
it takes 1.0 seconds
>>> timer = mmcv.Timer()
>>> time.sleep(0.5)
>>> print(timer.since_start())
0.500
>>> time.sleep(0.5)
>>> print(timer.since_last_check())
0.500
>>> print(timer.since_start())
1.000
property is_running

indicate whether the timer is running

Type

bool

since_last_check()[source]

Time since the last checking.

Either since_start() or since_last_check() is a checking operation.

Returns

Time in seconds.

Return type

float

since_start()[source]

Total time since the timer is started.

Returns

Time in seconds.

Return type

float

start()[source]

Start the timer.

exception mmcv.utils.TimerError(message)[source]
mmcv.utils.assert_attrs_equal(obj: Any, expected_attrs: Dict[str, Any])bool[source]

Check if attribute of class object is correct.

Parameters
  • obj (object) – Class object to be checked.

  • expected_attrs (Dict[str, Any]) – Dict of the expected attrs.

Returns

Whether the attribute of class object is correct.

Return type

bool

mmcv.utils.assert_dict_contains_subset(dict_obj: Dict[Any, Any], expected_subset: Dict[Any, Any])bool[source]

Check if the dict_obj contains the expected_subset.

Parameters
  • dict_obj (Dict[Any, Any]) – Dict object to be checked.

  • expected_subset (Dict[Any, Any]) – Subset expected to be contained in dict_obj.

Returns

Whether the dict_obj contains the expected_subset.

Return type

bool

mmcv.utils.assert_dict_has_keys(obj: Dict[str, Any], expected_keys: List[str])bool[source]

Check if the obj has all the expected_keys.

Parameters
  • obj (Dict[str, Any]) – Object to be checked.

  • expected_keys (List[str]) – Keys expected to contained in the keys of the obj.

Returns

Whether the obj has the expected keys.

Return type

bool

mmcv.utils.assert_is_norm_layer(module)bool[source]

Check if the module is a norm layer.

Parameters

module (nn.Module) – The module to be checked.

Returns

Whether the module is a norm layer.

Return type

bool

mmcv.utils.assert_keys_equal(result_keys: List[str], target_keys: List[str])bool[source]

Check if target_keys is equal to result_keys.

Parameters
  • result_keys (List[str]) – Result keys to be checked.

  • target_keys (List[str]) – Target keys to be checked.

Returns

Whether target_keys is equal to result_keys.

Return type

bool

mmcv.utils.assert_params_all_zeros(module)bool[source]

Check if the parameters of the module is all zeros.

Parameters

module (nn.Module) – The module to be checked.

Returns

Whether the parameters of the module is all zeros.

Return type

bool

mmcv.utils.build_from_cfg(cfg, registry, default_args=None)[source]

Build a module from config dict.

Parameters
  • cfg (dict) – Config dict. It should at least contain the key “type”.

  • registry (Registry) – The registry to search the type from.

  • default_args (dict, optional) – Default initialization arguments.

Returns

The constructed object.

Return type

object

mmcv.utils.check_prerequisites(prerequisites, checker, msg_tmpl='Prerequisites "{}" are required in method "{}" but not found, please install them first.')[source]

A decorator factory to check if prerequisites are satisfied.

Parameters
  • prerequisites (str of list[str]) – Prerequisites to be checked.

  • checker (callable) – The checker method that returns True if a prerequisite is meet, False otherwise.

  • msg_tmpl (str) – The message template with two variables.

Returns

A specific decorator.

Return type

decorator

mmcv.utils.check_python_script(cmd)[source]

Run the python cmd script with __main__. The difference between os.system is that, this function exectues code in the current process, so that it can be tracked by coverage tools. Currently it supports two forms:

  • ./tests/data/scripts/hello.py zz

  • python tests/data/scripts/hello.py zz

mmcv.utils.check_time(timer_id)[source]

Add check points in a single line.

This method is suitable for running a task on a list of items. A timer will be registered when the method is called for the first time.

Examples

>>> import time
>>> import mmcv
>>> for i in range(1, 6):
>>>     # simulate a code block
>>>     time.sleep(i)
>>>     mmcv.check_time('task1')
2.000
3.000
4.000
5.000
Parameters

str – Timer identifier.

mmcv.utils.collect_env()[source]

Collect the information of the running environments.

Returns

The environment information. The following fields are contained.

  • sys.platform: The variable of sys.platform.

  • Python: Python version.

  • CUDA available: Bool, indicating if CUDA is available.

  • GPU devices: Device type of each GPU.

  • CUDA_HOME (optional): The env var CUDA_HOME.

  • NVCC (optional): NVCC version.

  • GCC: GCC version, “n/a” if GCC is not installed.

  • PyTorch: PyTorch version.

  • PyTorch compiling details: The output of torch.__config__.show().

  • TorchVision (optional): TorchVision version.

  • OpenCV: OpenCV version.

  • MMCV: MMCV version.

  • MMCV Compiler: The GCC version for compiling MMCV ops.

  • MMCV CUDA Compiler: The CUDA version for compiling MMCV ops.

Return type

dict

mmcv.utils.concat_list(in_list)[source]

Concatenate a list of list into a single list.

Parameters

in_list (list) – The list of list to be merged.

Returns

The concatenated flat list.

Return type

list

mmcv.utils.deprecated_api_warning(name_dict, cls_name=None)[source]

A decorator to check if some arguments are deprecate and try to replace deprecate src_arg_name to dst_arg_name.

Parameters

name_dict (dict) – key (str): Deprecate argument names. val (str): Expected argument names.

Returns

New function.

Return type

func

mmcv.utils.digit_version(version_str: str, length: int = 4)[source]

Convert a version string into a tuple of integers.

This method is usually used for comparing two versions. For pre-release versions: alpha < beta < rc.

Parameters
  • version_str (str) – The version string.

  • length (int) – The maximum number of version levels. Default: 4.

Returns

The version info in digits (integers).

Return type

tuple[int]

mmcv.utils.get_git_hash(fallback='unknown', digits=None)[source]

Get the git hash of the current repo.

Parameters
  • fallback (str, optional) – The fallback string when git hash is unavailable. Defaults to ‘unknown’.

  • digits (int, optional) – kept digits of the hash. Defaults to None, meaning all digits are kept.

Returns

Git commit hash.

Return type

str

mmcv.utils.get_logger(name, log_file=None, log_level=20, file_mode='w')[source]

Initialize and get a logger by name.

If the logger has not been initialized, this method will initialize the logger by adding one or two handlers, otherwise the initialized logger will be directly returned. During initialization, a StreamHandler will always be added. If log_file is specified and the process rank is 0, a FileHandler will also be added.

Parameters
  • name (str) – Logger name.

  • log_file (str | None) – The log filename. If specified, a FileHandler will be added to the logger.

  • log_level (int) – The logger level. Note that only the process of rank 0 is affected, and other processes will set the level to “Error” thus be silent most of the time.

  • file_mode (str) – The file mode used in opening log file. Defaults to ‘w’.

Returns

The expected logger.

Return type

logging.Logger

mmcv.utils.has_method(obj: object, method: str)bool[source]

Check whether the object has a method.

Parameters
  • method (str) – The method name to check.

  • obj (object) – The object to check.

Returns

True if the object has the method else False.

Return type

bool

mmcv.utils.import_modules_from_strings(imports, allow_failed_imports=False)[source]

Import modules from the given list of strings.

Parameters
  • imports (list | str | None) – The given module names to be imported.

  • allow_failed_imports (bool) – If True, the failed imports will return None. Otherwise, an ImportError is raise. Default: False.

Returns

The imported modules.

Return type

list[module] | module | None

Examples

>>> osp, sys = import_modules_from_strings(
...     ['os.path', 'sys'])
>>> import os.path as osp_
>>> import sys as sys_
>>> assert osp == osp_
>>> assert sys == sys_
mmcv.utils.is_list_of(seq, expected_type)[source]

Check whether it is a list of some type.

A partial method of is_seq_of().

mmcv.utils.is_method_overridden(method, base_class, derived_class)[source]

Check if a method of base class is overridden in derived class.

Parameters
  • method (str) – the method name to check.

  • base_class (type) – the class of the base class.

  • derived_class (type | Any) – the class or instance of the derived class.

mmcv.utils.is_seq_of(seq, expected_type, seq_type=None)[source]

Check whether it is a sequence of some type.

Parameters
  • seq (Sequence) – The sequence to be checked.

  • expected_type (type) – Expected type of sequence items.

  • seq_type (type, optional) – Expected sequence type.

Returns

Whether the sequence is valid.

Return type

bool

mmcv.utils.is_str(x)[source]

Whether the input is an string instance.

Note: This method is deprecated since python 2 is no longer supported.

mmcv.utils.is_tuple_of(seq, expected_type)[source]

Check whether it is a tuple of some type.

A partial method of is_seq_of().

mmcv.utils.iter_cast(inputs, dst_type, return_type=None)[source]

Cast elements of an iterable object into some type.

Parameters
  • inputs (Iterable) – The input object.

  • dst_type (type) – Destination type.

  • return_type (type, optional) – If specified, the output object will be converted to this type, otherwise an iterator.

Returns

The converted object.

Return type

iterator or specified type

mmcv.utils.list_cast(inputs, dst_type)[source]

Cast elements of an iterable object into a list of some type.

A partial method of iter_cast().

mmcv.utils.load_url(url, model_dir=None, map_location=None, progress=True, check_hash=False, file_name=None)

Loads the Torch serialized object at the given URL.

If downloaded file is a zip file, it will be automatically decompressed.

If the object is already present in model_dir, it’s deserialized and returned. The default value of model_dir is <hub_dir>/checkpoints where hub_dir is the directory returned by get_dir().

Parameters
  • url (string) – URL of the object to download

  • model_dir (string, optional) – directory in which to save the object

  • map_location (optional) – a function or a dict specifying how to remap storage locations (see torch.load)

  • progress (bool, optional) – whether or not to display a progress bar to stderr. Default: True

  • check_hash (bool, optional) – If True, the filename part of the URL should follow the naming convention filename-<sha256>.ext where <sha256> is the first eight or more digits of the SHA256 hash of the contents of the file. The hash is used to ensure unique names and to verify the contents of the file. Default: False

  • file_name (string, optional) – name for the downloaded file. Filename from url will be used if not set.

Example

>>> state_dict = torch.hub.load_state_dict_from_url('https://s3.amazonaws.com/pytorch/models/resnet18-5c106cde.pth')
mmcv.utils.print_log(msg, logger=None, level=20)[source]

Print a log message.

Parameters
  • msg (str) – The message to be logged.

  • logger (logging.Logger | str | None) – The logger to be used. Some special loggers are: - “silent”: no message will be printed. - other str: the logger obtained with get_root_logger(logger). - None: The print() method will be used to print log messages.

  • level (int) – Logging level. Only available when logger is a Logger object or “root”.

mmcv.utils.requires_executable(prerequisites)[source]

A decorator to check if some executable files are installed.

Example

>>> @requires_executable('ffmpeg')
>>> func(arg1, args):
>>>     print(1)
1
mmcv.utils.requires_package(prerequisites)[source]

A decorator to check if some python packages are installed.

Example

>>> @requires_package('numpy')
>>> func(arg1, args):
>>>     return numpy.zeros(1)
array([0.])
>>> @requires_package(['numpy', 'non_package'])
>>> func(arg1, args):
>>>     return numpy.zeros(1)
ImportError
mmcv.utils.scandir(dir_path, suffix=None, recursive=False, case_sensitive=True)[source]

Scan a directory to find the interested files.

Parameters
  • dir_path (str | Path) – Path of the directory.

  • suffix (str | tuple(str), optional) – File suffix that we are interested in. Default: None.

  • recursive (bool, optional) – If set to True, recursively scan the directory. Default: False.

  • case_sensitive (bool, optional) – If set to False, ignore the case of suffix. Default: True.

Returns

A generator for all the interested files with relative paths.

mmcv.utils.slice_list(in_list, lens)[source]

Slice a list into several sub lists by a list of given length.

Parameters
  • in_list (list) – The list to be sliced.

  • lens (int or list) – The expected length of each out list.

Returns

A list of sliced list.

Return type

list

mmcv.utils.track_iter_progress(tasks, bar_width=50, file=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>)[source]

Track the progress of tasks iteration or enumeration with a progress bar.

Tasks are yielded with a simple for-loop.

Parameters
  • tasks (list or tuple[Iterable, int]) – A list of tasks or (tasks, total num).

  • bar_width (int) – Width of progress bar.

Yields

list – The task results.

mmcv.utils.track_parallel_progress(func, tasks, nproc, initializer=None, initargs=None, bar_width=50, chunksize=1, skip_first=False, keep_order=True, file=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>)[source]

Track the progress of parallel task execution with a progress bar.

The built-in multiprocessing module is used for process pools and tasks are done with Pool.map() or Pool.imap_unordered().

Parameters
  • func (callable) – The function to be applied to each task.

  • tasks (list or tuple[Iterable, int]) – A list of tasks or (tasks, total num).

  • nproc (int) – Process (worker) number.

  • initializer (None or callable) – Refer to multiprocessing.Pool for details.

  • initargs (None or tuple) – Refer to multiprocessing.Pool for details.

  • chunksize (int) – Refer to multiprocessing.Pool for details.

  • bar_width (int) – Width of progress bar.

  • skip_first (bool) – Whether to skip the first sample for each worker when estimating fps, since the initialization step may takes longer.

  • keep_order (bool) – If True, Pool.imap() is used, otherwise Pool.imap_unordered() is used.

Returns

The task results.

Return type

list

mmcv.utils.track_progress(func, tasks, bar_width=50, file=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>, **kwargs)[source]

Track the progress of tasks execution with a progress bar.

Tasks are done with a simple for-loop.

Parameters
  • func (callable) – The function to be applied to each task.

  • tasks (list or tuple[Iterable, int]) – A list of tasks or (tasks, total num).

  • bar_width (int) – Width of progress bar.

Returns

The task results.

Return type

list

mmcv.utils.tuple_cast(inputs, dst_type)[source]

Cast elements of an iterable object into a tuple of some type.

A partial method of iter_cast().

cnn

class mmcv.cnn.AlexNet(num_classes=- 1)[source]

AlexNet backbone.

Parameters

num_classes (int) – number of classes for classification.

forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.cnn.Caffe2XavierInit(**kwargs)[source]
class mmcv.cnn.ConstantInit(val, **kwargs)[source]

Initialize module parameters with constant values.

Parameters
  • val (int | float) – the value to fill the weights in the module with

  • bias (int | float) – the value to fill the bias. Defaults to 0.

  • bias_prob (float, optional) – the probability for bias initialization. Defaults to None.

  • layer (str | list[str], optional) – the layer will be initialized. Defaults to None.

class mmcv.cnn.ContextBlock(in_channels, ratio, pooling_type='att', fusion_types=('channel_add'))[source]

ContextBlock module in GCNet.

See ‘GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond’ (https://arxiv.org/abs/1904.11492) for details.

Parameters
  • in_channels (int) – Channels of the input feature map.

  • ratio (float) – Ratio of channels of transform bottleneck

  • pooling_type (str) – Pooling method for context modeling. Options are ‘att’ and ‘avg’, stand for attention pooling and average pooling respectively. Default: ‘att’.

  • fusion_types (Sequence[str]) – Fusion method for feature fusion, Options are ‘channels_add’, ‘channel_mul’, stand for channelwise addition and multiplication respectively. Default: (‘channel_add’,)

forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.cnn.Conv2d(in_channels: int, out_channels: int, kernel_size: Union[int, Tuple[int, int]], stride: Union[int, Tuple[int, int]] = 1, padding: Union[str, int, Tuple[int, int]] = 0, dilation: Union[int, Tuple[int, int]] = 1, groups: int = 1, bias: bool = True, padding_mode: str = 'zeros', device=None, dtype=None)[source]
forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.cnn.Conv3d(in_channels: int, out_channels: int, kernel_size: Union[int, Tuple[int, int, int]], stride: Union[int, Tuple[int, int, int]] = 1, padding: Union[str, int, Tuple[int, int, int]] = 0, dilation: Union[int, Tuple[int, int, int]] = 1, groups: int = 1, bias: bool = True, padding_mode: str = 'zeros', device=None, dtype=None)[source]
forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.cnn.ConvAWS2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True)[source]

AWS (Adaptive Weight Standardization)

This is a variant of Weight Standardization (https://arxiv.org/pdf/1903.10520.pdf) It is used in DetectoRS to avoid NaN (https://arxiv.org/pdf/2006.02334.pdf)

Parameters
  • in_channels (int) – Number of channels in the input image

  • out_channels (int) – Number of channels produced by the convolution

  • kernel_size (int or tuple) – Size of the conv kernel

  • stride (int or tuple, optional) – Stride of the convolution. Default: 1

  • padding (int or tuple, optional) – Zero-padding added to both sides of the input. Default: 0

  • dilation (int or tuple, optional) – Spacing between kernel elements. Default: 1

  • groups (int, optional) – Number of blocked connections from input channels to output channels. Default: 1

  • bias (bool, optional) – If set True, adds a learnable bias to the output. Default: True

forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.cnn.ConvModule(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias='auto', conv_cfg=None, norm_cfg=None, act_cfg={'type': 'ReLU'}, inplace=True, with_spectral_norm=False, padding_mode='zeros', order=('conv', 'norm', 'act'))[source]

A conv block that bundles conv/norm/activation layers.

This block simplifies the usage of convolution layers, which are commonly used with a norm layer (e.g., BatchNorm) and activation layer (e.g., ReLU). It is based upon three build methods: build_conv_layer(), build_norm_layer() and build_activation_layer().

Besides, we add some additional features in this module. 1. Automatically set bias of the conv layer. 2. Spectral norm is supported. 3. More padding modes are supported. Before PyTorch 1.5, nn.Conv2d only supports zero and circular padding, and we add “reflect” padding mode.

Parameters
  • in_channels (int) – Number of channels in the input feature map. Same as that in nn._ConvNd.

  • out_channels (int) – Number of channels produced by the convolution. Same as that in nn._ConvNd.

  • kernel_size (int | tuple[int]) – Size of the convolving kernel. Same as that in nn._ConvNd.

  • stride (int | tuple[int]) – Stride of the convolution. Same as that in nn._ConvNd.

  • padding (int | tuple[int]) – Zero-padding added to both sides of the input. Same as that in nn._ConvNd.

  • dilation (int | tuple[int]) – Spacing between kernel elements. Same as that in nn._ConvNd.

  • groups (int) – Number of blocked connections from input channels to output channels. Same as that in nn._ConvNd.

  • bias (bool | str) – If specified as auto, it will be decided by the norm_cfg. Bias will be set as True if norm_cfg is None, otherwise False. Default: “auto”.

  • conv_cfg (dict) – Config dict for convolution layer. Default: None, which means using conv2d.

  • norm_cfg (dict) – Config dict for normalization layer. Default: None.

  • act_cfg (dict) – Config dict for activation layer. Default: dict(type=’ReLU’).

  • inplace (bool) – Whether to use inplace mode for activation. Default: True.

  • with_spectral_norm (bool) – Whether use spectral norm in conv module. Default: False.

  • padding_mode (str) – If the padding_mode has not been supported by current Conv2d in PyTorch, we will use our own padding layer instead. Currently, we support [‘zeros’, ‘circular’] with official implementation and [‘reflect’] with our own implementation. Default: ‘zeros’.

  • order (tuple[str]) – The order of conv/norm/activation layers. It is a sequence of “conv”, “norm” and “act”. Common examples are (“conv”, “norm”, “act”) and (“act”, “conv”, “norm”). Default: (‘conv’, ‘norm’, ‘act’).

forward(x, activate=True, norm=True)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.cnn.ConvTranspose2d(in_channels: int, out_channels: int, kernel_size: Union[int, Tuple[int, int]], stride: Union[int, Tuple[int, int]] = 1, padding: Union[int, Tuple[int, int]] = 0, output_padding: Union[int, Tuple[int, int]] = 0, groups: int = 1, bias: bool = True, dilation: int = 1, padding_mode: str = 'zeros', device=None, dtype=None)[source]
forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.cnn.ConvTranspose3d(in_channels: int, out_channels: int, kernel_size: Union[int, Tuple[int, int, int]], stride: Union[int, Tuple[int, int, int]] = 1, padding: Union[int, Tuple[int, int, int]] = 0, output_padding: Union[int, Tuple[int, int, int]] = 0, groups: int = 1, bias: bool = True, dilation: Union[int, Tuple[int, int, int]] = 1, padding_mode: str = 'zeros', device=None, dtype=None)[source]
forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.cnn.ConvWS2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, eps=1e-05)[source]
forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.cnn.DepthwiseSeparableConvModule(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, norm_cfg=None, act_cfg={'type': 'ReLU'}, dw_norm_cfg='default', dw_act_cfg='default', pw_norm_cfg='default', pw_act_cfg='default', **kwargs)[source]

Depthwise separable convolution module.

See https://arxiv.org/pdf/1704.04861.pdf for details.

This module can replace a ConvModule with the conv block replaced by two conv block: depthwise conv block and pointwise conv block. The depthwise conv block contains depthwise-conv/norm/activation layers. The pointwise conv block contains pointwise-conv/norm/activation layers. It should be noted that there will be norm/activation layer in the depthwise conv block if norm_cfg and act_cfg are specified.

Parameters
  • in_channels (int) – Number of channels in the input feature map. Same as that in nn._ConvNd.

  • out_channels (int) – Number of channels produced by the convolution. Same as that in nn._ConvNd.

  • kernel_size (int | tuple[int]) – Size of the convolving kernel. Same as that in nn._ConvNd.

  • stride (int | tuple[int]) – Stride of the convolution. Same as that in nn._ConvNd. Default: 1.

  • padding (int | tuple[int]) – Zero-padding added to both sides of the input. Same as that in nn._ConvNd. Default: 0.

  • dilation (int | tuple[int]) – Spacing between kernel elements. Same as that in nn._ConvNd. Default: 1.

  • norm_cfg (dict) – Default norm config for both depthwise ConvModule and pointwise ConvModule. Default: None.

  • act_cfg (dict) – Default activation config for both depthwise ConvModule and pointwise ConvModule. Default: dict(type=’ReLU’).

  • dw_norm_cfg (dict) – Norm config of depthwise ConvModule. If it is ‘default’, it will be the same as norm_cfg. Default: ‘default’.

  • dw_act_cfg (dict) – Activation config of depthwise ConvModule. If it is ‘default’, it will be the same as act_cfg. Default: ‘default’.

  • pw_norm_cfg (dict) – Norm config of pointwise ConvModule. If it is ‘default’, it will be the same as norm_cfg. Default: ‘default’.

  • pw_act_cfg (dict) – Activation config of pointwise ConvModule. If it is ‘default’, it will be the same as act_cfg. Default: ‘default’.

  • kwargs (optional) – Other shared arguments for depthwise and pointwise ConvModule. See ConvModule for ref.

forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.cnn.GeneralizedAttention(in_channels, spatial_range=- 1, num_heads=9, position_embedding_dim=- 1, position_magnitude=1, kv_stride=2, q_stride=1, attention_type='1111')[source]

GeneralizedAttention module.

See ‘An Empirical Study of Spatial Attention Mechanisms in Deep Networks’ (https://arxiv.org/abs/1711.07971) for details.

Parameters
  • in_channels (int) – Channels of the input feature map.

  • spatial_range (int) – The spatial range. -1 indicates no spatial range constraint. Default: -1.

  • num_heads (int) – The head number of empirical_attention module. Default: 9.

  • position_embedding_dim (int) – The position embedding dimension. Default: -1.

  • position_magnitude (int) – A multiplier acting on coord difference. Default: 1.

  • kv_stride (int) – The feature stride acting on key/value feature map. Default: 2.

  • q_stride (int) – The feature stride acting on query feature map. Default: 1.

  • attention_type (str) –

    A binary indicator string for indicating which items in generalized empirical_attention module are used. Default: ‘1111’.

    • ’1000’ indicates ‘query and key content’ (appr - appr) item,

    • ’0100’ indicates ‘query content and relative position’ (appr - position) item,

    • ’0010’ indicates ‘key content only’ (bias - appr) item,

    • ’0001’ indicates ‘relative position only’ (bias - position) item.

forward(x_input)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.cnn.HSigmoid(bias=3.0, divisor=6.0, min_value=0.0, max_value=1.0)[source]

Hard Sigmoid Module. Apply the hard sigmoid function: Hsigmoid(x) = min(max((x + bias) / divisor, min_value), max_value) Default: Hsigmoid(x) = min(max((x + 3) / 6, 0), 1)

Note

In MMCV v1.4.4, we modified the default value of args to align with PyTorch official.

Parameters
  • bias (float) – Bias of the input feature map. Default: 3.0.

  • divisor (float) – Divisor of the input feature map. Default: 6.0.

  • min_value (float) – Lower bound value. Default: 0.0.

  • max_value (float) – Upper bound value. Default: 1.0.

Returns

The output tensor.

Return type

Tensor

forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.cnn.HSwish(inplace=False)[source]

Hard Swish Module.

This module applies the hard swish function:

\[Hswish(x) = x * ReLU6(x + 3) / 6\]
Parameters

inplace (bool) – can optionally do the operation in-place. Default: False.

Returns

The output tensor.

Return type

Tensor

forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.cnn.KaimingInit(a=0, mode='fan_out', nonlinearity='relu', distribution='normal', **kwargs)[source]

Initialize module parameters with the values according to the method described in Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification - He, K. et al. (2015).

Parameters
  • a (int | float) – the negative slope of the rectifier used after this layer (only used with 'leaky_relu'). Defaults to 0.

  • mode (str) – either 'fan_in' or 'fan_out'. Choosing 'fan_in' preserves the magnitude of the variance of the weights in the forward pass. Choosing 'fan_out' preserves the magnitudes in the backwards pass. Defaults to 'fan_out'.

  • nonlinearity (str) – the non-linear function (nn.functional name), recommended to use only with 'relu' or 'leaky_relu' . Defaults to ‘relu’.

  • bias (int | float) – the value to fill the bias. Defaults to 0.

  • bias_prob (float, optional) – the probability for bias initialization. Defaults to None.

  • distribution (str) – distribution either be 'normal' or 'uniform'. Defaults to 'normal'.

  • layer (str | list[str], optional) – the layer will be initialized. Defaults to None.

class mmcv.cnn.Linear(in_features: int, out_features: int, bias: bool = True, device=None, dtype=None)[source]
forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.cnn.MaxPool2d(kernel_size: Union[int, Tuple[int, ...]], stride: Optional[Union[int, Tuple[int, ...]]] = None, padding: Union[int, Tuple[int, ...]] = 0, dilation: Union[int, Tuple[int, ...]] = 1, return_indices: bool = False, ceil_mode: bool = False)[source]
forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.cnn.MaxPool3d(kernel_size: Union[int, Tuple[int, ...]], stride: Optional[Union[int, Tuple[int, ...]]] = None, padding: Union[int, Tuple[int, ...]] = 0, dilation: Union[int, Tuple[int, ...]] = 1, return_indices: bool = False, ceil_mode: bool = False)[source]
forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.cnn.NonLocal1d(in_channels, sub_sample=False, conv_cfg={'type': 'Conv1d'}, **kwargs)[source]

1D Non-local module.

Parameters
  • in_channels (int) – Same as NonLocalND.

  • sub_sample (bool) – Whether to apply max pooling after pairwise function (Note that the sub_sample is applied on spatial only). Default: False.

  • conv_cfg (None | dict) – Same as NonLocalND. Default: dict(type=’Conv1d’).

class mmcv.cnn.NonLocal2d(in_channels, sub_sample=False, conv_cfg={'type': 'Conv2d'}, **kwargs)[source]

2D Non-local module.

Parameters
  • in_channels (int) – Same as NonLocalND.

  • sub_sample (bool) – Whether to apply max pooling after pairwise function (Note that the sub_sample is applied on spatial only). Default: False.

  • conv_cfg (None | dict) – Same as NonLocalND. Default: dict(type=’Conv2d’).

class mmcv.cnn.NonLocal3d(in_channels, sub_sample=False, conv_cfg={'type': 'Conv3d'}, **kwargs)[source]

3D Non-local module.

Parameters
  • in_channels (int) – Same as NonLocalND.

  • sub_sample (bool) – Whether to apply max pooling after pairwise function (Note that the sub_sample is applied on spatial only). Default: False.

  • conv_cfg (None | dict) – Same as NonLocalND. Default: dict(type=’Conv3d’).

class mmcv.cnn.NormalInit(mean=0, std=1, **kwargs)[source]

Initialize module parameters with the values drawn from the normal distribution \(\mathcal{N}(\text{mean}, \text{std}^2)\).

Parameters
  • mean (int | float) – the mean of the normal distribution. Defaults to 0.

  • std (int | float) – the standard deviation of the normal distribution. Defaults to 1.

  • bias (int | float) – the value to fill the bias. Defaults to 0.

  • bias_prob (float, optional) – the probability for bias initialization. Defaults to None.

  • layer (str | list[str], optional) – the layer will be initialized. Defaults to None.

class mmcv.cnn.PretrainedInit(checkpoint, prefix=None, map_location=None)[source]

Initialize module by loading a pretrained model.

Parameters
  • checkpoint (str) – the checkpoint file of the pretrained model should be load.

  • prefix (str, optional) – the prefix of a sub-module in the pretrained model. it is for loading a part of the pretrained model to initialize. For example, if we would like to only load the backbone of a detector model, we can set prefix='backbone.'. Defaults to None.

  • map_location (str) – map tensors into proper locations.

class mmcv.cnn.ResNet(depth, num_stages=4, strides=(1, 2, 2, 2), dilations=(1, 1, 1, 1), out_indices=(0, 1, 2, 3), style='pytorch', frozen_stages=- 1, bn_eval=True, bn_frozen=False, with_cp=False)[source]

ResNet backbone.

Parameters
  • depth (int) – Depth of resnet, from {18, 34, 50, 101, 152}.

  • num_stages (int) – Resnet stages, normally 4.

  • strides (Sequence[int]) – Strides of the first block of each stage.

  • dilations (Sequence[int]) – Dilation of each stage.

  • out_indices (Sequence[int]) – Output from which stages.

  • style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer.

  • frozen_stages (int) – Stages to be frozen (all param fixed). -1 means not freezing any parameters.

  • bn_eval (bool) – Whether to set BN layers as eval mode, namely, freeze running stats (mean and var).

  • bn_frozen (bool) – Whether to freeze weight and bias of BN layers.

  • with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed.

forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

train(mode=True)[source]

Sets the module in training mode.

This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g. Dropout, BatchNorm, etc.

Parameters

mode (bool) – whether to set training mode (True) or evaluation mode (False). Default: True.

Returns

self

Return type

Module

class mmcv.cnn.Scale(scale=1.0)[source]

A learnable scale parameter.

This layer scales the input by a learnable factor. It multiplies a learnable scale parameter of shape (1,) with input of any shape.

Parameters

scale (float) – Initial value of scale factor. Default: 1.0

forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.cnn.Swish[source]

Swish Module.

This module applies the swish function:

\[Swish(x) = x * Sigmoid(x)\]
Returns

The output tensor.

Return type

Tensor

forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.cnn.TruncNormalInit(mean: float = 0, std: float = 1, a: float = - 2, b: float = 2, **kwargs)[source]

Initialize module parameters with the values drawn from the normal distribution \(\mathcal{N}(\text{mean}, \text{std}^2)\) with values outside \([a, b]\).

Parameters
  • mean (float) – the mean of the normal distribution. Defaults to 0.

  • std (float) – the standard deviation of the normal distribution. Defaults to 1.

  • a (float) – The minimum cutoff value.

  • b (float) – The maximum cutoff value.

  • bias (float) – the value to fill the bias. Defaults to 0.

  • bias_prob (float, optional) – the probability for bias initialization. Defaults to None.

  • layer (str | list[str], optional) – the layer will be initialized. Defaults to None.

class mmcv.cnn.UniformInit(a=0, b=1, **kwargs)[source]

Initialize module parameters with values drawn from the uniform distribution \(\mathcal{U}(a, b)\).

Parameters
  • a (int | float) – the lower bound of the uniform distribution. Defaults to 0.

  • b (int | float) – the upper bound of the uniform distribution. Defaults to 1.

  • bias (int | float) – the value to fill the bias. Defaults to 0.

  • bias_prob (float, optional) – the probability for bias initialization. Defaults to None.

  • layer (str | list[str], optional) – the layer will be initialized. Defaults to None.

class mmcv.cnn.VGG(depth, with_bn=False, num_classes=- 1, num_stages=5, dilations=(1, 1, 1, 1, 1), out_indices=(0, 1, 2, 3, 4), frozen_stages=- 1, bn_eval=True, bn_frozen=False, ceil_mode=False, with_last_pool=True)[source]

VGG backbone.

Parameters
  • depth (int) – Depth of vgg, from {11, 13, 16, 19}.

  • with_bn (bool) – Use BatchNorm or not.

  • num_classes (int) – number of classes for classification.

  • num_stages (int) – VGG stages, normally 5.

  • dilations (Sequence[int]) – Dilation of each stage.

  • out_indices (Sequence[int]) – Output from which stages.

  • frozen_stages (int) – Stages to be frozen (all param fixed). -1 means not freezing any parameters.

  • bn_eval (bool) – Whether to set BN layers as eval mode, namely, freeze running stats (mean and var).

  • bn_frozen (bool) – Whether to freeze weight and bias of BN layers.

forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

train(mode=True)[source]

Sets the module in training mode.

This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g. Dropout, BatchNorm, etc.

Parameters

mode (bool) – whether to set training mode (True) or evaluation mode (False). Default: True.

Returns

self

Return type

Module

class mmcv.cnn.XavierInit(gain=1, distribution='normal', **kwargs)[source]

Initialize module parameters with values according to the method described in Understanding the difficulty of training deep feedforward neural networks - Glorot, X. & Bengio, Y. (2010).

Parameters
  • gain (int | float) – an optional scaling factor. Defaults to 1.

  • bias (int | float) – the value to fill the bias. Defaults to 0.

  • bias_prob (float, optional) – the probability for bias initialization. Defaults to None.

  • distribution (str) – distribution either be 'normal' or 'uniform'. Defaults to 'normal'.

  • layer (str | list[str], optional) – the layer will be initialized. Defaults to None.

mmcv.cnn.bias_init_with_prob(prior_prob)[source]

initialize conv/fc bias value according to a given probability value.

mmcv.cnn.build_activation_layer(cfg)[source]

Build activation layer.

Parameters

cfg (dict) –

The activation layer config, which should contain:

  • type (str): Layer type.

  • layer args: Args needed to instantiate an activation layer.

Returns

Created activation layer.

Return type

nn.Module

mmcv.cnn.build_conv_layer(cfg, *args, **kwargs)[source]

Build convolution layer.

Parameters
  • cfg (None or dict) – The conv layer config, which should contain: - type (str): Layer type. - layer args: Args needed to instantiate an conv layer.

  • args (argument list) – Arguments passed to the __init__ method of the corresponding conv layer.

  • kwargs (keyword arguments) – Keyword arguments passed to the __init__ method of the corresponding conv layer.

Returns

Created conv layer.

Return type

nn.Module

mmcv.cnn.build_model_from_cfg(cfg, registry, default_args=None)[source]

Build a PyTorch model from config dict(s). Different from build_from_cfg, if cfg is a list, a nn.Sequential will be built.

Parameters
  • cfg (dict, list[dict]) – The config of modules, is is either a config dict or a list of config dicts. If cfg is a list, a the built modules will be wrapped with nn.Sequential.

  • registry (Registry) – A registry the module belongs to.

  • default_args (dict, optional) – Default arguments to build the module. Defaults to None.

Returns

A built nn module.

Return type

nn.Module

mmcv.cnn.build_norm_layer(cfg, num_features, postfix='')[source]

Build normalization layer.

Parameters
  • cfg (dict) –

    The norm layer config, which should contain:

    • type (str): Layer type.

    • layer args: Args needed to instantiate a norm layer.

    • requires_grad (bool, optional): Whether stop gradient updates.

  • num_features (int) – Number of input channels.

  • postfix (int | str) – The postfix to be appended into norm abbreviation to create named layer.

Returns

The first element is the layer name consisting of abbreviation and postfix, e.g., bn1, gn. The second element is the created norm layer.

Return type

tuple[str, nn.Module]

mmcv.cnn.build_padding_layer(cfg, *args, **kwargs)[source]

Build padding layer.

Parameters

cfg (None or dict) – The padding layer config, which should contain: - type (str): Layer type. - layer args: Args needed to instantiate a padding layer.

Returns

Created padding layer.

Return type

nn.Module

mmcv.cnn.build_plugin_layer(cfg, postfix='', **kwargs)[source]

Build plugin layer.

Parameters
  • cfg (None or dict) –

    cfg should contain:

    • type (str): identify plugin layer type.

    • layer args: args needed to instantiate a plugin layer.

  • postfix (int, str) – appended into norm abbreviation to create named layer. Default: ‘’.

Returns

The first one is the concatenation of abbreviation and postfix. The second is the created plugin layer.

Return type

tuple[str, nn.Module]

mmcv.cnn.build_upsample_layer(cfg, *args, **kwargs)[source]

Build upsample layer.

Parameters
  • cfg (dict) –

    The upsample layer config, which should contain:

    • type (str): Layer type.

    • scale_factor (int): Upsample ratio, which is not applicable to deconv.

    • layer args: Args needed to instantiate a upsample layer.

  • args (argument list) – Arguments passed to the __init__ method of the corresponding conv layer.

  • kwargs (keyword arguments) – Keyword arguments passed to the __init__ method of the corresponding conv layer.

Returns

Created upsample layer.

Return type

nn.Module

mmcv.cnn.fuse_conv_bn(module)[source]

Recursively fuse conv and bn in a module.

During inference, the functionary of batch norm layers is turned off but only the mean and var alone channels are used, which exposes the chance to fuse it with the preceding conv layers to save computations and simplify network structures.

Parameters

module (nn.Module) – Module to be fused.

Returns

Fused module.

Return type

nn.Module

mmcv.cnn.get_model_complexity_info(model, input_shape, print_per_layer_stat=True, as_strings=True, input_constructor=None, flush=False, ost=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>)[source]

Get complexity information of a model.

This method can calculate FLOPs and parameter counts of a model with corresponding input shape. It can also print complexity information for each layer in a model.

Supported layers are listed as below:
  • Convolutions: nn.Conv1d, nn.Conv2d, nn.Conv3d.

  • Activations: nn.ReLU, nn.PReLU, nn.ELU, nn.LeakyReLU, nn.ReLU6.

  • Poolings: nn.MaxPool1d, nn.MaxPool2d, nn.MaxPool3d, nn.AvgPool1d, nn.AvgPool2d, nn.AvgPool3d, nn.AdaptiveMaxPool1d, nn.AdaptiveMaxPool2d, nn.AdaptiveMaxPool3d, nn.AdaptiveAvgPool1d, nn.AdaptiveAvgPool2d, nn.AdaptiveAvgPool3d.

  • BatchNorms: nn.BatchNorm1d, nn.BatchNorm2d, nn.BatchNorm3d, nn.GroupNorm, nn.InstanceNorm1d, InstanceNorm2d, InstanceNorm3d, nn.LayerNorm.

  • Linear: nn.Linear.

  • Deconvolution: nn.ConvTranspose2d.

  • Upsample: nn.Upsample.

Parameters
  • model (nn.Module) – The model for complexity calculation.

  • input_shape (tuple) – Input shape used for calculation.

  • print_per_layer_stat (bool) – Whether to print complexity information for each layer in a model. Default: True.

  • as_strings (bool) – Output FLOPs and params counts in a string form. Default: True.

  • input_constructor (None | callable) – If specified, it takes a callable method that generates input. otherwise, it will generate a random tensor with input shape to calculate FLOPs. Default: None.

  • flush (bool) – same as that in print(). Default: False.

  • ost (stream) – same as file param in print(). Default: sys.stdout.

Returns

If as_strings is set to True, it will return FLOPs and parameter counts in a string format. otherwise, it will return those in a float number format.

Return type

tuple[float | str]

mmcv.cnn.initialize(module, init_cfg)[source]

Initialize a module.

Parameters
  • module (torch.nn.Module) – the module will be initialized.

  • init_cfg (dict | list[dict]) – initialization configuration dict to define initializer. OpenMMLab has implemented 6 initializers including Constant, Xavier, Normal, Uniform, Kaiming, and Pretrained.

Example

>>> module = nn.Linear(2, 3, bias=True)
>>> init_cfg = dict(type='Constant', layer='Linear', val =1 , bias =2)
>>> initialize(module, init_cfg)
>>> module = nn.Sequential(nn.Conv1d(3, 1, 3), nn.Linear(1,2))
>>> # define key ``'layer'`` for initializing layer with different
>>> # configuration
>>> init_cfg = [dict(type='Constant', layer='Conv1d', val=1),
        dict(type='Constant', layer='Linear', val=2)]
>>> initialize(module, init_cfg)
>>> # define key``'override'`` to initialize some specific part in
>>> # module
>>> class FooNet(nn.Module):
>>>     def __init__(self):
>>>         super().__init__()
>>>         self.feat = nn.Conv2d(3, 16, 3)
>>>         self.reg = nn.Conv2d(16, 10, 3)
>>>         self.cls = nn.Conv2d(16, 5, 3)
>>> model = FooNet()
>>> init_cfg = dict(type='Constant', val=1, bias=2, layer='Conv2d',
>>>     override=dict(type='Constant', name='reg', val=3, bias=4))
>>> initialize(model, init_cfg)
>>> model = ResNet(depth=50)
>>> # Initialize weights with the pretrained model.
>>> init_cfg = dict(type='Pretrained',
        checkpoint='torchvision://resnet50')
>>> initialize(model, init_cfg)
>>> # Initialize weights of a sub-module with the specific part of
>>> # a pretrained model by using "prefix".
>>> url = 'http://download.openmmlab.com/mmdetection/v2.0/retinanet/'\
>>>     'retinanet_r50_fpn_1x_coco/'\
>>>     'retinanet_r50_fpn_1x_coco_20200130-c2398f9e.pth'
>>> init_cfg = dict(type='Pretrained',
        checkpoint=url, prefix='backbone.')
mmcv.cnn.is_norm(layer, exclude=None)[source]

Check if a layer is a normalization layer.

Parameters
  • layer (nn.Module) – The layer to be checked.

  • exclude (type | tuple[type]) – Types to be excluded.

Returns

Whether the layer is a norm layer.

Return type

bool

runner

class mmcv.runner.BaseModule(init_cfg=None)[source]

Base module for all modules in openmmlab.

BaseModule is a wrapper of torch.nn.Module with additional functionality of parameter initialization. Compared with torch.nn.Module, BaseModule mainly adds three attributes.

  • init_cfg: the config to control the initialization.

  • init_weights: The function of parameter initialization and recording initialization information.

  • _params_init_info: Used to track the parameter initialization information. This attribute only exists during executing the init_weights.

Parameters

init_cfg (dict, optional) – Initialization config dict.

init_weights()[source]

Initialize the weights.

class mmcv.runner.BaseRunner(model, batch_processor=None, optimizer=None, work_dir=None, logger=None, meta=None, max_iters=None, max_epochs=None)[source]

The base class of Runner, a training helper for PyTorch.

All subclasses should implement the following APIs:

  • run()

  • train()

  • val()

  • save_checkpoint()

Parameters
  • model (torch.nn.Module) – The model to be run.

  • batch_processor (callable) – A callable method that process a data batch. The interface of this method should be batch_processor(model, data, train_mode) -> dict

  • optimizer (dict or torch.optim.Optimizer) – It can be either an optimizer (in most cases) or a dict of optimizers (in models that requires more than one optimizer, e.g., GAN).

  • work_dir (str, optional) – The working directory to save checkpoints and logs. Defaults to None.

  • logger (logging.Logger) – Logger used during training. Defaults to None. (The default value is just for backward compatibility)

  • meta (dict | None) – A dict records some import information such as environment info and seed, which will be logged in logger hook. Defaults to None.

  • max_epochs (int, optional) – Total training epochs.

  • max_iters (int, optional) – Total training iterations.

call_hook(fn_name)[source]

Call all hooks.

Parameters

fn_name (str) – The function name in each hook to be called, such as “before_train_epoch”.

current_lr()[source]

Get current learning rates.

Returns

Current learning rates of all param groups. If the runner has a dict of optimizers, this method will return a dict.

Return type

list[float] | dict[str, list[float]]

current_momentum()[source]

Get current momentums.

Returns

Current momentums of all param groups. If the runner has a dict of optimizers, this method will return a dict.

Return type

list[float] | dict[str, list[float]]

property epoch

Current epoch.

Type

int

property hooks

A list of registered hooks.

Type

list[Hook]

property inner_iter

Iteration in an epoch.

Type

int

property iter

Current iteration.

Type

int

property max_epochs

Maximum training epochs.

Type

int

property max_iters

Maximum training iterations.

Type

int

property model_name

Name of the model, usually the module class name.

Type

str

property rank

Rank of current process. (distributed training)

Type

int

register_hook(hook, priority='NORMAL')[source]

Register a hook into the hook list.

The hook will be inserted into a priority queue, with the specified priority (See Priority for details of priorities). For hooks with the same priority, they will be triggered in the same order as they are registered.

Parameters
  • hook (Hook) – The hook to be registered.

  • priority (int or str or Priority) – Hook priority. Lower value means higher priority.

register_hook_from_cfg(hook_cfg)[source]

Register a hook from its cfg.

Parameters

hook_cfg (dict) – Hook config. It should have at least keys ‘type’ and ‘priority’ indicating its type and priority.

Note

The specific hook class to register should not use ‘type’ and ‘priority’ arguments during initialization.

register_training_hooks(lr_config, optimizer_config=None, checkpoint_config=None, log_config=None, momentum_config=None, timer_config={'type': 'IterTimerHook'}, custom_hooks_config=None)[source]

Register default and custom hooks for training.

Default and custom hooks include:

Hooks

Priority

LrUpdaterHook

VERY_HIGH (10)

MomentumUpdaterHook

HIGH (30)

OptimizerStepperHook

ABOVE_NORMAL (40)

CheckpointSaverHook

NORMAL (50)

IterTimerHook

LOW (70)

LoggerHook(s)

VERY_LOW (90)

CustomHook(s)

defaults to NORMAL (50)

If custom hooks have same priority with default hooks, custom hooks will be triggered after default hooks.

property world_size

Number of processes participating in the job. (distributed training)

Type

int

class mmcv.runner.CheckpointHook(interval=- 1, by_epoch=True, save_optimizer=True, out_dir=None, max_keep_ckpts=- 1, save_last=True, sync_buffer=False, file_client_args=None, **kwargs)[source]

Save checkpoints periodically.

Parameters
  • interval (int) – The saving period. If by_epoch=True, interval indicates epochs, otherwise it indicates iterations. Default: -1, which means “never”.

  • by_epoch (bool) – Saving checkpoints by epoch or by iteration. Default: True.

  • save_optimizer (bool) – Whether to save optimizer state_dict in the checkpoint. It is usually used for resuming experiments. Default: True.

  • out_dir (str, optional) – The root directory to save checkpoints. If not specified, runner.work_dir will be used by default. If specified, the out_dir will be the concatenation of out_dir and the last level directory of runner.work_dir. Changed in version 1.3.16.

  • max_keep_ckpts (int, optional) – The maximum checkpoints to keep. In some cases we want only the latest few checkpoints and would like to delete old ones to save the disk space. Default: -1, which means unlimited.

  • save_last (bool, optional) – Whether to force the last checkpoint to be saved regardless of interval. Default: True.

  • sync_buffer (bool, optional) – Whether to synchronize buffers in different gpus. Default: False.

  • file_client_args (dict, optional) – Arguments to instantiate a FileClient. See mmcv.fileio.FileClient for details. Default: None. New in version 1.3.16.

Warning

Before v1.3.16, the out_dir argument indicates the path where the checkpoint is stored. However, since v1.3.16, out_dir indicates the root directory and the final path to save checkpoint is the concatenation of out_dir and the last level directory of runner.work_dir. Suppose the value of out_dir is “/path/of/A” and the value of runner.work_dir is “/path/of/B”, then the final path will be “/path/of/A/B”.

class mmcv.runner.CheckpointLoader[source]

A general checkpoint loader to manage all schemes.

classmethod load_checkpoint(filename, map_location=None, logger=None)[source]

load checkpoint through URL scheme path.

Parameters
  • filename (str) – checkpoint file name with given prefix

  • map_location (str, optional) – Same as torch.load(). Default: None

  • logger (logging.Logger, optional) – The logger for message. Default: None

Returns

The loaded checkpoint.

Return type

dict or OrderedDict

classmethod register_scheme(prefixes, loader=None, force=False)[source]

Register a loader to CheckpointLoader.

This method can be used as a normal class method or a decorator.

Parameters
  • prefixes (str or list[str] or tuple[str]) –

  • prefix of the registered loader. (The) –

  • loader (function, optional) – The loader function to be registered. When this method is used as a decorator, loader is None. Defaults to None.

  • force (bool, optional) – Whether to override the loader if the prefix has already been registered. Defaults to False.

class mmcv.runner.CosineAnnealingLrUpdaterHook(min_lr=None, min_lr_ratio=None, **kwargs)[source]
class mmcv.runner.CosineRestartLrUpdaterHook(periods, restart_weights=[1], min_lr=None, min_lr_ratio=None, **kwargs)[source]

Cosine annealing with restarts learning rate scheme.

Parameters
  • periods (list[int]) – Periods for each cosine anneling cycle.

  • restart_weights (list[float], optional) – Restart weights at each restart iteration. Default: [1].

  • min_lr (float, optional) – The minimum lr. Default: None.

  • min_lr_ratio (float, optional) – The ratio of minimum lr to the base lr. Either min_lr or min_lr_ratio should be specified. Default: None.

class mmcv.runner.CyclicLrUpdaterHook(by_epoch=False, target_ratio=(10, 0.0001), cyclic_times=1, step_ratio_up=0.4, anneal_strategy='cos', gamma=1, **kwargs)[source]

Cyclic LR Scheduler.

Implement the cyclical learning rate policy (CLR) described in https://arxiv.org/pdf/1506.01186.pdf

Different from the original paper, we use cosine annealing rather than triangular policy inside a cycle. This improves the performance in the 3D detection area.

Parameters
  • by_epoch (bool, optional) – Whether to update LR by epoch.

  • target_ratio (tuple[float], optional) – Relative ratio of the highest LR and the lowest LR to the initial LR.

  • cyclic_times (int, optional) – Number of cycles during training

  • step_ratio_up (float, optional) – The ratio of the increasing process of LR in the total cycle.

  • anneal_strategy (str, optional) – {‘cos’, ‘linear’} Specifies the annealing strategy: ‘cos’ for cosine annealing, ‘linear’ for linear annealing. Default: ‘cos’.

  • gamma (float, optional) – Cycle decay ratio. Default: 1. It takes values in the range (0, 1]. The difference between the maximum learning rate and the minimum learning rate decreases periodically when it is less than 1. New in version 1.4.4.

class mmcv.runner.CyclicMomentumUpdaterHook(by_epoch=False, target_ratio=(0.8947368421052632, 1), cyclic_times=1, step_ratio_up=0.4, anneal_strategy='cos', gamma=1, **kwargs)[source]

Cyclic momentum Scheduler.

Implement the cyclical momentum scheduler policy described in https://arxiv.org/pdf/1708.07120.pdf

This momentum scheduler usually used together with the CyclicLRUpdater to improve the performance in the 3D detection area.

Parameters
  • target_ratio (tuple[float]) – Relative ratio of the lowest momentum and the highest momentum to the initial momentum.

  • cyclic_times (int) – Number of cycles during training

  • step_ratio_up (float) – The ratio of the increasing process of momentum in the total cycle.

  • by_epoch (bool) – Whether to update momentum by epoch.

  • anneal_strategy (str, optional) – {‘cos’, ‘linear’} Specifies the annealing strategy: ‘cos’ for cosine annealing, ‘linear’ for linear annealing. Default: ‘cos’.

  • gamma (float, optional) – Cycle decay ratio. Default: 1. It takes values in the range (0, 1]. The difference between the maximum learning rate and the minimum learning rate decreases periodically when it is less than 1. New in version 1.4.4.

class mmcv.runner.DefaultOptimizerConstructor(optimizer_cfg, paramwise_cfg=None)[source]

Default constructor for optimizers.

By default each parameter share the same optimizer settings, and we provide an argument paramwise_cfg to specify parameter-wise settings. It is a dict and may contain the following fields:

  • custom_keys (dict): Specified parameters-wise settings by keys. If one of the keys in custom_keys is a substring of the name of one parameter, then the setting of the parameter will be specified by custom_keys[key] and other setting like bias_lr_mult etc. will be ignored. It should be noted that the aforementioned key is the longest key that is a substring of the name of the parameter. If there are multiple matched keys with the same length, then the key with lower alphabet order will be chosen. custom_keys[key] should be a dict and may contain fields lr_mult and decay_mult. See Example 2 below.

  • bias_lr_mult (float): It will be multiplied to the learning rate for all bias parameters (except for those in normalization layers and offset layers of DCN).

  • bias_decay_mult (float): It will be multiplied to the weight decay for all bias parameters (except for those in normalization layers, depthwise conv layers, offset layers of DCN).

  • norm_decay_mult (float): It will be multiplied to the weight decay for all weight and bias parameters of normalization layers.

  • dwconv_decay_mult (float): It will be multiplied to the weight decay for all weight and bias parameters of depthwise conv layers.

  • dcn_offset_lr_mult (float): It will be multiplied to the learning rate for parameters of offset layer in the deformable convs of a model.

  • bypass_duplicate (bool): If true, the duplicate parameters would not be added into optimizer. Default: False.

Note

1. If the option dcn_offset_lr_mult is used, the constructor will override the effect of bias_lr_mult in the bias of offset layer. So be careful when using both bias_lr_mult and dcn_offset_lr_mult. If you wish to apply both of them to the offset layer in deformable convs, set dcn_offset_lr_mult to the original dcn_offset_lr_mult * bias_lr_mult.

2. If the option dcn_offset_lr_mult is used, the constructor will apply it to all the DCN layers in the model. So be careful when the model contains multiple DCN layers in places other than backbone.

Parameters
  • model (nn.Module) – The model with parameters to be optimized.

  • optimizer_cfg (dict) –

    The config dict of the optimizer. Positional fields are

    • type: class name of the optimizer.

    Optional fields are

    • any arguments of the corresponding optimizer type, e.g., lr, weight_decay, momentum, etc.

  • paramwise_cfg (dict, optional) – Parameter-wise options.

Example 1:
>>> model = torch.nn.modules.Conv1d(1, 1, 1)
>>> optimizer_cfg = dict(type='SGD', lr=0.01, momentum=0.9,
>>>                      weight_decay=0.0001)
>>> paramwise_cfg = dict(norm_decay_mult=0.)
>>> optim_builder = DefaultOptimizerConstructor(
>>>     optimizer_cfg, paramwise_cfg)
>>> optimizer = optim_builder(model)
Example 2:
>>> # assume model have attribute model.backbone and model.cls_head
>>> optimizer_cfg = dict(type='SGD', lr=0.01, weight_decay=0.95)
>>> paramwise_cfg = dict(custom_keys={
        '.backbone': dict(lr_mult=0.1, decay_mult=0.9)})
>>> optim_builder = DefaultOptimizerConstructor(
>>>     optimizer_cfg, paramwise_cfg)
>>> optimizer = optim_builder(model)
>>> # Then the `lr` and `weight_decay` for model.backbone is
>>> # (0.01 * 0.1, 0.95 * 0.9). `lr` and `weight_decay` for
>>> # model.cls_head is (0.01, 0.95).
add_params(params, module, prefix='', is_dcn_module=None)[source]

Add all parameters of module to the params list.

The parameters of the given module will be added to the list of param groups, with specific rules defined by paramwise_cfg.

Parameters
  • params (list[dict]) – A list of param groups, it will be modified in place.

  • module (nn.Module) – The module to be added.

  • prefix (str) – The prefix of the module

  • is_dcn_module (int|float|None) – If the current module is a submodule of DCN, is_dcn_module will be passed to control conv_offset layer’s learning rate. Defaults to None.

class mmcv.runner.DefaultRunnerConstructor(runner_cfg, default_args=None)[source]

Default constructor for runners.

Custom existing Runner like EpocBasedRunner though RunnerConstructor. For example, We can inject some new properties and functions for Runner.

Example

>>> from mmcv.runner import RUNNER_BUILDERS, build_runner
>>> # Define a new RunnerReconstructor
>>> @RUNNER_BUILDERS.register_module()
>>> class MyRunnerConstructor:
...     def __init__(self, runner_cfg, default_args=None):
...         if not isinstance(runner_cfg, dict):
...             raise TypeError('runner_cfg should be a dict',
...                             f'but got {type(runner_cfg)}')
...         self.runner_cfg = runner_cfg
...         self.default_args = default_args
...
...     def __call__(self):
...         runner = RUNNERS.build(self.runner_cfg,
...                                default_args=self.default_args)
...         # Add new properties for existing runner
...         runner.my_name = 'my_runner'
...         runner.my_function = lambda self: print(self.my_name)
...         ...
>>> # build your runner
>>> runner_cfg = dict(type='EpochBasedRunner', max_epochs=40,
...                   constructor='MyRunnerConstructor')
>>> runner = build_runner(runner_cfg)
class mmcv.runner.DistEvalHook(dataloader, start=None, interval=1, by_epoch=True, save_best=None, rule=None, test_fn=None, greater_keys=None, less_keys=None, broadcast_bn_buffer=True, tmpdir=None, gpu_collect=False, out_dir=None, file_client_args=None, **eval_kwargs)[source]

Distributed evaluation hook.

This hook will regularly perform evaluation in a given interval when performing in distributed environment.

Parameters
  • dataloader (DataLoader) – A PyTorch dataloader, whose dataset has implemented evaluate function.

  • start (int | None, optional) – Evaluation starting epoch. It enables evaluation before the training starts if start <= the resuming epoch. If None, whether to evaluate is merely decided by interval. Default: None.

  • interval (int) – Evaluation interval. Default: 1.

  • by_epoch (bool) – Determine perform evaluation by epoch or by iteration. If set to True, it will perform by epoch. Otherwise, by iteration. default: True.

  • save_best (str, optional) – If a metric is specified, it would measure the best checkpoint during evaluation. The information about best checkpoint would be saved in runner.meta['hook_msgs'] to keep best score value and best checkpoint path, which will be also loaded when resume checkpoint. Options are the evaluation metrics on the test dataset. e.g., bbox_mAP, segm_mAP for bbox detection and instance segmentation. AR@100 for proposal recall. If save_best is auto, the first key of the returned OrderedDict result will be used. Default: None.

  • rule (str | None, optional) – Comparison rule for best score. If set to None, it will infer a reasonable rule. Keys such as ‘acc’, ‘top’ .etc will be inferred by ‘greater’ rule. Keys contain ‘loss’ will be inferred by ‘less’ rule. Options are ‘greater’, ‘less’, None. Default: None.

  • test_fn (callable, optional) – test a model with samples from a dataloader in a multi-gpu manner, and return the test results. If None, the default test function mmcv.engine.multi_gpu_test will be used. (default: None)

  • tmpdir (str | None) – Temporary directory to save the results of all processes. Default: None.

  • gpu_collect (bool) – Whether to use gpu or cpu to collect results. Default: False.

  • broadcast_bn_buffer (bool) – Whether to broadcast the buffer(running_mean and running_var) of rank 0 to other rank before evaluation. Default: True.

  • out_dir (str, optional) – The root directory to save checkpoints. If not specified, runner.work_dir will be used by default. If specified, the out_dir will be the concatenation of out_dir and the last level directory of runner.work_dir.

  • file_client_args (dict) – Arguments to instantiate a FileClient. See mmcv.fileio.FileClient for details. Default: None.

  • **eval_kwargs – Evaluation arguments fed into the evaluate function of the dataset.

class mmcv.runner.DistSamplerSeedHook[source]

Data-loading sampler for distributed training.

When distributed training, it is only useful in conjunction with EpochBasedRunner, while IterBasedRunner achieves the same purpose with IterLoader.

class mmcv.runner.DvcliveLoggerHook(model_file=None, interval=10, ignore_last=True, reset_flag=False, by_epoch=True, **kwargs)[source]

Class to log metrics with dvclive.

It requires dvclive to be installed.

Parameters
  • model_file (str) – Default None. If not None, after each epoch the model will be saved to {model_file}.

  • interval (int) – Logging interval (every k iterations). Default 10.

  • ignore_last (bool) – Ignore the log of last iterations in each epoch if less than interval. Default: True.

  • reset_flag (bool) – Whether to clear the output buffer after logging. Default: False.

  • by_epoch (bool) – Whether EpochBasedRunner is used. Default: True.

  • kwargs – Arguments for instantiating Live.

class mmcv.runner.EMAHook(momentum=0.0002, interval=1, warm_up=100, resume_from=None)[source]

Exponential Moving Average Hook.

Use Exponential Moving Average on all parameters of model in training process. All parameters have a ema backup, which update by the formula as below. EMAHook takes priority over EvalHook and CheckpointSaverHook.

\[Xema\_{t+1} = (1 - \text{momentum}) \times Xema\_{t} + \text{momentum} \times X_t\]
Parameters
  • momentum (float) – The momentum used for updating ema parameter. Defaults to 0.0002.

  • interval (int) – Update ema parameter every interval iteration. Defaults to 1.

  • warm_up (int) – During first warm_up steps, we may use smaller momentum to update ema parameters more slowly. Defaults to 100.

  • resume_from (str) – The checkpoint path. Defaults to None.

after_train_epoch(runner)[source]

We load parameter values from ema backup to model before the EvalHook.

after_train_iter(runner)[source]

Update ema parameter every self.interval iterations.

before_run(runner)[source]

To resume model with it’s ema parameters more friendly.

Register ema parameter as named_buffer to model

before_train_epoch(runner)[source]

We recover model’s parameter from ema backup after last epoch’s EvalHook.

class mmcv.runner.EpochBasedRunner(model, batch_processor=None, optimizer=None, work_dir=None, logger=None, meta=None, max_iters=None, max_epochs=None)[source]

Epoch-based Runner.

This runner train models epoch by epoch.

run(data_loaders, workflow, max_epochs=None, **kwargs)[source]

Start running.

Parameters
  • data_loaders (list[DataLoader]) – Dataloaders for training and validation.

  • workflow (list[tuple]) – A list of (phase, epochs) to specify the running order and epochs. E.g, [(‘train’, 2), (‘val’, 1)] means running 2 epochs for training and 1 epoch for validation, iteratively.

save_checkpoint(out_dir, filename_tmpl='epoch_{}.pth', save_optimizer=True, meta=None, create_symlink=True)[source]

Save the checkpoint.

Parameters
  • out_dir (str) – The directory that checkpoints are saved.

  • filename_tmpl (str, optional) – The checkpoint filename template, which contains a placeholder for the epoch number. Defaults to ‘epoch_{}.pth’.

  • save_optimizer (bool, optional) – Whether to save the optimizer to the checkpoint. Defaults to True.

  • meta (dict, optional) – The meta information to be saved in the checkpoint. Defaults to None.

  • create_symlink (bool, optional) – Whether to create a symlink “latest.pth” to point to the latest checkpoint. Defaults to True.

class mmcv.runner.EvalHook(dataloader, start=None, interval=1, by_epoch=True, save_best=None, rule=None, test_fn=None, greater_keys=None, less_keys=None, out_dir=None, file_client_args=None, **eval_kwargs)[source]

Non-Distributed evaluation hook.

This hook will regularly perform evaluation in a given interval when performing in non-distributed environment.

Parameters
  • dataloader (DataLoader) – A PyTorch dataloader, whose dataset has implemented evaluate function.

  • start (int | None, optional) – Evaluation starting epoch. It enables evaluation before the training starts if start <= the resuming epoch. If None, whether to evaluate is merely decided by interval. Default: None.

  • interval (int) – Evaluation interval. Default: 1.

  • by_epoch (bool) – Determine perform evaluation by epoch or by iteration. If set to True, it will perform by epoch. Otherwise, by iteration. Default: True.

  • save_best (str, optional) – If a metric is specified, it would measure the best checkpoint during evaluation. The information about best checkpoint would be saved in runner.meta['hook_msgs'] to keep best score value and best checkpoint path, which will be also loaded when resume checkpoint. Options are the evaluation metrics on the test dataset. e.g., bbox_mAP, segm_mAP for bbox detection and instance segmentation. AR@100 for proposal recall. If save_best is auto, the first key of the returned OrderedDict result will be used. Default: None.

  • rule (str | None, optional) – Comparison rule for best score. If set to None, it will infer a reasonable rule. Keys such as ‘acc’, ‘top’ .etc will be inferred by ‘greater’ rule. Keys contain ‘loss’ will be inferred by ‘less’ rule. Options are ‘greater’, ‘less’, None. Default: None.

  • test_fn (callable, optional) – test a model with samples from a dataloader, and return the test results. If None, the default test function mmcv.engine.single_gpu_test will be used. (default: None)

  • greater_keys (List[str] | None, optional) – Metric keys that will be inferred by ‘greater’ comparison rule. If None, _default_greater_keys will be used. (default: None)

  • less_keys (List[str] | None, optional) – Metric keys that will be inferred by ‘less’ comparison rule. If None, _default_less_keys will be used. (default: None)

  • out_dir (str, optional) – The root directory to save checkpoints. If not specified, runner.work_dir will be used by default. If specified, the out_dir will be the concatenation of out_dir and the last level directory of runner.work_dir. New in version 1.3.16.

  • file_client_args (dict) – Arguments to instantiate a FileClient. See mmcv.fileio.FileClient for details. Default: None. New in version 1.3.16.

  • **eval_kwargs – Evaluation arguments fed into the evaluate function of the dataset.

Note

If new arguments are added for EvalHook, tools/test.py, tools/eval_metric.py may be affected.

after_train_epoch(runner)[source]

Called after every training epoch to evaluate the results.

after_train_iter(runner)[source]

Called after every training iter to evaluate the results.

before_train_epoch(runner)[source]

Evaluate the model only at the start of training by epoch.

before_train_iter(runner)[source]

Evaluate the model only at the start of training by iteration.

evaluate(runner, results)[source]

Evaluate the results.

Parameters
  • runner (mmcv.Runner) – The underlined training runner.

  • results (list) – Output results.

class mmcv.runner.ExpLrUpdaterHook(gamma, **kwargs)[source]
class mmcv.runner.FixedLrUpdaterHook(**kwargs)[source]
class mmcv.runner.FlatCosineAnnealingLrUpdaterHook(start_percent=0.75, min_lr=None, min_lr_ratio=None, **kwargs)[source]

Flat + Cosine lr schedule.

Modified from https://github.com/fastai/fastai/blob/master/fastai/callback/schedule.py#L128 # noqa: E501

Parameters
  • start_percent (float) – When to start annealing the learning rate after the percentage of the total training steps. The value should be in range [0, 1). Default: 0.75

  • min_lr (float, optional) – The minimum lr. Default: None.

  • min_lr_ratio (float, optional) – The ratio of minimum lr to the base lr. Either min_lr or min_lr_ratio should be specified. Default: None.

class mmcv.runner.Fp16OptimizerHook(grad_clip=None, coalesce=True, bucket_size_mb=- 1, loss_scale=512.0, distributed=True)[source]

FP16 optimizer hook (using PyTorch’s implementation).

If you are using PyTorch >= 1.6, torch.cuda.amp is used as the backend, to take care of the optimization procedure.

Parameters

loss_scale (float | str | dict) – Scale factor configuration. If loss_scale is a float, static loss scaling will be used with the specified scale. If loss_scale is a string, it must be ‘dynamic’, then dynamic loss scaling will be used. It can also be a dict containing arguments of GradScalar. Defaults to 512. For Pytorch >= 1.6, mmcv uses official implementation of GradScaler. If you use a dict version of loss_scale to create GradScaler, please refer to: https://pytorch.org/docs/stable/amp.html#torch.cuda.amp.GradScaler for the parameters.

Examples

>>> loss_scale = dict(
...     init_scale=65536.0,
...     growth_factor=2.0,
...     backoff_factor=0.5,
...     growth_interval=2000
... )
>>> optimizer_hook = Fp16OptimizerHook(loss_scale=loss_scale)
after_train_iter(runner)[source]

Backward optimization steps for Mixed Precision Training. For dynamic loss scaling, please refer to https://pytorch.org/docs/stable/amp.html#torch.cuda.amp.GradScaler.

  1. Scale the loss by a scale factor.

  2. Backward the loss to obtain the gradients.

  3. Unscale the optimizer’s gradient tensors.

  4. Call optimizer.step() and update scale factor.

  5. Save loss_scaler state_dict for resume purpose.

before_run(runner)[source]

Preparing steps before Mixed Precision Training.

copy_grads_to_fp32(fp16_net, fp32_weights)[source]

Copy gradients from fp16 model to fp32 weight copy.

copy_params_to_fp16(fp16_net, fp32_weights)[source]

Copy updated params from fp32 weight copy to fp16 model.

class mmcv.runner.GradientCumulativeFp16OptimizerHook(*args, **kwargs)[source]

Fp16 optimizer Hook (using PyTorch’s implementation) implements multi-iters gradient cumulating.

If you are using PyTorch >= 1.6, torch.cuda.amp is used as the backend, to take care of the optimization procedure.

after_train_iter(runner)[source]

Backward optimization steps for Mixed Precision Training. For dynamic loss scaling, please refer to https://pytorch.org/docs/stable/amp.html#torch.cuda.amp.GradScaler.

  1. Scale the loss by a scale factor.

  2. Backward the loss to obtain the gradients.

  3. Unscale the optimizer’s gradient tensors.

  4. Call optimizer.step() and update scale factor.

  5. Save loss_scaler state_dict for resume purpose.

class mmcv.runner.GradientCumulativeOptimizerHook(cumulative_iters=1, **kwargs)[source]

Optimizer Hook implements multi-iters gradient cumulating.

Parameters

cumulative_iters (int, optional) – Num of gradient cumulative iters. The optimizer will step every cumulative_iters iters. Defaults to 1.

Examples

>>> # Use cumulative_iters to simulate a large batch size
>>> # It is helpful when the hardware cannot handle a large batch size.
>>> loader = DataLoader(data, batch_size=64)
>>> optim_hook = GradientCumulativeOptimizerHook(cumulative_iters=4)
>>> # almost equals to
>>> loader = DataLoader(data, batch_size=256)
>>> optim_hook = OptimizerHook()
class mmcv.runner.InvLrUpdaterHook(gamma, power=1.0, **kwargs)[source]
class mmcv.runner.IterBasedRunner(model, batch_processor=None, optimizer=None, work_dir=None, logger=None, meta=None, max_iters=None, max_epochs=None)[source]

Iteration-based Runner.

This runner train models iteration by iteration.

register_training_hooks(lr_config, optimizer_config=None, checkpoint_config=None, log_config=None, momentum_config=None, custom_hooks_config=None)[source]

Register default hooks for iter-based training.

Checkpoint hook, optimizer stepper hook and logger hooks will be set to by_epoch=False by default.

Default hooks include:

Hooks

Priority

LrUpdaterHook

VERY_HIGH (10)

MomentumUpdaterHook

HIGH (30)

OptimizerStepperHook

ABOVE_NORMAL (40)

CheckpointSaverHook

NORMAL (50)

IterTimerHook

LOW (70)

LoggerHook(s)

VERY_LOW (90)

CustomHook(s)

defaults to NORMAL (50)

If custom hooks have same priority with default hooks, custom hooks will be triggered after default hooks.

resume(checkpoint, resume_optimizer=True, map_location='default')[source]

Resume model from checkpoint.

Parameters
  • checkpoint (str) – Checkpoint to resume from.

  • resume_optimizer (bool, optional) – Whether resume the optimizer(s) if the checkpoint file includes optimizer(s). Default to True.

  • map_location (str, optional) – Same as torch.load(). Default to ‘default’.

run(data_loaders, workflow, max_iters=None, **kwargs)[source]

Start running.

Parameters
  • data_loaders (list[DataLoader]) – Dataloaders for training and validation.

  • workflow (list[tuple]) – A list of (phase, iters) to specify the running order and iterations. E.g, [(‘train’, 10000), (‘val’, 1000)] means running 10000 iterations for training and 1000 iterations for validation, iteratively.

save_checkpoint(out_dir, filename_tmpl='iter_{}.pth', meta=None, save_optimizer=True, create_symlink=True)[source]

Save checkpoint to file.

Parameters
  • out_dir (str) – Directory to save checkpoint files.

  • filename_tmpl (str, optional) – Checkpoint file template. Defaults to ‘iter_{}.pth’.

  • meta (dict, optional) – Metadata to be saved in checkpoint. Defaults to None.

  • save_optimizer (bool, optional) – Whether save optimizer. Defaults to True.

  • create_symlink (bool, optional) – Whether create symlink to the latest checkpoint file. Defaults to True.

class mmcv.runner.LoggerHook(interval=10, ignore_last=True, reset_flag=False, by_epoch=True)[source]

Base class for logger hooks.

Parameters
  • interval (int) – Logging interval (every k iterations). Default 10.

  • ignore_last (bool) – Ignore the log of last iterations in each epoch if less than interval. Default True.

  • reset_flag (bool) – Whether to clear the output buffer after logging. Default False.

  • by_epoch (bool) – Whether EpochBasedRunner is used. Default True.

get_iter(runner, inner_iter=False)[source]

Get the current training iteration step.

static is_scalar(val, include_np=True, include_torch=True)[source]

Tell the input variable is a scalar or not.

Parameters
  • val – Input variable.

  • include_np (bool) – Whether include 0-d np.ndarray as a scalar.

  • include_torch (bool) – Whether include 0-d torch.Tensor as a scalar.

Returns

True or False.

Return type

bool

class mmcv.runner.LossScaler(init_scale=4294967296, mode='dynamic', scale_factor=2.0, scale_window=1000)[source]

Class that manages loss scaling in mixed precision training which supports both dynamic or static mode.

The implementation refers to https://github.com/NVIDIA/apex/blob/master/apex/fp16_utils/loss_scaler.py. Indirectly, by supplying mode='dynamic' for dynamic loss scaling. It’s important to understand how LossScaler operates. Loss scaling is designed to combat the problem of underflowing gradients encountered at long times when training fp16 networks. Dynamic loss scaling begins by attempting a very high loss scale. Ironically, this may result in OVERflowing gradients. If overflowing gradients are encountered, FP16_Optimizer then skips the update step for this particular iteration/minibatch, and LossScaler adjusts the loss scale to a lower value. If a certain number of iterations occur without overflowing gradients detected,:class:LossScaler increases the loss scale once more. In this way LossScaler attempts to “ride the edge” of always using the highest loss scale possible without incurring overflow.

Parameters
  • init_scale (float) – Initial loss scale value, default: 2**32.

  • scale_factor (float) – Factor used when adjusting the loss scale. Default: 2.

  • mode (str) – Loss scaling mode. ‘dynamic’ or ‘static’

  • scale_window (int) – Number of consecutive iterations without an overflow to wait before increasing the loss scale. Default: 1000.

has_overflow(params)[source]

Check if params contain overflow.

load_state_dict(state_dict)[source]

Loads the loss_scaler state dict.

Parameters

state_dict (dict) – scaler state.

state_dict()[source]

Returns the state of the scaler as a dict.

update_scale(overflow)[source]

update the current loss scale value when overflow happens.

class mmcv.runner.LrUpdaterHook(by_epoch=True, warmup=None, warmup_iters=0, warmup_ratio=0.1, warmup_by_epoch=False)[source]

LR Scheduler in MMCV.

Parameters
  • by_epoch (bool) – LR changes epoch by epoch

  • warmup (string) – Type of warmup used. It can be None(use no warmup), ‘constant’, ‘linear’ or ‘exp’

  • warmup_iters (int) – The number of iterations or epochs that warmup lasts

  • warmup_ratio (float) – LR used at the beginning of warmup equals to warmup_ratio * initial_lr

  • warmup_by_epoch (bool) – When warmup_by_epoch == True, warmup_iters means the number of epochs that warmup lasts, otherwise means the number of iteration that warmup lasts

class mmcv.runner.MlflowLoggerHook(exp_name=None, tags=None, log_model=True, interval=10, ignore_last=True, reset_flag=False, by_epoch=True)[source]

Class to log metrics and (optionally) a trained model to MLflow.

It requires MLflow to be installed.

Parameters
  • exp_name (str, optional) – Name of the experiment to be used. Default None. If not None, set the active experiment. If experiment does not exist, an experiment with provided name will be created.

  • tags (Dict[str], optional) – Tags for the current run. Default None. If not None, set tags for the current run.

  • log_model (bool, optional) – Whether to log an MLflow artifact. Default True. If True, log runner.model as an MLflow artifact for the current run.

  • interval (int) – Logging interval (every k iterations). Default: 10.

  • ignore_last (bool) – Ignore the log of last iterations in each epoch if less than interval. Default: True.

  • reset_flag (bool) – Whether to clear the output buffer after logging. Default: False.

  • by_epoch (bool) – Whether EpochBasedRunner is used. Default: True.

class mmcv.runner.ModuleDict(modules=None, init_cfg=None)[source]

ModuleDict in openmmlab.

Parameters
  • modules (dict, optional) – a mapping (dictionary) of (string: module) or an iterable of key-value pairs of type (string, module).

  • init_cfg (dict, optional) – Initialization config dict.

class mmcv.runner.ModuleList(modules=None, init_cfg=None)[source]

ModuleList in openmmlab.

Parameters
  • modules (iterable, optional) – an iterable of modules to add.

  • init_cfg (dict, optional) – Initialization config dict.

class mmcv.runner.NeptuneLoggerHook(init_kwargs=None, interval=10, ignore_last=True, reset_flag=True, with_step=True, by_epoch=True)[source]

Class to log metrics to NeptuneAI.

It requires Neptune to be installed.

Parameters
  • init_kwargs (dict) –

    a dict contains the initialization keys as below:

    • project (str): Name of a project in a form of namespace/project_name. If None, the value of NEPTUNE_PROJECT environment variable will be taken.

    • api_token (str): User’s API token. If None, the value of NEPTUNE_API_TOKEN environment variable will be taken. Note: It is strongly recommended to use NEPTUNE_API_TOKEN environment variable rather than placing your API token in plain text in your source code.

    • name (str, optional, default is ‘Untitled’): Editable name of the run. Name is displayed in the run’s Details and in Runs table as a column.

    Check https://docs.neptune.ai/api-reference/neptune#init for more init arguments.

  • interval (int) – Logging interval (every k iterations). Default: 10.

  • ignore_last (bool) – Ignore the log of last iterations in each epoch if less than interval. Default: True.

  • reset_flag (bool) – Whether to clear the output buffer after logging. Default: True.

  • with_step (bool) – If True, the step will be logged from self.get_iters. Otherwise, step will not be logged. Default: True.

  • by_epoch (bool) – Whether EpochBasedRunner is used. Default: True.

class mmcv.runner.OneCycleLrUpdaterHook(max_lr, total_steps=None, pct_start=0.3, anneal_strategy='cos', div_factor=25, final_div_factor=10000.0, three_phase=False, **kwargs)[source]

One Cycle LR Scheduler.

The 1cycle learning rate policy changes the learning rate after every batch. The one cycle learning rate policy is described in https://arxiv.org/pdf/1708.07120.pdf

Parameters
  • max_lr (float or list) – Upper learning rate boundaries in the cycle for each parameter group.

  • total_steps (int, optional) – The total number of steps in the cycle. Note that if a value is not provided here, it will be the max_iter of runner. Default: None.

  • pct_start (float) – The percentage of the cycle (in number of steps) spent increasing the learning rate. Default: 0.3

  • anneal_strategy (str) – {‘cos’, ‘linear’} Specifies the annealing strategy: ‘cos’ for cosine annealing, ‘linear’ for linear annealing. Default: ‘cos’

  • div_factor (float) – Determines the initial learning rate via initial_lr = max_lr/div_factor Default: 25

  • final_div_factor (float) – Determines the minimum learning rate via min_lr = initial_lr/final_div_factor Default: 1e4

  • three_phase (bool) – If three_phase is True, use a third phase of the schedule to annihilate the learning rate according to final_div_factor instead of modifying the second phase (the first two phases will be symmetrical about the step indicated by pct_start). Default: False

class mmcv.runner.OneCycleMomentumUpdaterHook(base_momentum=0.85, max_momentum=0.95, pct_start=0.3, anneal_strategy='cos', three_phase=False, **kwargs)[source]

OneCycle momentum Scheduler.

This momentum scheduler usually used together with the OneCycleLrUpdater to improve the performance.

Parameters
  • base_momentum (float or list) – Lower momentum boundaries in the cycle for each parameter group. Note that momentum is cycled inversely to learning rate; at the peak of a cycle, momentum is ‘base_momentum’ and learning rate is ‘max_lr’. Default: 0.85

  • max_momentum (float or list) – Upper momentum boundaries in the cycle for each parameter group. Functionally, it defines the cycle amplitude (max_momentum - base_momentum). Note that momentum is cycled inversely to learning rate; at the start of a cycle, momentum is ‘max_momentum’ and learning rate is ‘base_lr’ Default: 0.95

  • pct_start (float) – The percentage of the cycle (in number of steps) spent increasing the learning rate. Default: 0.3

  • anneal_strategy (str) – {‘cos’, ‘linear’} Specifies the annealing strategy: ‘cos’ for cosine annealing, ‘linear’ for linear annealing. Default: ‘cos’

  • three_phase (bool) – If three_phase is True, use a third phase of the schedule to annihilate the learning rate according to final_div_factor instead of modifying the second phase (the first two phases will be symmetrical about the step indicated by pct_start). Default: False

class mmcv.runner.OptimizerHook(grad_clip=None, detect_anomalous_params=False)[source]

A hook contains custom operations for the optimizer.

Parameters
  • grad_clip (dict, optional) – A config dict to control the clip_grad. Default: None.

  • detect_anomalous_params (bool) –

    This option is only used for debugging which will slow down the training speed. Detect anomalous parameters that are not included in the computational graph with loss as the root. There are two cases

    • Parameters were not used during forward pass.

    • Parameters were not used to produce loss.

    Default: False.

class mmcv.runner.PaviLoggerHook(init_kwargs=None, add_graph=False, add_last_ckpt=False, interval=10, ignore_last=True, reset_flag=False, by_epoch=True, img_key='img_info')[source]

Class to visual model, log metrics (for internal use).

Parameters
  • init_kwargs (dict) – A dict contains the initialization keys.

  • add_graph (bool) – Whether to visual model. Default: False.

  • add_last_ckpt (bool) – Whether to save checkpoint after run. Default: False.

  • interval (int) – Logging interval (every k iterations). Default: True.

  • ignore_last (bool) – Ignore the log of last iterations in each epoch if less than interval. Default: True.

  • reset_flag (bool) – Whether to clear the output buffer after logging. Default: False.

  • by_epoch (bool) – Whether EpochBasedRunner is used. Default: True.

  • img_key (string) – Get image data from Dataset. Default: ‘img_info’.

get_step(runner)[source]

Get the total training step/epoch.

class mmcv.runner.PolyLrUpdaterHook(power=1.0, min_lr=0.0, **kwargs)[source]
class mmcv.runner.Priority(value)[source]

Hook priority levels.

Level

Value

HIGHEST

0

VERY_HIGH

10

HIGH

30

ABOVE_NORMAL

40

NORMAL

50

BELOW_NORMAL

60

LOW

70

VERY_LOW

90

LOWEST

100

class mmcv.runner.Runner(*args, **kwargs)[source]

Deprecated name of EpochBasedRunner.

class mmcv.runner.Sequential(*args, init_cfg=None)[source]

Sequential module in openmmlab.

Parameters

init_cfg (dict, optional) – Initialization config dict.

class mmcv.runner.StepLrUpdaterHook(step, gamma=0.1, min_lr=None, **kwargs)[source]

Step LR scheduler with min_lr clipping.

Parameters
  • step (int | list[int]) – Step to decay the LR. If an int value is given, regard it as the decay interval. If a list is given, decay LR at these steps.

  • gamma (float, optional) – Decay LR ratio. Default: 0.1.

  • min_lr (float, optional) – Minimum LR value to keep. If LR after decay is lower than min_lr, it will be clipped to this value. If None is given, we don’t perform lr clipping. Default: None.

class mmcv.runner.StepMomentumUpdaterHook(step, gamma=0.5, min_momentum=None, **kwargs)[source]

Step momentum scheduler with min value clipping.

Parameters
  • step (int | list[int]) – Step to decay the momentum. If an int value is given, regard it as the decay interval. If a list is given, decay momentum at these steps.

  • gamma (float, optional) – Decay momentum ratio. Default: 0.5.

  • min_momentum (float, optional) – Minimum momentum value to keep. If momentum after decay is lower than this value, it will be clipped accordingly. If None is given, we don’t perform lr clipping. Default: None.

class mmcv.runner.SyncBuffersHook(distributed=True)[source]

Synchronize model buffers such as running_mean and running_var in BN at the end of each epoch.

Parameters

distributed (bool) – Whether distributed training is used. It is effective only for distributed training. Defaults to True.

after_epoch(runner)[source]

All-reduce model buffers at the end of each epoch.

class mmcv.runner.TensorboardLoggerHook(log_dir=None, interval=10, ignore_last=True, reset_flag=False, by_epoch=True)[source]

Class to log metrics to Tensorboard.

Parameters
  • log_dir (string) – Save directory location. Default: None. If default values are used, directory location is runner.work_dir/tf_logs.

  • interval (int) – Logging interval (every k iterations). Default: True.

  • ignore_last (bool) – Ignore the log of last iterations in each epoch if less than interval. Default: True.

  • reset_flag (bool) – Whether to clear the output buffer after logging. Default: False.

  • by_epoch (bool) – Whether EpochBasedRunner is used. Default: True.

class mmcv.runner.TextLoggerHook(by_epoch=True, interval=10, ignore_last=True, reset_flag=False, interval_exp_name=1000, out_dir=None, out_suffix=('.log.json', '.log', '.py'), keep_local=True, file_client_args=None)[source]

Logger hook in text.

In this logger hook, the information will be printed on terminal and saved in json file.

Parameters
  • by_epoch (bool, optional) – Whether EpochBasedRunner is used. Default: True.

  • interval (int, optional) – Logging interval (every k iterations). Default: 10.

  • ignore_last (bool, optional) – Ignore the log of last iterations in each epoch if less than interval. Default: True.

  • reset_flag (bool, optional) – Whether to clear the output buffer after logging. Default: False.

  • interval_exp_name (int, optional) – Logging interval for experiment name. This feature is to help users conveniently get the experiment information from screen or log file. Default: 1000.

  • out_dir (str, optional) – Logs are saved in runner.work_dir default. If out_dir is specified, logs will be copied to a new directory which is the concatenation of out_dir and the last level directory of runner.work_dir. Default: None. New in version 1.3.16.

  • out_suffix (str or tuple[str], optional) – Those filenames ending with out_suffix will be copied to out_dir. Default: (‘.log.json’, ‘.log’, ‘.py’). New in version 1.3.16.

  • keep_local (bool, optional) – Whether to keep local log when out_dir is specified. If False, the local log will be removed. Default: True. New in version 1.3.16.

  • file_client_args (dict, optional) – Arguments to instantiate a FileClient. See mmcv.fileio.FileClient for details. Default: None. New in version 1.3.16.

class mmcv.runner.WandbLoggerHook(init_kwargs=None, interval=10, ignore_last=True, reset_flag=False, commit=True, by_epoch=True, with_step=True, log_artifact=True, out_suffix=('.log.json', '.log', '.py'))[source]

Class to log metrics with wandb.

It requires wandb to be installed.

Parameters
  • init_kwargs (dict) – A dict contains the initialization keys. Check https://docs.wandb.ai/ref/python/init for more init arguments.

  • interval (int) – Logging interval (every k iterations). Default 10.

  • ignore_last (bool) – Ignore the log of last iterations in each epoch if less than interval. Default: True.

  • reset_flag (bool) – Whether to clear the output buffer after logging. Default: False.

  • commit (bool) – Save the metrics dict to the wandb server and increment the step. If false wandb.log just updates the current metrics dict with the row argument and metrics won’t be saved until wandb.log is called with commit=True. Default: True.

  • by_epoch (bool) – Whether EpochBasedRunner is used. Default: True.

  • with_step (bool) – If True, the step will be logged from self.get_iters. Otherwise, step will not be logged. Default: True.

  • log_artifact (bool) – If True, artifacts in {work_dir} will be uploaded to wandb after training ends. Default: True New in version 1.4.3.

  • out_suffix (str or tuple[str], optional) – Those filenames ending with out_suffix will be uploaded to wandb. Default: (‘.log.json’, ‘.log’, ‘.py’). New in version 1.4.3.

mmcv.runner.allreduce_grads(params, coalesce=True, bucket_size_mb=- 1)[source]

Allreduce gradients.

Parameters
  • params (list[torch.Parameters]) – List of parameters of a model

  • coalesce (bool, optional) – Whether allreduce parameters as a whole. Defaults to True.

  • bucket_size_mb (int, optional) – Size of bucket, the unit is MB. Defaults to -1.

mmcv.runner.allreduce_params(params, coalesce=True, bucket_size_mb=- 1)[source]

Allreduce parameters.

Parameters
  • params (list[torch.Parameters]) – List of parameters or buffers of a model.

  • coalesce (bool, optional) – Whether allreduce parameters as a whole. Defaults to True.

  • bucket_size_mb (int, optional) – Size of bucket, the unit is MB. Defaults to -1.

mmcv.runner.auto_fp16(apply_to=None, out_fp32=False)[source]

Decorator to enable fp16 training automatically.

This decorator is useful when you write custom modules and want to support mixed precision training. If inputs arguments are fp32 tensors, they will be converted to fp16 automatically. Arguments other than fp32 tensors are ignored. If you are using PyTorch >= 1.6, torch.cuda.amp is used as the backend, otherwise, original mmcv implementation will be adopted.

Parameters
  • apply_to (Iterable, optional) – The argument names to be converted. None indicates all arguments.

  • out_fp32 (bool) – Whether to convert the output back to fp32.

Example

>>> import torch.nn as nn
>>> class MyModule1(nn.Module):
>>>
>>>     # Convert x and y to fp16
>>>     @auto_fp16()
>>>     def forward(self, x, y):
>>>         pass
>>> import torch.nn as nn
>>> class MyModule2(nn.Module):
>>>
>>>     # convert pred to fp16
>>>     @auto_fp16(apply_to=('pred', ))
>>>     def do_something(self, pred, others):
>>>         pass
mmcv.runner.force_fp32(apply_to=None, out_fp16=False)[source]

Decorator to convert input arguments to fp32 in force.

This decorator is useful when you write custom modules and want to support mixed precision training. If there are some inputs that must be processed in fp32 mode, then this decorator can handle it. If inputs arguments are fp16 tensors, they will be converted to fp32 automatically. Arguments other than fp16 tensors are ignored. If you are using PyTorch >= 1.6, torch.cuda.amp is used as the backend, otherwise, original mmcv implementation will be adopted.

Parameters
  • apply_to (Iterable, optional) – The argument names to be converted. None indicates all arguments.

  • out_fp16 (bool) – Whether to convert the output back to fp16.

Example

>>> import torch.nn as nn
>>> class MyModule1(nn.Module):
>>>
>>>     # Convert x and y to fp32
>>>     @force_fp32()
>>>     def loss(self, x, y):
>>>         pass
>>> import torch.nn as nn
>>> class MyModule2(nn.Module):
>>>
>>>     # convert pred to fp32
>>>     @force_fp32(apply_to=('pred', ))
>>>     def post_process(self, pred, others):
>>>         pass
mmcv.runner.get_host_info()[source]

Get hostname and username.

Return empty string if exception raised, e.g. getpass.getuser() will lead to error in docker container

mmcv.runner.get_priority(priority)[source]

Get priority value.

Parameters

priority (int or str or Priority) – Priority.

Returns

The priority value.

Return type

int

mmcv.runner.load_checkpoint(model, filename, map_location=None, strict=False, logger=None, revise_keys=[('^module\\.', '')])[source]

Load checkpoint from a file or URI.

Parameters
  • model (Module) – Module to load checkpoint.

  • filename (str) – Accept local filepath, URL, torchvision://xxx, open-mmlab://xxx. Please refer to docs/model_zoo.md for details.

  • map_location (str) – Same as torch.load().

  • strict (bool) – Whether to allow different params for the model and checkpoint.

  • logger (logging.Logger or None) – The logger for error message.

  • revise_keys (list) – A list of customized keywords to modify the state_dict in checkpoint. Each item is a (pattern, replacement) pair of the regular expression operations. Default: strip the prefix ‘module.’ by [(r’^module.’, ‘’)].

Returns

The loaded checkpoint.

Return type

dict or OrderedDict

mmcv.runner.load_state_dict(module, state_dict, strict=False, logger=None)[source]

Load state_dict to a module.

This method is modified from torch.nn.Module.load_state_dict(). Default value for strict is set to False and the message for param mismatch will be shown even if strict is False.

Parameters
  • module (Module) – Module that receives the state_dict.

  • state_dict (OrderedDict) – Weights.

  • strict (bool) – whether to strictly enforce that the keys in state_dict match the keys returned by this module’s state_dict() function. Default: False.

  • logger (logging.Logger, optional) – Logger to log the error message. If not specified, print function will be used.

mmcv.runner.obj_from_dict(info, parent=None, default_args=None)[source]

Initialize an object from dict.

The dict must contain the key “type”, which indicates the object type, it can be either a string or type, such as “list” or list. Remaining fields are treated as the arguments for constructing the object.

Parameters
  • info (dict) – Object types and arguments.

  • parent (module) – Module which may containing expected object classes.

  • default_args (dict, optional) – Default arguments for initializing the object.

Returns

Object built from the dict.

Return type

any type

mmcv.runner.save_checkpoint(model, filename, optimizer=None, meta=None, file_client_args=None)[source]

Save checkpoint to file.

The checkpoint will have 3 fields: meta, state_dict and optimizer. By default meta will contain version and time info.

Parameters
  • model (Module) – Module whose params are to be saved.

  • filename (str) – Checkpoint filename.

  • optimizer (Optimizer, optional) – Optimizer to be saved.

  • meta (dict, optional) – Metadata to be saved in checkpoint.

  • file_client_args (dict, optional) – Arguments to instantiate a FileClient. See mmcv.fileio.FileClient for details. Default: None. New in version 1.3.16.

mmcv.runner.set_random_seed(seed, deterministic=False, use_rank_shift=False)[source]

Set random seed.

Parameters
  • seed (int) – Seed to be used.

  • deterministic (bool) – Whether to set the deterministic option for CUDNN backend, i.e., set torch.backends.cudnn.deterministic to True and torch.backends.cudnn.benchmark to False. Default: False.

  • rank_shift (bool) – Whether to add rank number to the random seed to have different random seed in different threads. Default: False.

mmcv.runner.weights_to_cpu(state_dict)[source]

Copy a model state_dict to cpu.

Parameters

state_dict (OrderedDict) – Model weights on GPU.

Returns

Model weights on GPU.

Return type

OrderedDict

mmcv.runner.wrap_fp16_model(model)[source]

Wrap the FP32 model to FP16.

If you are using PyTorch >= 1.6, torch.cuda.amp is used as the backend, otherwise, original mmcv implementation will be adopted.

For PyTorch >= 1.6, this function will 1. Set fp16 flag inside the model to True.

Otherwise: 1. Convert FP32 model to FP16. 2. Remain some necessary layers to be FP32, e.g., normalization layers. 3. Set fp16_enabled flag inside the model to True.

Parameters

model (nn.Module) – Model in FP32.

engine

mmcv.engine.collect_results_cpu(result_part, size, tmpdir=None)[source]

Collect results under cpu mode.

On cpu mode, this function will save the results on different gpus to tmpdir and collect them by the rank 0 worker.

Parameters
  • result_part (list) – Result list containing result parts to be collected.

  • size (int) – Size of the results, commonly equal to length of the results.

  • tmpdir (str | None) – temporal directory for collected results to store. If set to None, it will create a random temporal directory for it.

Returns

The collected results.

Return type

list

mmcv.engine.collect_results_gpu(result_part, size)[source]

Collect results under gpu mode.

On gpu mode, this function will encode results to gpu tensors and use gpu communication for results collection.

Parameters
  • result_part (list) – Result list containing result parts to be collected.

  • size (int) – Size of the results, commonly equal to length of the results.

Returns

The collected results.

Return type

list

mmcv.engine.multi_gpu_test(model, data_loader, tmpdir=None, gpu_collect=False)[source]

Test model with multiple gpus.

This method tests model with multiple gpus and collects the results under two different modes: gpu and cpu modes. By setting gpu_collect=True, it encodes results to gpu tensors and use gpu communication for results collection. On cpu mode it saves the results on different gpus to tmpdir and collects them by the rank 0 worker.

Parameters
  • model (nn.Module) – Model to be tested.

  • data_loader (nn.Dataloader) – Pytorch data loader.

  • tmpdir (str) – Path of directory to save the temporary results from different gpus under cpu mode.

  • gpu_collect (bool) – Option to use either gpu or cpu to collect results.

Returns

The prediction results.

Return type

list

mmcv.engine.single_gpu_test(model, data_loader)[source]

Test model with a single gpu.

This method tests model with a single gpu and displays test progress bar.

Parameters
  • model (nn.Module) – Model to be tested.

  • data_loader (nn.Dataloader) – Pytorch data loader.

Returns

The prediction results.

Return type

list

ops

class mmcv.ops.BorderAlign(pool_size)[source]

Border align pooling layer.

Applies border_align over the input feature based on predicted bboxes. The details were described in the paper BorderDet: Border Feature for Dense Object Detection.

For each border line (e.g. top, left, bottom or right) of each box, border_align does the following:

  1. uniformly samples pool_size +1 positions on this line, involving the start and end points.

  2. the corresponding features on these points are computed by bilinear interpolation.

  3. max pooling over all the pool_size +1 positions are used for computing pooled feature.

Parameters

pool_size (int) – number of positions sampled over the boxes’ borders (e.g. top, bottom, left, right).

forward(input, boxes)[source]
Parameters
  • input – Features with shape [N,4C,H,W]. Channels ranged in [0,C), [C,2C), [2C,3C), [3C,4C) represent the top, left, bottom, right features respectively.

  • boxes – Boxes with shape [N,H*W,4]. Coordinate format (x1,y1,x2,y2).

Returns

Pooled features with shape [N,C,H*W,4]. The order is (top,left,bottom,right) for the last dimension.

Return type

torch.Tensor

class mmcv.ops.CARAFE(kernel_size, group_size, scale_factor)[source]

CARAFE: Content-Aware ReAssembly of FEatures

Please refer to CARAFE: Content-Aware ReAssembly of FEatures for more details.

Parameters
  • kernel_size (int) – reassemble kernel size

  • group_size (int) – reassemble group size

  • scale_factor (int) – upsample ratio

Returns

upsampled feature map

forward(features, masks)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.ops.CARAFENaive(kernel_size, group_size, scale_factor)[source]
forward(features, masks)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.ops.CARAFEPack(channels, scale_factor, up_kernel=5, up_group=1, encoder_kernel=3, encoder_dilation=1, compressed_channels=64)[source]

A unified package of CARAFE upsampler that contains: 1) channel compressor 2) content encoder 3) CARAFE op.

Official implementation of ICCV 2019 paper CARAFE: Content-Aware ReAssembly of FEatures.

Parameters
  • channels (int) – input feature channels

  • scale_factor (int) – upsample ratio

  • up_kernel (int) – kernel size of CARAFE op

  • up_group (int) – group size of CARAFE op

  • encoder_kernel (int) – kernel size of content encoder

  • encoder_dilation (int) – dilation of content encoder

  • compressed_channels (int) – output channels of channels compressor

Returns

upsampled feature map

forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

mmcv.ops.Conv2d

alias of mmcv.ops.deprecated_wrappers.Conv2d_deprecated

mmcv.ops.ConvTranspose2d

alias of mmcv.ops.deprecated_wrappers.ConvTranspose2d_deprecated

class mmcv.ops.CornerPool(mode)[source]

Corner Pooling.

Corner Pooling is a new type of pooling layer that helps a convolutional network better localize corners of bounding boxes.

Please refer to CornerNet: Detecting Objects as Paired Keypoints for more details.

Code is modified from https://github.com/princeton-vl/CornerNet-Lite.

Parameters

mode (str) –

Pooling orientation for the pooling layer

  • ’bottom’: Bottom Pooling

  • ’left’: Left Pooling

  • ’right’: Right Pooling

  • ’top’: Top Pooling

Returns

Feature map after pooling.

forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.ops.Correlation(kernel_size: int = 1, max_displacement: int = 1, stride: int = 1, padding: int = 0, dilation: int = 1, dilation_patch: int = 1)[source]

Correlation operator

This correlation operator works for optical flow correlation computation.

There are two batched tensors with shape \((N, C, H, W)\), and the correlation output’s shape is \((N, max\_displacement \times 2 + 1, max\_displacement * 2 + 1, H_{out}, W_{out})\)

where

\[H_{out} = \left\lfloor\frac{H_{in} + 2 \times padding - dilation \times (kernel\_size - 1) - 1} {stride} + 1\right\rfloor\]
\[W_{out} = \left\lfloor\frac{W_{in} + 2 \times padding - dilation \times (kernel\_size - 1) - 1} {stride} + 1\right\rfloor\]

the correlation item \((N_i, dy, dx)\) is formed by taking the sliding window convolution between input1 and shifted input2,

\[Corr(N_i, dx, dy) = \sum_{c=0}^{C-1} input1(N_i, c) \star \mathcal{S}(input2(N_i, c), dy, dx)\]

where \(\star\) is the valid 2d sliding window convolution operator, and \(\mathcal{S}\) means shifting the input features (auto-complete zero marginal), and \(dx, dy\) are shifting distance, \(dx, dy \in [-max\_displacement \times dilation\_patch, max\_displacement \times dilation\_patch]\).

Parameters
  • kernel_size (int) – The size of sliding window i.e. local neighborhood representing the center points and involved in correlation computation. Defaults to 1.

  • max_displacement (int) – The radius for computing correlation volume, but the actual working space can be dilated by dilation_patch. Defaults to 1.

  • stride (int) – The stride of the sliding blocks in the input spatial dimensions. Defaults to 1.

  • padding (int) – Zero padding added to all four sides of the input1. Defaults to 0.

  • dilation (int) – The spacing of local neighborhood that will involved in correlation. Defaults to 1.

  • dilation_patch (int) – The spacing between position need to compute correlation. Defaults to 1.

forward(input1: torch.Tensor, input2: torch.Tensor)torch.Tensor[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.ops.CrissCrossAttention(in_channels)[source]

Criss-Cross Attention Module.

Note

Before v1.3.13, we use a CUDA op. Since v1.3.13, we switch to a pure PyTorch and equivalent implementation. For more details, please refer to https://github.com/open-mmlab/mmcv/pull/1201.

Speed comparison for one forward pass

  • Input size: [2,512,97,97]

  • Device: 1 NVIDIA GeForce RTX 2080 Ti

PyTorch version

CUDA version

Relative speed

with torch.no_grad()

0.00554402 s

0.0299619 s

5.4x

no with torch.no_grad()

0.00562803 s

0.0301349 s

5.4x

Parameters

in_channels (int) – Channels of the input feature map.

forward(x)[source]

forward function of Criss-Cross Attention.

Parameters

x (torch.Tensor) – Input feature with the shape of (batch_size, in_channels, height, width).

Returns

Output of the layer, with the shape of (batch_size, in_channels, height, width)

Return type

torch.Tensor

class mmcv.ops.DeformConv2d(in_channels: int, out_channels: int, kernel_size: Union[int, Tuple[int, ...]], stride: Union[int, Tuple[int, ...]] = 1, padding: Union[int, Tuple[int, ...]] = 0, dilation: Union[int, Tuple[int, ...]] = 1, groups: int = 1, deform_groups: int = 1, bias: bool = False, im2col_step: int = 32)[source]

Deformable 2D convolution.

Applies a deformable 2D convolution over an input signal composed of several input planes. DeformConv2d was described in the paper Deformable Convolutional Networks

Note

The argument im2col_step was added in version 1.3.17, which means number of samples processed by the im2col_cuda_kernel per call. It enables users to define batch_size and im2col_step more flexibly and solved issue mmcv#1440.

Parameters
  • in_channels (int) – Number of channels in the input image.

  • out_channels (int) – Number of channels produced by the convolution.

  • kernel_size (int, tuple) – Size of the convolving kernel.

  • stride (int, tuple) – Stride of the convolution. Default: 1.

  • padding (int or tuple) – Zero-padding added to both sides of the input. Default: 0.

  • dilation (int or tuple) – Spacing between kernel elements. Default: 1.

  • groups (int) – Number of blocked connections from input. channels to output channels. Default: 1.

  • deform_groups (int) – Number of deformable group partitions.

  • bias (bool) – If True, adds a learnable bias to the output. Default: False.

  • im2col_step (int) – Number of samples processed by im2col_cuda_kernel per call. It will work when batch_size > im2col_step, but batch_size must be divisible by im2col_step. Default: 32. New in version 1.3.17.

forward(x: torch.Tensor, offset: torch.Tensor)torch.Tensor[source]

Deformable Convolutional forward function.

Parameters
  • x (Tensor) – Input feature, shape (B, C_in, H_in, W_in)

  • offset (Tensor) –

    Offset for deformable convolution, shape (B, deform_groups*kernel_size[0]*kernel_size[1]*2, H_out, W_out), H_out, W_out are equal to the output’s.

    An offset is like [y0, x0, y1, x1, y2, x2, …, y8, x8]. The spatial arrangement is like:

    (x0, y0) (x1, y1) (x2, y2)
    (x3, y3) (x4, y4) (x5, y5)
    (x6, y6) (x7, y7) (x8, y8)
    

Returns

Output of the layer.

Return type

Tensor

class mmcv.ops.DeformConv2dPack(*args, **kwargs)[source]

A Deformable Conv Encapsulation that acts as normal Conv layers.

The offset tensor is like [y0, x0, y1, x1, y2, x2, …, y8, x8]. The spatial arrangement is like:

(x0, y0) (x1, y1) (x2, y2)
(x3, y3) (x4, y4) (x5, y5)
(x6, y6) (x7, y7) (x8, y8)
Parameters
  • in_channels (int) – Same as nn.Conv2d.

  • out_channels (int) – Same as nn.Conv2d.

  • kernel_size (int or tuple[int]) – Same as nn.Conv2d.

  • stride (int or tuple[int]) – Same as nn.Conv2d.

  • padding (int or tuple[int]) – Same as nn.Conv2d.

  • dilation (int or tuple[int]) – Same as nn.Conv2d.

  • groups (int) – Same as nn.Conv2d.

  • bias (bool or str) – If specified as auto, it will be decided by the norm_cfg. Bias will be set as True if norm_cfg is None, otherwise False.

forward(x)[source]

Deformable Convolutional forward function.

Parameters
  • x (Tensor) – Input feature, shape (B, C_in, H_in, W_in)

  • offset (Tensor) –

    Offset for deformable convolution, shape (B, deform_groups*kernel_size[0]*kernel_size[1]*2, H_out, W_out), H_out, W_out are equal to the output’s.

    An offset is like [y0, x0, y1, x1, y2, x2, …, y8, x8]. The spatial arrangement is like:

    (x0, y0) (x1, y1) (x2, y2)
    (x3, y3) (x4, y4) (x5, y5)
    (x6, y6) (x7, y7) (x8, y8)
    

Returns

Output of the layer.

Return type

Tensor

class mmcv.ops.DeformRoIPool(output_size, spatial_scale=1.0, sampling_ratio=0, gamma=0.1)[source]
forward(input, rois, offset=None)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.ops.DeformRoIPoolPack(output_size, output_channels, deform_fc_channels=1024, spatial_scale=1.0, sampling_ratio=0, gamma=0.1)[source]
forward(input, rois)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.ops.DynamicScatter(voxel_size, point_cloud_range, average_points: bool)[source]

Scatters points into voxels, used in the voxel encoder with dynamic voxelization.

Note

The CPU and GPU implementation get the same output, but have numerical difference after summation and division (e.g., 5e-7).

Parameters
  • voxel_size (list) – list [x, y, z] size of three dimension.

  • point_cloud_range (list) – The coordinate range of points, [x_min, y_min, z_min, x_max, y_max, z_max].

  • average_points (bool) – whether to use avg pooling to scatter points into voxel.

forward(points, coors)[source]

Scatters points/features into voxels.

Parameters
  • points (torch.Tensor) – Points to be reduced into voxels.

  • coors (torch.Tensor) – Corresponding voxel coordinates (specifically multi-dim voxel index) of each points.

Returns

A tuple contains two elements. The first one is the voxel features with shape [M, C] which are respectively reduced from input features that share the same voxel coordinates. The second is voxel coordinates with shape [M, ndim].

Return type

tuple[torch.Tensor]

forward_single(points, coors)[source]

Scatters points into voxels.

Parameters
  • points (torch.Tensor) – Points to be reduced into voxels.

  • coors (torch.Tensor) – Corresponding voxel coordinates (specifically multi-dim voxel index) of each points.

Returns

A tuple contains two elements. The first one is the voxel features with shape [M, C] which are respectively reduced from input features that share the same voxel coordinates. The second is voxel coordinates with shape [M, ndim].

Return type

tuple[torch.Tensor]

class mmcv.ops.FusedBiasLeakyReLU(num_channels, negative_slope=0.2, scale=1.4142135623730951)[source]

Fused bias leaky ReLU.

This function is introduced in the StyleGAN2: Analyzing and Improving the Image Quality of StyleGAN

The bias term comes from the convolution operation. In addition, to keep the variance of the feature map or gradients unchanged, they also adopt a scale similarly with Kaiming initialization. However, since the \(1+{alpha}^2\) is too small, we can just ignore it. Therefore, the final scale is just \(\sqrt{2}\). Of course, you may change it with your own scale.

TODO: Implement the CPU version.

Parameters
  • channel (int) – The channel number of the feature map.

  • negative_slope (float, optional) – Same as nn.LeakyRelu. Defaults to 0.2.

  • scale (float, optional) – A scalar to adjust the variance of the feature map. Defaults to 2**0.5.

forward(input)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.ops.GroupAll(use_xyz: bool = True)[source]

Group xyz with feature.

Parameters

use_xyz (bool) – Whether to use xyz.

forward(xyz: torch.Tensor, new_xyz: torch.Tensor, features: Optional[torch.Tensor] = None)[source]
Parameters
  • xyz (Tensor) – (B, N, 3) xyz coordinates of the features.

  • new_xyz (Tensor) – new xyz coordinates of the features.

  • features (Tensor) – (B, C, N) features to group.

Returns

(B, C + 3, 1, N) Grouped feature.

Return type

Tensor

mmcv.ops.Linear

alias of mmcv.ops.deprecated_wrappers.Linear_deprecated

class mmcv.ops.MaskedConv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True)[source]

A MaskedConv2d which inherits the official Conv2d.

The masked forward doesn’t implement the backward function and only supports the stride parameter to be 1 currently.

forward(input, mask=None)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

mmcv.ops.MaxPool2d

alias of mmcv.ops.deprecated_wrappers.MaxPool2d_deprecated

class mmcv.ops.ModulatedDeformConv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, deform_groups=1, bias=True)[source]
forward(x, offset, mask)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.ops.ModulatedDeformConv2dPack(*args, **kwargs)[source]

A ModulatedDeformable Conv Encapsulation that acts as normal Conv layers.

Parameters
  • in_channels (int) – Same as nn.Conv2d.

  • out_channels (int) – Same as nn.Conv2d.

  • kernel_size (int or tuple[int]) – Same as nn.Conv2d.

  • stride (int) – Same as nn.Conv2d, while tuple is not supported.

  • padding (int) – Same as nn.Conv2d, while tuple is not supported.

  • dilation (int) – Same as nn.Conv2d, while tuple is not supported.

  • groups (int) – Same as nn.Conv2d.

  • bias (bool or str) – If specified as auto, it will be decided by the norm_cfg. Bias will be set as True if norm_cfg is None, otherwise False.

forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.ops.ModulatedDeformRoIPoolPack(output_size, output_channels, deform_fc_channels=1024, spatial_scale=1.0, sampling_ratio=0, gamma=0.1)[source]
forward(input, rois)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.ops.MultiScaleDeformableAttention(embed_dims=256, num_heads=8, num_levels=4, num_points=4, im2col_step=64, dropout=0.1, batch_first=False, norm_cfg=None, init_cfg=None)[source]

An attention module used in Deformable-Detr.

Deformable DETR: Deformable Transformers for End-to-End Object Detection..

Parameters
  • embed_dims (int) – The embedding dimension of Attention. Default: 256.

  • num_heads (int) – Parallel attention heads. Default: 64.

  • num_levels (int) – The number of feature map used in Attention. Default: 4.

  • num_points (int) – The number of sampling points for each query in each head. Default: 4.

  • im2col_step (int) – The step used in image_to_column. Default: 64.

  • dropout (float) – A Dropout layer on inp_identity. Default: 0.1.

  • batch_first (bool) – Key, Query and Value are shape of (batch, n, embed_dim) or (n, batch, embed_dim). Default to False.

  • norm_cfg (dict) – Config dict for normalization layer. Default: None.

  • (obj (init_cfg) – mmcv.ConfigDict): The Config for initialization. Default: None.

forward(query, key=None, value=None, identity=None, query_pos=None, key_padding_mask=None, reference_points=None, spatial_shapes=None, level_start_index=None, **kwargs)[source]

Forward Function of MultiScaleDeformAttention.

Parameters
  • query (torch.Tensor) – Query of Transformer with shape (num_query, bs, embed_dims).

  • key (torch.Tensor) – The key tensor with shape (num_key, bs, embed_dims).

  • value (torch.Tensor) – The value tensor with shape (num_key, bs, embed_dims).

  • identity (torch.Tensor) – The tensor used for addition, with the same shape as query. Default None. If None, query will be used.

  • query_pos (torch.Tensor) – The positional encoding for query. Default: None.

  • key_pos (torch.Tensor) – The positional encoding for key. Default None.

  • reference_points (torch.Tensor) – The normalized reference points with shape (bs, num_query, num_levels, 2), all elements is range in [0, 1], top-left (0,0), bottom-right (1, 1), including padding area. or (N, Length_{query}, num_levels, 4), add additional two dimensions is (w, h) to form reference boxes.

  • key_padding_mask (torch.Tensor) – ByteTensor for query, with shape [bs, num_key].

  • spatial_shapes (torch.Tensor) – Spatial shape of features in different levels. With shape (num_levels, 2), last dimension represents (h, w).

  • level_start_index (torch.Tensor) – The start index of each level. A tensor has shape (num_levels, ) and can be represented as [0, h_0*w_0, h_0*w_0+h_1*w_1, …].

Returns

forwarded results with shape [num_query, bs, embed_dims].

Return type

torch.Tensor

init_weights()[source]

Default initialization for Parameters of Module.

class mmcv.ops.PSAMask(psa_type, mask_size=None)[source]
forward(input)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.ops.PointsSampler(num_point: List[int], fps_mod_list: List[str] = ['D-FPS'], fps_sample_range_list: List[int] = [- 1])[source]

Points sampling.

Parameters
  • num_point (list[int]) – Number of sample points.

  • fps_mod_list (list[str], optional) – Type of FPS method, valid mod [‘F-FPS’, ‘D-FPS’, ‘FS’], Default: [‘D-FPS’]. F-FPS: using feature distances for FPS. D-FPS: using Euclidean distances of points for FPS. FS: using F-FPS and D-FPS simultaneously.

  • fps_sample_range_list (list[int], optional) – Range of points to apply FPS. Default: [-1].

forward(points_xyz, features)[source]
Parameters
  • points_xyz (torch.Tensor) – (B, N, 3) xyz coordinates of the points.

  • features (torch.Tensor) – (B, C, N) features of the points.

Returns

(B, npoint, sample_num) Indices of sampled points.

Return type

torch.Tensor

class mmcv.ops.QueryAndGroup(max_radius, sample_num, min_radius=0, use_xyz=True, return_grouped_xyz=False, normalize_xyz=False, uniform_sample=False, return_unique_cnt=False, return_grouped_idx=False)[source]

Groups points with a ball query of radius.

Parameters
  • max_radius (float) – The maximum radius of the balls. If None is given, we will use kNN sampling instead of ball query.

  • sample_num (int) – Maximum number of features to gather in the ball.

  • min_radius (float, optional) – The minimum radius of the balls. Default: 0.

  • use_xyz (bool, optional) – Whether to use xyz. Default: True.

  • return_grouped_xyz (bool, optional) – Whether to return grouped xyz. Default: False.

  • normalize_xyz (bool, optional) – Whether to normalize xyz. Default: False.

  • uniform_sample (bool, optional) – Whether to sample uniformly. Default: False

  • return_unique_cnt (bool, optional) – Whether to return the count of unique samples. Default: False.

  • return_grouped_idx (bool, optional) – Whether to return grouped idx. Default: False.

forward(points_xyz, center_xyz, features=None)[source]
Parameters
  • points_xyz (torch.Tensor) – (B, N, 3) xyz coordinates of the points.

  • center_xyz (torch.Tensor) – (B, npoint, 3) coordinates of the centriods.

  • features (torch.Tensor) – (B, C, N) The features of grouped points.

Returns

(B, 3 + C, npoint, sample_num) Grouped concatenated coordinates and features of points.

Return type

torch.Tensor

class mmcv.ops.RiRoIAlignRotated(out_size, spatial_scale, num_samples=0, num_orientations=8, clockwise=False)[source]

Rotation-invariant RoI align pooling layer for rotated proposals.

It accepts a feature map of shape (N, C, H, W) and rois with shape (n, 6) with each roi decoded as (batch_index, center_x, center_y, w, h, angle). The angle is in radian.

The details are described in the paper ReDet: A Rotation-equivariant Detector for Aerial Object Detection.

Parameters
  • out_size (tuple) – fixed dimensional RoI output with shape (h, w).

  • spatial_scale (float) – scale the input boxes by this number

  • num_samples (int) – number of inputs samples to take for each output sample. 0 to take samples densely for current models.

  • num_orientations (int) – number of oriented channels.

  • clockwise (bool) – If True, the angle in each proposal follows a clockwise fashion in image space, otherwise, the angle is counterclockwise. Default: False.

forward(features, rois)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.ops.RoIAlign(output_size, spatial_scale=1.0, sampling_ratio=0, pool_mode='avg', aligned=True, use_torchvision=False)[source]

RoI align pooling layer.

Parameters
  • output_size (tuple) – h, w

  • spatial_scale (float) – scale the input boxes by this number

  • sampling_ratio (int) – number of inputs samples to take for each output sample. 0 to take samples densely for current models.

  • pool_mode (str, 'avg' or 'max') – pooling mode in each bin.

  • aligned (bool) – if False, use the legacy implementation in MMDetection. If True, align the results more perfectly.

  • use_torchvision (bool) – whether to use roi_align from torchvision.

Note

The implementation of RoIAlign when aligned=True is modified from https://github.com/facebookresearch/detectron2/

The meaning of aligned=True:

Given a continuous coordinate c, its two neighboring pixel indices (in our pixel model) are computed by floor(c - 0.5) and ceil(c - 0.5). For example, c=1.3 has pixel neighbors with discrete indices [0] and [1] (which are sampled from the underlying signal at continuous coordinates 0.5 and 1.5). But the original roi_align (aligned=False) does not subtract the 0.5 when computing neighboring pixel indices and therefore it uses pixels with a slightly incorrect alignment (relative to our pixel model) when performing bilinear interpolation.

With aligned=True, we first appropriately scale the ROI and then shift it by -0.5 prior to calling roi_align. This produces the correct neighbors;

The difference does not make a difference to the model’s performance if ROIAlign is used together with conv layers.

forward(input, rois)[source]
Parameters
  • input – NCHW images

  • rois – Bx5 boxes. First column is the index into N. The other 4 columns are xyxy.

class mmcv.ops.RoIAlignRotated(out_size, spatial_scale, sample_num=0, aligned=True, clockwise=False)[source]

RoI align pooling layer for rotated proposals.

It accepts a feature map of shape (N, C, H, W) and rois with shape (n, 6) with each roi decoded as (batch_index, center_x, center_y, w, h, angle). The angle is in radian.

Parameters
  • out_size (tuple) – h, w

  • spatial_scale (float) – scale the input boxes by this number

  • sample_num (int) – number of inputs samples to take for each output sample. 0 to take samples densely for current models.

  • aligned (bool) – if False, use the legacy implementation in MMDetection. If True, align the results more perfectly. Default: True.

  • clockwise (bool) – If True, the angle in each proposal follows a clockwise fashion in image space, otherwise, the angle is counterclockwise. Default: False.

Note

The implementation of RoIAlign when aligned=True is modified from https://github.com/facebookresearch/detectron2/

The meaning of aligned=True:

Given a continuous coordinate c, its two neighboring pixel indices (in our pixel model) are computed by floor(c - 0.5) and ceil(c - 0.5). For example, c=1.3 has pixel neighbors with discrete indices [0] and [1] (which are sampled from the underlying signal at continuous coordinates 0.5 and 1.5). But the original roi_align (aligned=False) does not subtract the 0.5 when computing neighboring pixel indices and therefore it uses pixels with a slightly incorrect alignment (relative to our pixel model) when performing bilinear interpolation.

With aligned=True, we first appropriately scale the ROI and then shift it by -0.5 prior to calling roi_align. This produces the correct neighbors;

The difference does not make a difference to the model’s performance if ROIAlign is used together with conv layers.

forward(features, rois)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.ops.RoIAwarePool3d(out_size, max_pts_per_voxel=128, mode='max')[source]

Encode the geometry-specific features of each 3D proposal.

Please refer to PartA2 for more details.

Parameters
  • out_size (int or tuple) – The size of output features. n or [n1, n2, n3].

  • max_pts_per_voxel (int, optional) – The maximum number of points per voxel. Default: 128.

  • mode (str, optional) – Pooling method of RoIAware, ‘max’ or ‘avg’. Default: ‘max’.

forward(rois, pts, pts_feature)[source]
Parameters
  • rois (torch.Tensor) – [N, 7], in LiDAR coordinate, (x, y, z) is the bottom center of rois.

  • pts (torch.Tensor) – [npoints, 3], coordinates of input points.

  • pts_feature (torch.Tensor) – [npoints, C], features of input points.

Returns

Pooled features whose shape is [N, out_x, out_y, out_z, C].

Return type

torch.Tensor

class mmcv.ops.RoIPointPool3d(num_sampled_points=512)[source]

Encode the geometry-specific features of each 3D proposal.

Please refer to Paper of PartA2 for more details.

Parameters

num_sampled_points (int, optional) – Number of samples in each roi. Default: 512.

forward(points, point_features, boxes3d)[source]
Parameters
  • points (torch.Tensor) – Input points whose shape is (B, N, C).

  • point_features (torch.Tensor) – Features of input points whose shape is (B, N, C).

  • boxes3d (B, M, 7), Input bounding boxes whose shape is (B, M, 7) –

Returns

A tuple contains two elements. The first one is the pooled features whose shape is (B, M, 512, 3 + C). The second is an empty flag whose shape is (B, M).

Return type

tuple[torch.Tensor]

class mmcv.ops.RoIPool(output_size, spatial_scale=1.0)[source]
forward(input, rois)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.ops.SAConv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, use_deform=False)[source]

SAC (Switchable Atrous Convolution)

This is an implementation of DetectoRS: Detecting Objects with Recursive Feature Pyramid and Switchable Atrous Convolution.

Parameters
  • in_channels (int) – Number of channels in the input image

  • out_channels (int) – Number of channels produced by the convolution

  • kernel_size (int or tuple) – Size of the convolving kernel

  • stride (int or tuple, optional) – Stride of the convolution. Default: 1

  • padding (int or tuple, optional) – Zero-padding added to both sides of the input. Default: 0

  • padding_mode (string, optional) – 'zeros', 'reflect', 'replicate' or 'circular'. Default: 'zeros'

  • dilation (int or tuple, optional) – Spacing between kernel elements. Default: 1

  • groups (int, optional) – Number of blocked connections from input channels to output channels. Default: 1

  • bias (bool, optional) – If True, adds a learnable bias to the output. Default: True

  • use_deform – If True, replace convolution with deformable convolution. Default: False.

forward(x)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.ops.SigmoidFocalLoss(gamma, alpha, weight=None, reduction='mean')[source]
forward(input, target)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.ops.SimpleRoIAlign(output_size, spatial_scale, aligned=True)[source]
forward(features, rois)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.ops.SoftmaxFocalLoss(gamma, alpha, weight=None, reduction='mean')[source]
forward(input, target)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.ops.SyncBatchNorm(num_features, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True, group=None, stats_mode='default')[source]

Synchronized Batch Normalization.

Parameters
  • num_features (int) – number of features/chennels in input tensor

  • eps (float, optional) – a value added to the denominator for numerical stability. Defaults to 1e-5.

  • momentum (float, optional) – the value used for the running_mean and running_var computation. Defaults to 0.1.

  • affine (bool, optional) – whether to use learnable affine parameters. Defaults to True.

  • track_running_stats (bool, optional) – whether to track the running mean and variance during training. When set to False, this module does not track such statistics, and initializes statistics buffers running_mean and running_var as None. When these buffers are None, this module always uses batch statistics in both training and eval modes. Defaults to True.

  • group (int, optional) – synchronization of stats happen within each process group individually. By default it is synchronization across the whole world. Defaults to None.

  • stats_mode (str, optional) – The statistical mode. Available options includes 'default' and 'N'. Defaults to ‘default’. When stats_mode=='default', it computes the overall statistics using those from each worker with equal weight, i.e., the statistics are synchronized and simply divied by group. This mode will produce inaccurate statistics when empty tensors occur. When stats_mode=='N', it compute the overall statistics using the total number of batches in each worker ignoring the number of group, i.e., the statistics are synchronized and then divied by the total batch N. This mode is beneficial when empty tensors occur during training, as it average the total mean by the real number of batch.

forward(input)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class mmcv.ops.TINShift[source]

Temporal Interlace Shift.

Temporal Interlace shift is a differentiable temporal-wise frame shifting which is proposed in “Temporal Interlacing Network”

Please refer to Temporal Interlacing Network for more details.

Code is modified from https://github.com/mit-han-lab/temporal-shift-module

forward(input, shift)[source]

Perform temporal interlace shift.

Parameters
  • input (torch.Tensor) – Feature map with shape [N, num_segments, C, H * W].

  • shift (torch.Tensor) – Shift tensor with shape [N, num_segments].

Returns

Feature map after temporal interlace shift.

class mmcv.ops.Voxelization(voxel_size, point_cloud_range, max_num_points, max_voxels=20000)[source]

Convert kitti points(N, >=3) to voxels.

Please refer to Point-Voxel CNN for Efficient 3D Deep Learning for more details.

Parameters
  • voxel_size (tuple or float) – The size of voxel with the shape of [3].

  • point_cloud_range (tuple or float) – The coordinate range of voxel with the shape of [6].

  • max_num_points (int) – maximum points contained in a voxel. if max_points=-1, it means using dynamic_voxelize.

  • max_voxels (int, optional) – maximum voxels this function create. for second, 20000 is a good choice. Users should shuffle points before call this function because max_voxels may drop points. Default: 20000.

forward(input)[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

mmcv.ops.batched_nms(boxes, scores, idxs, nms_cfg, class_agnostic=False)[source]

Performs non-maximum suppression in a batched fashion.

Modified from torchvision/ops/boxes.py#L39. In order to perform NMS independently per class, we add an offset to all the boxes. The offset is dependent only on the class idx, and is large enough so that boxes from different classes do not overlap.

Note

In v1.4.1 and later, batched_nms supports skipping the NMS and returns sorted raw results when nms_cfg is None.

Parameters
  • boxes (torch.Tensor) – boxes in shape (N, 4).

  • scores (torch.Tensor) – scores in shape (N, ).

  • idxs (torch.Tensor) – each index value correspond to a bbox cluster, and NMS will not be applied between elements of different idxs, shape (N, ).

  • nms_cfg (dict | None) –

    Supports skipping the nms when nms_cfg is None, otherwise it should specify nms type and other parameters like iou_thr. Possible keys includes the following.

    • iou_thr (float): IoU threshold used for NMS.

    • split_thr (float): threshold number of boxes. In some cases the number of boxes is large (e.g., 200k). To avoid OOM during training, the users could set split_thr to a small value. If the number of boxes is greater than the threshold, it will perform NMS on each group of boxes separately and sequentially. Defaults to 10000.

  • class_agnostic (bool) – if true, nms is class agnostic, i.e. IoU thresholding happens over all boxes, regardless of the predicted class.

Returns

kept dets and indice.

  • boxes (Tensor): Bboxes with score after nms, has shape (num_bboxes, 5). last dimension 5 arrange as (x1, y1, x2, y2, score)

  • keep (Tensor): The indices of remaining boxes in input boxes.

Return type

tuple

mmcv.ops.bbox_overlaps(bboxes1, bboxes2, mode='iou', aligned=False, offset=0)[source]

Calculate overlap between two set of bboxes.

If aligned is False, then calculate the ious between each bbox of bboxes1 and bboxes2, otherwise the ious between each aligned pair of bboxes1 and bboxes2.

Parameters
  • bboxes1 (torch.Tensor) – shape (m, 4) in <x1, y1, x2, y2> format or empty.

  • bboxes2 (torch.Tensor) – shape (n, 4) in <x1, y1, x2, y2> format or empty. If aligned is True, then m and n must be equal.

  • mode (str) – “iou” (intersection over union) or iof (intersection over foreground).

Returns

Return the ious betweens boxes. If aligned is False, the shape of ious is (m, n) else (m, 1).

Return type

torch.Tensor

Example

>>> bboxes1 = torch.FloatTensor([
>>>     [0, 0, 10, 10],
>>>     [10, 10, 20, 20],
>>>     [32, 32, 38, 42],
>>> ])
>>> bboxes2 = torch.FloatTensor([
>>>     [0, 0, 10, 20],
>>>     [0, 10, 10, 19],
>>>     [10, 10, 20, 20],
>>> ])
>>> bbox_overlaps(bboxes1, bboxes2)
tensor([[0.5000, 0.0000, 0.0000],
        [0.0000, 0.0000, 1.0000],
        [0.0000, 0.0000, 0.0000]])

Example

>>> empty = torch.FloatTensor([])
>>> nonempty = torch.FloatTensor([
>>>     [0, 0, 10, 9],
>>> ])
>>> assert tuple(bbox_overlaps(empty, nonempty).shape) == (0, 1)
>>> assert tuple(bbox_overlaps(nonempty, empty).shape) == (1, 0)
>>> assert tuple(bbox_overlaps(empty, empty).shape) == (0, 0)
mmcv.ops.box_iou_rotated(bboxes1, bboxes2, mode='iou', aligned=False, clockwise=True)[source]

Return intersection-over-union (Jaccard index) of boxes.

Both sets of boxes are expected to be in (x_center, y_center, width, height, angle) format.

If aligned is False, then calculate the ious between each bbox of bboxes1 and bboxes2, otherwise the ious between each aligned pair of bboxes1 and bboxes2.

Note

The operator assumes:

  1. The positive direction along x axis is left -> right.

  2. The positive direction along y axis is top -> down.

  3. The w border is in parallel with x axis when angle = 0.

However, there are 2 opposite definitions of the positive angular direction, clockwise (CW) and counter-clockwise (CCW). MMCV supports both definitions and uses CW by default.

Please set clockwise=False if you are using the CCW definition.

The coordinate system when clockwise is True (default)

0-------------------> x (0 rad)
|  A-------------B
|  |             |
|  |     box     h
|  |   angle=0   |
|  D------w------C
v
y (pi/2 rad)

In such coordination system the rotation matrix is

\[\begin{split}\begin{pmatrix} \cos\alpha & -\sin\alpha \\ \sin\alpha & \cos\alpha \end{pmatrix}\end{split}\]

The coordinates of the corner point A can be calculated as:

\[\begin{split}P_A= \begin{pmatrix} x_A \\ y_A\end{pmatrix} = \begin{pmatrix} x_{center} \\ y_{center}\end{pmatrix} + \begin{pmatrix}\cos\alpha & -\sin\alpha \\ \sin\alpha & \cos\alpha\end{pmatrix} \begin{pmatrix} -0.5w \\ -0.5h\end{pmatrix} \\ = \begin{pmatrix} x_{center}-0.5w\cos\alpha+0.5h\sin\alpha \\ y_{center}-0.5w\sin\alpha-0.5h\cos\alpha\end{pmatrix}\end{split}\]

The coordinate system when clockwise is False

0-------------------> x (0 rad)
|  A-------------B
|  |             |
|  |     box     h
|  |   angle=0   |
|  D------w------C
v
y (-pi/2 rad)

In such coordination system the rotation matrix is

\[\begin{split}\begin{pmatrix} \cos\alpha & \sin\alpha \\ -\sin\alpha & \cos\alpha \end{pmatrix}\end{split}\]

The coordinates of the corner point A can be calculated as:

\[\begin{split}P_A= \begin{pmatrix} x_A \\ y_A\end{pmatrix} = \begin{pmatrix} x_{center} \\ y_{center}\end{pmatrix} + \begin{pmatrix}\cos\alpha & \sin\alpha \\ -\sin\alpha & \cos\alpha\end{pmatrix} \begin{pmatrix} -0.5w \\ -0.5h\end{pmatrix} \\ = \begin{pmatrix} x_{center}-0.5w\cos\alpha-0.5h\sin\alpha \\ y_{center}+0.5w\sin\alpha-0.5h\cos\alpha\end{pmatrix}\end{split}\]
Parameters
  • boxes1 (torch.Tensor) – rotated bboxes 1. It has shape (N, 5), indicating (x, y, w, h, theta) for each row. Note that theta is in radian.

  • boxes2 (torch.Tensor) – rotated bboxes 2. It has shape (M, 5), indicating (x, y, w, h, theta) for each row. Note that theta is in radian.

  • mode (str) – “iou” (intersection over union) or iof (intersection over foreground).

  • clockwise (bool) – flag indicating whether the positive angular orientation is clockwise. default True. New in version 1.4.3.

Returns

Return the ious betweens boxes. If aligned is False, the shape of ious is (N, M) else (N,).

Return type

torch.Tensor

mmcv.ops.boxes_iou_bev(boxes_a, boxes_b)[source]

Calculate boxes IoU in the Bird’s Eye View.

Parameters
  • boxes_a (torch.Tensor) – Input boxes a with shape (M, 5).

  • boxes_b (torch.Tensor) – Input boxes b with shape (N, 5).

Returns

IoU result with shape (M, N).

Return type

torch.Tensor

mmcv.ops.contour_expand(kernel_mask, internal_kernel_label, min_kernel_area, kernel_num)[source]

Expand kernel contours so that foreground pixels are assigned into instances.

Parameters
  • kernel_mask (np.array or torch.Tensor) – The instance kernel mask with size hxw.

  • internal_kernel_label (np.array or torch.Tensor) – The instance internal kernel label with size hxw.

  • min_kernel_area (int) – The minimum kernel area.

  • kernel_num (int) – The instance kernel number.

Returns

The instance index map with size hxw.

Return type

list

mmcv.ops.convex_giou(pointsets, polygons)[source]

Return generalized intersection-over-union (Jaccard index) between point sets and polygons.

Parameters
  • pointsets (torch.Tensor) – It has shape (N, 18), indicating (x1, y1, x2, y2, …, x9, y9) for each row.

  • polygons (torch.Tensor) – It has shape (N, 8), indicating (x1, y1, x2, y2, x3, y3, x4, y4) for each row.

Returns

The first element is the gious between point sets and polygons with the shape (N,). The second element is the gradient of point sets with the shape (N, 18).

Return type

tuple[torch.Tensor, torch.Tensor]

mmcv.ops.convex_iou(pointsets, polygons)[source]

Return intersection-over-union (Jaccard index) between point sets and polygons.

Parameters
  • pointsets (torch.Tensor) – It has shape (N, 18), indicating (x1, y1, x2, y2, …, x9, y9) for each row.

  • polygons (torch.Tensor) – It has shape (K, 8), indicating (x1, y1, x2, y2, x3, y3, x4, y4) for each row.

Returns

Return the ious between point sets and polygons with the shape (N, K).

Return type

torch.Tensor

mmcv.ops.fused_bias_leakyrelu(input, bias, negative_slope=0.2, scale=1.4142135623730951)[source]

Fused bias leaky ReLU function.

This function is introduced in the StyleGAN2: Analyzing and Improving the Image Quality of StyleGAN

The bias term comes from the convolution operation. In addition, to keep the variance of the feature map or gradients unchanged, they also adopt a scale similarly with Kaiming initialization. However, since the \(1+{alpha}^2\) is too small, we can just ignore it. Therefore, the final scale is just \(\sqrt{2}\). Of course, you may change it with your own scale.

Parameters
  • input (torch.Tensor) – Input feature map.

  • bias (nn.Parameter) – The bias from convolution operation.

  • negative_slope (float, optional) – Same as nn.LeakyRelu. Defaults to 0.2.

  • scale (float, optional) – A scalar to adjust the variance of the feature map. Defaults to 2**0.5.

Returns

Feature map after non-linear activation.

Return type

torch.Tensor

mmcv.ops.min_area_polygons(pointsets)[source]

Find the smallest polygons that surrounds all points in the point sets.

Parameters

pointsets (Tensor) – point sets with shape (N, 18).

Returns

Return the smallest polygons with shape (N, 8).

Return type

torch.Tensor

mmcv.ops.nms(boxes, scores, iou_threshold, offset=0, score_threshold=0, max_num=- 1)[source]

Dispatch to either CPU or GPU NMS implementations.

The input can be either torch tensor or numpy array. GPU NMS will be used if the input is gpu tensor, otherwise CPU NMS will be used. The returned type will always be the same as inputs.

Parameters
  • boxes (torch.Tensor or np.ndarray) – boxes in shape (N, 4).

  • scores (torch.Tensor or np.ndarray) – scores in shape (N, ).

  • iou_threshold (float) – IoU threshold for NMS.

  • offset (int, 0 or 1) – boxes’ width or height is (x2 - x1 + offset).

  • score_threshold (float) – score threshold for NMS.

  • max_num (int) – maximum number of boxes after NMS.

Returns

kept dets (boxes and scores) and indice, which always have the same data type as the input.

Return type

tuple

Example

>>> boxes = np.array([[49.1, 32.4, 51.0, 35.9],
>>>                   [49.3, 32.9, 51.0, 35.3],
>>>                   [49.2, 31.8, 51.0, 35.4],
>>>                   [35.1, 11.5, 39.1, 15.7],
>>>                   [35.6, 11.8, 39.3, 14.2],
>>>                   [35.3, 11.5, 39.9, 14.5],
>>>                   [35.2, 11.7, 39.7, 15.7]], dtype=np.float32)
>>> scores = np.array([0.9, 0.9, 0.5, 0.5, 0.5, 0.4, 0.3],               dtype=np.float32)
>>> iou_threshold = 0.6
>>> dets, inds = nms(boxes, scores, iou_threshold)
>>> assert len(inds) == len(dets) == 3
mmcv.ops.nms_bev(boxes, scores, thresh, pre_max_size=None, post_max_size=None)[source]

NMS function GPU implementation (for BEV boxes). The overlap of two boxes for IoU calculation is defined as the exact overlapping area of the two boxes. In this function, one can also set pre_max_size and post_max_size.

Parameters
  • boxes (torch.Tensor) – Input boxes with the shape of [N, 5] ([x1, y1, x2, y2, ry]).

  • scores (torch.Tensor) – Scores of boxes with the shape of [N].

  • thresh (float) – Overlap threshold of NMS.

  • pre_max_size (int, optional) – Max size of boxes before NMS. Default: None.

  • post_max_size (int, optional) – Max size of boxes after NMS. Default: None.

Returns

Indexes after NMS.

Return type

torch.Tensor

mmcv.ops.nms_match(dets, iou_threshold)[source]

Matched dets into different groups by NMS.

NMS match is Similar to NMS but when a bbox is suppressed, nms match will record the indice of suppressed bbox and form a group with the indice of kept bbox. In each group, indice is sorted as score order.

Parameters
  • dets (torch.Tensor | np.ndarray) – Det boxes with scores, shape (N, 5).

  • iou_thr (float) – IoU thresh for NMS.

Returns

The outer list corresponds different matched group, the inner Tensor corresponds the indices for a group in score order.

Return type

list[torch.Tensor | np.ndarray]

mmcv.ops.nms_normal_bev(boxes, scores, thresh)[source]

Normal NMS function GPU implementation (for BEV boxes). The overlap of two boxes for IoU calculation is defined as the exact overlapping area of the two boxes WITH their yaw angle set to 0.

Parameters
  • boxes (torch.Tensor) – Input boxes with shape (N, 5).

  • scores (torch.Tensor) – Scores of predicted boxes with shape (N).

  • thresh (float) – Overlap threshold of NMS.

Returns

Remaining indices with scores in descending order.

Return type

torch.Tensor

mmcv.ops.nms_rotated(dets, scores, iou_threshold, labels=None, clockwise=True)[source]

Performs non-maximum suppression (NMS) on the rotated boxes according to their intersection-over-union (IoU).

Rotated NMS iteratively removes lower scoring rotated boxes which have an IoU greater than iou_threshold with another (higher scoring) rotated box.

Parameters
  • dets (Tensor) – Rotated boxes in shape (N, 5). They are expected to be in (x_ctr, y_ctr, width, height, angle_radian) format.

  • scores (Tensor) – scores in shape (N, ).

  • iou_threshold (float) – IoU thresh for NMS.

  • labels (Tensor) – boxes’ label in shape (N,).

  • clockwise (bool) – flag indicating whether the positive angular orientation is clockwise. default True. New in version 1.4.3.

Returns

kept dets(boxes and scores) and indice, which is always the same data type as the input.

Return type

tuple

mmcv.ops.pixel_group(score, mask, embedding, kernel_label, kernel_contour, kernel_region_num, distance_threshold)[source]

Group pixels into text instances, which is widely used text detection methods.

Parameters
  • score (np.array or torch.Tensor) – The foreground score with size hxw.

  • mask (np.array or Tensor) – The foreground mask with size hxw.

  • embedding (np.array or torch.Tensor) – The embedding with size hxwxc to distinguish instances.

  • kernel_label (np.array or torch.Tensor) – The instance kernel index with size hxw.

  • kernel_contour (np.array or torch.Tensor) – The kernel contour with size hxw.

  • kernel_region_num (int) – The instance kernel region number.

  • distance_threshold (float) – The embedding distance threshold between kernel and pixel in one instance.

Returns

The instance coordinates and attributes list. Each element consists of averaged confidence, pixel number, and coordinates (x_i, y_i for all pixels) in order.

Return type

list[list[float]]

mmcv.ops.point_sample(input, points, align_corners=False, **kwargs)[source]

A wrapper around grid_sample() to support 3D point_coords tensors Unlike torch.nn.functional.grid_sample() it assumes point_coords to lie inside [0, 1] x [0, 1] square.

Parameters
  • input (torch.Tensor) – Feature map, shape (N, C, H, W).

  • points (torch.Tensor) – Image based absolute point coordinates (normalized), range [0, 1] x [0, 1], shape (N, P, 2) or (N, Hgrid, Wgrid, 2).

  • align_corners (bool, optional) – Whether align_corners. Default: False

Returns

Features of point on input, shape (N, C, P) or (N, C, Hgrid, Wgrid).

Return type

torch.Tensor

mmcv.ops.points_in_boxes_all(points, boxes)[source]

Find all boxes in which each point is (CUDA).

Parameters
  • points (torch.Tensor) – [B, M, 3], [x, y, z] in LiDAR/DEPTH coordinate

  • boxes (torch.Tensor) – [B, T, 7], num_valid_boxes <= T, [x, y, z, x_size, y_size, z_size, rz], (x, y, z) is the bottom center.

Returns

Return the box indices of points with the shape of (B, M, T). Default background = 0.

Return type

torch.Tensor

mmcv.ops.points_in_boxes_cpu(points, boxes)[source]

Find all boxes in which each point is (CPU). The CPU version of points_in_boxes_all().

Parameters
  • points (torch.Tensor) – [B, M, 3], [x, y, z] in LiDAR/DEPTH coordinate

  • boxes (torch.Tensor) – [B, T, 7], num_valid_boxes <= T, [x, y, z, x_size, y_size, z_size, rz], (x, y, z) is the bottom center.

Returns

Return the box indices of points with the shape of (B, M, T). Default background = 0.

Return type

torch.Tensor

mmcv.ops.points_in_boxes_part(points, boxes)[source]

Find the box in which each point is (CUDA).

Parameters
  • points (torch.Tensor) – [B, M, 3], [x, y, z] in LiDAR/DEPTH coordinate.

  • boxes (torch.Tensor) – [B, T, 7], num_valid_boxes <= T, [x, y, z, x_size, y_size, z_size, rz] in LiDAR/DEPTH coordinate, (x, y, z) is the bottom center.

Returns

Return the box indices of points with the shape of (B, M). Default background = -1.

Return type

torch.Tensor

mmcv.ops.points_in_polygons(points, polygons)[source]

Judging whether points are inside polygons, which is used in the ATSS assignment for the rotated boxes.

It should be noted that when the point is just at the polygon boundary, the judgment will be inaccurate, but the effect on assignment is limited.

Parameters
  • points (torch.Tensor) – It has shape (B, 2), indicating (x, y). M means the number of predicted points.

  • polygons (torch.Tensor) – It has shape (M, 8), indicating (x1, y1, x2, y2, x3, y3, x4, y4). M means the number of ground truth polygons.

Returns

Return the result with the shape of (B, M), 1 indicates that the point is inside the polygon, 0 indicates that the point is outside the polygon.

Return type

torch.Tensor

mmcv.ops.rel_roi_point_to_rel_img_point(rois, rel_roi_points, img, spatial_scale=1.0)[source]

Convert roi based relative point coordinates to image based absolute point coordinates.

Parameters
  • rois (torch.Tensor) – RoIs or BBoxes, shape (N, 4) or (N, 5)

  • rel_roi_points (torch.Tensor) – Point coordinates inside RoI, relative to RoI, location, range (0, 1), shape (N, P, 2)

  • img (tuple or torch.Tensor) – (height, width) of image or feature map.

  • spatial_scale (float, optional) – Scale points by this factor. Default: 1.

Returns

Image based relative point coordinates for sampling, shape (N, P, 2).

Return type

torch.Tensor

mmcv.ops.soft_nms(boxes, scores, iou_threshold=0.3, sigma=0.5, min_score=0.001, method='linear', offset=0)[source]

Dispatch to only CPU Soft NMS implementations.

The input can be either a torch tensor or numpy array. The returned type will always be the same as inputs.

Parameters
  • boxes (torch.Tensor or np.ndarray) – boxes in shape (N, 4).

  • scores (torch.Tensor or np.ndarray) – scores in shape (N, ).

  • iou_threshold (float) – IoU threshold for NMS.

  • sigma (float) – hyperparameter for gaussian method

  • min_score (float) – score filter threshold

  • method (str) – either ‘linear’ or ‘gaussian’

  • offset (int, 0 or 1) – boxes’ width or height is (x2 - x1 + offset).

Returns

kept dets (boxes and scores) and indice, which always have the same data type as the input.

Return type

tuple

Example

>>> boxes = np.array([[4., 3., 5., 3.],
>>>                   [4., 3., 5., 4.],
>>>                   [3., 1., 3., 1.],
>>>                   [3., 1., 3., 1.],
>>>                   [3., 1., 3., 1.],
>>>                   [3., 1., 3., 1.]], dtype=np.float32)
>>> scores = np.array([0.9, 0.9, 0.5, 0.5, 0.4, 0.0], dtype=np.float32)
>>> iou_threshold = 0.6
>>> dets, inds = soft_nms(boxes, scores, iou_threshold, sigma=0.5)
>>> assert len(inds) == len(dets) == 5
mmcv.ops.upfirdn2d(input, kernel, up=1, down=1, pad=(0, 0))[source]

UpFRIDn for 2d features.

UpFIRDn is short for upsample, apply FIR filter and downsample. More details can be found in: https://www.mathworks.com/help/signal/ref/upfirdn.html

Parameters
  • input (torch.Tensor) – Tensor with shape of (n, c, h, w).

  • kernel (torch.Tensor) – Filter kernel.

  • up (int | tuple[int], optional) – Upsampling factor. If given a number, we will use this factor for the both height and width side. Defaults to 1.

  • down (int | tuple[int], optional) – Downsampling factor. If given a number, we will use this factor for the both height and width side. Defaults to 1.

  • pad (tuple[int], optional) – Padding for tensors, (x_pad, y_pad) or (x_pad_0, x_pad_1, y_pad_0, y_pad_1). Defaults to (0, 0).

Returns

Tensor after UpFIRDn.

Return type

torch.Tensor

Read the Docs v: v1.4.5
Versions
master
latest
v1.5.2_a
v1.5.1
v1.5.0
v1.4.8
v1.4.7
v1.4.6
v1.4.5
v1.4.4
v1.4.3
v1.4.2
v1.4.1
v1.4.0
v1.3.18
v1.3.17
v1.3.16
v1.3.15
v1.3.14
v1.3.13
v1.3.12
v1.3.11
v1.3.10
v1.3.9
v1.3.8
v1.3.7
v1.3.6
v1.3.5
v1.3.4
v1.3.3
v1.3.2
v1.3.1
v1.3.0
Downloads
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.