fileio¶
- class mmcv.fileio.BaseStorageBackend[source]¶
Abstract class of storage backends.
All backends need to implement two apis:
get()
andget_text()
.get()
reads the file as a byte stream andget_text()
reads the file as texts.
- class mmcv.fileio.FileClient(backend=None, prefix=None, **kwargs)[source]¶
A general file client to access files in different backends.
The client loads a file or text in a specified backend from its path and returns it as a binary or text file. There are two ways to choose a backend, the name of backend and the prefix of path. Although both of them can be used to choose a storage backend,
backend
has a higher priority that is if they are all set, the storage backend will be chosen by the backend argument. If they are all None, the disk backend will be chosen. Note that It can also register other backend accessor with a given name, prefixes, and backend class. In addition, We use the singleton pattern to avoid repeated object creation. If the arguments are the same, the same object will be returned.- Parameters
backend (str, optional) – The storage backend type. Options are “disk”, “ceph”, “memcached”, “lmdb”, “http” and “petrel”. Default: None.
prefix (str, optional) – The prefix of the registered storage backend. Options are “s3”, “http”, “https”. Default: None.
Examples
>>> # only set backend >>> file_client = FileClient(backend='petrel') >>> # only set prefix >>> file_client = FileClient(prefix='s3') >>> # set both backend and prefix but use backend to choose client >>> file_client = FileClient(backend='petrel', prefix='s3') >>> # if the arguments are the same, the same object is returned >>> file_client1 = FileClient(backend='petrel') >>> file_client1 is file_client True
- client¶
The backend object.
- Type
- exists(filepath: Union[str, pathlib.Path]) → bool[source]¶
Check whether a file path exists.
- Parameters
filepath (str or Path) – Path to be checked whether exists.
- Returns
Return
True
iffilepath
exists,False
otherwise.- Return type
bool
- get(filepath: Union[str, pathlib.Path]) → Union[bytes, memoryview][source]¶
Read data from a given
filepath
with ‘rb’ mode.Note
There are two types of return values for
get
, one isbytes
and the other ismemoryview
. The advantage of using memoryview is that you can avoid copying, and if you want to convert it tobytes
, you can use.tobytes()
.- Parameters
filepath (str or Path) – Path to read data.
- Returns
Expected bytes object or a memory view of the bytes object.
- Return type
bytes | memoryview
- get_local_path(filepath: Union[str, pathlib.Path]) → Generator[Union[str, pathlib.Path], None, None][source]¶
Download data from
filepath
and write the data to local path.get_local_path
is decorated bycontxtlib.contextmanager()
. It can be called withwith
statement, and when exists from thewith
statement, the temporary path will be released.Note
If the
filepath
is a local path, just return itself.Warning
get_local_path
is an experimental interface that may change in the future.- Parameters
filepath (str or Path) – Path to be read data.
Examples
>>> file_client = FileClient(prefix='s3') >>> with file_client.get_local_path('s3://bucket/abc.jpg') as path: ... # do something here
- Yields
Iterable[str] – Only yield one path.
- get_text(filepath: Union[str, pathlib.Path], encoding='utf-8') → str[source]¶
Read data from a given
filepath
with ‘r’ mode.- Parameters
filepath (str or Path) – Path to read data.
encoding (str) – The encoding format used to open the
filepath
. Default: ‘utf-8’.
- Returns
Expected text reading from
filepath
.- Return type
str
- classmethod infer_client(file_client_args: Optional[dict] = None, uri: Optional[Union[str, pathlib.Path]] = None) → mmcv.fileio.file_client.FileClient[source]¶
Infer a suitable file client based on the URI and arguments.
- Parameters
file_client_args (dict, optional) – Arguments to instantiate a FileClient. Default: None.
uri (str | Path, optional) – Uri to be parsed that contains the file prefix. Default: None.
Examples
>>> uri = 's3://path/of/your/file' >>> file_client = FileClient.infer_client(uri=uri) >>> file_client_args = {'backend': 'petrel'} >>> file_client = FileClient.infer_client(file_client_args)
- Returns
Instantiated FileClient object.
- Return type
- isdir(filepath: Union[str, pathlib.Path]) → bool[source]¶
Check whether a file path is a directory.
- Parameters
filepath (str or Path) – Path to be checked whether it is a directory.
- Returns
Return
True
iffilepath
points to a directory,False
otherwise.- Return type
bool
- isfile(filepath: Union[str, pathlib.Path]) → bool[source]¶
Check whether a file path is a file.
- Parameters
filepath (str or Path) – Path to be checked whether it is a file.
- Returns
Return
True
iffilepath
points to a file,False
otherwise.- Return type
bool
- join_path(filepath: Union[str, pathlib.Path], *filepaths: Union[str, pathlib.Path]) → str[source]¶
Concatenate all file paths.
Join one or more filepath components intelligently. The return value is the concatenation of filepath and any members of *filepaths.
- Parameters
filepath (str or Path) – Path to be concatenated.
- Returns
The result of concatenation.
- Return type
str
- list_dir_or_file(dir_path: Union[str, pathlib.Path], list_dir: bool = True, list_file: bool = True, suffix: Optional[Union[str, Tuple[str]]] = None, recursive: bool = False) → Iterator[str][source]¶
Scan a directory to find the interested directories or files in arbitrary order.
Note
list_dir_or_file()
returns the path relative todir_path
.- Parameters
dir_path (str | Path) – Path of the directory.
list_dir (bool) – List the directories. Default: True.
list_file (bool) – List the path of files. Default: True.
suffix (str or tuple[str], optional) – File suffix that we are interested in. Default: None.
recursive (bool) – If set to True, recursively scan the directory. Default: False.
- Yields
Iterable[str] – A relative path to
dir_path
.
- static parse_uri_prefix(uri: Union[str, pathlib.Path]) → Optional[str][source]¶
Parse the prefix of a uri.
- Parameters
uri (str | Path) – Uri to be parsed that contains the file prefix.
Examples
>>> FileClient.parse_uri_prefix('s3://path/of/your/file') 's3'
- Returns
Return the prefix of uri if the uri contains ‘://’ else
None
.- Return type
str | None
- put(obj: bytes, filepath: Union[str, pathlib.Path]) → None[source]¶
Write data to a given
filepath
with ‘wb’ mode.Note
put
should create a directory if the directory offilepath
does not exist.- Parameters
obj (bytes) – Data to be written.
filepath (str or Path) – Path to write data.
- put_text(obj: str, filepath: Union[str, pathlib.Path]) → None[source]¶
Write data to a given
filepath
with ‘w’ mode.Note
put_text
should create a directory if the directory offilepath
does not exist.- Parameters
obj (str) – Data to be written.
filepath (str or Path) – Path to write data.
encoding (str, optional) – The encoding format used to open the filepath. Default: ‘utf-8’.
- classmethod register_backend(name, backend=None, force=False, prefixes=None)[source]¶
Register a backend to FileClient.
This method can be used as a normal class method or a decorator.
class NewBackend(BaseStorageBackend): def get(self, filepath): return filepath def get_text(self, filepath): return filepath FileClient.register_backend('new', NewBackend)
or
@FileClient.register_backend('new') class NewBackend(BaseStorageBackend): def get(self, filepath): return filepath def get_text(self, filepath): return filepath
- Parameters
name (str) – The name of the registered backend.
backend (class, optional) – The backend class to be registered, which must be a subclass of
BaseStorageBackend
. When this method is used as a decorator, backend is None. Defaults to None.force (bool, optional) – Whether to override the backend if the name has already been registered. Defaults to False.
prefixes (str or list[str] or tuple[str], optional) – The prefixes of the registered storage backend. Default: None. New in version 1.3.15.
- mmcv.fileio.dict_from_file(filename: Union[str, pathlib.Path], key_type: type = <class 'str'>, encoding: str = 'utf-8', file_client_args: Optional[Dict] = None) → Dict[source]¶
Load a text file and parse the content as a dict.
Each line of the text file will be two or more columns split by whitespaces or tabs. The first column will be parsed as dict keys, and the following columns will be parsed as dict values.
Note
In v1.3.16 and later,
dict_from_file
supports loading a text file which can be storaged in different backends and parsing the content as a dict.- Parameters
filename (str) – Filename.
key_type (type) – Type of the dict keys. str is user by default and type conversion will be performed if specified.
encoding (str) – Encoding used to open the file. Default utf-8.
file_client_args (dict, optional) – Arguments to instantiate a FileClient. See
mmcv.fileio.FileClient
for details. Default: None.
Examples
>>> dict_from_file('/path/of/your/file') # disk {'key1': 'value1', 'key2': 'value2'} >>> dict_from_file('s3://path/of/your/file') # ceph or petrel {'key1': 'value1', 'key2': 'value2'}
- Returns
The parsed contents.
- Return type
dict
- mmcv.fileio.dump(obj: Any, file: Optional[Union[str, pathlib.Path, TextIO, _io.StringIO, _io.BytesIO]] = None, file_format: Optional[str] = None, file_client_args: Optional[Dict] = None, **kwargs)[source]¶
Dump data to json/yaml/pickle strings or files.
This method provides a unified api for dumping data as strings or to files, and also supports custom arguments for each file format.
Note
In v1.3.16 and later,
dump
supports dumping data as strings or to files which is saved to different backends.- Parameters
obj (any) – The python object to be dumped.
file (str or
Path
or file-like object, optional) – If not specified, then the object is dumped to a str, otherwise to a file specified by the filename or file-like object.file_format (str, optional) – Same as
load()
.file_client_args (dict, optional) – Arguments to instantiate a FileClient. See
mmcv.fileio.FileClient
for details. Default: None.
Examples
>>> dump('hello world', '/path/of/your/file') # disk >>> dump('hello world', 's3://path/of/your/file') # ceph or petrel
- Returns
True for success, False otherwise.
- Return type
bool
- mmcv.fileio.list_from_file(filename: Union[str, pathlib.Path], prefix: str = '', offset: int = 0, max_num: int = 0, encoding: str = 'utf-8', file_client_args: Optional[Dict] = None) → List[source]¶
Load a text file and parse the content as a list of strings.
Note
In v1.3.16 and later,
list_from_file
supports loading a text file which can be storaged in different backends and parsing the content as a list for strings.- Parameters
filename (str) – Filename.
prefix (str) – The prefix to be inserted to the beginning of each item.
offset (int) – The offset of lines.
max_num (int) – The maximum number of lines to be read, zeros and negatives mean no limitation.
encoding (str) – Encoding used to open the file. Default utf-8.
file_client_args (dict, optional) – Arguments to instantiate a FileClient. See
mmcv.fileio.FileClient
for details. Default: None.
Examples
>>> list_from_file('/path/of/your/file') # disk ['hello', 'world'] >>> list_from_file('s3://path/of/your/file') # ceph or petrel ['hello', 'world']
- Returns
A list of strings.
- Return type
list[str]
- mmcv.fileio.load(file: Union[str, pathlib.Path, TextIO, _io.StringIO, _io.BytesIO], file_format: Optional[str] = None, file_client_args: Optional[Dict] = None, **kwargs)[source]¶
Load data from json/yaml/pickle files.
This method provides a unified api for loading data from serialized files.
Note
In v1.3.16 and later,
load
supports loading data from serialized files those can be storaged in different backends.- Parameters
file (str or
Path
or file-like object) – Filename or a file-like object.file_format (str, optional) – If not specified, the file format will be inferred from the file extension, otherwise use the specified one. Currently supported formats include “json”, “yaml/yml” and “pickle/pkl”.
file_client_args (dict, optional) – Arguments to instantiate a FileClient. See
mmcv.fileio.FileClient
for details. Default: None.
Examples
>>> load('/path/of/your/file') # file is storaged in disk >>> load('https://path/of/your/file') # file is storaged in Internet >>> load('s3://path/of/your/file') # file is storaged in petrel
- Returns
The content from the file.
image¶
- mmcv.image.adjust_brightness(img, factor=1.0, backend=None)[source]¶
Adjust image brightness.
This function controls the brightness of an image. An enhancement factor of 0.0 gives a black image. A factor of 1.0 gives the original image. This function blends the source image and the degenerated black image:
\[output = img * factor + degenerated * (1 - factor)\]- Parameters
img (ndarray) – Image to be brightened.
factor (float) – A value controls the enhancement. Factor 1.0 returns the original image, lower factors mean less color (brightness, contrast, etc), and higher values more. Default 1.
backend (str | None) – The image processing backend type. Options are cv2, pillow, None. If backend is None, the global
imread_backend
specified bymmcv.use_backend()
will be used. Defaults to None.
- Returns
The brightened image.
- Return type
ndarray
- mmcv.image.adjust_color(img, alpha=1, beta=None, gamma=0, backend=None)[source]¶
It blends the source image and its gray image:
\[output = img * alpha + gray\_img * beta + gamma\]- Parameters
img (ndarray) – The input source image.
alpha (int | float) – Weight for the source image. Default 1.
beta (int | float) – Weight for the converted gray image. If None, it’s assigned the value (1 - alpha).
gamma (int | float) – Scalar added to each sum. Same as
cv2.addWeighted()
. Default 0.backend (str | None) – The image processing backend type. Options are cv2, pillow, None. If backend is None, the global
imread_backend
specified bymmcv.use_backend()
will be used. Defaults to None.
- Returns
Colored image which has the same size and dtype as input.
- Return type
ndarray
- mmcv.image.adjust_contrast(img, factor=1.0, backend=None)[source]¶
Adjust image contrast.
This function controls the contrast of an image. An enhancement factor of 0.0 gives a solid grey image. A factor of 1.0 gives the original image. It blends the source image and the degenerated mean image:
\[output = img * factor + degenerated * (1 - factor)\]- Parameters
img (ndarray) – Image to be contrasted. BGR order.
factor (float) – Same as
mmcv.adjust_brightness()
.backend (str | None) – The image processing backend type. Options are cv2, pillow, None. If backend is None, the global
imread_backend
specified bymmcv.use_backend()
will be used. Defaults to None.
- Returns
The contrasted image.
- Return type
ndarray
- mmcv.image.adjust_hue(img: numpy.ndarray, hue_factor: float, backend: Optional[str] = None) → numpy.ndarray[source]¶
Adjust hue of an image.
The image hue is adjusted by converting the image to HSV and cyclically shifting the intensities in the hue channel (H). The image is then converted back to original image mode.
hue_factor is the amount of shift in H channel and must be in the interval [-0.5, 0.5].
Modified from https://github.com/pytorch/vision/blob/main/torchvision/ transforms/functional.py
- Parameters
img (ndarray) – Image to be adjusted.
hue_factor (float) – How much to shift the hue channel. Should be in [-0.5, 0.5]. 0.5 and -0.5 give complete reversal of hue channel in HSV space in positive and negative direction respectively. 0 means no shift. Therefore, both -0.5 and 0.5 will give an image with complementary colors while 0 gives the original image.
backend (str | None) – The image processing backend type. Options are cv2, pillow, None. If backend is None, the global
imread_backend
specified bymmcv.use_backend()
will be used. Defaults to None.
- Returns
Hue adjusted image.
- Return type
ndarray
- mmcv.image.adjust_lighting(img, eigval, eigvec, alphastd=0.1, to_rgb=True)[source]¶
AlexNet-style PCA jitter.
This data augmentation is proposed in ImageNet Classification with Deep Convolutional Neural Networks.
- Parameters
img (ndarray) – Image to be adjusted lighting. BGR order.
eigval (ndarray) – the eigenvalue of the convariance matrix of pixel values, respectively.
eigvec (ndarray) – the eigenvector of the convariance matrix of pixel values, respectively.
alphastd (float) – The standard deviation for distribution of alpha. Defaults to 0.1
to_rgb (bool) – Whether to convert img to rgb.
- Returns
The adjusted image.
- Return type
ndarray
- mmcv.image.adjust_sharpness(img, factor=1.0, kernel=None)[source]¶
Adjust image sharpness.
This function controls the sharpness of an image. An enhancement factor of 0.0 gives a blurred image. A factor of 1.0 gives the original image. And a factor of 2.0 gives a sharpened image. It blends the source image and the degenerated mean image:
\[output = img * factor + degenerated * (1 - factor)\]- Parameters
img (ndarray) – Image to be sharpened. BGR order.
factor (float) – Same as
mmcv.adjust_brightness()
.kernel (np.ndarray, optional) – Filter kernel to be applied on the img to obtain the degenerated img. Defaults to None.
Note
No value sanity check is enforced on the kernel set by users. So with an inappropriate kernel, the
adjust_sharpness
may fail to perform the function its name indicates but end up performing whatever transform determined by the kernel.- Returns
The sharpened image.
- Return type
ndarray
- mmcv.image.auto_contrast(img, cutoff=0)[source]¶
Auto adjust image contrast.
This function maximize (normalize) image contrast by first removing cutoff percent of the lightest and darkest pixels from the histogram and remapping the image so that the darkest pixel becomes black (0), and the lightest becomes white (255).
- Parameters
img (ndarray) – Image to be contrasted. BGR order.
cutoff (int | float | tuple) – The cutoff percent of the lightest and darkest pixels to be removed. If given as tuple, it shall be (low, high). Otherwise, the single value will be used for both. Defaults to 0.
- Returns
The contrasted image.
- Return type
ndarray
- mmcv.image.bgr2gray(img: numpy.ndarray, keepdim: bool = False) → numpy.ndarray[source]¶
Convert a BGR image to grayscale image.
- Parameters
img (ndarray) – The input image.
keepdim (bool) – If False (by default), then return the grayscale image with 2 dims, otherwise 3 dims.
- Returns
The converted grayscale image.
- Return type
ndarray
- mmcv.image.bgr2hls(img: numpy.ndarray) → numpy.ndarray¶
- Convert a BGR image to HLS
image.
- Parameters
img (ndarray or str) – The input image.
- Returns
The converted HLS image.
- Return type
ndarray
- mmcv.image.bgr2hsv(img: numpy.ndarray) → numpy.ndarray¶
- Convert a BGR image to HSV
image.
- Parameters
img (ndarray or str) – The input image.
- Returns
The converted HSV image.
- Return type
ndarray
- mmcv.image.bgr2rgb(img: numpy.ndarray) → numpy.ndarray¶
- Convert a BGR image to RGB
image.
- Parameters
img (ndarray or str) – The input image.
- Returns
The converted RGB image.
- Return type
ndarray
- mmcv.image.bgr2ycbcr(img: numpy.ndarray, y_only: bool = False) → numpy.ndarray[source]¶
Convert a BGR image to YCbCr image.
The bgr version of rgb2ycbcr. It implements the ITU-R BT.601 conversion for standard-definition television. See more details in https://en.wikipedia.org/wiki/YCbCr#ITU-R_BT.601_conversion.
It differs from a similar function in cv2.cvtColor: BGR <-> YCrCb. In OpenCV, it implements a JPEG conversion. See more details in https://en.wikipedia.org/wiki/YCbCr#JPEG_conversion.
- Parameters
img (ndarray) – The input image. It accepts: 1. np.uint8 type with range [0, 255]; 2. np.float32 type with range [0, 1].
y_only (bool) – Whether to only return Y channel. Default: False.
- Returns
The converted YCbCr image. The output image has the same type and range as input image.
- Return type
ndarray
- mmcv.image.clahe(img, clip_limit=40.0, tile_grid_size=(8, 8))[source]¶
Use CLAHE method to process the image.
See ZUIDERVELD,K. Contrast Limited Adaptive Histogram Equalization[J]. Graphics Gems, 1994:474-485. for more information.
- Parameters
img (ndarray) – Image to be processed.
clip_limit (float) – Threshold for contrast limiting. Default: 40.0.
tile_grid_size (tuple[int]) – Size of grid for histogram equalization. Input image will be divided into equally sized rectangular tiles. It defines the number of tiles in row and column. Default: (8, 8).
- Returns
The processed image.
- Return type
ndarray
- mmcv.image.cutout(img: numpy.ndarray, shape: Union[int, Tuple[int, int]], pad_val: Union[int, float, tuple] = 0) → numpy.ndarray[source]¶
Randomly cut out a rectangle from the original img.
- Parameters
img (ndarray) – Image to be cutout.
shape (int | tuple[int]) – Expected cutout shape (h, w). If given as a int, the value will be used for both h and w.
pad_val (int | float | tuple[int | float]) – Values to be filled in the cut area. Defaults to 0.
- Returns
The cutout image.
- Return type
ndarray
- mmcv.image.gray2bgr(img: numpy.ndarray) → numpy.ndarray[source]¶
Convert a grayscale image to BGR image.
- Parameters
img (ndarray) – The input image.
- Returns
The converted BGR image.
- Return type
ndarray
- mmcv.image.gray2rgb(img: numpy.ndarray) → numpy.ndarray[source]¶
Convert a grayscale image to RGB image.
- Parameters
img (ndarray) – The input image.
- Returns
The converted RGB image.
- Return type
ndarray
- mmcv.image.hls2bgr(img: numpy.ndarray) → numpy.ndarray¶
- Convert a HLS image to BGR
image.
- Parameters
img (ndarray or str) – The input image.
- Returns
The converted BGR image.
- Return type
ndarray
- mmcv.image.hsv2bgr(img: numpy.ndarray) → numpy.ndarray¶
- Convert a HSV image to BGR
image.
- Parameters
img (ndarray or str) – The input image.
- Returns
The converted BGR image.
- Return type
ndarray
- mmcv.image.imconvert(img: numpy.ndarray, src: str, dst: str) → numpy.ndarray[source]¶
Convert an image from the src colorspace to dst colorspace.
- Parameters
img (ndarray) – The input image.
src (str) – The source colorspace, e.g., ‘rgb’, ‘hsv’.
dst (str) – The destination colorspace, e.g., ‘rgb’, ‘hsv’.
- Returns
The converted image.
- Return type
ndarray
- mmcv.image.imcrop(img: numpy.ndarray, bboxes: numpy.ndarray, scale: float = 1.0, pad_fill: Optional[Union[float, list]] = None) → Union[numpy.ndarray, List[numpy.ndarray]][source]¶
Crop image patches.
3 steps: scale the bboxes -> clip bboxes -> crop and pad.
- Parameters
img (ndarray) – Image to be cropped.
bboxes (ndarray) – Shape (k, 4) or (4, ), location of cropped bboxes.
scale (float, optional) – Scale ratio of bboxes, the default value 1.0 means no scaling.
pad_fill (Number | list[Number]) – Value to be filled for padding. Default: None, which means no padding.
- Returns
The cropped image patches.
- Return type
list[ndarray] | ndarray
- mmcv.image.imequalize(img)[source]¶
Equalize the image histogram.
This function applies a non-linear mapping to the input image, in order to create a uniform distribution of grayscale values in the output image.
- Parameters
img (ndarray) – Image to be equalized.
- Returns
The equalized image.
- Return type
ndarray
- mmcv.image.imflip(img: numpy.ndarray, direction: str = 'horizontal') → numpy.ndarray[source]¶
Flip an image horizontally or vertically.
- Parameters
img (ndarray) – Image to be flipped.
direction (str) – The flip direction, either “horizontal” or “vertical” or “diagonal”.
- Returns
The flipped image.
- Return type
ndarray
- mmcv.image.imflip_(img: numpy.ndarray, direction: str = 'horizontal') → numpy.ndarray[source]¶
Inplace flip an image horizontally or vertically.
- Parameters
img (ndarray) – Image to be flipped.
direction (str) – The flip direction, either “horizontal” or “vertical” or “diagonal”.
- Returns
The flipped image (inplace).
- Return type
ndarray
- mmcv.image.imfrombytes(content: bytes, flag: str = 'color', channel_order: str = 'bgr', backend: Optional[str] = None) → numpy.ndarray[source]¶
Read an image from bytes.
- Parameters
content (bytes) – Image bytes got from files or other streams.
flag (str) – Same as
imread()
.channel_order (str) – The channel order of the output, candidates are ‘bgr’ and ‘rgb’. Default to ‘bgr’.
backend (str | None) – The image decoding backend type. Options are cv2, pillow, turbojpeg, tifffile, None. If backend is None, the global imread_backend specified by
mmcv.use_backend()
will be used. Default: None.
- Returns
Loaded image array.
- Return type
ndarray
Examples
>>> img_path = '/path/to/img.jpg' >>> with open(img_path, 'rb') as f: >>> img_buff = f.read() >>> img = mmcv.imfrombytes(img_buff) >>> img = mmcv.imfrombytes(img_buff, flag='color', channel_order='rgb') >>> img = mmcv.imfrombytes(img_buff, backend='pillow') >>> img = mmcv.imfrombytes(img_buff, backend='cv2')
- mmcv.image.iminvert(img)[source]¶
Invert (negate) an image.
- Parameters
img (ndarray) – Image to be inverted.
- Returns
The inverted image.
- Return type
ndarray
- mmcv.image.imnormalize(img, mean, std, to_rgb=True)[source]¶
Normalize an image with mean and std.
- Parameters
img (ndarray) – Image to be normalized.
mean (ndarray) – The mean to be used for normalize.
std (ndarray) – The std to be used for normalize.
to_rgb (bool) – Whether to convert to rgb.
- Returns
The normalized image.
- Return type
ndarray
- mmcv.image.imnormalize_(img, mean, std, to_rgb=True)[source]¶
Inplace normalize an image with mean and std.
- Parameters
img (ndarray) – Image to be normalized.
mean (ndarray) – The mean to be used for normalize.
std (ndarray) – The std to be used for normalize.
to_rgb (bool) – Whether to convert to rgb.
- Returns
The normalized image.
- Return type
ndarray
- mmcv.image.impad(img: numpy.ndarray, *, shape: Optional[Tuple[int, int]] = None, padding: Optional[Union[int, tuple]] = None, pad_val: Union[float, List] = 0, padding_mode: str = 'constant') → numpy.ndarray[source]¶
Pad the given image to a certain shape or pad on all sides with specified padding mode and padding value.
- Parameters
img (ndarray) – Image to be padded.
shape (tuple[int]) – Expected padding shape (h, w). Default: None.
padding (int or tuple[int]) – Padding on each border. If a single int is provided this is used to pad all borders. If tuple of length 2 is provided this is the padding on left/right and top/bottom respectively. If a tuple of length 4 is provided this is the padding for the left, top, right and bottom borders respectively. Default: None. Note that shape and padding can not be both set.
pad_val (Number | Sequence[Number]) – Values to be filled in padding areas when padding_mode is ‘constant’. Default: 0.
padding_mode (str) –
Type of padding. Should be: constant, edge, reflect or symmetric. Default: constant. - constant: pads with a constant value, this value is specified
with pad_val.
edge: pads with the last value at the edge of the image.
reflect: pads with reflection of image without repeating the last value on the edge. For example, padding [1, 2, 3, 4] with 2 elements on both sides in reflect mode will result in [3, 2, 1, 2, 3, 4, 3, 2].
symmetric: pads with reflection of image repeating the last value on the edge. For example, padding [1, 2, 3, 4] with 2 elements on both sides in symmetric mode will result in [2, 1, 1, 2, 3, 4, 4, 3]
- Returns
The padded image.
- Return type
ndarray
- mmcv.image.impad_to_multiple(img: numpy.ndarray, divisor: int, pad_val: Union[float, List] = 0) → numpy.ndarray[source]¶
Pad an image to ensure each edge to be multiple to some number.
- Parameters
img (ndarray) – Image to be padded.
divisor (int) – Padded image edges will be multiple to divisor.
pad_val (Number | Sequence[Number]) – Same as
impad()
.
- Returns
The padded image.
- Return type
ndarray
- mmcv.image.imread(img_or_path: Union[numpy.ndarray, str, pathlib.Path], flag: str = 'color', channel_order: str = 'bgr', backend: Optional[str] = None, file_client_args: Optional[dict] = None) → numpy.ndarray[source]¶
Read an image.
Note
In v1.4.1 and later, add file_client_args parameters.
- Parameters
img_or_path (ndarray or str or Path) – Either a numpy array or str or pathlib.Path. If it is a numpy array (loaded image), then it will be returned as is.
flag (str) – Flags specifying the color type of a loaded image, candidates are color, grayscale, unchanged, color_ignore_orientation and grayscale_ignore_orientation. By default, cv2 and pillow backend would rotate the image according to its EXIF info unless called with unchanged or *_ignore_orientation flags. turbojpeg and tifffile backend always ignore image’s EXIF info regardless of the flag. The turbojpeg backend only supports color and grayscale.
channel_order (str) – Order of channel, candidates are bgr and rgb.
backend (str | None) – The image decoding backend type. Options are cv2, pillow, turbojpeg, tifffile, None. If backend is None, the global imread_backend specified by
mmcv.use_backend()
will be used. Default: None.file_client_args (dict | None) – Arguments to instantiate a FileClient. See
mmcv.fileio.FileClient
for details. Default: None.
- Returns
Loaded image array.
- Return type
ndarray
Examples
>>> import mmcv >>> img_path = '/path/to/img.jpg' >>> img = mmcv.imread(img_path) >>> img = mmcv.imread(img_path, flag='color', channel_order='rgb', ... backend='cv2') >>> img = mmcv.imread(img_path, flag='color', channel_order='bgr', ... backend='pillow') >>> s3_img_path = 's3://bucket/img.jpg' >>> # infer the file backend by the prefix s3 >>> img = mmcv.imread(s3_img_path) >>> # manually set the file backend petrel >>> img = mmcv.imread(s3_img_path, file_client_args={ ... 'backend': 'petrel'}) >>> http_img_path = 'http://path/to/img.jpg' >>> img = mmcv.imread(http_img_path) >>> img = mmcv.imread(http_img_path, file_client_args={ ... 'backend': 'http'})
- mmcv.image.imrescale(img: numpy.ndarray, scale: Union[float, Tuple[int, int]], return_scale: bool = False, interpolation: str = 'bilinear', backend: Optional[str] = None) → Union[numpy.ndarray, Tuple[numpy.ndarray, float]][source]¶
Resize image while keeping the aspect ratio.
- Parameters
img (ndarray) – The input image.
scale (float | tuple[int]) – The scaling factor or maximum size. If it is a float number, then the image will be rescaled by this factor, else if it is a tuple of 2 integers, then the image will be rescaled as large as possible within the scale.
return_scale (bool) – Whether to return the scaling factor besides the rescaled image.
interpolation (str) – Same as
resize()
.backend (str | None) – Same as
resize()
.
- Returns
The rescaled image.
- Return type
ndarray
- mmcv.image.imresize(img: numpy.ndarray, size: Tuple[int, int], return_scale: bool = False, interpolation: str = 'bilinear', out: Optional[numpy.ndarray] = None, backend: Optional[str] = None) → Union[Tuple[numpy.ndarray, float, float], numpy.ndarray][source]¶
Resize image to a given size.
- Parameters
img (ndarray) – The input image.
size (tuple[int]) – Target size (w, h).
return_scale (bool) – Whether to return w_scale and h_scale.
interpolation (str) – Interpolation method, accepted values are “nearest”, “bilinear”, “bicubic”, “area”, “lanczos” for ‘cv2’ backend, “nearest”, “bilinear” for ‘pillow’ backend.
out (ndarray) – The output destination.
backend (str | None) – The image resize backend type. Options are cv2, pillow, None. If backend is None, the global imread_backend specified by
mmcv.use_backend()
will be used. Default: None.
- Returns
(resized_img, w_scale, h_scale) or resized_img.
- Return type
tuple | ndarray
- mmcv.image.imresize_like(img: numpy.ndarray, dst_img: numpy.ndarray, return_scale: bool = False, interpolation: str = 'bilinear', backend: Optional[str] = None) → Union[Tuple[numpy.ndarray, float, float], numpy.ndarray][source]¶
Resize image to the same size of a given image.
- Parameters
img (ndarray) – The input image.
dst_img (ndarray) – The target image.
return_scale (bool) – Whether to return w_scale and h_scale.
interpolation (str) – Same as
resize()
.backend (str | None) – Same as
resize()
.
- Returns
(resized_img, w_scale, h_scale) or resized_img.
- Return type
tuple or ndarray
- mmcv.image.imresize_to_multiple(img: numpy.ndarray, divisor: Union[int, Tuple[int, int]], size: Optional[Union[int, Tuple[int, int]]] = None, scale_factor: Optional[Union[float, Tuple[float, float]]] = None, keep_ratio: bool = False, return_scale: bool = False, interpolation: str = 'bilinear', out: Optional[numpy.ndarray] = None, backend: Optional[str] = None) → Union[Tuple[numpy.ndarray, float, float], numpy.ndarray][source]¶
Resize image according to a given size or scale factor and then rounds up the the resized or rescaled image size to the nearest value that can be divided by the divisor.
- Parameters
img (ndarray) – The input image.
divisor (int | tuple) – Resized image size will be a multiple of divisor. If divisor is a tuple, divisor should be (w_divisor, h_divisor).
size (None | int | tuple[int]) – Target size (w, h). Default: None.
scale_factor (None | float | tuple[float]) – Multiplier for spatial size. Should match input size if it is a tuple and the 2D style is (w_scale_factor, h_scale_factor). Default: None.
keep_ratio (bool) – Whether to keep the aspect ratio when resizing the image. Default: False.
return_scale (bool) – Whether to return w_scale and h_scale.
interpolation (str) – Interpolation method, accepted values are “nearest”, “bilinear”, “bicubic”, “area”, “lanczos” for ‘cv2’ backend, “nearest”, “bilinear” for ‘pillow’ backend.
out (ndarray) – The output destination.
backend (str | None) – The image resize backend type. Options are cv2, pillow, None. If backend is None, the global imread_backend specified by
mmcv.use_backend()
will be used. Default: None.
- Returns
(resized_img, w_scale, h_scale) or resized_img.
- Return type
tuple | ndarray
- mmcv.image.imrotate(img: numpy.ndarray, angle: float, center: Optional[Tuple[float, float]] = None, scale: float = 1.0, border_value: int = 0, interpolation: str = 'bilinear', auto_bound: bool = False, border_mode: str = 'constant') → numpy.ndarray[source]¶
Rotate an image.
- Parameters
img (np.ndarray) – Image to be rotated.
angle (float) – Rotation angle in degrees, positive values mean clockwise rotation.
center (tuple[float], optional) – Center point (w, h) of the rotation in the source image. If not specified, the center of the image will be used.
scale (float) – Isotropic scale factor.
border_value (int) – Border value used in case of a constant border. Defaults to 0.
interpolation (str) – Same as
resize()
.auto_bound (bool) – Whether to adjust the image size to cover the whole rotated image.
border_mode (str) – Pixel extrapolation method. Defaults to ‘constant’.
- Returns
The rotated image.
- Return type
np.ndarray
- mmcv.image.imshear(img: numpy.ndarray, magnitude: Union[int, float], direction: str = 'horizontal', border_value: Union[int, Tuple[int, int]] = 0, interpolation: str = 'bilinear') → numpy.ndarray[source]¶
Shear an image.
- Parameters
img (ndarray) – Image to be sheared with format (h, w) or (h, w, c).
magnitude (int | float) – The magnitude used for shear.
direction (str) – The flip direction, either “horizontal” or “vertical”.
border_value (int | tuple[int]) – Value used in case of a constant border.
interpolation (str) – Same as
resize()
.
- Returns
The sheared image.
- Return type
ndarray
- mmcv.image.imtranslate(img: numpy.ndarray, offset: Union[int, float], direction: str = 'horizontal', border_value: Union[int, tuple] = 0, interpolation: str = 'bilinear') → numpy.ndarray[source]¶
Translate an image.
- Parameters
img (ndarray) – Image to be translated with format (h, w) or (h, w, c).
offset (int | float) – The offset used for translate.
direction (str) – The translate direction, either “horizontal” or “vertical”.
border_value (int | tuple[int]) – Value used in case of a constant border.
interpolation (str) – Same as
resize()
.
- Returns
The translated image.
- Return type
ndarray
- mmcv.image.imwrite(img: numpy.ndarray, file_path: str, params: Optional[list] = None, auto_mkdir: Optional[bool] = None, file_client_args: Optional[dict] = None) → bool[source]¶
Write image to file.
Note
In v1.4.1 and later, add file_client_args parameters.
Warning
The parameter auto_mkdir will be deprecated in the future and every file clients will make directory automatically.
- Parameters
img (ndarray) – Image array to be written.
file_path (str) – Image file path.
params (None or list) – Same as opencv
imwrite()
interface.auto_mkdir (bool) – If the parent folder of file_path does not exist, whether to create it automatically. It will be deprecated.
file_client_args (dict | None) – Arguments to instantiate a FileClient. See
mmcv.fileio.FileClient
for details. Default: None.
- Returns
Successful or not.
- Return type
bool
Examples
>>> # write to hard disk client >>> ret = mmcv.imwrite(img, '/path/to/img.jpg') >>> # infer the file backend by the prefix s3 >>> ret = mmcv.imwrite(img, 's3://bucket/img.jpg') >>> # manually set the file backend petrel >>> ret = mmcv.imwrite(img, 's3://bucket/img.jpg', file_client_args={ ... 'backend': 'petrel'})
- mmcv.image.lut_transform(img, lut_table)[source]¶
Transform array by look-up table.
The function lut_transform fills the output array with values from the look-up table. Indices of the entries are taken from the input array.
- Parameters
img (ndarray) – Image to be transformed.
lut_table (ndarray) – look-up table of 256 elements; in case of multi-channel input array, the table should either have a single channel (in this case the same table is used for all channels) or the same number of channels as in the input array.
- Returns
The transformed image.
- Return type
ndarray
- mmcv.image.posterize(img, bits)[source]¶
Posterize an image (reduce the number of bits for each color channel)
- Parameters
img (ndarray) – Image to be posterized.
bits (int) – Number of bits (1 to 8) to use for posterizing.
- Returns
The posterized image.
- Return type
ndarray
- mmcv.image.rescale_size(old_size: tuple, scale: Union[float, int, tuple], return_scale: bool = False) → tuple[source]¶
Calculate the new size to be rescaled to.
- Parameters
old_size (tuple[int]) – The old size (w, h) of image.
scale (float | tuple[int]) – The scaling factor or maximum size. If it is a float number, then the image will be rescaled by this factor, else if it is a tuple of 2 integers, then the image will be rescaled as large as possible within the scale.
return_scale (bool) – Whether to return the scaling factor besides the rescaled image size.
- Returns
The new rescaled image size.
- Return type
tuple[int]
- mmcv.image.rgb2bgr(img: numpy.ndarray) → numpy.ndarray¶
- Convert a RGB image to BGR
image.
- Parameters
img (ndarray or str) – The input image.
- Returns
The converted BGR image.
- Return type
ndarray
- mmcv.image.rgb2gray(img: numpy.ndarray, keepdim: bool = False) → numpy.ndarray[source]¶
Convert a RGB image to grayscale image.
- Parameters
img (ndarray) – The input image.
keepdim (bool) – If False (by default), then return the grayscale image with 2 dims, otherwise 3 dims.
- Returns
The converted grayscale image.
- Return type
ndarray
- mmcv.image.rgb2ycbcr(img: numpy.ndarray, y_only: bool = False) → numpy.ndarray[source]¶
Convert a RGB image to YCbCr image.
This function produces the same results as Matlab’s rgb2ycbcr function. It implements the ITU-R BT.601 conversion for standard-definition television. See more details in https://en.wikipedia.org/wiki/YCbCr#ITU-R_BT.601_conversion.
It differs from a similar function in cv2.cvtColor: RGB <-> YCrCb. In OpenCV, it implements a JPEG conversion. See more details in https://en.wikipedia.org/wiki/YCbCr#JPEG_conversion.
- Parameters
img (ndarray) – The input image. It accepts: 1. np.uint8 type with range [0, 255]; 2. np.float32 type with range [0, 1].
y_only (bool) – Whether to only return Y channel. Default: False.
- Returns
The converted YCbCr image. The output image has the same type and range as input image.
- Return type
ndarray
- mmcv.image.solarize(img, thr=128)[source]¶
Solarize an image (invert all pixel values above a threshold)
- Parameters
img (ndarray) – Image to be solarized.
thr (int) – Threshold for solarizing (0 - 255).
- Returns
The solarized image.
- Return type
ndarray
- mmcv.image.tensor2imgs(tensor, mean: Optional[tuple] = None, std: Optional[tuple] = None, to_rgb: bool = True) → list[source]¶
Convert tensor to 3-channel images or 1-channel gray images.
- Parameters
tensor (torch.Tensor) – Tensor that contains multiple images, shape ( N, C, H, W). \(C\) can be either 3 or 1.
mean (tuple[float], optional) – Mean of images. If None, (0, 0, 0) will be used for tensor with 3-channel, while (0, ) for tensor with 1-channel. Defaults to None.
std (tuple[float], optional) – Standard deviation of images. If None, (1, 1, 1) will be used for tensor with 3-channel, while (1, ) for tensor with 1-channel. Defaults to None.
to_rgb (bool, optional) – Whether the tensor was converted to RGB format in the first place. If so, convert it back to BGR. For the tensor with 1 channel, it must be False. Defaults to True.
- Returns
A list that contains multiple images.
- Return type
list[np.ndarray]
- mmcv.image.use_backend(backend: str) → None[source]¶
Select a backend for image decoding.
- Parameters
backend (str) – The image decoding backend type. Options are cv2,
pillow – //github.com/lilohuang/PyTurboJPEG)
(see https (turbojpeg) – //github.com/lilohuang/PyTurboJPEG)
tifffile. turbojpeg is faster but it only supports .jpeg (and) –
format. (file) –
- mmcv.image.ycbcr2bgr(img: numpy.ndarray) → numpy.ndarray[source]¶
Convert a YCbCr image to BGR image.
The bgr version of ycbcr2rgb. It implements the ITU-R BT.601 conversion for standard-definition television. See more details in https://en.wikipedia.org/wiki/YCbCr#ITU-R_BT.601_conversion.
It differs from a similar function in cv2.cvtColor: YCrCb <-> BGR. In OpenCV, it implements a JPEG conversion. See more details in https://en.wikipedia.org/wiki/YCbCr#JPEG_conversion.
- Parameters
img (ndarray) – The input image. It accepts: 1. np.uint8 type with range [0, 255]; 2. np.float32 type with range [0, 1].
- Returns
The converted BGR image. The output image has the same type and range as input image.
- Return type
ndarray
- mmcv.image.ycbcr2rgb(img: numpy.ndarray) → numpy.ndarray[source]¶
Convert a YCbCr image to RGB image.
This function produces the same results as Matlab’s ycbcr2rgb function. It implements the ITU-R BT.601 conversion for standard-definition television. See more details in https://en.wikipedia.org/wiki/YCbCr#ITU-R_BT.601_conversion.
It differs from a similar function in cv2.cvtColor: YCrCb <-> RGB. In OpenCV, it implements a JPEG conversion. See more details in https://en.wikipedia.org/wiki/YCbCr#JPEG_conversion.
- Parameters
img (ndarray) – The input image. It accepts: 1. np.uint8 type with range [0, 255]; 2. np.float32 type with range [0, 1].
- Returns
The converted RGB image. The output image has the same type and range as input image.
- Return type
ndarray
video¶
- class mmcv.video.VideoReader(filename, cache_capacity=10)[source]¶
Video class with similar usage to a list object.
This video wrapper class provides convenient apis to access frames. There exists an issue of OpenCV’s VideoCapture class that jumping to a certain frame may be inaccurate. It is fixed in this class by checking the position after jumping each time. Cache is used when decoding videos. So if the same frame is visited for the second time, there is no need to decode again if it is stored in the cache.
Examples
>>> import mmcv >>> v = mmcv.VideoReader('sample.mp4') >>> len(v) # get the total frame number with `len()` 120 >>> for img in v: # v is iterable >>> mmcv.imshow(img) >>> v[5] # get the 6th frame
- current_frame()[source]¶
Get the current frame (frame that is just visited).
- Returns
If the video is fresh, return None, otherwise return the frame.
- Return type
ndarray or None
- cvt2frames(frame_dir, file_start=0, filename_tmpl='{:06d}.jpg', start=0, max_num=0, show_progress=True)[source]¶
Convert a video to frame images.
- Parameters
frame_dir (str) – Output directory to store all the frame images.
file_start (int) – Filenames will start from the specified number.
filename_tmpl (str) – Filename template with the index as the placeholder.
start (int) – The starting frame index.
max_num (int) – Maximum number of frames to be written.
show_progress (bool) – Whether to show a progress bar.
- property fourcc¶
“Four character code” of the video.
- Type
str
- property fps¶
FPS of the video.
- Type
float
- property frame_cnt¶
Total frames of the video.
- Type
int
- get_frame(frame_id)[source]¶
Get frame by index.
- Parameters
frame_id (int) – Index of the expected frame, 0-based.
- Returns
Return the frame if successful, otherwise None.
- Return type
ndarray or None
- property height¶
Height of video frames.
- Type
int
- property opened¶
Indicate whether the video is opened.
- Type
bool
- property position¶
Current cursor position, indicating frame decoded.
- Type
int
- read()[source]¶
Read the next frame.
If the next frame have been decoded before and in the cache, then return it directly, otherwise decode, cache and return it.
- Returns
Return the frame if successful, otherwise None.
- Return type
ndarray or None
- property resolution¶
Video resolution (width, height).
- Type
tuple
- property vcap¶
The raw VideoCapture object.
- Type
cv2.VideoCapture
- property width¶
Width of video frames.
- Type
int
- mmcv.video.concat_video(video_list: List, out_file: str, vcodec: Optional[str] = None, acodec: Optional[str] = None, log_level: str = 'info', print_cmd: bool = False) → None[source]¶
Concatenate multiple videos into a single one.
- Parameters
video_list (list) – A list of video filenames
out_file (str) – Output video filename
vcodec (None or str) – Output video codec, None for unchanged
acodec (None or str) – Output audio codec, None for unchanged
log_level (str) – Logging level of ffmpeg.
print_cmd (bool) – Whether to print the final ffmpeg command.
- mmcv.video.convert_video(in_file: str, out_file: str, print_cmd: bool = False, pre_options: str = '', **kwargs) → None[source]¶
Convert a video with ffmpeg.
This provides a general api to ffmpeg, the executed command is:
`ffmpeg -y <pre_options> -i <in_file> <options> <out_file>`
Options(kwargs) are mapped to ffmpeg commands with the following rules:
key=val: “-key val”
key=True: “-key”
key=False: “”
- Parameters
in_file (str) – Input video filename.
out_file (str) – Output video filename.
pre_options (str) – Options appears before “-i <in_file>”.
print_cmd (bool) – Whether to print the final ffmpeg command.
- mmcv.video.cut_video(in_file: str, out_file: str, start: Optional[float] = None, end: Optional[float] = None, vcodec: Optional[str] = None, acodec: Optional[str] = None, log_level: str = 'info', print_cmd: bool = False) → None[source]¶
Cut a clip from a video.
- Parameters
in_file (str) – Input video filename.
out_file (str) – Output video filename.
start (None or float) – Start time (in seconds).
end (None or float) – End time (in seconds).
vcodec (None or str) – Output video codec, None for unchanged.
acodec (None or str) – Output audio codec, None for unchanged.
log_level (str) – Logging level of ffmpeg.
print_cmd (bool) – Whether to print the final ffmpeg command.
- mmcv.video.dequantize_flow(dx: numpy.ndarray, dy: numpy.ndarray, max_val: float = 0.02, denorm: bool = True) → numpy.ndarray[source]¶
Recover from quantized flow.
- Parameters
dx (ndarray) – Quantized dx.
dy (ndarray) – Quantized dy.
max_val (float) – Maximum value used when quantizing.
denorm (bool) – Whether to multiply flow values with width/height.
- Returns
Dequantized flow.
- Return type
ndarray
- mmcv.video.flow_from_bytes(content: bytes) → numpy.ndarray[source]¶
Read dense optical flow from bytes.
Note
This load optical flow function works for FlyingChairs, FlyingThings3D, Sintel, FlyingChairsOcc datasets, but cannot load the data from ChairsSDHom.
- Parameters
content (bytes) – Optical flow bytes got from files or other streams.
- Returns
Loaded optical flow with the shape (H, W, 2).
- Return type
ndarray
- mmcv.video.flow_warp(img: numpy.ndarray, flow: numpy.ndarray, filling_value: int = 0, interpolate_mode: str = 'nearest') → numpy.ndarray[source]¶
Use flow to warp img.
- Parameters
img (ndarray) – Image to be warped.
flow (ndarray) – Optical Flow.
filling_value (int) – The missing pixels will be set with filling_value.
interpolate_mode (str) – bilinear -> Bilinear Interpolation; nearest -> Nearest Neighbor.
- Returns
Warped image with the same shape of img
- Return type
ndarray
- mmcv.video.flowread(flow_or_path: Union[numpy.ndarray, str], quantize: bool = False, concat_axis: int = 0, *args, **kwargs) → numpy.ndarray[source]¶
Read an optical flow map.
- Parameters
flow_or_path (ndarray or str) – A flow map or filepath.
quantize (bool) – whether to read quantized pair, if set to True, remaining args will be passed to
dequantize_flow()
.concat_axis (int) – The axis that dx and dy are concatenated, can be either 0 or 1. Ignored if quantize is False.
- Returns
Optical flow represented as a (h, w, 2) numpy array
- Return type
ndarray
- mmcv.video.flowwrite(flow: numpy.ndarray, filename: str, quantize: bool = False, concat_axis: int = 0, *args, **kwargs) → None[source]¶
Write optical flow to file.
If the flow is not quantized, it will be saved as a .flo file losslessly, otherwise a jpeg image which is lossy but of much smaller size. (dx and dy will be concatenated horizontally into a single image if quantize is True.)
- Parameters
flow (ndarray) – (h, w, 2) array of optical flow.
filename (str) – Output filepath.
quantize (bool) – Whether to quantize the flow and save it to 2 jpeg images. If set to True, remaining args will be passed to
quantize_flow()
.concat_axis (int) – The axis that dx and dy are concatenated, can be either 0 or 1. Ignored if quantize is False.
- mmcv.video.frames2video(frame_dir: str, video_file: str, fps: float = 30, fourcc: str = 'XVID', filename_tmpl: str = '{:06d}.jpg', start: int = 0, end: int = 0, show_progress: bool = True) → None[source]¶
Read the frame images from a directory and join them as a video.
- Parameters
frame_dir (str) – The directory containing video frames.
video_file (str) – Output filename.
fps (float) – FPS of the output video.
fourcc (str) – Fourcc of the output video, this should be compatible with the output file type.
filename_tmpl (str) – Filename template with the index as the variable.
start (int) – Starting frame index.
end (int) – Ending frame index.
show_progress (bool) – Whether to show a progress bar.
- mmcv.video.quantize_flow(flow: numpy.ndarray, max_val: float = 0.02, norm: bool = True) → tuple[source]¶
Quantize flow to [0, 255].
After this step, the size of flow will be much smaller, and can be dumped as jpeg images.
- Parameters
flow (ndarray) – (h, w, 2) array of optical flow.
max_val (float) – Maximum value of flow, values beyond [-max_val, max_val] will be truncated.
norm (bool) – Whether to divide flow values by image width/height.
- Returns
Quantized dx and dy.
- Return type
tuple[ndarray]
- mmcv.video.resize_video(in_file: str, out_file: str, size: Optional[tuple] = None, ratio: Optional[Union[tuple, float]] = None, keep_ar: bool = False, log_level: str = 'info', print_cmd: bool = False) → None[source]¶
Resize a video.
- Parameters
in_file (str) – Input video filename.
out_file (str) – Output video filename.
size (tuple) – Expected size (w, h), eg, (320, 240) or (320, -1).
ratio (tuple or float) – Expected resize ratio, (2, 0.5) means (w*2, h*0.5).
keep_ar (bool) – Whether to keep original aspect ratio.
log_level (str) – Logging level of ffmpeg.
print_cmd (bool) – Whether to print the final ffmpeg command.
- mmcv.video.sparse_flow_from_bytes(content: bytes) → Tuple[numpy.ndarray, numpy.ndarray][source]¶
Read the optical flow in KITTI datasets from bytes.
This function is modified from RAFT load the KITTI datasets.
- Parameters
content (bytes) – Optical flow bytes got from files or other streams.
- Returns
Loaded optical flow with the shape (H, W, 2) and flow valid mask with the shape (H, W).
- Return type
Tuple(ndarray, ndarray)
arraymisc¶
- mmcv.arraymisc.dequantize(arr: numpy.ndarray, min_val: Union[int, float], max_val: Union[int, float], levels: int, dtype=<class 'numpy.float64'>) → tuple[source]¶
Dequantize an array.
- Parameters
arr (ndarray) – Input array.
min_val (int or float) – Minimum value to be clipped.
max_val (int or float) – Maximum value to be clipped.
levels (int) – Quantization levels.
dtype (np.type) – The type of the dequantized array.
- Returns
Dequantized array.
- Return type
tuple
- mmcv.arraymisc.quantize(arr: numpy.ndarray, min_val: Union[int, float], max_val: Union[int, float], levels: int, dtype=<class 'numpy.int64'>) → tuple[source]¶
Quantize an array of (-inf, inf) to [0, levels-1].
- Parameters
arr (ndarray) – Input array.
min_val (int or float) – Minimum value to be clipped.
max_val (int or float) – Maximum value to be clipped.
levels (int) – Quantization levels.
dtype (np.type) – The type of the quantized array.
- Returns
Quantized array.
- Return type
tuple
visualization¶
- class mmcv.visualization.Color(value)[source]¶
An enum that defines common colors.
Contains red, green, blue, cyan, yellow, magenta, white and black.
- mmcv.visualization.color_val(color: Union[mmcv.visualization.color.Color, str, tuple, int, numpy.ndarray]) → tuple[source]¶
Convert various input to color tuples.
- Parameters
color (
Color
/str/tuple/int/ndarray) – Color inputs- Returns
A tuple of 3 integers indicating BGR channels.
- Return type
tuple[int]
- mmcv.visualization.flow2rgb(flow: numpy.ndarray, color_wheel: Optional[numpy.ndarray] = None, unknown_thr: float = 1000000.0) → numpy.ndarray[source]¶
Convert flow map to RGB image.
- Parameters
flow (ndarray) – Array of optical flow.
color_wheel (ndarray or None) – Color wheel used to map flow field to RGB colorspace. Default color wheel will be used if not specified.
unknown_thr (float) – Values above this threshold will be marked as unknown and thus ignored.
- Returns
RGB image that can be visualized.
- Return type
ndarray
- mmcv.visualization.flowshow(flow: Union[numpy.ndarray, str], win_name: str = '', wait_time: int = 0) → None[source]¶
Show optical flow.
- Parameters
flow (ndarray or str) – The optical flow to be displayed.
win_name (str) – The window name.
wait_time (int) – Value of waitKey param.
- mmcv.visualization.imshow(img: Union[str, numpy.ndarray], win_name: str = '', wait_time: int = 0)[source]¶
Show an image.
- Parameters
img (str or ndarray) – The image to be displayed.
win_name (str) – The window name.
wait_time (int) – Value of waitKey param.
- mmcv.visualization.imshow_bboxes(img: Union[str, numpy.ndarray], bboxes: Union[list, numpy.ndarray], colors: Union[mmcv.visualization.color.Color, str, tuple, int, numpy.ndarray] = 'green', top_k: int = - 1, thickness: int = 1, show: bool = True, win_name: str = '', wait_time: int = 0, out_file: Optional[str] = None)[source]¶
Draw bboxes on an image.
- Parameters
img (str or ndarray) – The image to be displayed.
bboxes (list or ndarray) – A list of ndarray of shape (k, 4).
colors (Color or str or tuple or int or ndarray) – A list of colors.
top_k (int) – Plot the first k bboxes only if set positive.
thickness (int) – Thickness of lines.
show (bool) – Whether to show the image.
win_name (str) – The window name.
wait_time (int) – Value of waitKey param.
out_file (str, optional) – The filename to write the image.
- Returns
The image with bboxes drawn on it.
- Return type
ndarray
- mmcv.visualization.imshow_det_bboxes(img: Union[str, numpy.ndarray], bboxes: numpy.ndarray, labels: numpy.ndarray, class_names: Optional[List[str]] = None, score_thr: float = 0, bbox_color: Union[mmcv.visualization.color.Color, str, tuple, int, numpy.ndarray] = 'green', text_color: Union[mmcv.visualization.color.Color, str, tuple, int, numpy.ndarray] = 'green', thickness: int = 1, font_scale: float = 0.5, show: bool = True, win_name: str = '', wait_time: int = 0, out_file: Optional[str] = None)[source]¶
Draw bboxes and class labels (with scores) on an image.
- Parameters
img (str or ndarray) – The image to be displayed.
bboxes (ndarray) – Bounding boxes (with scores), shaped (n, 4) or (n, 5).
labels (ndarray) – Labels of bboxes.
class_names (list[str]) – Names of each classes.
score_thr (float) – Minimum score of bboxes to be shown.
bbox_color (Color or str or tuple or int or ndarray) – Color of bbox lines.
text_color (Color or str or tuple or int or ndarray) – Color of texts.
thickness (int) – Thickness of lines.
font_scale (float) – Font scales of texts.
show (bool) – Whether to show the image.
win_name (str) – The window name.
wait_time (int) – Value of waitKey param.
out_file (str or None) – The filename to write the image.
- Returns
The image with bboxes drawn on it.
- Return type
ndarray
- mmcv.visualization.make_color_wheel(bins: Optional[Union[list, tuple]] = None) → numpy.ndarray[source]¶
Build a color wheel.
- Parameters
bins (list or tuple, optional) – Specify the number of bins for each color range, corresponding to six ranges: red -> yellow, yellow -> green, green -> cyan, cyan -> blue, blue -> magenta, magenta -> red. [15, 6, 4, 11, 13, 6] is used for default (see Middlebury).
- Returns
Color wheel of shape (total_bins, 3).
- Return type
ndarray
utils¶
- class mmcv.utils.BuildExtension(*args, **kwargs)[source]¶
A custom
setuptools
build extension .This
setuptools.build_ext
subclass takes care of passing the minimum required compiler flags (e.g.-std=c++14
) as well as mixed C++/CUDA compilation (and support for CUDA files in general).When using
BuildExtension
, it is allowed to supply a dictionary forextra_compile_args
(rather than the usual list) that maps from languages (cxx
ornvcc
) to a list of additional compiler flags to supply to the compiler. This makes it possible to supply different flags to the C++ and CUDA compiler during mixed compilation.use_ninja
(bool): Ifuse_ninja
isTrue
(default), then we attempt to build using the Ninja backend. Ninja greatly speeds up compilation compared to the standardsetuptools.build_ext
. Fallbacks to the standard distutils backend if Ninja is not available.Note
By default, the Ninja backend uses #CPUS + 2 workers to build the extension. This may use up too many resources on some systems. One can control the number of workers by setting the MAX_JOBS environment variable to a non-negative number.
- finalize_options() → None[source]¶
Set final values for all the options that this command supports. This is always called as late as possible, ie. after any option assignments from the command-line or from other commands have been done. Thus, this is the place to code option dependencies: if ‘foo’ depends on ‘bar’, then it is safe to set ‘foo’ from ‘bar’ as long as ‘foo’ still has the same value it was assigned in ‘initialize_options()’.
This method must be implemented by all command classes.
- mmcv.utils.CUDAExtension(name, sources, *args, **kwargs)[source]¶
Creates a
setuptools.Extension
for CUDA/C++.Convenience method that creates a
setuptools.Extension
with the bare minimum (but often sufficient) arguments to build a CUDA/C++ extension. This includes the CUDA include path, library path and runtime library.All arguments are forwarded to the
setuptools.Extension
constructor.Example
>>> # xdoctest: +SKIP >>> from setuptools import setup >>> from torch.utils.cpp_extension import BuildExtension, CUDAExtension >>> setup( ... name='cuda_extension', ... ext_modules=[ ... CUDAExtension( ... name='cuda_extension', ... sources=['extension.cpp', 'extension_kernel.cu'], ... extra_compile_args={'cxx': ['-g'], ... 'nvcc': ['-O2']}) ... ], ... cmdclass={ ... 'build_ext': BuildExtension ... })
Compute capabilities:
By default the extension will be compiled to run on all archs of the cards visible during the building process of the extension, plus PTX. If down the road a new card is installed the extension may need to be recompiled. If a visible card has a compute capability (CC) that’s newer than the newest version for which your nvcc can build fully-compiled binaries, Pytorch will make nvcc fall back to building kernels with the newest version of PTX your nvcc does support (see below for details on PTX).
You can override the default behavior using TORCH_CUDA_ARCH_LIST to explicitly specify which CCs you want the extension to support:
TORCH_CUDA_ARCH_LIST=”6.1 8.6” python build_my_extension.py TORCH_CUDA_ARCH_LIST=”5.2 6.0 6.1 7.0 7.5 8.0 8.6+PTX” python build_my_extension.py
The +PTX option causes extension kernel binaries to include PTX instructions for the specified CC. PTX is an intermediate representation that allows kernels to runtime-compile for any CC >= the specified CC (for example, 8.6+PTX generates PTX that can runtime-compile for any GPU with CC >= 8.6). This improves your binary’s forward compatibility. However, relying on older PTX to provide forward compat by runtime-compiling for newer CCs can modestly reduce performance on those newer CCs. If you know exact CC(s) of the GPUs you want to target, you’re always better off specifying them individually. For example, if you want your extension to run on 8.0 and 8.6, “8.0+PTX” would work functionally because it includes PTX that can runtime-compile for 8.6, but “8.0 8.6” would be better.
Note that while it’s possible to include all supported archs, the more archs get included the slower the building process will be, as it will build a separate kernel image for each arch.
Note that CUDA-11.5 nvcc will hit internal compiler error while parsing torch/extension.h on Windows. To workaround the issue, move python binding logic to pure C++ file.
- Example use:
>>> # xdoctest: +SKIP >>> #include <ATen/ATen.h> >>> at::Tensor SigmoidAlphaBlendForwardCuda(....)
- Instead of:
>>> # xdoctest: +SKIP >>> #include <torch/extension.h> >>> torch::Tensor SigmoidAlphaBlendForwardCuda(...)
Currently open issue for nvcc bug: https://github.com/pytorch/pytorch/issues/69460 Complete workaround code example: https://github.com/facebookresearch/pytorch3d/commit/cb170ac024a949f1f9614ffe6af1c38d972f7d48
Relocatable device code linking:
If you want to reference device symbols across compilation units (across object files), the object files need to be built with relocatable device code (-rdc=true or -dc). An exception to this rule is “dynamic parallelism” (nested kernel launches) which is not used a lot anymore. Relocatable device code is less optimized so it needs to be used only on object files that need it. Using -dlto (Device Link Time Optimization) at the device code compilation step and dlink step help reduce the protentional perf degradation of -rdc. Note that it needs to be used at both steps to be useful.
If you have rdc objects you need to have an extra -dlink (device linking) step before the CPU symbol linking step. There is also a case where -dlink is used without -rdc: when an extension is linked against a static lib containing rdc-compiled objects like the [NVSHMEM library](https://developer.nvidia.com/nvshmem).
Note: Ninja is required to build a CUDA Extension with RDC linking.
Example
>>> # xdoctest: +SKIP >>> CUDAExtension( ... name='cuda_extension', ... sources=['extension.cpp', 'extension_kernel.cu'], ... dlink=True, ... dlink_libraries=["dlink_lib"], ... extra_compile_args={'cxx': ['-g'], ... 'nvcc': ['-O2', '-rdc=true']})
- class mmcv.utils.Config(cfg_dict=None, cfg_text=None, filename=None)[source]¶
A facility for config and config files.
It supports common file formats as configs: python/json/yaml. The interface is the same as a dict object and also allows access config values as attributes.
Example
>>> cfg = Config(dict(a=1, b=dict(b1=[0, 1]))) >>> cfg.a 1 >>> cfg.b {'b1': [0, 1]} >>> cfg.b.b1 [0, 1] >>> cfg = Config.fromfile('tests/data/config/a.py') >>> cfg.filename "/home/kchen/projects/mmcv/tests/data/config/a.py" >>> cfg.item4 'test' >>> cfg "Config [path: /home/kchen/projects/mmcv/tests/data/config/a.py]: " "{'item1': [1, 2], 'item2': {'a': 0}, 'item3': True, 'item4': 'test'}"
- static auto_argparser(description=None)[source]¶
Generate argparser from config file automatically (experimental)
- dump(file=None)[source]¶
Dumps config into a file or returns a string representation of the config.
If a file argument is given, saves the config to that file using the format defined by the file argument extension.
Otherwise, returns a string representing the config. The formatting of this returned string is defined by the extension of self.filename. If self.filename is not defined, returns a string representation of a
dict (lowercased and using ‘ for strings).
Examples
>>> cfg_dict = dict(item1=[1, 2], item2=dict(a=0), ... item3=True, item4='test') >>> cfg = Config(cfg_dict=cfg_dict) >>> dump_file = "a.py" >>> cfg.dump(dump_file)
- Parameters
file (str, optional) – Path of the output file where the config will be dumped. Defaults to None.
- static fromstring(cfg_str, file_format)[source]¶
Generate config from config str.
- Parameters
cfg_str (str) – Config str.
file_format (str) – Config file format corresponding to the config str. Only py/yml/yaml/json type are supported now!
- Returns
Config obj.
- Return type
- merge_from_dict(options, allow_list_keys=True)[source]¶
Merge list into cfg_dict.
Merge the dict parsed by MultipleKVAction into this cfg.
Examples
>>> options = {'model.backbone.depth': 50, ... 'model.backbone.with_cp':True} >>> cfg = Config(dict(model=dict(backbone=dict(type='ResNet')))) >>> cfg.merge_from_dict(options) >>> cfg_dict = super(Config, self).__getattribute__('_cfg_dict') >>> assert cfg_dict == dict( ... model=dict(backbone=dict(depth=50, with_cp=True)))
>>> # Merge list element >>> cfg = Config(dict(pipeline=[ ... dict(type='LoadImage'), dict(type='LoadAnnotations')])) >>> options = dict(pipeline={'0': dict(type='SelfLoadImage')}) >>> cfg.merge_from_dict(options, allow_list_keys=True) >>> cfg_dict = super(Config, self).__getattribute__('_cfg_dict') >>> assert cfg_dict == dict(pipeline=[ ... dict(type='SelfLoadImage'), dict(type='LoadAnnotations')])
- Parameters
options (dict) – dict of configs to merge from.
allow_list_keys (bool) – If True, int string keys (e.g. ‘0’, ‘1’) are allowed in
options
and will replace the element of the corresponding index in the config if the config is a list. Default: True.
- mmcv.utils.CppExtension(name, sources, *args, **kwargs)[source]¶
Creates a
setuptools.Extension
for C++.Convenience method that creates a
setuptools.Extension
with the bare minimum (but often sufficient) arguments to build a C++ extension.All arguments are forwarded to the
setuptools.Extension
constructor.Example
>>> # xdoctest: +SKIP >>> from setuptools import setup >>> from torch.utils.cpp_extension import BuildExtension, CppExtension >>> setup( ... name='extension', ... ext_modules=[ ... CppExtension( ... name='extension', ... sources=['extension.cpp'], ... extra_compile_args=['-g']), ... ], ... cmdclass={ ... 'build_ext': BuildExtension ... })
- class mmcv.utils.DataLoader(dataset: torch.utils.data.dataset.Dataset[torch.utils.data.dataloader.T_co], batch_size: Optional[int] = 1, shuffle: Optional[bool] = None, sampler: Optional[Union[torch.utils.data.sampler.Sampler, Iterable]] = None, batch_sampler: Optional[Union[torch.utils.data.sampler.Sampler[Sequence], Iterable[Sequence]]] = None, num_workers: int = 0, collate_fn: Optional[Callable[[List[torch.utils.data.dataloader.T]], Any]] = None, pin_memory: bool = False, drop_last: bool = False, timeout: float = 0, worker_init_fn: Optional[Callable[[int], None]] = None, multiprocessing_context=None, generator=None, *, prefetch_factor: int = 2, persistent_workers: bool = False, pin_memory_device: str = '')[source]¶
Data loader. Combines a dataset and a sampler, and provides an iterable over the given dataset.
The
DataLoader
supports both map-style and iterable-style datasets with single- or multi-process loading, customizing loading order and optional automatic batching (collation) and memory pinning.See
torch.utils.data
documentation page for more details.- Parameters
dataset (Dataset) – dataset from which to load the data.
batch_size (int, optional) – how many samples per batch to load (default:
1
).shuffle (bool, optional) – set to
True
to have the data reshuffled at every epoch (default:False
).sampler (Sampler or Iterable, optional) – defines the strategy to draw samples from the dataset. Can be any
Iterable
with__len__
implemented. If specified,shuffle
must not be specified.batch_sampler (Sampler or Iterable, optional) – like
sampler
, but returns a batch of indices at a time. Mutually exclusive withbatch_size
,shuffle
,sampler
, anddrop_last
.num_workers (int, optional) – how many subprocesses to use for data loading.
0
means that the data will be loaded in the main process. (default:0
)collate_fn (Callable, optional) – merges a list of samples to form a mini-batch of Tensor(s). Used when using batched loading from a map-style dataset.
pin_memory (bool, optional) – If
True
, the data loader will copy Tensors into device/CUDA pinned memory before returning them. If your data elements are a custom type, or yourcollate_fn
returns a batch that is a custom type, see the example below.drop_last (bool, optional) – set to
True
to drop the last incomplete batch, if the dataset size is not divisible by the batch size. IfFalse
and the size of dataset is not divisible by the batch size, then the last batch will be smaller. (default:False
)timeout (numeric, optional) – if positive, the timeout value for collecting a batch from workers. Should always be non-negative. (default:
0
)worker_init_fn (Callable, optional) – If not
None
, this will be called on each worker subprocess with the worker id (an int in[0, num_workers - 1]
) as input, after seeding and before data loading. (default:None
)generator (torch.Generator, optional) – If not
None
, this RNG will be used by RandomSampler to generate random indexes and multiprocessing to generate base_seed for workers. (default:None
)prefetch_factor (int, optional, keyword-only arg) – Number of batches loaded in advance by each worker.
2
means there will be a total of 2 * num_workers batches prefetched across all workers. (default:2
)persistent_workers (bool, optional) – If
True
, the data loader will not shutdown the worker processes after a dataset has been consumed once. This allows to maintain the workers Dataset instances alive. (default:False
)pin_memory_device (str, optional) – the data loader will copy Tensors into device pinned memory before returning them if pin_memory is set to true.
Warning
If the
spawn
start method is used,worker_init_fn
cannot be an unpicklable object, e.g., a lambda function. See multiprocessing-best-practices on more details related to multiprocessing in PyTorch.Warning
len(dataloader)
heuristic is based on the length of the sampler used. Whendataset
is anIterableDataset
, it instead returns an estimate based onlen(dataset) / batch_size
, with proper rounding depending ondrop_last
, regardless of multi-process loading configurations. This represents the best guess PyTorch can make because PyTorch trusts userdataset
code in correctly handling multi-process loading to avoid duplicate data.However, if sharding results in multiple workers having incomplete last batches, this estimate can still be inaccurate, because (1) an otherwise complete batch can be broken into multiple ones and (2) more than one batch worth of samples can be dropped when
drop_last
is set. Unfortunately, PyTorch can not detect such cases in general.See `Dataset Types`_ for more details on these two types of datasets and how
IterableDataset
interacts with `Multi-process data loading`_.Warning
See reproducibility, and dataloader-workers-random-seed, and data-loading-randomness notes for random seed related questions.
- class mmcv.utils.DictAction(option_strings, dest, nargs=None, const=None, default=None, type=None, choices=None, required=False, help=None, metavar=None)[source]¶
argparse action to split an argument into KEY=VALUE form on the first = and append to a dictionary. List options can be passed as comma separated values, i.e ‘KEY=V1,V2,V3’, or with explicit brackets, i.e. ‘KEY=[V1,V2,V3]’. It also support nested brackets to build list/tuple values. e.g. ‘KEY=[(V1,V2),(V3,V4)]’
- mmcv.utils.PoolDataLoader¶
- class mmcv.utils.ProgressBar(task_num=0, bar_width=50, start=True, file=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>)[source]¶
A progress bar which can print the progress.
- class mmcv.utils.Registry(name, build_func=None, parent=None, scope=None)[source]¶
A registry to map strings to classes or functions.
Registered object could be built from registry. Meanwhile, registered functions could be called from registry.
Example
>>> MODELS = Registry('models') >>> @MODELS.register_module() >>> class ResNet: >>> pass >>> resnet = MODELS.build(dict(type='ResNet')) >>> @MODELS.register_module() >>> def resnet50(): >>> pass >>> resnet = MODELS.build(dict(type='resnet50'))
Please refer to https://mmcv.readthedocs.io/en/latest/understand_mmcv/registry.html for advanced usage.
- Parameters
name (str) – Registry name.
build_func (func, optional) – Build function to construct instance from Registry, func:build_from_cfg is used if neither
parent
orbuild_func
is specified. Ifparent
is specified andbuild_func
is not given,build_func
will be inherited fromparent
. Default: None.parent (Registry, optional) – Parent registry. The class registered in children registry could be built from parent. Default: None.
scope (str, optional) – The scope of registry. It is the key to search for children registry. If not specified, scope will be the name of the package where class is defined, e.g. mmdet, mmcls, mmseg. Default: None.
- get(key)[source]¶
Get the registry record.
- Parameters
key (str) – The class name in string format.
- Returns
The corresponding class.
- Return type
class
- static infer_scope()[source]¶
Infer the scope of registry.
The name of the package where registry is defined will be returned.
Example
>>> # in mmdet/models/backbone/resnet.py >>> MODELS = Registry('models') >>> @MODELS.register_module() >>> class ResNet: >>> pass The scope of ``ResNet`` will be ``mmdet``.
- Returns
The inferred scope name.
- Return type
str
- register_module(name=None, force=False, module=None)[source]¶
Register a module.
A record will be added to self._module_dict, whose key is the class name or the specified name, and value is the class itself. It can be used as a decorator or a normal function.
Example
>>> backbones = Registry('backbone') >>> @backbones.register_module() >>> class ResNet: >>> pass
>>> backbones = Registry('backbone') >>> @backbones.register_module(name='mnet') >>> class MobileNet: >>> pass
>>> backbones = Registry('backbone') >>> class ResNet: >>> pass >>> backbones.register_module(ResNet)
- Parameters
name (str | None) – The module name to be registered. If not specified, the class name will be used.
force (bool, optional) – Whether to override an existing class with the same name. Default: False.
module (type) – Module class or function to be registered.
- static split_scope_key(key)[source]¶
Split scope and key.
The first scope will be split from key.
Examples
>>> Registry.split_scope_key('mmdet.ResNet') 'mmdet', 'ResNet' >>> Registry.split_scope_key('ResNet') None, 'ResNet'
- Returns
The former element is the first scope of the key, which can be
None
. The latter is the remaining key.- Return type
tuple[str | None, str]
- class mmcv.utils.SyncBatchNorm(num_features: int, eps: float = 1e-05, momentum: float = 0.1, affine: bool = True, track_running_stats: bool = True, process_group: Optional[Any] = None, device=None, dtype=None)[source]¶
- class mmcv.utils.Timer(start=True, print_tmpl=None)[source]¶
A flexible Timer class.
Examples
>>> import time >>> import mmcv >>> with mmcv.Timer(): >>> # simulate a code block that will run for 1s >>> time.sleep(1) 1.000 >>> with mmcv.Timer(print_tmpl='it takes {:.1f} seconds'): >>> # simulate a code block that will run for 1s >>> time.sleep(1) it takes 1.0 seconds >>> timer = mmcv.Timer() >>> time.sleep(0.5) >>> print(timer.since_start()) 0.500 >>> time.sleep(0.5) >>> print(timer.since_last_check()) 0.500 >>> print(timer.since_start()) 1.000
- property is_running¶
indicate whether the timer is running
- Type
bool
- since_last_check()[source]¶
Time since the last checking.
Either
since_start()
orsince_last_check()
is a checking operation.- Returns
Time in seconds.
- Return type
float
- mmcv.utils.assert_attrs_equal(obj: Any, expected_attrs: Dict[str, Any]) → bool[source]¶
Check if attribute of class object is correct.
- Parameters
obj (object) – Class object to be checked.
expected_attrs (Dict[str, Any]) – Dict of the expected attrs.
- Returns
Whether the attribute of class object is correct.
- Return type
bool
- mmcv.utils.assert_dict_contains_subset(dict_obj: Dict[Any, Any], expected_subset: Dict[Any, Any]) → bool[source]¶
Check if the dict_obj contains the expected_subset.
- Parameters
dict_obj (Dict[Any, Any]) – Dict object to be checked.
expected_subset (Dict[Any, Any]) – Subset expected to be contained in dict_obj.
- Returns
Whether the dict_obj contains the expected_subset.
- Return type
bool
- mmcv.utils.assert_dict_has_keys(obj: Dict[str, Any], expected_keys: List[str]) → bool[source]¶
Check if the obj has all the expected_keys.
- Parameters
obj (Dict[str, Any]) – Object to be checked.
expected_keys (List[str]) – Keys expected to contained in the keys of the obj.
- Returns
Whether the obj has the expected keys.
- Return type
bool
- mmcv.utils.assert_is_norm_layer(module) → bool[source]¶
Check if the module is a norm layer.
- Parameters
module (nn.Module) – The module to be checked.
- Returns
Whether the module is a norm layer.
- Return type
bool
- mmcv.utils.assert_keys_equal(result_keys: List[str], target_keys: List[str]) → bool[source]¶
Check if target_keys is equal to result_keys.
- Parameters
result_keys (List[str]) – Result keys to be checked.
target_keys (List[str]) – Target keys to be checked.
- Returns
Whether target_keys is equal to result_keys.
- Return type
bool
- mmcv.utils.assert_params_all_zeros(module) → bool[source]¶
Check if the parameters of the module is all zeros.
- Parameters
module (nn.Module) – The module to be checked.
- Returns
Whether the parameters of the module is all zeros.
- Return type
bool
- mmcv.utils.build_from_cfg(cfg: Dict, registry: mmcv.utils.registry.Registry, default_args: Optional[Dict] = None) → Any[source]¶
Build a module from config dict when it is a class configuration, or call a function from config dict when it is a function configuration.
Example
>>> MODELS = Registry('models') >>> @MODELS.register_module() >>> class ResNet: >>> pass >>> resnet = build_from_cfg(dict(type='Resnet'), MODELS) >>> # Returns an instantiated object >>> @MODELS.register_module() >>> def resnet50(): >>> pass >>> resnet = build_from_cfg(dict(type='resnet50'), MODELS) >>> # Return a result of the calling function
- Parameters
cfg (dict) – Config dict. It should at least contain the key “type”.
registry (
Registry
) – The registry to search the type from.default_args (dict, optional) – Default initialization arguments.
- Returns
The constructed object.
- Return type
object
- mmcv.utils.check_prerequisites(prerequisites, checker, msg_tmpl='Prerequisites "{}" are required in method "{}" but not found, please install them first.')[source]¶
A decorator factory to check if prerequisites are satisfied.
- Parameters
prerequisites (str of list[str]) – Prerequisites to be checked.
checker (callable) – The checker method that returns True if a prerequisite is meet, False otherwise.
msg_tmpl (str) – The message template with two variables.
- Returns
A specific decorator.
- Return type
decorator
- mmcv.utils.check_python_script(cmd)[source]¶
Run the python cmd script with __main__. The difference between os.system is that, this function exectues code in the current process, so that it can be tracked by coverage tools. Currently it supports two forms:
./tests/data/scripts/hello.py zz
python tests/data/scripts/hello.py zz
- mmcv.utils.check_time(timer_id)[source]¶
Add check points in a single line.
This method is suitable for running a task on a list of items. A timer will be registered when the method is called for the first time.
Examples
>>> import time >>> import mmcv >>> for i in range(1, 6): >>> # simulate a code block >>> time.sleep(i) >>> mmcv.check_time('task1') 2.000 3.000 4.000 5.000
- Parameters
str – Timer identifier.
- mmcv.utils.collect_env()[source]¶
Collect the information of the running environments.
- Returns
The environment information. The following fields are contained.
sys.platform: The variable of
sys.platform
.Python: Python version.
CUDA available: Bool, indicating if CUDA is available.
GPU devices: Device type of each GPU.
CUDA_HOME (optional): The env var
CUDA_HOME
.NVCC (optional): NVCC version.
GCC: GCC version, “n/a” if GCC is not installed.
MSVC: Microsoft Virtual C++ Compiler version, Windows only.
PyTorch: PyTorch version.
PyTorch compiling details: The output of
torch.__config__.show()
.TorchVision (optional): TorchVision version.
OpenCV: OpenCV version.
MMCV: MMCV version.
MMCV Compiler: The GCC version for compiling MMCV ops.
MMCV CUDA Compiler: The CUDA version for compiling MMCV ops.
- Return type
dict
- mmcv.utils.concat_list(in_list)[source]¶
Concatenate a list of list into a single list.
- Parameters
in_list (list) – The list of list to be merged.
- Returns
The concatenated flat list.
- Return type
list
- mmcv.utils.deprecated_api_warning(name_dict, cls_name=None)[source]¶
A decorator to check if some arguments are deprecate and try to replace deprecate src_arg_name to dst_arg_name.
- Parameters
name_dict (dict) – key (str): Deprecate argument names. val (str): Expected argument names.
- Returns
New function.
- Return type
func
- mmcv.utils.digit_version(version_str: str, length: int = 4)[source]¶
Convert a version string into a tuple of integers.
This method is usually used for comparing two versions. For pre-release versions: alpha < beta < rc.
- Parameters
version_str (str) – The version string.
length (int) – The maximum number of version levels. Default: 4.
- Returns
The version info in digits (integers).
- Return type
tuple[int]
- mmcv.utils.get_git_hash(fallback='unknown', digits=None)[source]¶
Get the git hash of the current repo.
- Parameters
fallback (str, optional) – The fallback string when git hash is unavailable. Defaults to ‘unknown’.
digits (int, optional) – kept digits of the hash. Defaults to None, meaning all digits are kept.
- Returns
Git commit hash.
- Return type
str
- mmcv.utils.get_logger(name, log_file=None, log_level=20, file_mode='w')[source]¶
Initialize and get a logger by name.
If the logger has not been initialized, this method will initialize the logger by adding one or two handlers, otherwise the initialized logger will be directly returned. During initialization, a StreamHandler will always be added. If log_file is specified and the process rank is 0, a FileHandler will also be added.
- Parameters
name (str) – Logger name.
log_file (str | None) – The log filename. If specified, a FileHandler will be added to the logger.
log_level (int) – The logger level. Note that only the process of rank 0 is affected, and other processes will set the level to “Error” thus be silent most of the time.
file_mode (str) – The file mode used in opening log file. Defaults to ‘w’.
- Returns
The expected logger.
- Return type
logging.Logger
- mmcv.utils.has_method(obj: object, method: str) → bool[source]¶
Check whether the object has a method.
- Parameters
method (str) – The method name to check.
obj (object) – The object to check.
- Returns
True if the object has the method else False.
- Return type
bool
- mmcv.utils.import_modules_from_strings(imports, allow_failed_imports=False)[source]¶
Import modules from the given list of strings.
- Parameters
imports (list | str | None) – The given module names to be imported.
allow_failed_imports (bool) – If True, the failed imports will return None. Otherwise, an ImportError is raise. Default: False.
- Returns
The imported modules.
- Return type
list[module] | module | None
Examples
>>> osp, sys = import_modules_from_strings( ... ['os.path', 'sys']) >>> import os.path as osp_ >>> import sys as sys_ >>> assert osp == osp_ >>> assert sys == sys_
- mmcv.utils.is_list_of(seq, expected_type)[source]¶
Check whether it is a list of some type.
A partial method of
is_seq_of()
.
- mmcv.utils.is_method_overridden(method, base_class, derived_class)[source]¶
Check if a method of base class is overridden in derived class.
- Parameters
method (str) – the method name to check.
base_class (type) – the class of the base class.
derived_class (type | Any) – the class or instance of the derived class.
- mmcv.utils.is_seq_of(seq, expected_type, seq_type=None)[source]¶
Check whether it is a sequence of some type.
- Parameters
seq (Sequence) – The sequence to be checked.
expected_type (type) – Expected type of sequence items.
seq_type (type, optional) – Expected sequence type.
- Returns
Whether the sequence is valid.
- Return type
bool
- mmcv.utils.is_str(x)[source]¶
Whether the input is an string instance.
Note: This method is deprecated since python 2 is no longer supported.
- mmcv.utils.is_tuple_of(seq, expected_type)[source]¶
Check whether it is a tuple of some type.
A partial method of
is_seq_of()
.
- mmcv.utils.iter_cast(inputs, dst_type, return_type=None)[source]¶
Cast elements of an iterable object into some type.
- Parameters
inputs (Iterable) – The input object.
dst_type (type) – Destination type.
return_type (type, optional) – If specified, the output object will be converted to this type, otherwise an iterator.
- Returns
The converted object.
- Return type
iterator or specified type
- mmcv.utils.list_cast(inputs, dst_type)[source]¶
Cast elements of an iterable object into a list of some type.
A partial method of
iter_cast()
.
- mmcv.utils.load_url(url: str, model_dir: Optional[str] = None, map_location: Optional[Union[Callable[[torch.Tensor, str], torch.Tensor], torch.device, str, Dict[str, str]]] = None, progress: bool = True, check_hash: bool = False, file_name: Optional[str] = None) → Dict[str, Any]¶
Loads the Torch serialized object at the given URL.
If downloaded file is a zip file, it will be automatically decompressed.
If the object is already present in model_dir, it’s deserialized and returned. The default value of
model_dir
is<hub_dir>/checkpoints
wherehub_dir
is the directory returned byget_dir()
.- Parameters
url (str) – URL of the object to download
model_dir (str, optional) – directory in which to save the object
map_location (optional) – a function or a dict specifying how to remap storage locations (see torch.load)
progress (bool, optional) – whether or not to display a progress bar to stderr. Default: True
check_hash (bool, optional) – If True, the filename part of the URL should follow the naming convention
filename-<sha256>.ext
where<sha256>
is the first eight or more digits of the SHA256 hash of the contents of the file. The hash is used to ensure unique names and to verify the contents of the file. Default: Falsefile_name (str, optional) – name for the downloaded file. Filename from
url
will be used if not set.
Example
>>> state_dict = torch.hub.load_state_dict_from_url('https://s3.amazonaws.com/pytorch/models/resnet18-5c106cde.pth')
- mmcv.utils.print_log(msg, logger=None, level=20)[source]¶
Print a log message.
- Parameters
msg (str) – The message to be logged.
logger (logging.Logger | str | None) –
The logger to be used. Some special loggers are:
”silent”: no message will be printed.
other str: the logger obtained with get_root_logger(logger).
None: The print() method will be used to print log messages.
level (int) – Logging level. Only available when logger is a Logger object or “root”.
- mmcv.utils.requires_executable(prerequisites)[source]¶
A decorator to check if some executable files are installed.
Example
>>> @requires_executable('ffmpeg') >>> func(arg1, args): >>> print(1) 1
- mmcv.utils.requires_package(prerequisites)[source]¶
A decorator to check if some python packages are installed.
Example
>>> @requires_package('numpy') >>> func(arg1, args): >>> return numpy.zeros(1) array([0.]) >>> @requires_package(['numpy', 'non_package']) >>> func(arg1, args): >>> return numpy.zeros(1) ImportError
- mmcv.utils.scandir(dir_path, suffix=None, recursive=False, case_sensitive=True)[source]¶
Scan a directory to find the interested files.
- Parameters
dir_path (str |
Path
) – Path of the directory.suffix (str | tuple(str), optional) – File suffix that we are interested in. Default: None.
recursive (bool, optional) – If set to True, recursively scan the directory. Default: False.
case_sensitive (bool, optional) – If set to False, ignore the case of suffix. Default: True.
- Returns
A generator for all the interested files with relative paths.
- mmcv.utils.slice_list(in_list, lens)[source]¶
Slice a list into several sub lists by a list of given length.
- Parameters
in_list (list) – The list to be sliced.
lens (int or list) – The expected length of each out list.
- Returns
A list of sliced list.
- Return type
list
- mmcv.utils.torch_meshgrid(*tensors)[source]¶
A wrapper of torch.meshgrid to compat different PyTorch versions.
Since PyTorch 1.10.0a0, torch.meshgrid supports the arguments
indexing
. So we implement a wrapper here to avoid warning when using high-version PyTorch and avoid compatibility issues when using previous versions of PyTorch.- Parameters
tensors (List[Tensor]) – List of scalars or 1 dimensional tensors.
- Returns
Sequence of meshgrid tensors.
- Return type
Sequence[Tensor]
- mmcv.utils.track_iter_progress(tasks, bar_width=50, file=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>)[source]¶
Track the progress of tasks iteration or enumeration with a progress bar.
Tasks are yielded with a simple for-loop.
- Parameters
tasks (list or tuple[Iterable, int]) – A list of tasks or (tasks, total num).
bar_width (int) – Width of progress bar.
- Yields
list – The task results.
- mmcv.utils.track_parallel_progress(func, tasks, nproc, initializer=None, initargs=None, bar_width=50, chunksize=1, skip_first=False, keep_order=True, file=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>)[source]¶
Track the progress of parallel task execution with a progress bar.
The built-in
multiprocessing
module is used for process pools and tasks are done withPool.map()
orPool.imap_unordered()
.- Parameters
func (callable) – The function to be applied to each task.
tasks (list or tuple[Iterable, int]) – A list of tasks or (tasks, total num).
nproc (int) – Process (worker) number.
initializer (None or callable) – Refer to
multiprocessing.Pool
for details.initargs (None or tuple) – Refer to
multiprocessing.Pool
for details.chunksize (int) – Refer to
multiprocessing.Pool
for details.bar_width (int) – Width of progress bar.
skip_first (bool) – Whether to skip the first sample for each worker when estimating fps, since the initialization step may takes longer.
keep_order (bool) – If True,
Pool.imap()
is used, otherwisePool.imap_unordered()
is used.
- Returns
The task results.
- Return type
list
- mmcv.utils.track_progress(func, tasks, bar_width=50, file=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>, **kwargs)[source]¶
Track the progress of tasks execution with a progress bar.
Tasks are done with a simple for-loop.
- Parameters
func (callable) – The function to be applied to each task.
tasks (list or tuple[Iterable, int]) – A list of tasks or (tasks, total num).
bar_width (int) – Width of progress bar.
- Returns
The task results.
- Return type
list
- mmcv.utils.tuple_cast(inputs, dst_type)[source]¶
Cast elements of an iterable object into a tuple of some type.
A partial method of
iter_cast()
.
- mmcv.utils.worker_init_fn(worker_id: int, num_workers: int, rank: int, seed: int)[source]¶
Function to initialize each worker.
The seed of each worker equals to
num_worker * rank + worker_id + user_seed
.- Parameters
worker_id (int) – Id for each worker.
num_workers (int) – Number of workers.
rank (int) – Rank in distributed training.
seed (int) – Random seed.
cnn¶
- class mmcv.cnn.AlexNet(num_classes: int = - 1)[source]¶
AlexNet backbone.
- Parameters
num_classes (int) – number of classes for classification.
- forward(x: torch.Tensor) → torch.Tensor[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class mmcv.cnn.ConstantInit(val: Union[int, float], **kwargs)[source]¶
Initialize module parameters with constant values.
- Parameters
val (int | float) – the value to fill the weights in the module with
bias (int | float) – the value to fill the bias. Defaults to 0.
bias_prob (float, optional) – the probability for bias initialization. Defaults to None.
layer (str | list[str], optional) – the layer will be initialized. Defaults to None.
- class mmcv.cnn.ContextBlock(in_channels: int, ratio: float, pooling_type: str = 'att', fusion_types: tuple = ('channel_add'))[source]¶
ContextBlock module in GCNet.
See ‘GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond’ (https://arxiv.org/abs/1904.11492) for details.
- Parameters
in_channels (int) – Channels of the input feature map.
ratio (float) – Ratio of channels of transform bottleneck
pooling_type (str) – Pooling method for context modeling. Options are ‘att’ and ‘avg’, stand for attention pooling and average pooling respectively. Default: ‘att’.
fusion_types (Sequence[str]) – Fusion method for feature fusion, Options are ‘channels_add’, ‘channel_mul’, stand for channelwise addition and multiplication respectively. Default: (‘channel_add’,)
- forward(x: torch.Tensor) → torch.Tensor[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class mmcv.cnn.Conv2d(in_channels: int, out_channels: int, kernel_size: Union[int, Tuple[int, int]], stride: Union[int, Tuple[int, int]] = 1, padding: Union[str, int, Tuple[int, int]] = 0, dilation: Union[int, Tuple[int, int]] = 1, groups: int = 1, bias: bool = True, padding_mode: str = 'zeros', device=None, dtype=None)[source]¶
- forward(x: torch.Tensor) → torch.Tensor[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class mmcv.cnn.Conv2dRFSearchOp(op_layer: torch.nn.modules.module.Module, global_config: dict, verbose: bool = True)[source]¶
Enable Conv2d with receptive field searching ability.
- Parameters
op_layer (nn.Module) – pytorch module, e,g, Conv2d
global_config (dict) –
config dict. Defaults to None. By default this must include:
”init_alphas”: The value for initializing weights of each branch.
”num_branches”: The controller of the size of search space (the number of branches).
”exp_rate”: The controller of the sparsity of search space.
”mmin”: The minimum dilation rate.
”mmax”: The maximum dilation rate.
Extra keys may exist, but are used by RFSearchHook, e.g., “step”, “max_step”, “search_interval”, and “skip_layer”.
verbose (bool) – Determines whether to print rf-next related logging messages. Defaults to True.
- forward(input: torch.Tensor) → torch.Tensor[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class mmcv.cnn.Conv3d(in_channels: int, out_channels: int, kernel_size: Union[int, Tuple[int, int, int]], stride: Union[int, Tuple[int, int, int]] = 1, padding: Union[str, int, Tuple[int, int, int]] = 0, dilation: Union[int, Tuple[int, int, int]] = 1, groups: int = 1, bias: bool = True, padding_mode: str = 'zeros', device=None, dtype=None)[source]¶
- forward(x: torch.Tensor) → torch.Tensor[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class mmcv.cnn.ConvAWS2d(in_channels: int, out_channels: int, kernel_size: Union[int, Tuple[int, int]], stride: Union[int, Tuple[int, int]] = 1, padding: Union[int, Tuple[int, int]] = 0, dilation: Union[int, Tuple[int, int]] = 1, groups: int = 1, bias: bool = True)[source]¶
AWS (Adaptive Weight Standardization)
This is a variant of Weight Standardization (https://arxiv.org/pdf/1903.10520.pdf) It is used in DetectoRS to avoid NaN (https://arxiv.org/pdf/2006.02334.pdf)
- Parameters
in_channels (int) – Number of channels in the input image
out_channels (int) – Number of channels produced by the convolution
kernel_size (int or tuple) – Size of the conv kernel
stride (int or tuple, optional) – Stride of the convolution. Default: 1
padding (int or tuple, optional) – Zero-padding added to both sides of the input. Default: 0
dilation (int or tuple, optional) – Spacing between kernel elements. Default: 1
groups (int, optional) – Number of blocked connections from input channels to output channels. Default: 1
bias (bool, optional) – If set True, adds a learnable bias to the output. Default: True
- forward(x: torch.Tensor) → torch.Tensor[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class mmcv.cnn.ConvModule(in_channels: int, out_channels: int, kernel_size: Union[int, Tuple[int, int]], stride: Union[int, Tuple[int, int]] = 1, padding: Union[int, Tuple[int, int]] = 0, dilation: Union[int, Tuple[int, int]] = 1, groups: int = 1, bias: Union[bool, str] = 'auto', conv_cfg: Optional[Dict] = None, norm_cfg: Optional[Dict] = None, act_cfg: Optional[Dict] = {'type': 'ReLU'}, inplace: bool = True, with_spectral_norm: bool = False, padding_mode: str = 'zeros', order: tuple = ('conv', 'norm', 'act'))[source]¶
A conv block that bundles conv/norm/activation layers.
This block simplifies the usage of convolution layers, which are commonly used with a norm layer (e.g., BatchNorm) and activation layer (e.g., ReLU). It is based upon three build methods: build_conv_layer(), build_norm_layer() and build_activation_layer().
Besides, we add some additional features in this module. 1. Automatically set bias of the conv layer. 2. Spectral norm is supported. 3. More padding modes are supported. Before PyTorch 1.5, nn.Conv2d only supports zero and circular padding, and we add “reflect” padding mode.
- Parameters
in_channels (int) – Number of channels in the input feature map. Same as that in
nn._ConvNd
.out_channels (int) – Number of channels produced by the convolution. Same as that in
nn._ConvNd
.kernel_size (int | tuple[int]) – Size of the convolving kernel. Same as that in
nn._ConvNd
.stride (int | tuple[int]) – Stride of the convolution. Same as that in
nn._ConvNd
.padding (int | tuple[int]) – Zero-padding added to both sides of the input. Same as that in
nn._ConvNd
.dilation (int | tuple[int]) – Spacing between kernel elements. Same as that in
nn._ConvNd
.groups (int) – Number of blocked connections from input channels to output channels. Same as that in
nn._ConvNd
.bias (bool | str) – If specified as auto, it will be decided by the norm_cfg. Bias will be set as True if norm_cfg is None, otherwise False. Default: “auto”.
conv_cfg (dict) – Config dict for convolution layer. Default: None, which means using conv2d.
norm_cfg (dict) – Config dict for normalization layer. Default: None.
act_cfg (dict) – Config dict for activation layer. Default: dict(type=’ReLU’).
inplace (bool) – Whether to use inplace mode for activation. Default: True.
with_spectral_norm (bool) – Whether use spectral norm in conv module. Default: False.
padding_mode (str) – If the padding_mode has not been supported by current Conv2d in PyTorch, we will use our own padding layer instead. Currently, we support [‘zeros’, ‘circular’] with official implementation and [‘reflect’] with our own implementation. Default: ‘zeros’.
order (tuple[str]) – The order of conv/norm/activation layers. It is a sequence of “conv”, “norm” and “act”. Common examples are (“conv”, “norm”, “act”) and (“act”, “conv”, “norm”). Default: (‘conv’, ‘norm’, ‘act’).
- forward(x: torch.Tensor, activate: bool = True, norm: bool = True) → torch.Tensor[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class mmcv.cnn.ConvTranspose2d(in_channels: int, out_channels: int, kernel_size: Union[int, Tuple[int, int]], stride: Union[int, Tuple[int, int]] = 1, padding: Union[int, Tuple[int, int]] = 0, output_padding: Union[int, Tuple[int, int]] = 0, groups: int = 1, bias: bool = True, dilation: Union[int, Tuple[int, int]] = 1, padding_mode: str = 'zeros', device=None, dtype=None)[source]¶
- forward(x: torch.Tensor) → torch.Tensor[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class mmcv.cnn.ConvTranspose3d(in_channels: int, out_channels: int, kernel_size: Union[int, Tuple[int, int, int]], stride: Union[int, Tuple[int, int, int]] = 1, padding: Union[int, Tuple[int, int, int]] = 0, output_padding: Union[int, Tuple[int, int, int]] = 0, groups: int = 1, bias: bool = True, dilation: Union[int, Tuple[int, int, int]] = 1, padding_mode: str = 'zeros', device=None, dtype=None)[source]¶
- forward(x: torch.Tensor) → torch.Tensor[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class mmcv.cnn.ConvWS2d(in_channels: int, out_channels: int, kernel_size: Union[int, Tuple[int, int]], stride: Union[int, Tuple[int, int]] = 1, padding: Union[int, Tuple[int, int]] = 0, dilation: Union[int, Tuple[int, int]] = 1, groups: int = 1, bias: bool = True, eps: float = 1e-05)[source]¶
- forward(x: torch.Tensor) → torch.Tensor[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class mmcv.cnn.DepthwiseSeparableConvModule(in_channels: int, out_channels: int, kernel_size: Union[int, Tuple[int, int]], stride: Union[int, Tuple[int, int]] = 1, padding: Union[int, Tuple[int, int]] = 0, dilation: Union[int, Tuple[int, int]] = 1, norm_cfg: Optional[Dict] = None, act_cfg: Dict = {'type': 'ReLU'}, dw_norm_cfg: Union[Dict, str] = 'default', dw_act_cfg: Union[Dict, str] = 'default', pw_norm_cfg: Union[Dict, str] = 'default', pw_act_cfg: Union[Dict, str] = 'default', **kwargs)[source]¶
Depthwise separable convolution module.
See https://arxiv.org/pdf/1704.04861.pdf for details.
This module can replace a ConvModule with the conv block replaced by two conv block: depthwise conv block and pointwise conv block. The depthwise conv block contains depthwise-conv/norm/activation layers. The pointwise conv block contains pointwise-conv/norm/activation layers. It should be noted that there will be norm/activation layer in the depthwise conv block if norm_cfg and act_cfg are specified.
- Parameters
in_channels (int) – Number of channels in the input feature map. Same as that in
nn._ConvNd
.out_channels (int) – Number of channels produced by the convolution. Same as that in
nn._ConvNd
.kernel_size (int | tuple[int]) – Size of the convolving kernel. Same as that in
nn._ConvNd
.stride (int | tuple[int]) – Stride of the convolution. Same as that in
nn._ConvNd
. Default: 1.padding (int | tuple[int]) – Zero-padding added to both sides of the input. Same as that in
nn._ConvNd
. Default: 0.dilation (int | tuple[int]) – Spacing between kernel elements. Same as that in
nn._ConvNd
. Default: 1.norm_cfg (dict) – Default norm config for both depthwise ConvModule and pointwise ConvModule. Default: None.
act_cfg (dict) – Default activation config for both depthwise ConvModule and pointwise ConvModule. Default: dict(type=’ReLU’).
dw_norm_cfg (dict) – Norm config of depthwise ConvModule. If it is ‘default’, it will be the same as norm_cfg. Default: ‘default’.
dw_act_cfg (dict) – Activation config of depthwise ConvModule. If it is ‘default’, it will be the same as act_cfg. Default: ‘default’.
pw_norm_cfg (dict) – Norm config of pointwise ConvModule. If it is ‘default’, it will be the same as norm_cfg. Default: ‘default’.
pw_act_cfg (dict) – Activation config of pointwise ConvModule. If it is ‘default’, it will be the same as act_cfg. Default: ‘default’.
kwargs (optional) – Other shared arguments for depthwise and pointwise ConvModule. See ConvModule for ref.
- forward(x: torch.Tensor) → torch.Tensor[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class mmcv.cnn.GeneralizedAttention(in_channels: int, spatial_range: int = - 1, num_heads: int = 9, position_embedding_dim: int = - 1, position_magnitude: int = 1, kv_stride: int = 2, q_stride: int = 1, attention_type: str = '1111')[source]¶
GeneralizedAttention module.
See ‘An Empirical Study of Spatial Attention Mechanisms in Deep Networks’ (https://arxiv.org/abs/1904.05873) for details.
- Parameters
in_channels (int) – Channels of the input feature map.
spatial_range (int) – The spatial range. -1 indicates no spatial range constraint. Default: -1.
num_heads (int) – The head number of empirical_attention module. Default: 9.
position_embedding_dim (int) – The position embedding dimension. Default: -1.
position_magnitude (int) – A multiplier acting on coord difference. Default: 1.
kv_stride (int) – The feature stride acting on key/value feature map. Default: 2.
q_stride (int) – The feature stride acting on query feature map. Default: 1.
attention_type (str) –
A binary indicator string for indicating which items in generalized empirical_attention module are used. Default: ‘1111’.
’1000’ indicates ‘query and key content’ (appr - appr) item,
’0100’ indicates ‘query content and relative position’ (appr - position) item,
’0010’ indicates ‘key content only’ (bias - appr) item,
’0001’ indicates ‘relative position only’ (bias - position) item.
- forward(x_input: torch.Tensor) → torch.Tensor[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class mmcv.cnn.HSigmoid(bias: float = 3.0, divisor: float = 6.0, min_value: float = 0.0, max_value: float = 1.0)[source]¶
Hard Sigmoid Module. Apply the hard sigmoid function: Hsigmoid(x) = min(max((x + bias) / divisor, min_value), max_value) Default: Hsigmoid(x) = min(max((x + 3) / 6, 0), 1)
Note
In MMCV v1.4.4, we modified the default value of args to align with PyTorch official.
- Parameters
bias (float) – Bias of the input feature map. Default: 3.0.
divisor (float) – Divisor of the input feature map. Default: 6.0.
min_value (float) – Lower bound value. Default: 0.0.
max_value (float) – Upper bound value. Default: 1.0.
- Returns
The output tensor.
- Return type
Tensor
- forward(x: torch.Tensor) → torch.Tensor[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class mmcv.cnn.HSwish(inplace: bool = False)[source]¶
Hard Swish Module.
This module applies the hard swish function:
\[Hswish(x) = x * ReLU6(x + 3) / 6\]- Parameters
inplace (bool) – can optionally do the operation in-place. Default: False.
- Returns
The output tensor.
- Return type
Tensor
- forward(x: torch.Tensor) → torch.Tensor[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class mmcv.cnn.KaimingInit(a: float = 0, mode: str = 'fan_out', nonlinearity: str = 'relu', distribution: str = 'normal', **kwargs)[source]¶
Initialize module parameters with the values according to the method described in `Delving deep into rectifiers: Surpassing human-level.
performance on ImageNet classification - He, K. et al. (2015). <https://www.cv-foundation.org/openaccess/content_iccv_2015/ papers/He_Delving_Deep_into_ICCV_2015_paper.pdf>`_
- Parameters
a (int | float) – the negative slope of the rectifier used after this layer (only used with
'leaky_relu'
). Defaults to 0.mode (str) – either
'fan_in'
or'fan_out'
. Choosing'fan_in'
preserves the magnitude of the variance of the weights in the forward pass. Choosing'fan_out'
preserves the magnitudes in the backwards pass. Defaults to'fan_out'
.nonlinearity (str) – the non-linear function (nn.functional name), recommended to use only with
'relu'
or'leaky_relu'
. Defaults to ‘relu’.bias (int | float) – the value to fill the bias. Defaults to 0.
bias_prob (float, optional) – the probability for bias initialization. Defaults to None.
distribution (str) – distribution either be
'normal'
or'uniform'
. Defaults to'normal'
.layer (str | list[str], optional) – the layer will be initialized. Defaults to None.
- class mmcv.cnn.Linear(in_features: int, out_features: int, bias: bool = True, device=None, dtype=None)[source]¶
- forward(x: torch.Tensor) → torch.Tensor[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class mmcv.cnn.MaxPool2d(kernel_size: Union[int, Tuple[int, ...]], stride: Optional[Union[int, Tuple[int, ...]]] = None, padding: Union[int, Tuple[int, ...]] = 0, dilation: Union[int, Tuple[int, ...]] = 1, return_indices: bool = False, ceil_mode: bool = False)[source]¶
- forward(x: torch.Tensor) → torch.Tensor[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class mmcv.cnn.MaxPool3d(kernel_size: Union[int, Tuple[int, ...]], stride: Optional[Union[int, Tuple[int, ...]]] = None, padding: Union[int, Tuple[int, ...]] = 0, dilation: Union[int, Tuple[int, ...]] = 1, return_indices: bool = False, ceil_mode: bool = False)[source]¶
- forward(x: torch.Tensor) → torch.Tensor[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class mmcv.cnn.NonLocal1d(in_channels: int, sub_sample: bool = False, conv_cfg: Dict = {'type': 'Conv1d'}, **kwargs)[source]¶
1D Non-local module.
- Parameters
in_channels (int) – Same as NonLocalND.
sub_sample (bool) – Whether to apply max pooling after pairwise function (Note that the sub_sample is applied on spatial only). Default: False.
conv_cfg (None | dict) – Same as NonLocalND. Default: dict(type=’Conv1d’).
- class mmcv.cnn.NonLocal2d(in_channels: int, sub_sample: bool = False, conv_cfg: Dict = {'type': 'Conv2d'}, **kwargs)[source]¶
2D Non-local module.
- Parameters
in_channels (int) – Same as NonLocalND.
sub_sample (bool) – Whether to apply max pooling after pairwise function (Note that the sub_sample is applied on spatial only). Default: False.
conv_cfg (None | dict) – Same as NonLocalND. Default: dict(type=’Conv2d’).
- class mmcv.cnn.NonLocal3d(in_channels: int, sub_sample: bool = False, conv_cfg: Dict = {'type': 'Conv3d'}, **kwargs)[source]¶
3D Non-local module.
- Parameters
in_channels (int) – Same as NonLocalND.
sub_sample (bool) – Whether to apply max pooling after pairwise function (Note that the sub_sample is applied on spatial only). Default: False.
conv_cfg (None | dict) – Same as NonLocalND. Default: dict(type=’Conv3d’).
- class mmcv.cnn.NormalInit(mean: float = 0, std: float = 1, **kwargs)[source]¶
Initialize module parameters with the values drawn from the normal distribution \(\mathcal{N}(\text{mean}, \text{std}^2)\).
- Parameters
mean (int | float) – the mean of the normal distribution. Defaults to 0.
std (int | float) – the standard deviation of the normal distribution. Defaults to 1.
bias (int | float) – the value to fill the bias. Defaults to 0.
bias_prob (float, optional) – the probability for bias initialization. Defaults to None.
layer (str | list[str], optional) – the layer will be initialized. Defaults to None.
- class mmcv.cnn.PretrainedInit(checkpoint: str, prefix: Optional[str] = None, map_location: Optional[str] = None)[source]¶
Initialize module by loading a pretrained model.
- Parameters
checkpoint (str) – the checkpoint file of the pretrained model should be load.
prefix (str, optional) – the prefix of a sub-module in the pretrained model. it is for loading a part of the pretrained model to initialize. For example, if we would like to only load the backbone of a detector model, we can set
prefix='backbone.'
. Defaults to None.map_location (str) – map tensors into proper locations.
- class mmcv.cnn.RFSearchHook(mode: str = 'search', config: Dict = {}, rfstructure_file: Optional[str] = None, by_epoch: bool = True, verbose: bool = True)[source]¶
Rcecptive field search via dilation rates.
Please refer to RF-Next: Efficient Receptive Field Search for Convolutional Neural Networks for more details.
- Parameters
mode (str, optional) – It can be set to the following types: ‘search’, ‘fixed_single_branch’, or ‘fixed_multi_branch’. Defaults to ‘search’.
config (Dict, optional) –
config dict of search. By default this config contains “search”, and config[“search”] must include:
”step”: recording the current searching step.
”max_step”: The maximum number of searching steps to update the structures.
”search_interval”: The interval (epoch/iteration) between two updates.
”exp_rate”: The controller of the sparsity of search space.
”init_alphas”: The value for initializing weights of each branch.
”mmin”: The minimum dilation rate.
”mmax”: The maximum dilation rate.
”num_branches”: The controller of the size of search space (the number of branches).
”skip_layer”: The modules in skip_layer will be ignored during the receptive field search.
rfstructure_file (str, optional) – Path to load searched receptive fields of the model. Defaults to None.
by_epoch (bool, optional) – Determine to perform step by epoch or by iteration. If set to True, it will step by epoch. Otherwise, by iteration. Defaults to True.
verbose (bool) – Determines whether to print rf-next related logging messages. Defaults to True.
- estimate_and_expand(model: torch.nn.modules.module.Module)[source]¶
estimate and search for RFConvOp.
- Parameters
model (nn.Module) – pytorch model
- init_model(model: torch.nn.modules.module.Module)[source]¶
init model with search ability.
- Parameters
model (nn.Module) – pytorch model
- Raises
NotImplementedError – only support three modes: search/fixed_single_branch/fixed_multi_branch
- set_model(model: torch.nn.modules.module.Module, search_op: str = 'Conv2d', init_rates: Optional[int] = None, prefix: str = '')[source]¶
set model based on config.
- Parameters
model (nn.Module) – pytorch model
config (Dict) – config file
search_op (str) – The module that uses RF search. Defaults to ‘Conv2d’.
init_rates (int, optional) – Set to other initial dilation rates. Defaults to None.
prefix (str) – Prefix for function recursion. Defaults to ‘’.
- step(model: torch.nn.modules.module.Module, work_dir: str)[source]¶
Performs a dilation searching step.
- Parameters
model (nn.Module) – pytorch model
work_dir (str) – Directory to save the searching results.
- wrap_model(model: torch.nn.modules.module.Module, search_op: str = 'Conv2d', prefix: str = '')[source]¶
wrap model to support searchable conv op.
- Parameters
model (nn.Module) – pytorch model
search_op (str) – The module that uses RF search. Defaults to ‘Conv2d’.
init_rates (int, optional) – Set to other initial dilation rates. Defaults to None.
prefix (str) – Prefix for function recursion. Defaults to ‘’.
- class mmcv.cnn.ResNet(depth: int, num_stages: int = 4, strides: Sequence[int] = (1, 2, 2, 2), dilations: Sequence[int] = (1, 1, 1, 1), out_indices: Sequence[int] = (0, 1, 2, 3), style: str = 'pytorch', frozen_stages: int = - 1, bn_eval: bool = True, bn_frozen: bool = False, with_cp: bool = False)[source]¶
ResNet backbone.
- Parameters
depth (int) – Depth of resnet, from {18, 34, 50, 101, 152}.
num_stages (int) – Resnet stages, normally 4.
strides (Sequence[int]) – Strides of the first block of each stage.
dilations (Sequence[int]) – Dilation of each stage.
out_indices (Sequence[int]) – Output from which stages.
style (str) – pytorch or caffe. If set to “pytorch”, the stride-two layer is the 3x3 conv layer, otherwise the stride-two layer is the first 1x1 conv layer.
frozen_stages (int) – Stages to be frozen (all param fixed). -1 means not freezing any parameters.
bn_eval (bool) – Whether to set BN layers as eval mode, namely, freeze running stats (mean and var).
bn_frozen (bool) – Whether to freeze weight and bias of BN layers.
with_cp (bool) – Use checkpoint or not. Using checkpoint will save some memory while slowing down the training speed.
- forward(x: torch.Tensor) → Union[torch.Tensor, Tuple[torch.Tensor]][source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- train(mode: bool = True) → None[source]¶
Sets the module in training mode.
This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g.
Dropout
,BatchNorm
, etc.- Parameters
mode (bool) – whether to set training mode (
True
) or evaluation mode (False
). Default:True
.- Returns
self
- Return type
Module
- class mmcv.cnn.Scale(scale: float = 1.0)[source]¶
A learnable scale parameter.
This layer scales the input by a learnable factor. It multiplies a learnable scale parameter of shape (1,) with input of any shape.
- Parameters
scale (float) – Initial value of scale factor. Default: 1.0
- forward(x: torch.Tensor) → torch.Tensor[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class mmcv.cnn.Swish[source]¶
Swish Module.
This module applies the swish function:
\[Swish(x) = x * Sigmoid(x)\]- Returns
The output tensor.
- Return type
Tensor
- forward(x: torch.Tensor) → torch.Tensor[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class mmcv.cnn.TruncNormalInit(mean: float = 0, std: float = 1, a: float = - 2, b: float = 2, **kwargs)[source]¶
Initialize module parameters with the values drawn from the normal distribution \(\mathcal{N}(\text{mean}, \text{std}^2)\) with values outside \([a, b]\).
- Parameters
mean (float) – the mean of the normal distribution. Defaults to 0.
std (float) – the standard deviation of the normal distribution. Defaults to 1.
a (float) – The minimum cutoff value.
b (float) – The maximum cutoff value.
bias (float) – the value to fill the bias. Defaults to 0.
bias_prob (float, optional) – the probability for bias initialization. Defaults to None.
layer (str | list[str], optional) – the layer will be initialized. Defaults to None.
- class mmcv.cnn.UniformInit(a: float = 0.0, b: float = 1.0, **kwargs)[source]¶
Initialize module parameters with values drawn from the uniform distribution \(\mathcal{U}(a, b)\).
- Parameters
a (int | float) – the lower bound of the uniform distribution. Defaults to 0.
b (int | float) – the upper bound of the uniform distribution. Defaults to 1.
bias (int | float) – the value to fill the bias. Defaults to 0.
bias_prob (float, optional) – the probability for bias initialization. Defaults to None.
layer (str | list[str], optional) – the layer will be initialized. Defaults to None.
- class mmcv.cnn.VGG(depth: int, with_bn: bool = False, num_classes: int = - 1, num_stages: int = 5, dilations: Sequence[int] = (1, 1, 1, 1, 1), out_indices: Sequence[int] = (0, 1, 2, 3, 4), frozen_stages: int = - 1, bn_eval: bool = True, bn_frozen: bool = False, ceil_mode: bool = False, with_last_pool: bool = True)[source]¶
VGG backbone.
- Parameters
depth (int) – Depth of vgg, from {11, 13, 16, 19}.
with_bn (bool) – Use BatchNorm or not.
num_classes (int) – number of classes for classification.
num_stages (int) – VGG stages, normally 5.
dilations (Sequence[int]) – Dilation of each stage.
out_indices (Sequence[int]) – Output from which stages.
frozen_stages (int) – Stages to be frozen (all param fixed). -1 means not freezing any parameters.
bn_eval (bool) – Whether to set BN layers as eval mode, namely, freeze running stats (mean and var).
bn_frozen (bool) – Whether to freeze weight and bias of BN layers.
- forward(x: torch.Tensor) → Union[torch.Tensor, Tuple[torch.Tensor, ...]][source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- train(mode: bool = True) → None[source]¶
Sets the module in training mode.
This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g.
Dropout
,BatchNorm
, etc.- Parameters
mode (bool) – whether to set training mode (
True
) or evaluation mode (False
). Default:True
.- Returns
self
- Return type
Module
- class mmcv.cnn.XavierInit(gain: float = 1, distribution: str = 'normal', **kwargs)[source]¶
Initialize module parameters with values according to the method described in `Understanding the difficulty of training deep feedforward.
neural networks - Glorot, X. & Bengio, Y. (2010). <http://proceedings.mlr.press/v9/glorot10a/glorot10a.pdf>`_
- Parameters
gain (int | float) – an optional scaling factor. Defaults to 1.
bias (int | float) – the value to fill the bias. Defaults to 0.
bias_prob (float, optional) – the probability for bias initialization. Defaults to None.
distribution (str) – distribution either be
'normal'
or'uniform'
. Defaults to'normal'
.layer (str | list[str], optional) – the layer will be initialized. Defaults to None.
- mmcv.cnn.bias_init_with_prob(prior_prob: float) → float[source]¶
initialize conv/fc bias value according to a given probability value.
- mmcv.cnn.build_activation_layer(cfg: Dict) → torch.nn.modules.module.Module[source]¶
Build activation layer.
- Parameters
cfg (dict) –
The activation layer config, which should contain:
type (str): Layer type.
layer args: Args needed to instantiate an activation layer.
- Returns
Created activation layer.
- Return type
nn.Module
- mmcv.cnn.build_conv_layer(cfg: Optional[Dict], *args, **kwargs) → torch.nn.modules.module.Module[source]¶
Build convolution layer.
- Parameters
cfg (None or dict) – The conv layer config, which should contain: - type (str): Layer type. - layer args: Args needed to instantiate an conv layer.
args (argument list) – Arguments passed to the __init__ method of the corresponding conv layer.
kwargs (keyword arguments) – Keyword arguments passed to the __init__ method of the corresponding conv layer.
- Returns
Created conv layer.
- Return type
nn.Module
- mmcv.cnn.build_model_from_cfg(cfg, registry, default_args=None)[source]¶
Build a PyTorch model from config dict(s). Different from
build_from_cfg
, if cfg is a list, ann.Sequential
will be built.- Parameters
cfg (dict, list[dict]) – The config of modules, is is either a config dict or a list of config dicts. If cfg is a list, a the built modules will be wrapped with
nn.Sequential
.registry (
Registry
) – A registry the module belongs to.default_args (dict, optional) – Default arguments to build the module. Defaults to None.
- Returns
A built nn module.
- Return type
nn.Module
- mmcv.cnn.build_norm_layer(cfg: Dict, num_features: int, postfix: Union[int, str] = '') → Tuple[str, torch.nn.modules.module.Module][source]¶
Build normalization layer.
- Parameters
cfg (dict) –
The norm layer config, which should contain:
type (str): Layer type.
layer args: Args needed to instantiate a norm layer.
requires_grad (bool, optional): Whether stop gradient updates.
num_features (int) – Number of input channels.
postfix (int | str) – The postfix to be appended into norm abbreviation to create named layer.
- Returns
The first element is the layer name consisting of abbreviation and postfix, e.g., bn1, gn. The second element is the created norm layer.
- Return type
tuple[str, nn.Module]
- mmcv.cnn.build_padding_layer(cfg: Dict, *args, **kwargs) → torch.nn.modules.module.Module[source]¶
Build padding layer.
- Parameters
cfg (dict) – The padding layer config, which should contain: - type (str): Layer type. - layer args: Args needed to instantiate a padding layer.
- Returns
Created padding layer.
- Return type
nn.Module
- mmcv.cnn.build_plugin_layer(cfg: Dict, postfix: Union[int, str] = '', **kwargs) → Tuple[str, torch.nn.modules.module.Module][source]¶
Build plugin layer.
- Parameters
cfg (dict) –
cfg should contain:
type (str): identify plugin layer type.
layer args: args needed to instantiate a plugin layer.
postfix (int, str) – appended into norm abbreviation to create named layer. Default: ‘’.
- Returns
The first one is the concatenation of abbreviation and postfix. The second is the created plugin layer.
- Return type
tuple[str, nn.Module]
- mmcv.cnn.build_upsample_layer(cfg: Dict, *args, **kwargs) → torch.nn.modules.module.Module[source]¶
Build upsample layer.
- Parameters
cfg (dict) –
The upsample layer config, which should contain:
type (str): Layer type.
scale_factor (int): Upsample ratio, which is not applicable to deconv.
layer args: Args needed to instantiate a upsample layer.
args (argument list) – Arguments passed to the
__init__
method of the corresponding conv layer.kwargs (keyword arguments) – Keyword arguments passed to the
__init__
method of the corresponding conv layer.
- Returns
Created upsample layer.
- Return type
nn.Module
- mmcv.cnn.fuse_conv_bn(module: torch.nn.modules.module.Module) → torch.nn.modules.module.Module[source]¶
Recursively fuse conv and bn in a module.
During inference, the functionary of batch norm layers is turned off but only the mean and var alone channels are used, which exposes the chance to fuse it with the preceding conv layers to save computations and simplify network structures.
- Parameters
module (nn.Module) – Module to be fused.
- Returns
Fused module.
- Return type
nn.Module
- mmcv.cnn.get_model_complexity_info(model: torch.nn.modules.module.Module, input_shape: tuple, print_per_layer_stat: bool = True, as_strings: bool = True, input_constructor: Optional[Callable] = None, flush: bool = False, ost: TextIO = <_io.TextIOWrapper name='<stdout>' mode='w' encoding='UTF-8'>) → tuple[source]¶
Get complexity information of a model.
This method can calculate FLOPs and parameter counts of a model with corresponding input shape. It can also print complexity information for each layer in a model.
- Supported layers are listed as below:
Convolutions:
nn.Conv1d
,nn.Conv2d
,nn.Conv3d
.Activations:
nn.ReLU
,nn.PReLU
,nn.ELU
,nn.LeakyReLU
,nn.ReLU6
.Poolings:
nn.MaxPool1d
,nn.MaxPool2d
,nn.MaxPool3d
,nn.AvgPool1d
,nn.AvgPool2d
,nn.AvgPool3d
,nn.AdaptiveMaxPool1d
,nn.AdaptiveMaxPool2d
,nn.AdaptiveMaxPool3d
,nn.AdaptiveAvgPool1d
,nn.AdaptiveAvgPool2d
,nn.AdaptiveAvgPool3d
.BatchNorms:
nn.BatchNorm1d
,nn.BatchNorm2d
,nn.BatchNorm3d
,nn.GroupNorm
,nn.InstanceNorm1d
,InstanceNorm2d
,InstanceNorm3d
,nn.LayerNorm
.Linear:
nn.Linear
.Deconvolution:
nn.ConvTranspose2d
.Upsample:
nn.Upsample
.
- Parameters
model (nn.Module) – The model for complexity calculation.
input_shape (tuple) – Input shape used for calculation.
print_per_layer_stat (bool) – Whether to print complexity information for each layer in a model. Default: True.
as_strings (bool) – Output FLOPs and params counts in a string form. Default: True.
input_constructor (None | callable) – If specified, it takes a callable method that generates input. otherwise, it will generate a random tensor with input shape to calculate FLOPs. Default: None.
flush (bool) – same as that in
print()
. Default: False.ost (stream) – same as
file
param inprint()
. Default: sys.stdout.
- Returns
If
as_strings
is set to True, it will return FLOPs and parameter counts in a string format. otherwise, it will return those in a float number format.- Return type
tuple[float | str]
- mmcv.cnn.initialize(module: torch.nn.modules.module.Module, init_cfg: Union[Dict, List[dict]]) → None[source]¶
Initialize a module.
- Parameters
module (
torch.nn.Module
) – the module will be initialized.init_cfg (dict | list[dict]) – initialization configuration dict to define initializer. OpenMMLab has implemented 6 initializers including
Constant
,Xavier
,Normal
,Uniform
,Kaiming
, andPretrained
.
Example
>>> module = nn.Linear(2, 3, bias=True) >>> init_cfg = dict(type='Constant', layer='Linear', val =1 , bias =2) >>> initialize(module, init_cfg)
>>> module = nn.Sequential(nn.Conv1d(3, 1, 3), nn.Linear(1,2)) >>> # define key ``'layer'`` for initializing layer with different >>> # configuration >>> init_cfg = [dict(type='Constant', layer='Conv1d', val=1), dict(type='Constant', layer='Linear', val=2)] >>> initialize(module, init_cfg)
>>> # define key``'override'`` to initialize some specific part in >>> # module >>> class FooNet(nn.Module): >>> def __init__(self): >>> super().__init__() >>> self.feat = nn.Conv2d(3, 16, 3) >>> self.reg = nn.Conv2d(16, 10, 3) >>> self.cls = nn.Conv2d(16, 5, 3) >>> model = FooNet() >>> init_cfg = dict(type='Constant', val=1, bias=2, layer='Conv2d', >>> override=dict(type='Constant', name='reg', val=3, bias=4)) >>> initialize(model, init_cfg)
>>> model = ResNet(depth=50) >>> # Initialize weights with the pretrained model. >>> init_cfg = dict(type='Pretrained', checkpoint='torchvision://resnet50') >>> initialize(model, init_cfg)
>>> # Initialize weights of a sub-module with the specific part of >>> # a pretrained model by using "prefix". >>> url = 'http://download.openmmlab.com/mmdetection/v2.0/retinanet/'\ >>> 'retinanet_r50_fpn_1x_coco/'\ >>> 'retinanet_r50_fpn_1x_coco_20200130-c2398f9e.pth' >>> init_cfg = dict(type='Pretrained', checkpoint=url, prefix='backbone.')
- mmcv.cnn.is_norm(layer: torch.nn.modules.module.Module, exclude: Optional[Union[type, tuple]] = None) → bool[source]¶
Check if a layer is a normalization layer.
- Parameters
layer (nn.Module) – The layer to be checked.
exclude (type | tuple[type]) – Types to be excluded.
- Returns
Whether the layer is a norm layer.
- Return type
bool
runner¶
- class mmcv.runner.BaseModule(init_cfg: Optional[dict] = None)[source]¶
Base module for all modules in openmmlab.
BaseModule
is a wrapper oftorch.nn.Module
with additional functionality of parameter initialization. Compared withtorch.nn.Module
,BaseModule
mainly adds three attributes.init_cfg
: the config to control the initialization.init_weights
: The function of parameter initialization and recording initialization information._params_init_info
: Used to track the parameter initialization information. This attribute only exists during executing theinit_weights
.
- Parameters
init_cfg (dict, optional) – Initialization config dict.
- class mmcv.runner.BaseRunner(model: torch.nn.modules.module.Module, batch_processor: Optional[Callable] = None, optimizer: Optional[Union[Dict, torch.optim.optimizer.Optimizer]] = None, work_dir: Optional[str] = None, logger: Optional[logging.Logger] = None, meta: Optional[Dict] = None, max_iters: Optional[int] = None, max_epochs: Optional[int] = None)[source]¶
The base class of Runner, a training helper for PyTorch.
All subclasses should implement the following APIs:
run()
train()
val()
save_checkpoint()
- Parameters
model (
torch.nn.Module
) – The model to be run.batch_processor (callable) – A callable method that process a data batch. The interface of this method should be batch_processor(model, data, train_mode) -> dict
optimizer (dict or
torch.optim.Optimizer
) – It can be either an optimizer (in most cases) or a dict of optimizers (in models that requires more than one optimizer, e.g., GAN).work_dir (str, optional) – The working directory to save checkpoints and logs. Defaults to None.
logger (
logging.Logger
) – Logger used during training. Defaults to None. (The default value is just for backward compatibility)meta (dict | None) – A dict records some import information such as environment info and seed, which will be logged in logger hook. Defaults to None.
max_epochs (int, optional) – Total training epochs.
max_iters (int, optional) – Total training iterations.
- call_hook(fn_name: str) → None[source]¶
Call all hooks.
- Parameters
fn_name (str) – The function name in each hook to be called, such as “before_train_epoch”.
- current_lr() → Union[List[float], Dict[str, List[float]]][source]¶
Get current learning rates.
- Returns
Current learning rates of all param groups. If the runner has a dict of optimizers, this method will return a dict.
- Return type
list[float] | dict[str, list[float]]
- current_momentum() → Union[List[float], Dict[str, List[float]]][source]¶
Get current momentums.
- Returns
Current momentums of all param groups. If the runner has a dict of optimizers, this method will return a dict.
- Return type
list[float] | dict[str, list[float]]
- property epoch: int¶
Current epoch.
- Type
int
- property hooks: List[mmcv.runner.hooks.hook.Hook]¶
A list of registered hooks.
- Type
list[
Hook
]
- property inner_iter: int¶
Iteration in an epoch.
- Type
int
- property iter: int¶
Current iteration.
- Type
int
- property max_epochs¶
Maximum training epochs.
- Type
int
- property max_iters¶
Maximum training iterations.
- Type
int
- property model_name: str¶
Name of the model, usually the module class name.
- Type
str
- property rank: int¶
Rank of current process. (distributed training)
- Type
int
- register_hook(hook: mmcv.runner.hooks.hook.Hook, priority: Union[int, str, mmcv.runner.priority.Priority] = 'NORMAL') → None[source]¶
Register a hook into the hook list.
The hook will be inserted into a priority queue, with the specified priority (See
Priority
for details of priorities). For hooks with the same priority, they will be triggered in the same order as they are registered.- Parameters
hook (
Hook
) – The hook to be registered.priority (int or str or
Priority
) – Hook priority. Lower value means higher priority.
- register_hook_from_cfg(hook_cfg: Dict) → None[source]¶
Register a hook from its cfg.
- Parameters
hook_cfg (dict) – Hook config. It should have at least keys ‘type’ and ‘priority’ indicating its type and priority.
Note
The specific hook class to register should not use ‘type’ and ‘priority’ arguments during initialization.
- register_training_hooks(lr_config: Optional[Union[Dict, mmcv.runner.hooks.hook.Hook]], optimizer_config: Optional[Union[Dict, mmcv.runner.hooks.hook.Hook]] = None, checkpoint_config: Optional[Union[Dict, mmcv.runner.hooks.hook.Hook]] = None, log_config: Optional[Dict] = None, momentum_config: Optional[Union[Dict, mmcv.runner.hooks.hook.Hook]] = None, timer_config: Union[Dict, mmcv.runner.hooks.hook.Hook] = {'type': 'IterTimerHook'}, custom_hooks_config: Optional[Union[List, Dict, mmcv.runner.hooks.hook.Hook]] = None) → None[source]¶
Register default and custom hooks for training.
Default and custom hooks include:
Hooks
Priority
LrUpdaterHook
VERY_HIGH (10)
MomentumUpdaterHook
HIGH (30)
OptimizerStepperHook
ABOVE_NORMAL (40)
CheckpointSaverHook
NORMAL (50)
IterTimerHook
LOW (70)
LoggerHook(s)
VERY_LOW (90)
CustomHook(s)
defaults to NORMAL (50)
If custom hooks have same priority with default hooks, custom hooks will be triggered after default hooks.
- property world_size: int¶
Number of processes participating in the job. (distributed training)
- Type
int
- class mmcv.runner.CheckpointHook(interval: int = - 1, by_epoch: bool = True, save_optimizer: bool = True, out_dir: Optional[str] = None, max_keep_ckpts: int = - 1, save_last: bool = True, sync_buffer: bool = False, file_client_args: Optional[dict] = None, **kwargs)[source]¶
Save checkpoints periodically.
- Parameters
interval (int) – The saving period. If
by_epoch=True
, interval indicates epochs, otherwise it indicates iterations. Default: -1, which means “never”.by_epoch (bool) – Saving checkpoints by epoch or by iteration. Default: True.
save_optimizer (bool) – Whether to save optimizer state_dict in the checkpoint. It is usually used for resuming experiments. Default: True.
out_dir (str, optional) – The root directory to save checkpoints. If not specified,
runner.work_dir
will be used by default. If specified, theout_dir
will be the concatenation ofout_dir
and the last level directory ofrunner.work_dir
. Changed in version 1.3.16.max_keep_ckpts (int, optional) – The maximum checkpoints to keep. In some cases we want only the latest few checkpoints and would like to delete old ones to save the disk space. Default: -1, which means unlimited.
save_last (bool, optional) – Whether to force the last checkpoint to be saved regardless of interval. Default: True.
sync_buffer (bool, optional) – Whether to synchronize buffers in different gpus. Default: False.
file_client_args (dict, optional) – Arguments to instantiate a FileClient. See
mmcv.fileio.FileClient
for details. Default: None. New in version 1.3.16.
Warning
Before v1.3.16, the
out_dir
argument indicates the path where the checkpoint is stored. However, since v1.3.16,out_dir
indicates the root directory and the final path to save checkpoint is the concatenation ofout_dir
and the last level directory ofrunner.work_dir
. Suppose the value ofout_dir
is “/path/of/A” and the value ofrunner.work_dir
is “/path/of/B”, then the final path will be “/path/of/A/B”.
- class mmcv.runner.CheckpointLoader[source]¶
A general checkpoint loader to manage all schemes.
- classmethod load_checkpoint(filename: str, map_location: Optional[Union[str, Callable]] = None, logger: Optional[logging.Logger] = None) → Union[dict, collections.OrderedDict][source]¶
load checkpoint through URL scheme path.
- Parameters
filename (str) – checkpoint file name with given prefix
map_location (str, optional) – Same as
torch.load()
. Default: Nonelogger (
logging.Logger
, optional) – The logger for message. Default: None
- Returns
The loaded checkpoint.
- Return type
dict or OrderedDict
- classmethod register_scheme(prefixes: Union[str, List[str], Tuple[str, ...]], loader: Optional[Callable] = None, force: bool = False) → Callable[source]¶
Register a loader to CheckpointLoader.
This method can be used as a normal class method or a decorator.
- Parameters
prefixes (str or Sequence[str]) –
prefix of the registered loader. (The) –
loader (function, optional) – The loader function to be registered. When this method is used as a decorator, loader is None. Defaults to None.
force (bool, optional) – Whether to override the loader if the prefix has already been registered. Defaults to False.
- class mmcv.runner.ClearMLLoggerHook(init_kwargs: Optional[Dict] = None, interval: int = 10, ignore_last: bool = True, reset_flag: bool = False, by_epoch: bool = True)[source]¶
Class to log metrics with clearml.
It requires clearml to be installed.
- Parameters
init_kwargs (dict) – A dict contains the clearml.Task.init initialization keys. See taskinit for more details.
interval (int) – Logging interval (every k iterations). Default 10.
ignore_last (bool) – Ignore the log of last iterations in each epoch if less than interval. Default: True.
reset_flag (bool) – Whether to clear the output buffer after logging. Default: False.
by_epoch (bool) – Whether EpochBasedRunner is used. Default: True.
- class mmcv.runner.CosineAnnealingLrUpdaterHook(min_lr: Optional[float] = None, min_lr_ratio: Optional[float] = None, **kwargs)[source]¶
CosineAnnealing LR scheduler.
- Parameters
min_lr (float, optional) – The minimum lr. Default: None.
min_lr_ratio (float, optional) – The ratio of minimum lr to the base lr. Either min_lr or min_lr_ratio should be specified. Default: None.
- class mmcv.runner.CosineAnnealingMomentumUpdaterHook(min_momentum: Optional[float] = None, min_momentum_ratio: Optional[float] = None, **kwargs)[source]¶
Cosine annealing LR Momentum decays the Momentum of each parameter group linearly.
- Parameters
min_momentum (float, optional) – The minimum momentum. Default: None.
min_momentum_ratio (float, optional) – The ratio of minimum momentum to the base momentum. Either min_momentum or min_momentum_ratio should be specified. Default: None.
- class mmcv.runner.CosineRestartLrUpdaterHook(periods: List[int], restart_weights: List[float] = [1], min_lr: Optional[float] = None, min_lr_ratio: Optional[float] = None, **kwargs)[source]¶
Cosine annealing with restarts learning rate scheme.
- Parameters
periods (list[int]) – Periods for each cosine anneling cycle.
restart_weights (list[float]) – Restart weights at each restart iteration. Defaults to [1].
min_lr (float, optional) – The minimum lr. Default: None.
min_lr_ratio (float, optional) – The ratio of minimum lr to the base lr. Either min_lr or min_lr_ratio should be specified. Default: None.
- class mmcv.runner.CyclicLrUpdaterHook(by_epoch: bool = False, target_ratio: Union[float, tuple] = (10, 0.0001), cyclic_times: int = 1, step_ratio_up: float = 0.4, anneal_strategy: str = 'cos', gamma: float = 1, **kwargs)[source]¶
Cyclic LR Scheduler.
Implement the cyclical learning rate policy (CLR) described in https://arxiv.org/pdf/1506.01186.pdf
Different from the original paper, we use cosine annealing rather than triangular policy inside a cycle. This improves the performance in the 3D detection area.
- Parameters
by_epoch (bool, optional) – Whether to update LR by epoch.
target_ratio (tuple[float], optional) – Relative ratio of the highest LR and the lowest LR to the initial LR.
cyclic_times (int, optional) – Number of cycles during training
step_ratio_up (float, optional) – The ratio of the increasing process of LR in the total cycle.
anneal_strategy (str, optional) – {‘cos’, ‘linear’} Specifies the annealing strategy: ‘cos’ for cosine annealing, ‘linear’ for linear annealing. Default: ‘cos’.
gamma (float, optional) – Cycle decay ratio. Default: 1. It takes values in the range (0, 1]. The difference between the maximum learning rate and the minimum learning rate decreases periodically when it is less than 1. New in version 1.4.4.
- class mmcv.runner.CyclicMomentumUpdaterHook(by_epoch: bool = False, target_ratio: Tuple[float, float] = (0.8947368421052632, 1.0), cyclic_times: int = 1, step_ratio_up: float = 0.4, anneal_strategy: str = 'cos', gamma: float = 1.0, **kwargs)[source]¶
Cyclic momentum Scheduler.
Implement the cyclical momentum scheduler policy described in https://arxiv.org/pdf/1708.07120.pdf
This momentum scheduler usually used together with the CyclicLRUpdater to improve the performance in the 3D detection area.
- Parameters
target_ratio (tuple[float]) – Relative ratio of the lowest momentum and the highest momentum to the initial momentum.
cyclic_times (int) – Number of cycles during training
step_ratio_up (float) – The ratio of the increasing process of momentum in the total cycle.
by_epoch (bool) – Whether to update momentum by epoch.
anneal_strategy (str, optional) – {‘cos’, ‘linear’} Specifies the annealing strategy: ‘cos’ for cosine annealing, ‘linear’ for linear annealing. Default: ‘cos’.
gamma (float, optional) – Cycle decay ratio. Default: 1. It takes values in the range (0, 1]. The difference between the maximum learning rate and the minimum learning rate decreases periodically when it is less than 1. New in version 1.4.4.
- class mmcv.runner.DefaultOptimizerConstructor(optimizer_cfg: Dict, paramwise_cfg: Optional[Dict] = None)[source]¶
Default constructor for optimizers.
By default each parameter share the same optimizer settings, and we provide an argument
paramwise_cfg
to specify parameter-wise settings. It is a dict and may contain the following fields:custom_keys
(dict): Specified parameters-wise settings by keys. If one of the keys incustom_keys
is a substring of the name of one parameter, then the setting of the parameter will be specified bycustom_keys[key]
and other setting likebias_lr_mult
etc. will be ignored. It should be noted that the aforementionedkey
is the longest key that is a substring of the name of the parameter. If there are multiple matched keys with the same length, then the key with lower alphabet order will be chosen.custom_keys[key]
should be a dict and may contain fieldslr_mult
anddecay_mult
. See Example 2 below.bias_lr_mult
(float): It will be multiplied to the learning rate for all bias parameters (except for those in normalization layers and offset layers of DCN).bias_decay_mult
(float): It will be multiplied to the weight decay for all bias parameters (except for those in normalization layers, depthwise conv layers, offset layers of DCN).norm_decay_mult
(float): It will be multiplied to the weight decay for all weight and bias parameters of normalization layers.dwconv_decay_mult
(float): It will be multiplied to the weight decay for all weight and bias parameters of depthwise conv layers.dcn_offset_lr_mult
(float): It will be multiplied to the learning rate for parameters of offset layer in the deformable convs of a model.bypass_duplicate
(bool): If true, the duplicate parameters would not be added into optimizer. Default: False.
Note
1. If the option
dcn_offset_lr_mult
is used, the constructor will override the effect ofbias_lr_mult
in the bias of offset layer. So be careful when using bothbias_lr_mult
anddcn_offset_lr_mult
. If you wish to apply both of them to the offset layer in deformable convs, setdcn_offset_lr_mult
to the originaldcn_offset_lr_mult
*bias_lr_mult
.2. If the option
dcn_offset_lr_mult
is used, the constructor will apply it to all the DCN layers in the model. So be careful when the model contains multiple DCN layers in places other than backbone.- Parameters
model (
nn.Module
) – The model with parameters to be optimized.optimizer_cfg (dict) –
The config dict of the optimizer. Positional fields are
type: class name of the optimizer.
Optional fields are
any arguments of the corresponding optimizer type, e.g., lr, weight_decay, momentum, etc.
paramwise_cfg (dict, optional) – Parameter-wise options.
- Example 1:
>>> model = torch.nn.modules.Conv1d(1, 1, 1) >>> optimizer_cfg = dict(type='SGD', lr=0.01, momentum=0.9, >>> weight_decay=0.0001) >>> paramwise_cfg = dict(norm_decay_mult=0.) >>> optim_builder = DefaultOptimizerConstructor( >>> optimizer_cfg, paramwise_cfg) >>> optimizer = optim_builder(model)
- Example 2:
>>> # assume model have attribute model.backbone and model.cls_head >>> optimizer_cfg = dict(type='SGD', lr=0.01, weight_decay=0.95) >>> paramwise_cfg = dict(custom_keys={ 'backbone': dict(lr_mult=0.1, decay_mult=0.9)}) >>> optim_builder = DefaultOptimizerConstructor( >>> optimizer_cfg, paramwise_cfg) >>> optimizer = optim_builder(model) >>> # Then the `lr` and `weight_decay` for model.backbone is >>> # (0.01 * 0.1, 0.95 * 0.9). `lr` and `weight_decay` for >>> # model.cls_head is (0.01, 0.95).
- add_params(params: List[Dict], module: torch.nn.modules.module.Module, prefix: str = '', is_dcn_module: Optional[Union[int, float]] = None) → None[source]¶
Add all parameters of module to the params list.
The parameters of the given module will be added to the list of param groups, with specific rules defined by paramwise_cfg.
- Parameters
params (list[dict]) – A list of param groups, it will be modified in place.
module (nn.Module) – The module to be added.
prefix (str) – The prefix of the module
is_dcn_module (int|float|None) – If the current module is a submodule of DCN, is_dcn_module will be passed to control conv_offset layer’s learning rate. Defaults to None.
- class mmcv.runner.DefaultRunnerConstructor(runner_cfg: dict, default_args: Optional[dict] = None)[source]¶
Default constructor for runners.
Custom existing Runner like EpocBasedRunner though RunnerConstructor. For example, We can inject some new properties and functions for Runner.
Example
>>> from mmcv.runner import RUNNER_BUILDERS, build_runner >>> # Define a new RunnerReconstructor >>> @RUNNER_BUILDERS.register_module() >>> class MyRunnerConstructor: ... def __init__(self, runner_cfg, default_args=None): ... if not isinstance(runner_cfg, dict): ... raise TypeError('runner_cfg should be a dict', ... f'but got {type(runner_cfg)}') ... self.runner_cfg = runner_cfg ... self.default_args = default_args ... ... def __call__(self): ... runner = RUNNERS.build(self.runner_cfg, ... default_args=self.default_args) ... # Add new properties for existing runner ... runner.my_name = 'my_runner' ... runner.my_function = lambda self: print(self.my_name) ... ... >>> # build your runner >>> runner_cfg = dict(type='EpochBasedRunner', max_epochs=40, ... constructor='MyRunnerConstructor') >>> runner = build_runner(runner_cfg)
- class mmcv.runner.DistEvalHook(dataloader: torch.utils.data.dataloader.DataLoader, start: Optional[int] = None, interval: int = 1, by_epoch: bool = True, save_best: Optional[str] = None, rule: Optional[str] = None, test_fn: Optional[Callable] = None, greater_keys: Optional[List[str]] = None, less_keys: Optional[List[str]] = None, broadcast_bn_buffer: bool = True, tmpdir: Optional[str] = None, gpu_collect: bool = False, out_dir: Optional[str] = None, file_client_args: Optional[dict] = None, **eval_kwargs)[source]¶
Distributed evaluation hook.
This hook will regularly perform evaluation in a given interval when performing in distributed environment.
- Parameters
dataloader (DataLoader) – A PyTorch dataloader, whose dataset has implemented
evaluate
function.start (int | None, optional) – Evaluation starting epoch. It enables evaluation before the training starts if
start
<= the resuming epoch. If None, whether to evaluate is merely decided byinterval
. Default: None.interval (int) – Evaluation interval. Default: 1.
by_epoch (bool) – Determine perform evaluation by epoch or by iteration. If set to True, it will perform by epoch. Otherwise, by iteration. default: True.
save_best (str, optional) – If a metric is specified, it would measure the best checkpoint during evaluation. The information about best checkpoint would be saved in
runner.meta['hook_msgs']
to keep best score value and best checkpoint path, which will be also loaded when resume checkpoint. Options are the evaluation metrics on the test dataset. e.g.,bbox_mAP
,segm_mAP
for bbox detection and instance segmentation.AR@100
for proposal recall. Ifsave_best
isauto
, the first key of the returnedOrderedDict
result will be used. Default: None.rule (str | None, optional) – Comparison rule for best score. If set to None, it will infer a reasonable rule. Keys such as ‘acc’, ‘top’ .etc will be inferred by ‘greater’ rule. Keys contain ‘loss’ will be inferred by ‘less’ rule. Options are ‘greater’, ‘less’, None. Default: None.
test_fn (callable, optional) – test a model with samples from a dataloader in a multi-gpu manner, and return the test results. If
None
, the default test functionmmcv.engine.multi_gpu_test
will be used. (default:None
)tmpdir (str | None) – Temporary directory to save the results of all processes. Default: None.
gpu_collect (bool) – Whether to use gpu or cpu to collect results. Default: False.
broadcast_bn_buffer (bool) – Whether to broadcast the buffer(running_mean and running_var) of rank 0 to other rank before evaluation. Default: True.
out_dir (str, optional) – The root directory to save checkpoints. If not specified, runner.work_dir will be used by default. If specified, the out_dir will be the concatenation of out_dir and the last level directory of runner.work_dir.
file_client_args (dict) – Arguments to instantiate a FileClient. See
mmcv.fileio.FileClient
for details. Default: None.**eval_kwargs – Evaluation arguments fed into the evaluate function of the dataset.
- class mmcv.runner.DistSamplerSeedHook[source]¶
Data-loading sampler for distributed training.
When distributed training, it is only useful in conjunction with
EpochBasedRunner
, whileIterBasedRunner
achieves the same purpose withIterLoader
.
- class mmcv.runner.DvcliveLoggerHook(model_file: Optional[str] = None, interval: int = 10, ignore_last: bool = True, reset_flag: bool = False, by_epoch: bool = True, dvclive=None, **kwargs)[source]¶
Class to log metrics with dvclive.
It requires dvclive to be installed.
- Parameters
model_file (str) – Default None. If not None, after each epoch the model will be saved to {model_file}.
interval (int) – Logging interval (every k iterations). Default 10.
ignore_last (bool) – Ignore the log of last iterations in each epoch if less than interval. Default: True.
reset_flag (bool) – Whether to clear the output buffer after logging. Default: False.
by_epoch (bool) – Whether EpochBasedRunner is used. Determines whether log is called after_train_iter or after_train_epoch. Default: True.
dvclive (Live, optional) – An instance of the Live logger to use instead of initializing a new one internally. Defaults to None.
kwargs – Arguments for instantiating Live (ignored if dvclive is provided).
- class mmcv.runner.EMAHook(momentum: float = 0.0002, interval: int = 1, warm_up: int = 100, resume_from: Optional[str] = None)[source]¶
Exponential Moving Average Hook.
Use Exponential Moving Average on all parameters of model in training process. All parameters have a ema backup, which update by the formula as below. EMAHook takes priority over EvalHook and CheckpointSaverHook.
\[Xema\_{t+1} = (1 - \text{momentum}) \times Xema\_{t} + \text{momentum} \times X_t\]- Parameters
momentum (float) – The momentum used for updating ema parameter. Defaults to 0.0002.
interval (int) – Update ema parameter every interval iteration. Defaults to 1.
warm_up (int) – During first warm_up steps, we may use smaller momentum to update ema parameters more slowly. Defaults to 100.
resume_from (str, optional) – The checkpoint path. Defaults to None.
- after_train_epoch(runner)[source]¶
We load parameter values from ema backup to model before the EvalHook.
- class mmcv.runner.EpochBasedRunner(model: torch.nn.modules.module.Module, batch_processor: Optional[Callable] = None, optimizer: Optional[Union[Dict, torch.optim.optimizer.Optimizer]] = None, work_dir: Optional[str] = None, logger: Optional[logging.Logger] = None, meta: Optional[Dict] = None, max_iters: Optional[int] = None, max_epochs: Optional[int] = None)[source]¶
Epoch-based Runner.
This runner train models epoch by epoch.
- run(data_loaders: List[torch.utils.data.dataloader.DataLoader], workflow: List[Tuple[str, int]], max_epochs: Optional[int] = None, **kwargs) → None[source]¶
Start running.
- Parameters
data_loaders (list[
DataLoader
]) – Dataloaders for training and validation.workflow (list[tuple]) – A list of (phase, epochs) to specify the running order and epochs. E.g, [(‘train’, 2), (‘val’, 1)] means running 2 epochs for training and 1 epoch for validation, iteratively.
- save_checkpoint(out_dir: str, filename_tmpl: str = 'epoch_{}.pth', save_optimizer: bool = True, meta: Optional[Dict] = None, create_symlink: bool = True) → None[source]¶
Save the checkpoint.
- Parameters
out_dir (str) – The directory that checkpoints are saved.
filename_tmpl (str, optional) – The checkpoint filename template, which contains a placeholder for the epoch number. Defaults to ‘epoch_{}.pth’.
save_optimizer (bool, optional) – Whether to save the optimizer to the checkpoint. Defaults to True.
meta (dict, optional) – The meta information to be saved in the checkpoint. Defaults to None.
create_symlink (bool, optional) – Whether to create a symlink “latest.pth” to point to the latest checkpoint. Defaults to True.
- class mmcv.runner.EvalHook(dataloader: torch.utils.data.dataloader.DataLoader, start: Optional[int] = None, interval: int = 1, by_epoch: bool = True, save_best: Optional[str] = None, rule: Optional[str] = None, test_fn: Optional[Callable] = None, greater_keys: Optional[List[str]] = None, less_keys: Optional[List[str]] = None, out_dir: Optional[str] = None, file_client_args: Optional[dict] = None, **eval_kwargs)[source]¶
Non-Distributed evaluation hook.
This hook will regularly perform evaluation in a given interval when performing in non-distributed environment.
- Parameters
dataloader (DataLoader) – A PyTorch dataloader, whose dataset has implemented
evaluate
function.start (int | None, optional) – Evaluation starting epoch or iteration. It enables evaluation before the training starts if
start
<= the resuming epoch or iteration. If None, whether to evaluate is merely decided byinterval
. Default: None.interval (int) – Evaluation interval. Default: 1.
by_epoch (bool) – Determine perform evaluation by epoch or by iteration. If set to True, it will perform by epoch. Otherwise, by iteration. Default: True.
save_best (str, optional) – If a metric is specified, it would measure the best checkpoint during evaluation. The information about best checkpoint would be saved in
runner.meta['hook_msgs']
to keep best score value and best checkpoint path, which will be also loaded when resume checkpoint. Options are the evaluation metrics on the test dataset. e.g.,bbox_mAP
,segm_mAP
for bbox detection and instance segmentation.AR@100
for proposal recall. Ifsave_best
isauto
, the first key of the returnedOrderedDict
result will be used. Default: None.rule (str | None, optional) – Comparison rule for best score. If set to None, it will infer a reasonable rule. Keys such as ‘acc’, ‘top’ .etc will be inferred by ‘greater’ rule. Keys contain ‘loss’ will be inferred by ‘less’ rule. Options are ‘greater’, ‘less’, None. Default: None.
test_fn (callable, optional) – test a model with samples from a dataloader, and return the test results. If
None
, the default test functionmmcv.engine.single_gpu_test
will be used. (default:None
)greater_keys (List[str] | None, optional) – Metric keys that will be inferred by ‘greater’ comparison rule. If
None
, _default_greater_keys will be used. (default:None
)less_keys (List[str] | None, optional) – Metric keys that will be inferred by ‘less’ comparison rule. If
None
, _default_less_keys will be used. (default:None
)out_dir (str, optional) – The root directory to save checkpoints. If not specified, runner.work_dir will be used by default. If specified, the out_dir will be the concatenation of out_dir and the last level directory of runner.work_dir. New in version 1.3.16.
file_client_args (dict) – Arguments to instantiate a FileClient. See
mmcv.fileio.FileClient
for details. Default: None. New in version 1.3.16.**eval_kwargs – Evaluation arguments fed into the evaluate function of the dataset.
Note
If new arguments are added for EvalHook, tools/test.py, tools/eval_metric.py may be affected.
- class mmcv.runner.FlatCosineAnnealingLrUpdaterHook(start_percent: float = 0.75, min_lr: Optional[float] = None, min_lr_ratio: Optional[float] = None, **kwargs)[source]¶
Flat + Cosine lr schedule.
Modified from https://github.com/fastai/fastai/blob/master/fastai/callback/schedule.py#L128 # noqa: E501
- Parameters
start_percent (float) – When to start annealing the learning rate after the percentage of the total training steps. The value should be in range [0, 1). Default: 0.75
min_lr (float, optional) – The minimum lr. Default: None.
min_lr_ratio (float, optional) – The ratio of minimum lr to the base lr. Either min_lr or min_lr_ratio should be specified. Default: None.
- class mmcv.runner.Fp16OptimizerHook(grad_clip: Optional[dict] = None, coalesce: bool = True, bucket_size_mb: int = - 1, loss_scale: Union[float, str, dict] = 512.0, distributed: bool = True)[source]¶
FP16 optimizer hook (using PyTorch’s implementation).
If you are using PyTorch >= 1.6, torch.cuda.amp is used as the backend, to take care of the optimization procedure.
- Parameters
loss_scale (float | str | dict) – Scale factor configuration. If loss_scale is a float, static loss scaling will be used with the specified scale. If loss_scale is a string, it must be ‘dynamic’, then dynamic loss scaling will be used. It can also be a dict containing arguments of GradScalar. Defaults to 512. For Pytorch >= 1.6, mmcv uses official implementation of GradScaler. If you use a dict version of loss_scale to create GradScaler, please refer to: https://pytorch.org/docs/stable/amp.html#torch.cuda.amp.GradScaler for the parameters.
Examples
>>> loss_scale = dict( ... init_scale=65536.0, ... growth_factor=2.0, ... backoff_factor=0.5, ... growth_interval=2000 ... ) >>> optimizer_hook = Fp16OptimizerHook(loss_scale=loss_scale)
- after_train_iter(runner) → None[source]¶
Backward optimization steps for Mixed Precision Training. For dynamic loss scaling, please refer to https://pytorch.org/docs/stable/amp.html#torch.cuda.amp.GradScaler.
Scale the loss by a scale factor.
Backward the loss to obtain the gradients.
Unscale the optimizer’s gradient tensors.
Call optimizer.step() and update scale factor.
Save loss_scaler state_dict for resume purpose.
- class mmcv.runner.GradientCumulativeFp16OptimizerHook(*args, **kwargs)[source]¶
Fp16 optimizer Hook (using PyTorch’s implementation) implements multi-iters gradient cumulating.
If you are using PyTorch >= 1.6, torch.cuda.amp is used as the backend, to take care of the optimization procedure.
- after_train_iter(runner) → None[source]¶
Backward optimization steps for Mixed Precision Training. For dynamic loss scaling, please refer to https://pytorch.org/docs/stable/amp.html#torch.cuda.amp.GradScaler.
Scale the loss by a scale factor.
Backward the loss to obtain the gradients.
Unscale the optimizer’s gradient tensors.
Call optimizer.step() and update scale factor.
Save loss_scaler state_dict for resume purpose.
- class mmcv.runner.GradientCumulativeOptimizerHook(cumulative_iters: int = 1, **kwargs)[source]¶
Optimizer Hook implements multi-iters gradient cumulating.
- Parameters
cumulative_iters (int, optional) – Num of gradient cumulative iters. The optimizer will step every cumulative_iters iters. Defaults to 1.
Examples
>>> # Use cumulative_iters to simulate a large batch size >>> # It is helpful when the hardware cannot handle a large batch size. >>> loader = DataLoader(data, batch_size=64) >>> optim_hook = GradientCumulativeOptimizerHook(cumulative_iters=4) >>> # almost equals to >>> loader = DataLoader(data, batch_size=256) >>> optim_hook = OptimizerHook()
- class mmcv.runner.IterBasedRunner(model: torch.nn.modules.module.Module, batch_processor: Optional[Callable] = None, optimizer: Optional[Union[Dict, torch.optim.optimizer.Optimizer]] = None, work_dir: Optional[str] = None, logger: Optional[logging.Logger] = None, meta: Optional[Dict] = None, max_iters: Optional[int] = None, max_epochs: Optional[int] = None)[source]¶
Iteration-based Runner.
This runner train models iteration by iteration.
- register_training_hooks(lr_config, optimizer_config=None, checkpoint_config=None, log_config=None, momentum_config=None, custom_hooks_config=None)[source]¶
Register default hooks for iter-based training.
Checkpoint hook, optimizer stepper hook and logger hooks will be set to by_epoch=False by default.
Default hooks include:
Hooks
Priority
LrUpdaterHook
VERY_HIGH (10)
MomentumUpdaterHook
HIGH (30)
OptimizerStepperHook
ABOVE_NORMAL (40)
CheckpointSaverHook
NORMAL (50)
IterTimerHook
LOW (70)
LoggerHook(s)
VERY_LOW (90)
CustomHook(s)
defaults to NORMAL (50)
If custom hooks have same priority with default hooks, custom hooks will be triggered after default hooks.
- resume(checkpoint: str, resume_optimizer: bool = True, map_location: Union[str, Callable] = 'default') → None[source]¶
Resume model from checkpoint.
- Parameters
checkpoint (str) – Checkpoint to resume from.
resume_optimizer (bool, optional) – Whether resume the optimizer(s) if the checkpoint file includes optimizer(s). Default to True.
map_location (str, optional) – Same as
torch.load()
. Default to ‘default’.
- run(data_loaders: List[torch.utils.data.dataloader.DataLoader], workflow: List[Tuple[str, int]], max_iters: Optional[int] = None, **kwargs) → None[source]¶
Start running.
- Parameters
data_loaders (list[
DataLoader
]) – Dataloaders for training and validation.workflow (list[tuple]) – A list of (phase, iters) to specify the running order and iterations. E.g, [(‘train’, 10000), (‘val’, 1000)] means running 10000 iterations for training and 1000 iterations for validation, iteratively.
- save_checkpoint(out_dir: str, filename_tmpl: str = 'iter_{}.pth', meta: Optional[Dict] = None, save_optimizer: bool = True, create_symlink: bool = True) → None[source]¶
Save checkpoint to file.
- Parameters
out_dir (str) – Directory to save checkpoint files.
filename_tmpl (str, optional) – Checkpoint file template. Defaults to ‘iter_{}.pth’.
meta (dict, optional) – Metadata to be saved in checkpoint. Defaults to None.
save_optimizer (bool, optional) – Whether save optimizer. Defaults to True.
create_symlink (bool, optional) – Whether create symlink to the latest checkpoint file. Defaults to True.
- class mmcv.runner.LinearAnnealingLrUpdaterHook(min_lr: Optional[float] = None, min_lr_ratio: Optional[float] = None, **kwargs)[source]¶
Linear annealing LR Scheduler decays the learning rate of each parameter group linearly.
- Parameters
min_lr (float, optional) – The minimum lr. Default: None.
min_lr_ratio (float, optional) – The ratio of minimum lr to the base lr. Either min_lr or min_lr_ratio should be specified. Default: None.
- class mmcv.runner.LinearAnnealingMomentumUpdaterHook(min_momentum: Optional[float] = None, min_momentum_ratio: Optional[float] = None, **kwargs)[source]¶
Linear annealing LR Momentum decays the Momentum of each parameter group linearly.
- Parameters
min_momentum (float, optional) – The minimum momentum. Default: None.
min_momentum_ratio (float, optional) – The ratio of minimum momentum to the base momentum. Either min_momentum or min_momentum_ratio should be specified. Default: None.
- class mmcv.runner.LoggerHook(interval: int = 10, ignore_last: bool = True, reset_flag: bool = False, by_epoch: bool = True)[source]¶
Base class for logger hooks.
- Parameters
interval (int) – Logging interval (every k iterations). Default 10.
ignore_last (bool) – Ignore the log of last iterations in each epoch if less than interval. Default True.
reset_flag (bool) – Whether to clear the output buffer after logging. Default False.
by_epoch (bool) – Whether EpochBasedRunner is used. Default True.
- static is_scalar(val, include_np: bool = True, include_torch: bool = True) → bool[source]¶
Tell the input variable is a scalar or not.
- Parameters
val – Input variable.
include_np (bool) – Whether include 0-d np.ndarray as a scalar.
include_torch (bool) – Whether include 0-d torch.Tensor as a scalar.
- Returns
True or False.
- Return type
bool
- class mmcv.runner.LossScaler(init_scale: float = 4294967296, mode: str = 'dynamic', scale_factor: float = 2.0, scale_window: int = 1000)[source]¶
Class that manages loss scaling in mixed precision training which supports both dynamic or static mode.
The implementation refers to https://github.com/NVIDIA/apex/blob/master/apex/fp16_utils/loss_scaler.py. Indirectly, by supplying
mode='dynamic'
for dynamic loss scaling. It’s important to understand howLossScaler
operates. Loss scaling is designed to combat the problem of underflowing gradients encountered at long times when training fp16 networks. Dynamic loss scaling begins by attempting a very high loss scale. Ironically, this may result in OVERflowing gradients. If overflowing gradients are encountered,FP16_Optimizer
then skips the update step for this particular iteration/minibatch, andLossScaler
adjusts the loss scale to a lower value. If a certain number of iterations occur without overflowing gradients detected,:class:LossScaler increases the loss scale once more. In this wayLossScaler
attempts to “ride the edge” of always using the highest loss scale possible without incurring overflow.- Parameters
init_scale (float) – Initial loss scale value, default: 2**32.
scale_factor (float) – Factor used when adjusting the loss scale. Default: 2.
mode (str) – Loss scaling mode. ‘dynamic’ or ‘static’
scale_window (int) – Number of consecutive iterations without an overflow to wait before increasing the loss scale. Default: 1000.
- has_overflow(params: List[torch.nn.parameter.Parameter]) → bool[source]¶
Check if params contain overflow.
- class mmcv.runner.LrUpdaterHook(by_epoch: bool = True, warmup: Optional[str] = None, warmup_iters: int = 0, warmup_ratio: float = 0.1, warmup_by_epoch: bool = False)[source]¶
LR Scheduler in MMCV.
- Parameters
by_epoch (bool) – LR changes epoch by epoch
warmup (string) – Type of warmup used. It can be None(use no warmup), ‘constant’, ‘linear’ or ‘exp’
warmup_iters (int) – The number of iterations or epochs that warmup lasts
warmup_ratio (float) – LR used at the beginning of warmup equals to warmup_ratio * initial_lr
warmup_by_epoch (bool) – When warmup_by_epoch == True, warmup_iters means the number of epochs that warmup lasts, otherwise means the number of iteration that warmup lasts
- class mmcv.runner.MlflowLoggerHook(exp_name: Optional[str] = None, tags: Optional[Dict] = None, params: Optional[Dict] = None, log_model: bool = True, interval: int = 10, ignore_last: bool = True, reset_flag: bool = False, by_epoch: bool = True)[source]¶
Class to log metrics and (optionally) a trained model to MLflow.
It requires MLflow to be installed.
- Parameters
exp_name (str, optional) – Name of the experiment to be used. Default None. If not None, set the active experiment. If experiment does not exist, an experiment with provided name will be created.
tags (Dict[str], optional) – Tags for the current run. Default None. If not None, set tags for the current run.
params (Dict[str], optional) – Params for the current run. Default None. If not None, set params for the current run.
log_model (bool, optional) – Whether to log an MLflow artifact. Default True. If True, log runner.model as an MLflow artifact for the current run.
interval (int) – Logging interval (every k iterations). Default: 10.
ignore_last (bool) – Ignore the log of last iterations in each epoch if less than interval. Default: True.
reset_flag (bool) – Whether to clear the output buffer after logging. Default: False.
by_epoch (bool) – Whether EpochBasedRunner is used. Default: True.
- class mmcv.runner.ModuleDict(modules: Optional[dict] = None, init_cfg: Optional[dict] = None)[source]¶
ModuleDict in openmmlab.
- Parameters
modules (dict, optional) – a mapping (dictionary) of (string: module) or an iterable of key-value pairs of type (string, module).
init_cfg (dict, optional) – Initialization config dict.
- class mmcv.runner.ModuleList(modules: Optional[Iterable] = None, init_cfg: Optional[dict] = None)[source]¶
ModuleList in openmmlab.
- Parameters
modules (iterable, optional) – an iterable of modules to add.
init_cfg (dict, optional) – Initialization config dict.
- class mmcv.runner.NeptuneLoggerHook(init_kwargs: Optional[Dict] = None, interval: int = 10, ignore_last: bool = True, reset_flag: bool = True, with_step: bool = True, by_epoch: bool = True)[source]¶
Class to log metrics to NeptuneAI.
It requires Neptune to be installed.
- Parameters
init_kwargs (dict) –
a dict contains the initialization keys as below:
project (str): Name of a project in a form of namespace/project_name. If None, the value of NEPTUNE_PROJECT environment variable will be taken.
api_token (str): User’s API token. If None, the value of NEPTUNE_API_TOKEN environment variable will be taken. Note: It is strongly recommended to use NEPTUNE_API_TOKEN environment variable rather than placing your API token in plain text in your source code.
name (str, optional, default is ‘Untitled’): Editable name of the run. Name is displayed in the run’s Details and in Runs table as a column.
Check https://docs.neptune.ai/api-reference/neptune#init for more init arguments.
interval (int) – Logging interval (every k iterations). Default: 10.
ignore_last (bool) – Ignore the log of last iterations in each epoch if less than
interval
. Default: True.reset_flag (bool) – Whether to clear the output buffer after logging. Default: True.
with_step (bool) – If True, the step will be logged from
self.get_iters
. Otherwise, step will not be logged. Default: True.by_epoch (bool) – Whether EpochBasedRunner is used. Default: True.
- class mmcv.runner.OneCycleLrUpdaterHook(max_lr: Union[float, List], total_steps: Optional[int] = None, pct_start: float = 0.3, anneal_strategy: str = 'cos', div_factor: float = 25, final_div_factor: float = 10000.0, three_phase: bool = False, **kwargs)[source]¶
One Cycle LR Scheduler.
The 1cycle learning rate policy changes the learning rate after every batch. The one cycle learning rate policy is described in https://arxiv.org/pdf/1708.07120.pdf
- Parameters
max_lr (float or list) – Upper learning rate boundaries in the cycle for each parameter group.
total_steps (int, optional) – The total number of steps in the cycle. Note that if a value is not provided here, it will be the max_iter of runner. Default: None.
pct_start (float) – The percentage of the cycle (in number of steps) spent increasing the learning rate. Default: 0.3
anneal_strategy (str) – {‘cos’, ‘linear’} Specifies the annealing strategy: ‘cos’ for cosine annealing, ‘linear’ for linear annealing. Default: ‘cos’
div_factor (float) – Determines the initial learning rate via initial_lr = max_lr/div_factor Default: 25
final_div_factor (float) – Determines the minimum learning rate via min_lr = initial_lr/final_div_factor Default: 1e4
three_phase (bool) – If three_phase is True, use a third phase of the schedule to annihilate the learning rate according to final_div_factor instead of modifying the second phase (the first two phases will be symmetrical about the step indicated by pct_start). Default: False
- class mmcv.runner.OneCycleMomentumUpdaterHook(base_momentum: Union[float, list, dict] = 0.85, max_momentum: Union[float, list, dict] = 0.95, pct_start: float = 0.3, anneal_strategy: str = 'cos', three_phase: bool = False, **kwargs)[source]¶
OneCycle momentum Scheduler.
This momentum scheduler usually used together with the OneCycleLrUpdater to improve the performance.
- Parameters
base_momentum (float or list) – Lower momentum boundaries in the cycle for each parameter group. Note that momentum is cycled inversely to learning rate; at the peak of a cycle, momentum is ‘base_momentum’ and learning rate is ‘max_lr’. Default: 0.85
max_momentum (float or list) – Upper momentum boundaries in the cycle for each parameter group. Functionally, it defines the cycle amplitude (max_momentum - base_momentum). Note that momentum is cycled inversely to learning rate; at the start of a cycle, momentum is ‘max_momentum’ and learning rate is ‘base_lr’ Default: 0.95
pct_start (float) – The percentage of the cycle (in number of steps) spent increasing the learning rate. Default: 0.3
anneal_strategy (str) – {‘cos’, ‘linear’} Specifies the annealing strategy: ‘cos’ for cosine annealing, ‘linear’ for linear annealing. Default: ‘cos’
three_phase (bool) – If three_phase is True, use a third phase of the schedule to annihilate the learning rate according to final_div_factor instead of modifying the second phase (the first two phases will be symmetrical about the step indicated by pct_start). Default: False
- class mmcv.runner.OptimizerHook(grad_clip: Optional[dict] = None, detect_anomalous_params: bool = False)[source]¶
A hook contains custom operations for the optimizer.
- Parameters
grad_clip (dict, optional) – A config dict to control the clip_grad. Default: None.
detect_anomalous_params (bool) –
This option is only used for debugging which will slow down the training speed. Detect anomalous parameters that are not included in the computational graph with loss as the root. There are two cases
Parameters were not used during forward pass.
Parameters were not used to produce loss.
Default: False.
- class mmcv.runner.PaviLoggerHook(init_kwargs: Optional[Dict] = None, add_graph: Optional[bool] = None, img_key: Optional[str] = None, add_last_ckpt: bool = False, interval: int = 10, ignore_last: bool = True, reset_flag: bool = False, by_epoch: bool = True, add_graph_kwargs: Optional[Dict] = None, add_ckpt_kwargs: Optional[Dict] = None)[source]¶
Class to visual model, log metrics (for internal use).
- Parameters
init_kwargs (dict) –
A dict contains the initialization keys as below:
name (str, optional): Custom training name. Defaults to None, which means current work_dir.
project (str, optional): Project name. Defaults to “default”.
model (str, optional): Training model name. Defaults to current model.
session_text (str, optional): Session string in YAML format. Defaults to current config.
training_id (int, optional): Training ID in PAVI, if you want to use an existing training. Defaults to None.
compare_id (int, optional): Compare ID in PAVI, if you want to add the task to an existing compare. Defaults to None.
overwrite_last_training (bool, optional): Whether to upload data to the training with the same name in the same project, rather than creating a new one. Defaults to False.
add_graph (bool, optional) – Deprecated. Whether to visual model. Default: False.
img_key (str, optional) – Deprecated. Image key. Defaults to None.
add_last_ckpt (bool) – Whether to save checkpoint after run. Default: False.
interval (int) – Logging interval (every k iterations). Default: True.
ignore_last (bool) – Ignore the log of last iterations in each epoch if less than interval. Default: True.
reset_flag (bool) – Whether to clear the output buffer after logging. Default: False.
by_epoch (bool) – Whether EpochBasedRunner is used. Default: True.
add_graph_kwargs (dict, optional) –
A dict contains the params for adding graph, the keys are as below: - active (bool): Whether to use
add_graph
. Default: False. - start (int): The epoch or iteration to start. Default: 0. - interval (int): Interval ofadd_graph
. Default: 1. - img_key (str): Get image data from Dataset. Default: ‘img’. - opset_version (int):opset_version
of exporting onnx.Default: 11.
- dummy_forward_kwargs (dict, optional): Set default parameters to
model forward function except image. For example, you can set {‘return_loss’: False} for mmcls. Default: None.
add_ckpt_kwargs (dict, optional) – A dict contains the params for adding checkpoint, the keys are as below: - active (bool): Whether to upload checkpoint. Default: False. - start (int): The epoch or iteration to start. Default: 0. - interval (int): Interval of upload checkpoint. Default: 1.
- class mmcv.runner.Priority(value)[source]¶
Hook priority levels.
Level
Value
HIGHEST
0
VERY_HIGH
10
HIGH
30
ABOVE_NORMAL
40
NORMAL
50
BELOW_NORMAL
60
LOW
70
VERY_LOW
90
LOWEST
100
- class mmcv.runner.SegmindLoggerHook(interval: int = 10, ignore_last: bool = True, reset_flag: bool = False, by_epoch=True)[source]¶
Class to log metrics to Segmind.
It requires Segmind to be installed.
- Parameters
interval (int) – Logging interval (every k iterations). Default: 10.
ignore_last (bool) – Ignore the log of last iterations in each epoch if less than interval. Default True.
reset_flag (bool) – Whether to clear the output buffer after logging. Default False.
by_epoch (bool) – Whether EpochBasedRunner is used. Default True.
- class mmcv.runner.Sequential(*args, init_cfg: Optional[dict] = None)[source]¶
Sequential module in openmmlab.
- Parameters
init_cfg (dict, optional) – Initialization config dict.
- class mmcv.runner.StepLrUpdaterHook(step: Union[int, List[int]], gamma: float = 0.1, min_lr: Optional[float] = None, **kwargs)[source]¶
Step LR scheduler with min_lr clipping.
- Parameters
step (int | list[int]) – Step to decay the LR. If an int value is given, regard it as the decay interval. If a list is given, decay LR at these steps.
gamma (float) – Decay LR ratio. Defaults to 0.1.
min_lr (float, optional) – Minimum LR value to keep. If LR after decay is lower than min_lr, it will be clipped to this value. If None is given, we don’t perform lr clipping. Default: None.
- class mmcv.runner.StepMomentumUpdaterHook(step: Union[int, List[int]], gamma: float = 0.5, min_momentum: Optional[float] = None, **kwargs)[source]¶
Step momentum scheduler with min value clipping.
- Parameters
step (int | list[int]) – Step to decay the momentum. If an int value is given, regard it as the decay interval. If a list is given, decay momentum at these steps.
gamma (float, optional) – Decay momentum ratio. Default: 0.5.
min_momentum (float, optional) – Minimum momentum value to keep. If momentum after decay is lower than this value, it will be clipped accordingly. If None is given, we don’t perform lr clipping. Default: None.
- class mmcv.runner.SyncBuffersHook(distributed: bool = True)[source]¶
Synchronize model buffers such as running_mean and running_var in BN at the end of each epoch.
- Parameters
distributed (bool) – Whether distributed training is used. It is effective only for distributed training. Defaults to True.
- class mmcv.runner.TensorboardLoggerHook(log_dir: Optional[str] = None, interval: int = 10, ignore_last: bool = True, reset_flag: bool = False, by_epoch: bool = True)[source]¶
Class to log metrics to Tensorboard.
- Parameters
log_dir (string) – Save directory location. Default: None. If default values are used, directory location is
runner.work_dir
/tf_logs.interval (int) – Logging interval (every k iterations). Default: True.
ignore_last (bool) – Ignore the log of last iterations in each epoch if less than interval. Default: True.
reset_flag (bool) – Whether to clear the output buffer after logging. Default: False.
by_epoch (bool) – Whether EpochBasedRunner is used. Default: True.
- class mmcv.runner.TextLoggerHook(by_epoch: bool = True, interval: int = 10, ignore_last: bool = True, reset_flag: bool = False, interval_exp_name: int = 1000, out_dir: Optional[str] = None, out_suffix: Union[str, tuple] = ('.log.json', '.log', '.py'), keep_local: bool = True, file_client_args: Optional[Dict] = None)[source]¶
Logger hook in text.
In this logger hook, the information will be printed on terminal and saved in json file.
- Parameters
by_epoch (bool, optional) – Whether EpochBasedRunner is used. Default: True.
interval (int, optional) – Logging interval (every k iterations). Default: 10.
ignore_last (bool, optional) – Ignore the log of last iterations in each epoch if less than
interval
. Default: True.reset_flag (bool, optional) – Whether to clear the output buffer after logging. Default: False.
interval_exp_name (int, optional) – Logging interval for experiment name. This feature is to help users conveniently get the experiment information from screen or log file. Default: 1000.
out_dir (str, optional) – Logs are saved in
runner.work_dir
default. Ifout_dir
is specified, logs will be copied to a new directory which is the concatenation ofout_dir
and the last level directory ofrunner.work_dir
. Default: None. New in version 1.3.16.out_suffix (str or tuple[str], optional) – Those filenames ending with
out_suffix
will be copied toout_dir
. Default: (‘.log.json’, ‘.log’, ‘.py’). New in version 1.3.16.keep_local (bool, optional) – Whether to keep local log when
out_dir
is specified. If False, the local log will be removed. Default: True. New in version 1.3.16.file_client_args (dict, optional) – Arguments to instantiate a FileClient. See
mmcv.fileio.FileClient
for details. Default: None. New in version 1.3.16.
- class mmcv.runner.WandbLoggerHook(init_kwargs: Optional[Dict] = None, interval: int = 10, ignore_last: bool = True, reset_flag: bool = False, commit: bool = True, by_epoch: bool = True, with_step: bool = True, log_artifact: bool = True, out_suffix: Union[str, tuple] = ('.log.json', '.log', '.py'), define_metric_cfg: Optional[Dict] = None)[source]¶
Class to log metrics with wandb.
It requires wandb to be installed.
- Parameters
init_kwargs (dict) – A dict contains the initialization keys. Check https://docs.wandb.ai/ref/python/init for more init arguments.
interval (int) – Logging interval (every k iterations). Default 10.
ignore_last (bool) – Ignore the log of last iterations in each epoch if less than interval. Default: True.
reset_flag (bool) – Whether to clear the output buffer after logging. Default: False.
commit (bool) – Save the metrics dict to the wandb server and increment the step. If false
wandb.log
just updates the current metrics dict with the row argument and metrics won’t be saved untilwandb.log
is called withcommit=True
. Default: True.by_epoch (bool) – Whether EpochBasedRunner is used. Default: True.
with_step (bool) – If True, the step will be logged from
self.get_iters
. Otherwise, step will not be logged. Default: True.log_artifact (bool) – If True, artifacts in {work_dir} will be uploaded to wandb after training ends. Default: True New in version 1.4.3.
out_suffix (str or tuple[str], optional) – Those filenames ending with
out_suffix
will be uploaded to wandb. Default: (‘.log.json’, ‘.log’, ‘.py’). New in version 1.4.3.define_metric_cfg (dict, optional) –
A dict of metrics and summaries for wandb.define_metric. The key is metric and the value is summary. The summary should be in [“min”, “max”, “mean” ,”best”, “last”,
”none”].
For example, if setting
define_metric_cfg={'coco/bbox_mAP': 'max'}
, the maximum value ofcoco/bbox_mAP
will be logged on wandb UI. See wandb docs for details. Defaults to None. New in version 1.6.3.
- mmcv.runner.allreduce_grads(params: List[torch.nn.parameter.Parameter], coalesce: bool = True, bucket_size_mb: int = - 1) → None[source]¶
Allreduce gradients.
- Parameters
params (list[torch.nn.Parameter]) – List of parameters of a model.
coalesce (bool, optional) – Whether allreduce parameters as a whole. Defaults to True.
bucket_size_mb (int, optional) – Size of bucket, the unit is MB. Defaults to -1.
- mmcv.runner.allreduce_params(params: List[torch.nn.parameter.Parameter], coalesce: bool = True, bucket_size_mb: int = - 1) → None[source]¶
Allreduce parameters.
- Parameters
params (list[torch.nn.Parameter]) – List of parameters or buffers of a model.
coalesce (bool, optional) – Whether allreduce parameters as a whole. Defaults to True.
bucket_size_mb (int, optional) – Size of bucket, the unit is MB. Defaults to -1.
- mmcv.runner.auto_fp16(apply_to: Optional[Iterable] = None, out_fp32: bool = False, supported_types: tuple = (<class 'torch.nn.modules.module.Module'>, )) → Callable[source]¶
Decorator to enable fp16 training automatically.
This decorator is useful when you write custom modules and want to support mixed precision training. If inputs arguments are fp32 tensors, they will be converted to fp16 automatically. Arguments other than fp32 tensors are ignored. If you are using PyTorch >= 1.6, torch.cuda.amp is used as the backend, otherwise, original mmcv implementation will be adopted.
- Parameters
apply_to (Iterable, optional) – The argument names to be converted. None indicates all arguments.
out_fp32 (bool) – Whether to convert the output back to fp32.
supported_types (tuple) – Classes can be decorated by
auto_fp16
. New in version 1.5.0.
Example
>>> import torch.nn as nn >>> class MyModule1(nn.Module): >>> >>> # Convert x and y to fp16 >>> @auto_fp16() >>> def forward(self, x, y): >>> pass
>>> import torch.nn as nn >>> class MyModule2(nn.Module): >>> >>> # convert pred to fp16 >>> @auto_fp16(apply_to=('pred', )) >>> def do_something(self, pred, others): >>> pass
- mmcv.runner.force_fp32(apply_to: Optional[Iterable] = None, out_fp16: bool = False) → Callable[source]¶
Decorator to convert input arguments to fp32 in force.
This decorator is useful when you write custom modules and want to support mixed precision training. If there are some inputs that must be processed in fp32 mode, then this decorator can handle it. If inputs arguments are fp16 tensors, they will be converted to fp32 automatically. Arguments other than fp16 tensors are ignored. If you are using PyTorch >= 1.6, torch.cuda.amp is used as the backend, otherwise, original mmcv implementation will be adopted.
- Parameters
apply_to (Iterable, optional) – The argument names to be converted. None indicates all arguments.
out_fp16 (bool) – Whether to convert the output back to fp16.
Example
>>> import torch.nn as nn >>> class MyModule1(nn.Module): >>> >>> # Convert x and y to fp32 >>> @force_fp32() >>> def loss(self, x, y): >>> pass
>>> import torch.nn as nn >>> class MyModule2(nn.Module): >>> >>> # convert pred to fp32 >>> @force_fp32(apply_to=('pred', )) >>> def post_process(self, pred, others): >>> pass
- mmcv.runner.get_host_info() → str[source]¶
Get hostname and username.
Return empty string if exception raised, e.g.
getpass.getuser()
will lead to error in docker container
- mmcv.runner.get_priority(priority: Union[int, str, mmcv.runner.priority.Priority]) → int[source]¶
Get priority value.
- Parameters
priority (int or str or
Priority
) – Priority.- Returns
The priority value.
- Return type
int
- mmcv.runner.load_checkpoint(model: torch.nn.modules.module.Module, filename: str, map_location: Optional[Union[str, Callable]] = None, strict: bool = False, logger: Optional[logging.Logger] = None, revise_keys: list = [('^module\\.', '')]) → Union[dict, collections.OrderedDict][source]¶
Load checkpoint from a file or URI.
- Parameters
model (Module) – Module to load checkpoint.
filename (str) – Accept local filepath, URL,
torchvision://xxx
,open-mmlab://xxx
. Please refer todocs/model_zoo.md
for details.map_location (str) – Same as
torch.load()
.strict (bool) – Whether to allow different params for the model and checkpoint.
logger (
logging.Logger
or None) – The logger for error message.revise_keys (list) – A list of customized keywords to modify the state_dict in checkpoint. Each item is a (pattern, replacement) pair of the regular expression operations. Default: strip the prefix ‘module.’ by [(r’^module.’, ‘’)].
- Returns
The loaded checkpoint.
- Return type
dict or OrderedDict
- mmcv.runner.load_state_dict(module: torch.nn.modules.module.Module, state_dict: Union[dict, collections.OrderedDict], strict: bool = False, logger: Optional[logging.Logger] = None) → None[source]¶
Load state_dict to a module.
This method is modified from
torch.nn.Module.load_state_dict()
. Default value forstrict
is set toFalse
and the message for param mismatch will be shown even if strict is False.- Parameters
module (Module) – Module that receives the state_dict.
state_dict (dict or OrderedDict) – Weights.
strict (bool) – whether to strictly enforce that the keys in
state_dict
match the keys returned by this module’sstate_dict()
function. Default:False
.logger (
logging.Logger
, optional) – Logger to log the error message. If not specified, print function will be used.
- mmcv.runner.obj_from_dict(info: dict, parent: Optional[module] = None, default_args: Optional[dict] = None)[source]¶
Initialize an object from dict.
The dict must contain the key “type”, which indicates the object type, it can be either a string or type, such as “list” or
list
. Remaining fields are treated as the arguments for constructing the object.- Parameters
info (dict) – Object types and arguments.
parent (
module
) – Module which may containing expected object classes.default_args (dict, optional) – Default arguments for initializing the object.
- Returns
Object built from the dict.
- Return type
any type
- mmcv.runner.save_checkpoint(model: torch.nn.modules.module.Module, filename: str, optimizer: Optional[torch.optim.optimizer.Optimizer] = None, meta: Optional[dict] = None, file_client_args: Optional[dict] = None) → None[source]¶
Save checkpoint to file.
The checkpoint will have 3 fields:
meta
,state_dict
andoptimizer
. By defaultmeta
will contain version and time info.- Parameters
model (Module) – Module whose params are to be saved.
filename (str) – Checkpoint filename.
optimizer (
Optimizer
, optional) – Optimizer to be saved.meta (dict, optional) – Metadata to be saved in checkpoint.
file_client_args (dict, optional) – Arguments to instantiate a FileClient. See
mmcv.fileio.FileClient
for details. Default: None. New in version 1.3.16.
- mmcv.runner.set_random_seed(seed: int, deterministic: bool = False, use_rank_shift: bool = False) → None[source]¶
Set random seed.
- Parameters
seed (int) – Seed to be used.
deterministic (bool) – Whether to set the deterministic option for CUDNN backend, i.e., set torch.backends.cudnn.deterministic to True and torch.backends.cudnn.benchmark to False. Default: False.
rank_shift (bool) – Whether to add rank number to the random seed to have different random seed in different threads. Default: False.
- mmcv.runner.weights_to_cpu(state_dict: collections.OrderedDict) → collections.OrderedDict[source]¶
Copy a model state_dict to cpu.
- Parameters
state_dict (OrderedDict) – Model weights on GPU.
- Returns
Model weights on GPU.
- Return type
OrderedDict
- mmcv.runner.wrap_fp16_model(model: torch.nn.modules.module.Module) → None[source]¶
Wrap the FP32 model to FP16.
If you are using PyTorch >= 1.6, torch.cuda.amp is used as the backend, otherwise, original mmcv implementation will be adopted.
For PyTorch >= 1.6, this function will 1. Set fp16 flag inside the model to True.
Otherwise: 1. Convert FP32 model to FP16. 2. Remain some necessary layers to be FP32, e.g., normalization layers. 3. Set fp16_enabled flag inside the model to True.
- Parameters
model (nn.Module) – Model in FP32.
engine¶
- mmcv.engine.collect_results_cpu(result_part: list, size: int, tmpdir: Optional[str] = None) → Optional[list][source]¶
Collect results under cpu mode.
On cpu mode, this function will save the results on different gpus to
tmpdir
and collect them by the rank 0 worker.- Parameters
result_part (list) – Result list containing result parts to be collected.
size (int) – Size of the results, commonly equal to length of the results.
tmpdir (str | None) – temporal directory for collected results to store. If set to None, it will create a random temporal directory for it.
- Returns
The collected results.
- Return type
list
- mmcv.engine.collect_results_gpu(result_part: list, size: int) → Optional[list][source]¶
Collect results under gpu mode.
On gpu mode, this function will encode results to gpu tensors and use gpu communication for results collection.
- Parameters
result_part (list) – Result list containing result parts to be collected.
size (int) – Size of the results, commonly equal to length of the results.
- Returns
The collected results.
- Return type
list
- mmcv.engine.multi_gpu_test(model: torch.nn.modules.module.Module, data_loader: torch.utils.data.dataloader.DataLoader, tmpdir: Optional[str] = None, gpu_collect: bool = False) → Optional[list][source]¶
Test model with multiple gpus.
This method tests model with multiple gpus and collects the results under two different modes: gpu and cpu modes. By setting
gpu_collect=True
, it encodes results to gpu tensors and use gpu communication for results collection. On cpu mode it saves the results on different gpus totmpdir
and collects them by the rank 0 worker.- Parameters
model (nn.Module) – Model to be tested.
data_loader (nn.Dataloader) – Pytorch data loader.
tmpdir (str) – Path of directory to save the temporary results from different gpus under cpu mode.
gpu_collect (bool) – Option to use either gpu or cpu to collect results.
- Returns
The prediction results.
- Return type
list
- mmcv.engine.single_gpu_test(model: torch.nn.modules.module.Module, data_loader: torch.utils.data.dataloader.DataLoader) → list[source]¶
Test model with a single gpu.
This method tests model with a single gpu and displays test progress bar.
- Parameters
model (nn.Module) – Model to be tested.
data_loader (nn.Dataloader) – Pytorch data loader.
- Returns
The prediction results.
- Return type
list
ops¶
- class mmcv.ops.BorderAlign(pool_size: int)[source]¶
Border align pooling layer.
Applies border_align over the input feature based on predicted bboxes. The details were described in the paper BorderDet: Border Feature for Dense Object Detection.
For each border line (e.g. top, left, bottom or right) of each box, border_align does the following:
uniformly samples
pool_size
+1 positions on this line, involving the start and end points.the corresponding features on these points are computed by bilinear interpolation.
max pooling over all the
pool_size
+1 positions are used for computing pooled feature.
- Parameters
pool_size (int) – number of positions sampled over the boxes’ borders (e.g. top, bottom, left, right).
- forward(input: torch.Tensor, boxes: torch.Tensor) → torch.Tensor[source]¶
- Parameters
input – Features with shape [N,4C,H,W]. Channels ranged in [0,C), [C,2C), [2C,3C), [3C,4C) represent the top, left, bottom, right features respectively.
boxes – Boxes with shape [N,H*W,4]. Coordinate format (x1,y1,x2,y2).
- Returns
Pooled features with shape [N,C,H*W,4]. The order is (top,left,bottom,right) for the last dimension.
- Return type
torch.Tensor
- class mmcv.ops.CARAFE(kernel_size: int, group_size: int, scale_factor: int)[source]¶
CARAFE: Content-Aware ReAssembly of FEatures
Please refer to CARAFE: Content-Aware ReAssembly of FEatures for more details.
- Parameters
kernel_size (int) – reassemble kernel size
group_size (int) – reassemble group size
scale_factor (int) – upsample ratio
- Returns
upsampled feature map
- forward(features: torch.Tensor, masks: torch.Tensor) → torch.Tensor[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class mmcv.ops.CARAFENaive(kernel_size: int, group_size: int, scale_factor: int)[source]¶
- forward(features: torch.Tensor, masks: torch.Tensor) → torch.Tensor[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class mmcv.ops.CARAFEPack(channels: int, scale_factor: int, up_kernel: int = 5, up_group: int = 1, encoder_kernel: int = 3, encoder_dilation: int = 1, compressed_channels: int = 64)[source]¶
A unified package of CARAFE upsampler that contains: 1) channel compressor 2) content encoder 3) CARAFE op.
Official implementation of ICCV 2019 paper CARAFE: Content-Aware ReAssembly of FEatures.
- Parameters
channels (int) – input feature channels
scale_factor (int) – upsample ratio
up_kernel (int) – kernel size of CARAFE op
up_group (int) – group size of CARAFE op
encoder_kernel (int) – kernel size of content encoder
encoder_dilation (int) – dilation of content encoder
compressed_channels (int) – output channels of channels compressor
- Returns
upsampled feature map
- forward(x: torch.Tensor) → torch.Tensor[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- mmcv.ops.Conv2d¶
alias of
mmcv.ops.deprecated_wrappers.Conv2d_deprecated
- mmcv.ops.ConvTranspose2d¶
alias of
mmcv.ops.deprecated_wrappers.ConvTranspose2d_deprecated
- class mmcv.ops.CornerPool(mode: str)[source]¶
Corner Pooling.
Corner Pooling is a new type of pooling layer that helps a convolutional network better localize corners of bounding boxes.
Please refer to CornerNet: Detecting Objects as Paired Keypoints for more details.
Code is modified from https://github.com/princeton-vl/CornerNet-Lite.
- Parameters
mode (str) –
Pooling orientation for the pooling layer
’bottom’: Bottom Pooling
’left’: Left Pooling
’right’: Right Pooling
’top’: Top Pooling
- Returns
Feature map after pooling.
- forward(x: torch.Tensor) → torch.Tensor[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class mmcv.ops.Correlation(kernel_size: int = 1, max_displacement: int = 1, stride: int = 1, padding: int = 0, dilation: int = 1, dilation_patch: int = 1)[source]¶
Correlation operator.
This correlation operator works for optical flow correlation computation.
There are two batched tensors with shape \((N, C, H, W)\), and the correlation output’s shape is \((N, max\_displacement \times 2 + 1, max\_displacement * 2 + 1, H_{out}, W_{out})\)
where
\[H_{out} = \left\lfloor\frac{H_{in} + 2 \times padding - dilation \times (kernel\_size - 1) - 1} {stride} + 1\right\rfloor\]\[W_{out} = \left\lfloor\frac{W_{in} + 2 \times padding - dilation \times (kernel\_size - 1) - 1} {stride} + 1\right\rfloor\]the correlation item \((N_i, dy, dx)\) is formed by taking the sliding window convolution between input1 and shifted input2,
\[Corr(N_i, dx, dy) = \sum_{c=0}^{C-1} input1(N_i, c) \star \mathcal{S}(input2(N_i, c), dy, dx)\]where \(\star\) is the valid 2d sliding window convolution operator, and \(\mathcal{S}\) means shifting the input features (auto-complete zero marginal), and \(dx, dy\) are shifting distance, \(dx, dy \in [-max\_displacement \times dilation\_patch, max\_displacement \times dilation\_patch]\).
- Parameters
kernel_size (int) – The size of sliding window i.e. local neighborhood representing the center points and involved in correlation computation. Defaults to 1.
max_displacement (int) – The radius for computing correlation volume, but the actual working space can be dilated by dilation_patch. Defaults to 1.
stride (int) – The stride of the sliding blocks in the input spatial dimensions. Defaults to 1.
padding (int) – Zero padding added to all four sides of the input1. Defaults to 0.
dilation (int) – The spacing of local neighborhood that will involved in correlation. Defaults to 1.
dilation_patch (int) – The spacing between position need to compute correlation. Defaults to 1.
- forward(input1: torch.Tensor, input2: torch.Tensor) → torch.Tensor[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class mmcv.ops.CrissCrossAttention(in_channels: int)[source]¶
Criss-Cross Attention Module.
Note
Before v1.3.13, we use a CUDA op. Since v1.3.13, we switch to a pure PyTorch and equivalent implementation. For more details, please refer to https://github.com/open-mmlab/mmcv/pull/1201.
Speed comparison for one forward pass
Input size: [2,512,97,97]
Device: 1 NVIDIA GeForce RTX 2080 Ti
PyTorch version
CUDA version
Relative speed
with torch.no_grad()
0.00554402 s
0.0299619 s
5.4x
no with torch.no_grad()
0.00562803 s
0.0301349 s
5.4x
- Parameters
in_channels (int) – Channels of the input feature map.
- forward(x: torch.Tensor) → torch.Tensor[source]¶
forward function of Criss-Cross Attention.
- Parameters
x (torch.Tensor) – Input feature with the shape of (batch_size, in_channels, height, width).
- Returns
Output of the layer, with the shape of (batch_size, in_channels, height, width)
- Return type
torch.Tensor
- class mmcv.ops.DeformConv2d(in_channels: int, out_channels: int, kernel_size: Union[int, Tuple[int, ...]], stride: Union[int, Tuple[int, ...]] = 1, padding: Union[int, Tuple[int, ...]] = 0, dilation: Union[int, Tuple[int, ...]] = 1, groups: int = 1, deform_groups: int = 1, bias: bool = False, im2col_step: int = 32)[source]¶
Deformable 2D convolution.
Applies a deformable 2D convolution over an input signal composed of several input planes. DeformConv2d was described in the paper Deformable Convolutional Networks
Note
The argument
im2col_step
was added in version 1.3.17, which means number of samples processed by theim2col_cuda_kernel
per call. It enables users to definebatch_size
andim2col_step
more flexibly and solved issue mmcv#1440.- Parameters
in_channels (int) – Number of channels in the input image.
out_channels (int) – Number of channels produced by the convolution.
kernel_size (int, tuple) – Size of the convolving kernel.
stride (int, tuple) – Stride of the convolution. Default: 1.
padding (int or tuple) – Zero-padding added to both sides of the input. Default: 0.
dilation (int or tuple) – Spacing between kernel elements. Default: 1.
groups (int) – Number of blocked connections from input. channels to output channels. Default: 1.
deform_groups (int) – Number of deformable group partitions.
bias (bool) – If True, adds a learnable bias to the output. Default: False.
im2col_step (int) – Number of samples processed by im2col_cuda_kernel per call. It will work when
batch_size
>im2col_step
, butbatch_size
must be divisible byim2col_step
. Default: 32. New in version 1.3.17.
- forward(x: torch.Tensor, offset: torch.Tensor) → torch.Tensor[source]¶
Deformable Convolutional forward function.
- Parameters
x (Tensor) – Input feature, shape (B, C_in, H_in, W_in)
offset (Tensor) –
Offset for deformable convolution, shape (B, deform_groups*kernel_size[0]*kernel_size[1]*2, H_out, W_out), H_out, W_out are equal to the output’s.
An offset is like [y0, x0, y1, x1, y2, x2, …, y8, x8]. The spatial arrangement is like:
(x0, y0) (x1, y1) (x2, y2) (x3, y3) (x4, y4) (x5, y5) (x6, y6) (x7, y7) (x8, y8)
- Returns
Output of the layer.
- Return type
Tensor
- class mmcv.ops.DeformConv2dPack(*args, **kwargs)[source]¶
A Deformable Conv Encapsulation that acts as normal Conv layers.
The offset tensor is like [y0, x0, y1, x1, y2, x2, …, y8, x8]. The spatial arrangement is like:
(x0, y0) (x1, y1) (x2, y2) (x3, y3) (x4, y4) (x5, y5) (x6, y6) (x7, y7) (x8, y8)
- Parameters
in_channels (int) – Same as nn.Conv2d.
out_channels (int) – Same as nn.Conv2d.
kernel_size (int or tuple[int]) – Same as nn.Conv2d.
stride (int or tuple[int]) – Same as nn.Conv2d.
padding (int or tuple[int]) – Same as nn.Conv2d.
dilation (int or tuple[int]) – Same as nn.Conv2d.
groups (int) – Same as nn.Conv2d.
bias (bool or str) – If specified as auto, it will be decided by the norm_cfg. Bias will be set as True if norm_cfg is None, otherwise False.
- forward(x: torch.Tensor) → torch.Tensor[source]¶
Deformable Convolutional forward function.
- Parameters
x (Tensor) – Input feature, shape (B, C_in, H_in, W_in)
offset (Tensor) –
Offset for deformable convolution, shape (B, deform_groups*kernel_size[0]*kernel_size[1]*2, H_out, W_out), H_out, W_out are equal to the output’s.
An offset is like [y0, x0, y1, x1, y2, x2, …, y8, x8]. The spatial arrangement is like:
(x0, y0) (x1, y1) (x2, y2) (x3, y3) (x4, y4) (x5, y5) (x6, y6) (x7, y7) (x8, y8)
- Returns
Output of the layer.
- Return type
Tensor
- class mmcv.ops.DeformRoIPool(output_size: Tuple[int, ...], spatial_scale: float = 1.0, sampling_ratio: int = 0, gamma: float = 0.1)[source]¶
- forward(input: torch.Tensor, rois: torch.Tensor, offset: Optional[torch.Tensor] = None) → torch.Tensor[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class mmcv.ops.DeformRoIPoolPack(output_size: Tuple[int, ...], output_channels: int, deform_fc_channels: int = 1024, spatial_scale: float = 1.0, sampling_ratio: int = 0, gamma: float = 0.1)[source]¶
- forward(input: torch.Tensor, rois: torch.Tensor) → torch.Tensor[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class mmcv.ops.DynamicScatter(voxel_size: List, point_cloud_range: List, average_points: bool)[source]¶
Scatters points into voxels, used in the voxel encoder with dynamic voxelization.
Note
The CPU and GPU implementation get the same output, but have numerical difference after summation and division (e.g., 5e-7).
- Parameters
voxel_size (list) – list [x, y, z] size of three dimension.
point_cloud_range (list) – The coordinate range of points, [x_min, y_min, z_min, x_max, y_max, z_max].
average_points (bool) – whether to use avg pooling to scatter points into voxel.
- forward(points: torch.Tensor, coors: torch.Tensor) → Tuple[torch.Tensor, torch.Tensor][source]¶
Scatters points/features into voxels.
- Parameters
points (torch.Tensor) – Points to be reduced into voxels.
coors (torch.Tensor) – Corresponding voxel coordinates (specifically multi-dim voxel index) of each points.
- Returns
A tuple contains two elements. The first one is the voxel features with shape [M, C] which are respectively reduced from input features that share the same voxel coordinates. The second is voxel coordinates with shape [M, ndim].
- Return type
tuple[torch.Tensor]
- forward_single(points: torch.Tensor, coors: torch.Tensor) → Tuple[torch.Tensor, torch.Tensor][source]¶
Scatters points into voxels.
- Parameters
points (torch.Tensor) – Points to be reduced into voxels.
coors (torch.Tensor) – Corresponding voxel coordinates (specifically multi-dim voxel index) of each points.
- Returns
A tuple contains two elements. The first one is the voxel features with shape [M, C] which are respectively reduced from input features that share the same voxel coordinates. The second is voxel coordinates with shape [M, ndim].
- Return type
tuple[torch.Tensor]
- class mmcv.ops.FusedBiasLeakyReLU(num_channels: int, negative_slope: float = 0.2, scale: float = 1.4142135623730951)[source]¶
Fused bias leaky ReLU.
This function is introduced in the StyleGAN2: Analyzing and Improving the Image Quality of StyleGAN
The bias term comes from the convolution operation. In addition, to keep the variance of the feature map or gradients unchanged, they also adopt a scale similarly with Kaiming initialization. However, since the \(1+{alpha}^2\) is too small, we can just ignore it. Therefore, the final scale is just \(\sqrt{2}\). Of course, you may change it with your own scale.
TODO: Implement the CPU version.
- Parameters
num_channels (int) – The channel number of the feature map.
negative_slope (float, optional) – Same as nn.LeakyRelu. Defaults to 0.2.
scale (float, optional) – A scalar to adjust the variance of the feature map. Defaults to 2**0.5.
- forward(input: torch.Tensor) → torch.Tensor[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class mmcv.ops.GroupAll(use_xyz: bool = True)[source]¶
Group xyz with feature.
- Parameters
use_xyz (bool) – Whether to use xyz.
- forward(xyz: torch.Tensor, new_xyz: torch.Tensor, features: Optional[torch.Tensor] = None) → torch.Tensor[source]¶
- Parameters
xyz (Tensor) – (B, N, 3) xyz coordinates of the features.
new_xyz (Tensor) – new xyz coordinates of the features.
features (Tensor) – (B, C, N) features to group.
- Returns
(B, C + 3, 1, N) Grouped feature.
- Return type
Tensor
- mmcv.ops.Linear¶
alias of
mmcv.ops.deprecated_wrappers.Linear_deprecated
- class mmcv.ops.MaskedConv2d(in_channels: int, out_channels: int, kernel_size: Union[int, Tuple[int, ...]], stride: int = 1, padding: int = 0, dilation: int = 1, groups: int = 1, bias: bool = True)[source]¶
A MaskedConv2d which inherits the official Conv2d.
The masked forward doesn’t implement the backward function and only supports the stride parameter to be 1 currently.
- forward(input: torch.Tensor, mask: Optional[torch.Tensor] = None) → torch.Tensor[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- mmcv.ops.MaxPool2d¶
alias of
mmcv.ops.deprecated_wrappers.MaxPool2d_deprecated
- class mmcv.ops.ModulatedDeformConv2d(in_channels: int, out_channels: int, kernel_size: Union[int, Tuple[int]], stride: int = 1, padding: int = 0, dilation: int = 1, groups: int = 1, deform_groups: int = 1, bias: Union[bool, str] = True)[source]¶
- forward(x: torch.Tensor, offset: torch.Tensor, mask: torch.Tensor) → torch.Tensor[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class mmcv.ops.ModulatedDeformConv2dPack(*args, **kwargs)[source]¶
A ModulatedDeformable Conv Encapsulation that acts as normal Conv layers.
- Parameters
in_channels (int) – Same as nn.Conv2d.
out_channels (int) – Same as nn.Conv2d.
kernel_size (int or tuple[int]) – Same as nn.Conv2d.
stride (int) – Same as nn.Conv2d, while tuple is not supported.
padding (int) – Same as nn.Conv2d, while tuple is not supported.
dilation (int) – Same as nn.Conv2d, while tuple is not supported.
groups (int) – Same as nn.Conv2d.
bias (bool or str) – If specified as auto, it will be decided by the norm_cfg. Bias will be set as True if norm_cfg is None, otherwise False.
- forward(x: torch.Tensor) → torch.Tensor[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class mmcv.ops.ModulatedDeformRoIPoolPack(output_size: Tuple[int, ...], output_channels: int, deform_fc_channels: int = 1024, spatial_scale: float = 1.0, sampling_ratio: int = 0, gamma: float = 0.1)[source]¶
- forward(input: torch.Tensor, rois: torch.Tensor) → torch.Tensor[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class mmcv.ops.MultiScaleDeformableAttention(embed_dims: int = 256, num_heads: int = 8, num_levels: int = 4, num_points: int = 4, im2col_step: int = 64, dropout: float = 0.1, batch_first: bool = False, norm_cfg: Optional[dict] = None, init_cfg: Optional[mmcv.utils.config.ConfigDict] = None)[source]¶
An attention module used in Deformable-Detr.
Deformable DETR: Deformable Transformers for End-to-End Object Detection..
- Parameters
embed_dims (int) – The embedding dimension of Attention. Default: 256.
num_heads (int) – Parallel attention heads. Default: 8.
num_levels (int) – The number of feature map used in Attention. Default: 4.
num_points (int) – The number of sampling points for each query in each head. Default: 4.
im2col_step (int) – The step used in image_to_column. Default: 64.
dropout (float) – A Dropout layer on inp_identity. Default: 0.1.
batch_first (bool) – Key, Query and Value are shape of (batch, n, embed_dim) or (n, batch, embed_dim). Default to False.
norm_cfg (dict) – Config dict for normalization layer. Default: None.
(obj (init_cfg) – mmcv.ConfigDict): The Config for initialization. Default: None.
- forward(query: torch.Tensor, key: Optional[torch.Tensor] = None, value: Optional[torch.Tensor] = None, identity: Optional[torch.Tensor] = None, query_pos: Optional[torch.Tensor] = None, key_padding_mask: Optional[torch.Tensor] = None, reference_points: Optional[torch.Tensor] = None, spatial_shapes: Optional[torch.Tensor] = None, level_start_index: Optional[torch.Tensor] = None, **kwargs) → torch.Tensor[source]¶
Forward Function of MultiScaleDeformAttention.
- Parameters
query (torch.Tensor) – Query of Transformer with shape (num_query, bs, embed_dims).
key (torch.Tensor) – The key tensor with shape (num_key, bs, embed_dims).
value (torch.Tensor) – The value tensor with shape (num_key, bs, embed_dims).
identity (torch.Tensor) – The tensor used for addition, with the same shape as query. Default None. If None, query will be used.
query_pos (torch.Tensor) – The positional encoding for query. Default: None.
key_padding_mask (torch.Tensor) – ByteTensor for query, with shape [bs, num_key].
reference_points (torch.Tensor) – The normalized reference points with shape (bs, num_query, num_levels, 2), all elements is range in [0, 1], top-left (0,0), bottom-right (1, 1), including padding area. or (N, Length_{query}, num_levels, 4), add additional two dimensions is (w, h) to form reference boxes.
spatial_shapes (torch.Tensor) – Spatial shape of features in different levels. With shape (num_levels, 2), last dimension represents (h, w).
level_start_index (torch.Tensor) – The start index of each level. A tensor has shape
(num_levels, )
and can be represented as [0, h_0*w_0, h_0*w_0+h_1*w_1, …].
- Returns
forwarded results with shape [num_query, bs, embed_dims].
- Return type
torch.Tensor
- class mmcv.ops.PSAMask(psa_type: str, mask_size: Optional[tuple] = None)[source]¶
- forward(input: torch.Tensor) → torch.Tensor[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class mmcv.ops.PointsSampler(num_point: List[int], fps_mod_list: List[str] = ['D-FPS'], fps_sample_range_list: List[int] = [- 1])[source]¶
Points sampling.
- Parameters
num_point (list[int]) – Number of sample points.
fps_mod_list (list[str], optional) – Type of FPS method, valid mod [‘F-FPS’, ‘D-FPS’, ‘FS’], Default: [‘D-FPS’]. F-FPS: using feature distances for FPS. D-FPS: using Euclidean distances of points for FPS. FS: using F-FPS and D-FPS simultaneously.
fps_sample_range_list (list[int], optional) – Range of points to apply FPS. Default: [-1].
- forward(points_xyz: torch.Tensor, features: torch.Tensor) → torch.Tensor[source]¶
- Parameters
points_xyz (torch.Tensor) – (B, N, 3) xyz coordinates of the points.
features (torch.Tensor) – (B, C, N) features of the points.
- Returns
(B, npoint, sample_num) Indices of sampled points.
- Return type
torch.Tensor
- class mmcv.ops.PrRoIPool(output_size: Union[int, tuple], spatial_scale: float = 1.0)[source]¶
The operation of precision RoI pooling. The implementation of PrRoIPool is modified from https://github.com/vacancy/PreciseRoIPooling/
Precise RoI Pooling (PrRoIPool) is an integration-based (bilinear interpolation) average pooling method for RoI Pooling. It avoids any quantization and has a continuous gradient on bounding box coordinates. It is:
1. different from the original RoI Pooling proposed in Fast R-CNN. PrRoI Pooling uses average pooling instead of max pooling for each bin and has a continuous gradient on bounding box coordinates. That is, one can take the derivatives of some loss function w.r.t the coordinates of each RoI and optimize the RoI coordinates. 2. different from the RoI Align proposed in Mask R-CNN. PrRoI Pooling uses a full integration-based average pooling instead of sampling a constant number of points. This makes the gradient w.r.t. the coordinates continuous.
- Parameters
output_size (Union[int, tuple]) – h, w.
spatial_scale (float, optional) – scale the input boxes by this number. Defaults to 1.0.
- class mmcv.ops.QueryAndGroup(max_radius: float, sample_num: int, min_radius: float = 0.0, use_xyz: bool = True, return_grouped_xyz: bool = False, normalize_xyz: bool = False, uniform_sample: bool = False, return_unique_cnt: bool = False