Cameras¶

Camera Initialization¶

We follow Pytorch3D cameras. The camera extrinsic matrix is defined as the camera to world transformation, and uses right matrix multiplication, whereas the intrinsic matrix uses left matrix multiplication. Nevertheless, our interface provides opencv convention that defines the camera the same way as an OpenCV camera, would be helpful if you are more familiar with that.

Slice cameras:

In mmhuman3d, the recommended way to initialize a camera is by passing K, R, T matrix directly. You can slice the cameras by index. You can also concat the cameras in batch dim.

from mmhuman3d.core.cameras import PerspectiveCameras
import torch
K = torch.eye(4, 4)[None]
R = torch.eye(3, 3)[None]
T = torch.zeros(100, 3)
# Batch of K, R, T should all be the same or some of them could be 1. The final batch size will be the biggest one.
cam = PerspectiveCameras(K=K, R=R, T=T)
assert cam.R.shape == (100, 3, 3)
assert cam.K.shape == (100, 4, 4)
assert cam.T.shape == (100, 3)
assert (cam[:10].K == cam.K[:10]).all()

Build cameras:

Wrapped by mmcv.Registry. In mmhuman3d, the recommended way to initialize a camera is by passing K, R, T matrix directly, but you also have the options to pass focal_length and principle_point as the input.

Take the usually used PerspectiveCameras as examples. If K, R, T are not specified, the K will use default K by compute_default_projection_matrix with default focal_length and principal_point and R will be identical matrix, T will be zeros. You can also specify by overwriting the parameters for compute_default_projection_matrix.

from mmhuman3d.core.cameras import build_cameras

# Initialize a perspective camera with given K, R, T matrix.
# It is recommended that the batches of K, R, T either the same or be 1.
K = torch.eye(4, 4)[None]
R = torch.eye(3, 3)[None]
T = torch.zeros(10, 3)

height, width = 1000
cam1 = build_cameras(
    dict(
        type='PerspectiveCameras',
        K=K,
        R=R,
        T=T,
        in_ndc=True,
        image_size=(height, width),
        convention='opencv',
        ))

# This is the same as:
cam2 = PerspectiveCameras(
        K=K,
        R=R,
        T=T,
        in_ndc=True,
        image_size=1000, # single number represents square images.
        convention='opencv',
        )
assert cam1.K.shape == cam2.K.shape == (10, 4, 4)
assert cam1.R.shape == cam2.R.shape == (10, 3, 3)
assert cam1.T.shape == cam2.T.shape == (10, 3)

# Initialize a perspective camera with specific `image_size`, `principal_points`, `focal_length`.
# `in_ndc = False` means the intrinsic matrix `K` defined in screen space. The `focal_length` and `principal_point` in `K` is defined in scale of pixels. This `principal_points` is (500, 500) pixels and `focal_length` is 1000 pixels.
cam = build_cameras(
    dict(
        type='PerspectiveCameras',
        in_ndc=False,
        image_size=(1000, 1000),
        principal_points=(500, 500),
        focal_length=1000,
        convention='opencv',
        ))

assert (cam.K[0] == torch.Tensor([[1000., 0.,  500.,    0.],
                                  [0., 1000.,  500.,    0.],
                                  [0.,    0.,    0.,    1.],
                                  [0.,    0.,    1.,    0.]]).view(4, 4)).all()

# Initialize a weakperspective camera with given K, R, T. weakperspective camera support `in_ndc = True` only.
cam = build_cameras(
    dict(
        type='WeakPerspectiveCameras',
        K=K,
        R=R,
        T=T,
        image_size=(1000, 1000)
        ))

# If no `K`, `R`, `T` information provided
# Initialize a `in_ndc` perspective camera with default matrix.
cam = build_cameras(
    dict(
        type='PerspectiveCameras',
        in_ndc=True,
        image_size=(1000, 1000),
        ))
# Then convert it to screen. This operation requires `image_size`.
cam.to_screen_()

Camera Projection Matrixs¶

Perspective:

format of intrinsic matrix: fx, fy is focal_length, px, py is principal_point.

K = [
        [fx,   0,   px,   0],
        [0,   fy,   py,   0],
        [0,    0,    0,   1],
        [0,    0,    1,   0],
    ]

Detailed information refer to Pytorch3D.

WeakPerspective:

format of intrinsic matrix:

K = [
        [sx*r,   0,    0,   tx*sx*r],
        [0,     sy,    0,   ty*sy],
        [0,     0,     1,       0],
        [0,     0,     0,       1],
    ]

WeakPerspectiveCameras is orthographics indeed, mainly for SMPL(x) projection. Detailed information refer to mmhuman3d cameras. This can be converted from SMPL predicted camera parameter by:

from mmhuman3d.core.cameras import WeakPerspectiveCameras
K = WeakPerspectiveCameras.convert_orig_cam_to_matrix(orig_cam)

The pred_cam is array/tensor of shape (frame, 4) consists of [scale_x, scale_y, transl_x, transl_y]. See in VIBE.

FoVPerspective:

format of intrinsic matrix:
K = [
        [s1,   0,   w1,   0],
        [0,   s2,   h1,   0],
        [0,    0,   f1,  f2],
        [0,    0,    1,   0],
    ]

s1, s2, w1, h1, f1, f2 are defined by FoV parameters (fov, znear, zfar, etc.), detailed information refer to Pytorch3D.

Orthographics:

format of intrinsic matrix:

K = [
        [fx,   0,    0,  px],
        [0,   fy,    0,  py],
        [0,    0,    1,   0],
        [0,    0,    0,   1],
]

Detailed information refer to Pytorch3D.

FoVOrthographics:

K = [
        [scale_x,        0,         0,  -mid_x],
        [0,        scale_y,         0,  -mix_y],
        [0,              0,  -scale_z,  -mid_z],
        [0,              0,         0,       1],
]

scale_x, scale_y, scale_z, mid_x, mid_y, mid_z are defined by FoV parameters(min_x, min_y, max_x, max_y, znear, zfar, etc.), related information refer to Pytorch3D.

Camera Conventions¶

Convert between different cameras:

We name intrinsic matrix as K, rotation matrix as R and translation matrix as T. Different camera conventions have different axis directions, and some use left matrix multiplication and some use right matrix multiplication. Intrinsic and extrinsic matrix should be of the same multiplication convention, but some conventions like Pytorch3D uses right matrix multiplication in computation procedure but passes left matrix multiplication K when initializing the cameras(mainly for better understanding). Conversion between NDC (normalized device coordinate) and screen also influence the intrinsic matrix, this is independent of camera conventions but should also be included. If you want to use an existing convention, choose in ['opengl', 'opencv', 'pytorch3d', 'pyrender', 'open3d']. E.g., you want to convert your opencv calibrated camera to Pytorch3D NDC defined camera for rendering, you can do:
```
from mmhuman3d.core.conventions.cameras import convert_cameras
import torch

K = torch.eye(4, 4)[None]
R = torch.eye(3, 3)[None]
T = torch.zeros(10, 3)
height, width = 1080, 1920
K, R, T = convert_cameras(
    K=K,
    R=R,
    T=T,
    in_ndc_src=False,
    in_ndc_dst=True,
    resolution_src=(height, width),
    convention_src='opencv',
    convention_dst='pytorch3d')
```
Input K could be None, or array/tensor of shape (batch_size, 3, 3) or (batch_size, 4, 4). Input R could be None, or array/tensor of shape (batch_size, 3, 3). Input T could be None, or array/tensor of shape (batch_size, 3). If the original K is None, it will remain None. If the original R is None, it will be set as identity matrix. If the original T is None, it will be set as zeros matrix. Please refer to Pytorch3D for more information about cameras in NDC and in screen space..

Define your new camera convention:

If want to use a new convention, define your convention in CAMERA_CONVENTION_FACTORY by the order of right to, up to, and off screen. E.g., the first one is pyrender and its convention should be ‘+x+y+z’. ‘+’ could be ignored. The second one is opencv and its convention should be ‘+x-y-z’. The third one is Pytorch3D and its convention should be ‘-xyz’.

OpenGL(PyRender)       OpenCV             Pytorch3D
    y                   z                     y
    |                  /                      |
    |                 /                       |
    |_______x        /________x     x________ |
    /                |                        /
   /                 |                       /
z /                y |                    z /

Some Conversion Functions¶

Convert functions are also defined in conventions.cameras.

NDC & screen:

from mmhuman3d.core.conventions.cameras import (convert_ndc_to_screen,
                                                convert_screen_to_ndc)

K = convert_ndc_to_screen(K, resolution=(1080, 1920), is_perspective=True)
K = convert_screen_to_ndc(K, resolution=(1080, 1920), is_perspective=True)

3x3 & 4x4 intrinsic matrix

from mmhuman3d.core.conventions.cameras import (convert_K_3x3_to_4x4,
                                                convert_K_4x4_to_3x3)

K = convert_K_3x3_to_4x4(K, is_perspective=True)
K = convert_K_4x4_to_3x3(K, is_perspective=True)

world & view:

Convert between world & view coordinates.

from mmhuman3d.core.conventions.cameras import convert_world_view
R, T = convert_world_view(R, T)

weakperspective & perspective:

Convert between weakperspective & perspective. zmean is needed. WeakperspectiveCameras is in_ndc, so you should pass resolution if perspective not in ndc.

from mmhuman3d.core.conventions.cameras import (
    convert_perspective_to_weakperspective,
    convert_weakperspective_to_perspective)

K = convert_perspective_to_weakperspective(
    K, zmean, in_ndc=False, resolution, convention='opencv')
K = convert_weakperspective_to_perspective(
    K, zmean, in_ndc=False, resolution, convention='pytorch3d')

Some Compute Functions¶

Project 3D coordinates to screen:

points_xydepth = cameras.transform_points_screen(points)
points_xy = points_xydepth[..., :2]

Compute depth of points:

You can simply convert points to the view coordinates and get the z value as depth. Example could be found in DepthRenderer.
```
points_depth = cameras.compute_depth_of_points(points)
```
Compute normal of meshes:

Use Pytorch3D to compute normal of meshes. Example could be found in NormalRenderer.
```
normals = cameras.compute_normal_of_meshes(meshes)
```
Get camera plane normal:

Get the normalized normal tensor which points out of the camera plane from camera center.
```
normals = cameras.get_camera_plane_normals()
```