Pytorch3D cameras. The camera extrinsic matrix is defined as the camera to world transformation, and uses right matrix multiplication, whereas the intrinsic matrix uses left matrix multiplication. Nevertheless, our interface provides
opencv convention that defines the camera the same way as an
OpenCV camera, would be helpful if you are more familiar with that.
In mmhuman3d, the recommended way to initialize a camera is by passing
Tmatrix directly. You can slice the cameras by index. You can also concat the cameras in batch dim.
from mmhuman3d.core.cameras import PerspectiveCameras import torch K = torch.eye(4, 4)[None] R = torch.eye(3, 3)[None] T = torch.zeros(100, 3) # Batch of K, R, T should all be the same or some of them could be 1. The final batch size will be the biggest one. cam = PerspectiveCameras(K=K, R=R, T=T) assert cam.R.shape == (100, 3, 3) assert cam.K.shape == (100, 4, 4) assert cam.T.shape == (100, 3) assert (cam[:10].K == cam.K[:10]).all()
Wrapped by mmcv.Registry. In mmhuman3d, the recommended way to initialize a camera is by passing
Tmatrix directly, but you also have the options to pass
principle_pointas the input.
Take the usually used
PerspectiveCamerasas examples. If
Tare not specified, the
Kwill use default
Rwill be identical matrix,
Twill be zeros. You can also specify by overwriting the parameters for
from mmhuman3d.core.cameras import build_cameras # Initialize a perspective camera with given K, R, T matrix. # It is recommended that the batches of K, R, T either the same or be 1. K = torch.eye(4, 4)[None] R = torch.eye(3, 3)[None] T = torch.zeros(10, 3) height, width = 1000 cam1 = build_cameras( dict( type='PerspectiveCameras', K=K, R=R, T=T, in_ndc=True, image_size=(height, width), convention='opencv', )) # This is the same as: cam2 = PerspectiveCameras( K=K, R=R, T=T, in_ndc=True, image_size=1000, # single number represents square images. convention='opencv', ) assert cam1.K.shape == cam2.K.shape == (10, 4, 4) assert cam1.R.shape == cam2.R.shape == (10, 3, 3) assert cam1.T.shape == cam2.T.shape == (10, 3) # Initialize a perspective camera with specific `image_size`, `principal_points`, `focal_length`. # `in_ndc = False` means the intrinsic matrix `K` defined in screen space. The `focal_length` and `principal_point` in `K` is defined in scale of pixels. This `principal_points` is (500, 500) pixels and `focal_length` is 1000 pixels. cam = build_cameras( dict( type='PerspectiveCameras', in_ndc=False, image_size=(1000, 1000), principal_points=(500, 500), focal_length=1000, convention='opencv', )) assert (cam.K == torch.Tensor([[1000., 0., 500., 0.], [0., 1000., 500., 0.], [0., 0., 0., 1.], [0., 0., 1., 0.]]).view(4, 4)).all() # Initialize a weakperspective camera with given K, R, T. weakperspective camera support `in_ndc = True` only. cam = build_cameras( dict( type='WeakPerspectiveCameras', K=K, R=R, T=T, image_size=(1000, 1000) )) # If no `K`, `R`, `T` information provided # Initialize a `in_ndc` perspective camera with default matrix. cam = build_cameras( dict( type='PerspectiveCameras', in_ndc=True, image_size=(1000, 1000), )) # Then convert it to screen. This operation requires `image_size`. cam.to_screen_()
Camera Projection Matrixs¶
format of intrinsic matrix: fx, fy is focal_length, px, py is principal_point.
K = [ [fx, 0, px, 0], [0, fy, py, 0], [0, 0, 0, 1], [0, 0, 1, 0], ]
Detailed information refer to Pytorch3D.
format of intrinsic matrix:
K = [ [sx*r, 0, 0, tx*sx*r], [0, sy, 0, ty*sy], [0, 0, 1, 0], [0, 0, 0, 1], ]
WeakPerspectiveCamerasis orthographics indeed, mainly for SMPL(x) projection. Detailed information refer to mmhuman3d cameras. This can be converted from SMPL predicted camera parameter by:
from mmhuman3d.core.cameras import WeakPerspectiveCameras K = WeakPerspectiveCameras.convert_orig_cam_to_matrix(orig_cam)
The pred_cam is array/tensor of shape (frame, 4) consists of [scale_x, scale_y, transl_x, transl_y]. See in VIBE.
format of intrinsic matrix: K = [ [s1, 0, w1, 0], [0, s2, h1, 0], [0, 0, f1, f2], [0, 0, 1, 0], ]
s1, s2, w1, h1, f1, f2 are defined by FoV parameters (
zfar, etc.), detailed information refer to Pytorch3D.
format of intrinsic matrix:
K = [ [fx, 0, 0, px], [0, fy, 0, py], [0, 0, 1, 0], [0, 0, 0, 1], ]
Detailed information refer to Pytorch3D.
K = [ [scale_x, 0, 0, -mid_x], [0, scale_y, 0, -mix_y], [0, 0, -scale_z, -mid_z], [0, 0, 0, 1], ]
scale_x, scale_y, scale_z, mid_x, mid_y, mid_z are defined by FoV parameters(
zfar, etc.), related information refer to Pytorch3D.
Convert between different cameras:
We name intrinsic matrix as
K, rotation matrix as
Rand translation matrix as
T. Different camera conventions have different axis directions, and some use left matrix multiplication and some use right matrix multiplication. Intrinsic and extrinsic matrix should be of the same multiplication convention, but some conventions like
Pytorch3Duses right matrix multiplication in computation procedure but passes left matrix multiplication
Kwhen initializing the cameras(mainly for better understanding). Conversion between
NDC(normalized device coordinate) and
screenalso influence the intrinsic matrix, this is independent of camera conventions but should also be included. If you want to use an existing convention, choose in
['opengl', 'opencv', 'pytorch3d', 'pyrender', 'open3d']. E.g., you want to convert your opencv calibrated camera to Pytorch3D NDC defined camera for rendering, you can do:
from mmhuman3d.core.conventions.cameras import convert_cameras import torch K = torch.eye(4, 4)[None] R = torch.eye(3, 3)[None] T = torch.zeros(10, 3) height, width = 1080, 1920 K, R, T = convert_cameras( K=K, R=R, T=T, in_ndc_src=False, in_ndc_dst=True, resolution_src=(height, width), convention_src='opencv', convention_dst='pytorch3d')
Input K could be None, or
tensorof shape (batch_size, 3, 3) or (batch_size, 4, 4). Input R could be None, or
tensorof shape (batch_size, 3, 3). Input T could be None, or
tensorof shape (batch_size, 3). If the original
None, it will remain
None. If the original
None, it will be set as identity matrix. If the original
None, it will be set as zeros matrix. Please refer to Pytorch3D for more information about cameras in
Define your new camera convention:
If want to use a new convention, define your convention in CAMERA_CONVENTION_FACTORY by the order of right to, up to, and off screen. E.g., the first one is pyrender and its convention should be ‘+x+y+z’. ‘+’ could be ignored. The second one is opencv and its convention should be ‘+x-y-z’. The third one is Pytorch3D and its convention should be ‘-xyz’.
OpenGL(PyRender) OpenCV Pytorch3D y z y | / | | / | |_______x /________x x________ | / | / / | / z / y | z /
Some Conversion Functions¶
Convert functions are also defined in conventions.cameras.
NDC & screen:
from mmhuman3d.core.conventions.cameras import (convert_ndc_to_screen, convert_screen_to_ndc) K = convert_ndc_to_screen(K, resolution=(1080, 1920), is_perspective=True) K = convert_screen_to_ndc(K, resolution=(1080, 1920), is_perspective=True)
3x3 & 4x4 intrinsic matrix
from mmhuman3d.core.conventions.cameras import (convert_K_3x3_to_4x4, convert_K_4x4_to_3x3) K = convert_K_3x3_to_4x4(K, is_perspective=True) K = convert_K_4x4_to_3x3(K, is_perspective=True)
world & view:
Convert between world & view coordinates.
from mmhuman3d.core.conventions.cameras import convert_world_view R, T = convert_world_view(R, T)
weakperspective & perspective:
Convert between weakperspective & perspective. zmean is needed. WeakperspectiveCameras is in_ndc, so you should pass resolution if perspective not in ndc.
from mmhuman3d.core.conventions.cameras import ( convert_perspective_to_weakperspective, convert_weakperspective_to_perspective) K = convert_perspective_to_weakperspective( K, zmean, in_ndc=False, resolution, convention='opencv') K = convert_weakperspective_to_perspective( K, zmean, in_ndc=False, resolution, convention='pytorch3d')
Some Compute Functions¶
Project 3D coordinates to screen:
points_xydepth = cameras.transform_points_screen(points) points_xy = points_xydepth[..., :2]
Compute depth of points:
You can simply convert points to the view coordinates and get the z value as depth. Example could be found in DepthRenderer.
points_depth = cameras.compute_depth_of_points(points)
Compute normal of meshes:
Pytorch3Dto compute normal of meshes. Example could be found in NormalRenderer.
normals = cameras.compute_normal_of_meshes(meshes)
Get camera plane normal:
Get the normalized normal tensor which points out of the camera plane from camera center.
normals = cameras.get_camera_plane_normals()