Arcturus Volumetric Video Editing – fxguide
HoloEdit is a non-linear editor for volumetric video from Arcturus and part of their HoloSuite of tools. It enables interactive edits, touch-ups, refinements, sequencing, and more. The company has produced a tool for editing and compression to reduce project time and complexity as well as supporting many common 3D volumetric file formats. HoloEdit can be used to author, edit, and stream volumetrically captured performers for VFX as well as immersive virtual and augmented reality projects.
The number of volumetric video capture studios grew 45% from January 2021 to December 2021. Capturing the data has become easier and more accessible as more studio and portable rigs have come online but editing of capture data remains a real concern for many VFX producers and editors.
The company has over 20 employees and last year raised over $5M to expand (Bitkraft Ventures led the round). The company was founded by a management team with product, art, and science backgrounds from Netflix, Autodesk, Pixar, Dreamworks, Google, YouTube, and Uber. Arcturus CEO Kamal Mistry said in a statement that the company aims to create digital human holograms that are captured from reality and customized so that they can interact with a viewer in real-time. For example, as a digital customer service agents, human avatars, virtual 3D concerts, and fashion runways, and visualize a professional athlete’s perspective in broadcast sports.
The company has been going since 2019 and updates its software every 3 to 6 months. Many of the customers are artists and users are in Music and 3D concerts, as well as VFX, but the company also has a strong user base in industrial and training. There is also now also a lot of movement in sports.
To do this the company needs to move beyond producing just point sampled or frame-independent photogrammetry meshes. Many people have seen capture volumes produce one-off images from someone in a capture volume made up of multiple DSLRs. The problem with this approach is that each frame is a unique solution and so continuous motion is near impossible to edit. Furthermore, textures or UV space can be vastly inconsistent, so relighting or modifying even a clothing item is extremely difficult beyond just keying the color of an item in a traditional 2D pipeline approach.
Devin Horsman, co-founder explains that HoloEdit aims to take advantage of all of the benefits of volumetric captures, such as realism and fast turnaround, but provide the editing and control of a traditional 3D pipeline. “We want to be able to offer all the CG capabilities that we are used to like animation, the ability to do relighting, have the capacity to change the material properties of something and also do mesh touch-ups easily, – which up until this point have been very difficult in volumetric video.”
HoloSuite is made to handle most of the different data sets produced currently by the various volumetric capture core technologies. It typically works with open formats such as Alembic, .OBJ, .PLY as well as many proprietary formats.
Temporal topology consistency
In the simplest capture volume outputs, the artist receives a completely different mesh for frame 1 than they do for frame 2 and frame 3, and so on. This makes an edit to the mesh and having that edit exists past that one frame challenging. HoloEdit solves this with its topology mesh stabilization phase. The software does this in a variety of ways from fully automated solutions to artist-directed solutions.
The state-of-the-art today is to get a clip segment with something between 10 to 100 frames with a single consistent topology. After that time, an additional edit would be needed for the next segment. The topology is solved so that there is still a full topology (full size) per frame, based on a master frame for that segment. Between segments, over segment boundaries, there can be an issue with discontinuities. If surface normals, for example, were completely re-calculated then any relighting would pop on segment changes, “so work is going into global directional texture surface maps instead of just simple mesh normals (- which would pop on crossing segment boundaries),” outlines Horsman.
For surface painterly edits HoloEdit provides special painterly tools that allow artists to cross segment boundaries with the same projected texture consistency, given that the UV mapping is completely different between adjacent video segments. At an operational level, if an artist paints out a t-shirt logo, at the start of a volumetric video, then at each segmentation divide, they can edit and reproject that paint-out edit into the new segment, adjusting the edit if the new UV space causes any issues.
The data is also compressed later in the pipeline to allow the HoloEdit to interface with the company’s streaming solution HoloStream, but during editing, the data is preserved, and it is not compressed or compromised. For example, for offline VFX editing, compression is not required, nor mandated by HoloEdit.
One valuable aspect of the HoloEdited is its ability to relight a volumetric capture. Most volumetric solutions capture an albedo, color, texture along with the mesh with normals per vertex. Typically, the mesh density is fairly high, 50,000 to 250,000 triangles per frame but even so the mesh solution is rarely high enough in detail and surface normals for complex relighting. Sharper edges become smoothed, and faces feel averaged and too simplified to be subtly relit. Also, all the mesh is one ‘object’ there is no sense of the person being built of separate objects with separate BRDFs. Nor is there any consistency over time.
HoloEdit starts by trying to identify or segment the scan into materials of the same type. For given points on the surface, the program performs ‘material segmentation’. It then tries to approximate the material properties of each. This stage is ‘material estimation’. Combined material segmentation and estimation allow the user to meaningfully edit and adjust the scanned data without having to rely on simplistic 2D filters or keying approaches. This is an active area of research for the company.
Using machine learning the company is also exploring ML super-resolution to provide ‘super normals’. this inferred solution provides a path for a more plausible surface normal relighting.
Not all these tools produce perfect solutions all the time, but the company sees the 3D community as interconnected and collaborative thus some of their work is produced with an eye to sitting in a pipeline. Sometimes, providing 90% of a solution is enough to allow other tools to polish or address some of the most challenging problems in this space. Arcturus wants to be a good partner and contributor to complex and innovative pipelines, especially given the complexity of the problems they are researching. Arcturus publishes HoloSuite to Maya and HoloSuite to Mari, for example, as plugins.
To allow for animation and manipulation, the HoloSuite tools build a rig for scanned actors. The process uses multiple camera angles to infer the correct rig pose. There has been significant work done in this area in inferring a rig from just a 2D image, but the tools in HoloSuite’s HoloEdit can take advantage of multiple cameras photographing simultaneously. Their solution looks for the correlation of the rigged bones from all those camera positions but also correlation over time. Without this, a ‘perfect’ solution solved for just one moment might actually lead to bones changing length over time, which would be highly undesirable. HoloEdit provides a spatial and temporal solution that seeks to do the best fit of all the data that provides a stable consistent platform for later animation.
The 16-bone rig can easily be exported to common formats. The process involves both advanced deep learning AI and more traditional statistical optimization algorithms. The rig does not currently have bones for the hands, which is in part due to the resolution of an actor’s fingers when filmed full body height in a normal capture volume rig. The visual capture resolution of a full-height camera enclosure or stage is normally not enough to provide high-quality and accurate vision for articulated fingers and hands. Arcturus is working on this problem and moving to a higher density 32-bone rig. “We have seen extremely high-quality/high-resolution capture setups, that are not yet on the open market, but right now we are trying to provide value to the majority of our customers,” comments Horsman. As resolution increases the company will match this with more complex rigging.
With most characters, there is a primary motion that would be addressed by the rig, and then there is a secondary motion, such as flowing hair or clothes. The secondary motion is captured as part of the underlying technology, and while this is currently not visible to users, “this will be worked into future more complex skeletons that we will release, adds Horsman. “And they will provide a basis for simulation and allow users to have physically plausible motion when animating characters.”
Any props such as handbags etc can be removed from a capture solution. Users can define a 3D region or go further and remove any data of a certain color range inside that region. “It is like a 3D green screen approach,” he explains pointing to the example of a standing prop such as a fake car door. It is unlikely that a capture volume would be able to capture a person and a car, but it could be key to have a stand-in for the car door, so the actor has something to reach for onset. This is also an excellent example of where the post-capture rig would be invaluable. It could be used to realign the actor’s hand to exactly match the real car door when the capture is combined with the virtual or digital 3D car.
Note: SkeletonPose2.1(image of male dancer) was provided by Volumetric Camera Systems (VCS) in Vancouver, BC. All other images were provided by Crescent, Inc. in Tokyo, Japan.