AND NOW… DIGITAL HUMANS FOR THE REST OF US
By IAN FAILES
Photoreal digital humans were once – and perhaps still are – one of the hardest things to achieve in visual effects and animation. Today, however, there are now a swathe of options that allow both novice and experienced artists to ‘jump right in’ and craft digital humans for projects, ranging from incredibly photorealistic models to more stylized but still believable versions.
Today there are a number of popular and accessible methods for crafting digital ‘avatars,’ ranging from somewhat traditional 3D digital content creation tools, to new real-time approaches, and also ones that take advantage of machine learning and artificial intelligence techniques.
THE CREATION OF METAHUMAN CREATOR
The cloud-based MetaHuman Creator app from Epic Games has quickly become a go-to for building digital humans. That’s partly wrapped up in the explosion of interest in and abilities of real-time tools, in this case, Epic’s Unreal Engine, and partly because of the speed at which users can generate a realistic human with the app. It’s also free.
“We focused on democratizing the creation of high-fidelity digital humans, and the goal was to take a task that only a handful of domain experts in the world can do in months and simplify it so that anyone could create a believable human in minutes,” outlines Chris Evans, Technical Animation Director, Advanced Character Group at Epic Games. “Key to this was the vision of Vladimir Mastilovic and the unique tools and expertise of his company, 3Lateral. We had worked with Vlad for many years and decided to invite him and his team to join the Epic family with this goal in mind.
“It was a real challenge to develop a system that feels simple and intuitive, yet drives a back end that’s rebuilding a very complex rig and has all the details required to use the character in a game or interactive experience,” adds Evans. “It was especially challenging to create a user interface for the generation of believable human skin. Our skin shader is one of the most complex shaders we have made at Epic, and distilling it down to a few intuitive sliders was a challenge. We had to research a lot of techniques involving machine learning and pragmatic uses of principal component analysis (PCA) and frequency separation to get us there, but we’re happy with the results.”
“We focused on democratizing the creation of high-fidelity digital humans, and the goal was to take a task that only a handful of domain experts in the world can do in months and simplify it so that anyone could create a believable human in minutes.”
—Chris Evans, Technical Animation Director, Advanced Character Group, Epic Games
To make MetaHuman as accessible as possible, Evans said Epic initially concentrated on the sculpting tools. However, he relates, “even with very intuitive sculpting tools, you still needed to know facial anatomy and construction to use them successfully. For instance, if someone wanted to create a character of Asian descent, they would need to know enough to pull down the eye cover fold, creating a monolid as seen in some Asian facial morphologies. So we added the ability to blend in specific traits of faces from a pool of existing characters. This allows you to browse existing presets and then toss them into a blendspace, and just use the nose from one, to the eye from another – it’s a lot more user friendly.”
A DIGITAL HUMAN WORKFLOW
Another methodology artists have been adopting to quickly create believable digital humans is via Reallusion’s Character Creator (build) and iClone (animation) pipeline. The software, which also incorporates a number of real-time features, contains base models that are highly morphable and fully rigged.
“Digital humans made with Character Creator are able to take on the look of any person or image from the Reallusion Headshot tool feature that allows users to import a photo and generate a head model based on the person in the photo,” advises Reallusion Vice President of Marketing John Martin. “James Gunn’s The Suicide Squad used Character Creator and iClone extensively to develop and animate every hero featured in the film for 16 scenes of previs, for example.”
There are tools for crafting specific facial details, too, such as SkinGen, Hair and Beard apps, with SkinGen enabling aspects such as micro-details for pores, blemishes and wrinkles. Once a character is designed, it can be sent to iClone for animation or exported as FBX or USD to many platforms including NVIDIA’s Omniverse and Unreal Engine. A raft of tools, including real-time ones, that capitalize on new approaches to lipsync, facial puppeteering, motion capture and keyframing can give the digital human movement.
Generating Humans with Human Generator
Crafting digital humans can be a laborious process, so it helps if there’s a way to kick-start the build. That’s the aim of the Human Generator add-on for the open-source 3D software Blender made by Alexander Lashko and Oliver J. Post.
Lashko observes that Human Generator was made based on “the growing need for human character creation. Whether it’s for film, games, architectural visualization, video content creation or simply digital artwork, human characters play a major role in the design process. Likewise with the growing potential of Blender software and its community, we thought having a native tool would help creators with their projects.”
The fully rigged digital human characters available in the add-on can be customized to different genders, ethnicities, body shapes, hair type and age. Clothing presented an interesting challenge for Lashko and Post, owing to interpenetrations. “One solution we came up with,” details Lashko, “was to hide the parts of the body that are covered by the clothes. This also saves on performance, since the rendering engine does not require to calculate the hidden parts.”
Like many of the apps discussed here, the idea behind Reallusion’s tools is to enable the complex creation of digital humans to be done without the user having to necessarily understand all that complexity. But that doesn’t mean it’s not complex under the hood.
“Every element of development for digital humans has a hurdle to cross for appearance and movement,” says Martin. “The skin appearance involved an extensive effort to provide users with the balance of ease of application and deeply detailed capabilities. We developed a full, multi-layered system for addressing the skin appearance and how users could edit and customize their looks.
“Wherever we see challenges in the workflow it’s our mission to solve them, especially when you get to the really difficult things like animation. The more we can democratize that, the more we’re going to give capability to storytellers, designers and developers to extend their vision with more powerful, yet accessible tools.”
—Richard Kerris, Vice President, Omniverse Platform Development, NVIDIA
“Starting from basic skin details and micro normals,” continues Martin, “we worked with TexturingXYZ, an industry leader in skin assets to help overcome some of the compelling challenges of micro details for our layered design. SkinGen became the layered appearance editor developed for users to have access to every element of the skin, resulting in a streamlined tool to control the digital human’s skin from head to toe and from human skin to creature skin, glamorous makeup looks to ghastly wounds.”
PERSONALIZING DIGITAL HUMANS
Another major development in the ‘democratization’ of digital humans is the ability to almost instantly create lifelike 3D likenesses of real people from single photographs. This is an area in which Pinscreen operates, having recently developed several new technologies and solutions for making the creation of photorealistic avatars accessible to consumers, as well as applications that make full use of deployable and personalized virtual humans.
“Our latest photo-based avatar digitization technology consists of a highly robust algorithm to generate normalized 3D avatars from highly unconstrained input photos,” explains Pinscreen CEO and Co-founder Hao Li. “This means that you can take input photos of yourself in very challenging lighting conditions. You can be smiling or side facing and the algorithm can still produce a consistent lighting normalized 3D avatar head with neutral expressions. This is extremely important for rendering fully parametric CG characters in any virtual environments and for performance-driven animation. Our method is based on a variant of StyleGAN2 which can produce 3D textured meshes of a head and a differentiable rendering framework that uses perceptual loss for refinement.”
“Our latest photo-based avatar digitization technology consists of a highly robust algorithm to generate normalized 3D avatars from highly unconstrained input photos. This means that you can take input photos of yourself in very challenging lighting conditions. You can be smiling or side facing and the algorithm can still produce a consistent-lighting normalized 3D avatar head with neutral expressions. This is extremely important for rendering fully parametric CG characters in any virtual environments and for performance-driven animation.”
—Hao Li, CEO/Co-founder, Pinscreen
To demonstrate their solution, Pinscreen has been building an immersive chat application called PinScape. It was demonstrated at SIGGRAPH 2021’s Real-Time Live! “The idea,” says Li, “is to build a VR-based communication system that can go beyond traditional 2D video conferencing. At the beginning, a user would take a selfie using a webcam, and the system automatically creates a complete full-body avatar from that input photo. The avatar consists of a fully rigged face and body, and also digitizes hair using an upgraded technology of what we presented a few years ago.”
Pinscreen’s approach to generating a believable human avatar from a single input image isn’t without its challenges. They have to overcome what can be sub-optimal lighting and shadows, although of course the idea is to avoid the need for any kind of controlled studio lighting scenario to produce the avatar. “Another challenge,” mentions Li, “is that people often smile in photographs or may not be front facing the camera. To this end, we developed an algorithm that allows us to extract a consistent likeness of a person regardless of the lighting, facial expression, and pose from a photo, and then generate a 3D model of a normalized avatar.”
MAKING YOUR DIGITAL HUMAN EMOTE
Animating your digital human creation is of course just as important as building it. There are myriad options out there for facial capture and motion capture to do this, while another method has been to take an audio track and, using A.I., generate matching expressive facial animation from just that audio source. NVIDIA’s Omniverse Audio2Face app enables this, and even has a pre-loaded 3D character called ‘Digital Mark’ for users to get started with.
The intention behind the app was all about accessibility, explains Simon Yuen, Director, Graphics AI at NVIDIA. “Right now, if you want to create a 3D character, you need to be a domain expert to do it. Audio2Face is designed with a very specific purpose in mind, to help simplify voice-based facial animation while improving the quality of automated solutions of today. The method leverages deep learning and runs in real-time. It supports realistic or stylized characters and neither rigging nor AI training is required to use it. It soon will support a broader range of motion and emotion for the full face. And it’s designed to complement and work with other existing tools and workflows.”
“Wherever we see challenges in the workflow it’s our mission to solve them,” adds Richard Kerris, Vice President, Omniverse Platform Development at NVIDIA, “especially when you get to the really difficult things like animation. The more we can democratize that, the more we’re going to give capability to storytellers, designers and developers to extend their vision with more powerful, yet accessible tools.”
Audio2Face continues to be updated and now with blendshape support for instance. Users are able to re-target to any 3D human or human-looking face, and even more alien or animal-esque faces. “We’ve tested from rhinos to aliens to other things,” notes Yuen, “and we do plan to have more pre-trained data available with Audio2Face and better support of a larger variety of voices and languages.”
WHEN A VERY DIFFERENT KIND OF DIGITAL HUMAN IS REQUIRED
The face replacements in David France’s Welcome to Chechnya (2020) documentary caught many people’s attention, not only for the delicate way they protected the identities of those featured in the film who risked suffering persecution from the Russian republic of Chechnya, but also the way they were achieved with A.I. and machine learning techniques. In the end, what ultimately were produced were ‘new’ digital humans.
“It was especially challenging to create a user interface for the generation of believable human skin. Our skin shader is one of the most complex shaders we have made at Epic, and distilling it down to a few intuitive sliders was a challenge. We had to research a lot of techniques involving machine learning and pragmatic uses of principal component analysis (PCA) and frequency separation to get us there, but we’re happy with the results.”
—Chris Evans, Technical Animation Director, Advanced Character Group, Epic Games
Visual Effects Supervisor Ryan Laney oversaw that work, taking new performers as face doubles and masking them over the original subjects, aided by machine learning and traditional compositing. The approach taken fit in with the film’s limited budget and also meant expensive and time-consuming, completely-CG photoreal digital avatars were not necessary. Furthermore, it wasn’t quite the same process as the current deep fakes phenomenon – itself now also a widely accessible toolset – although it certainly shares some technology.
An interesting aspect of the final shots was some noticeable softness or blur in the faces, which in fact was deliberately retained in the digital humans. “We leveraged the visual language of that softness to help us maintain journalistic integrity,” notes Laney. “There was a moment halfway through the production where I wasn’t sure if what was on the screen was the double or the original. So, it really did help us to have those visual indicators, but journalistic integrity was why keeping the soft look stayed in.”
Laney has come to refer to his team’s digital human creation technique, which he is utilizing in other projects, as ‘digital veils.’ “We’re also calling the process ‘automated face replacement,’” he says. “I feel like it fits in with the analogy of automated dialogue replacement or ADR. We capture the data and we do this training session to marry the faces, and then there’s still work involved to lay it in. That automated face replacement is the process that produces digital veils.”