Volumetrically Capturing Authentic Digital Actors, with Metastage’s Christina Heller
Alan: I have a really special guest today; Christina Heller, the CEO of Metastage. Metastage is an XR studio that puts real performances into AR and VR through volumetric capture. Metastage is the first US partner for the Microsoft Mixed Reality Capture Software, and their soundstage is located in Culver City, California. Prior to Metastage, Christina co-founded and led VR Playhouse. So, between Metastage and VR Playhouse, she’s helped produce over 80 immersive experiences. To learn more about Christina Heller and Metastage, you can visit metastage.com.
Welcome to the show, Christina.
Christina: Thank you so much for having me.
Alan: It’s my absolute pleasure. We met, maybe three years ago? At VRTO?
Christina: Yes, that’s correct.
Alan: Yeah, we got to try your incredible experiences, mostly in the field of 360 video. And you’ve kind of taken the leap to the next level of this stuff. So, talk to us about Metastage.
Christina: Sure. As you said, it’s a company that specializes in volumetric capture. I think, in the future, you’ll see other things, but at the moment, we specialize in volumetric capture. Specifically, using the Microsoft Mixed Reality Capture system, which is an incredibly sophisticated way of taking real people and authentic performances, and then bringing them into full AR and VR experiences, where you can move around these characters, and it’s as if they are doing that action right in front of you.
Alan: Let’s just go back a little bit. What is volumetric capture, for those who have no idea what volumetric capture is?
Christina: Sure. For a long time, if you wanted to put real people into AR/VR experiences, you had basically two ways of doing it. You could either animate it; so, you would try to create — using mo-cap and animation — the most lifelike creation of a human character possible. Think, like, video games; when you go play a video game and they’ve got a character playing a scene out with you. If you wanted to put real people into these XR experiences, that was the most common way to do it.
Then there was also volumetric capture, which, for a long time, just wasn’t quite — I would say – at the technological sophistication that people wanted, to integrate it into projects. Volumetric capture — thanks to the Microsoft system, I think — is finally really ready to be used in a major way in all these projects. And basically what it does is, we use 106 video cameras, and we film a performance from every possible angle. So, we’re getting a ton of data. We use 53 RGB cameras and 53 infrared cameras. The infrared is what we use to calculate the depth and the volume of the person that’s performing at the center of the stage. The RGB cameras are what’s capturing all the texture and visual data.
Then, we put that through the Microsoft software, and on the other end of it you get a fully-3D asset that really maintains the integrity and fidelity of the performance that was captured on the stage. Unlike some of the animated assets — because this was kind of the challenge — the animated assets, they might get kind of there, but they had that uncanny valley thing going.
Alan: Yeah, those are creepy.
Christina: Yeah. And so if you’re not familiar with the term “uncanny valley,” basically with people and animals – or like, dynamic, organic, moving objects — if you get it kind of close, but not fully there in terms of it looking lifelike, you have this inherent rejection of it. You just… there’s this distaste, like “ew.” It’s called “the uncanny valley.” It’s close, but it’s not really there.
So, volumetric capture — and specifically the captures we’re doing in Metastage — don’t have that uncanny valley going on. When you look at them, they look like real people. They maintain all of the nuances and the micro-expressions and the subtleties of the person that performed on the stage. So you can bring these fully authentic, fully real captures into your AR and VR experiences. It just kind of brings the whole thing to life. So, that’s how I would describe it. It’s volumetric video. It’s a video asset, but it’s fully-3D, and you can easily integrate it with six degrees of freedom into AR and VR experiences.
Alan: I’m going to break it down even simpler: this means you can now step into the movie, and participate in the movie as it’s going on around you.
Christina: Yeah, correct.
Alan: So exciting.
Christina: One of the things that I’ve been saying is — part of the reason I’m really passionate about volumetric capture as a tool inside of the greater medium is — it’s the real person’s seat at the table. As we move more into these virtual worlds, it’s important that real people are represented in them. And when you watch something that was captured volumetrically, you know that it was captured live. It wasn’t reanimated. It wasn’t puppeted. This is something that you can watch with the same awe that you would a live performance happening right in front of you. And I think that that’s really important, that authenticity.
Alan: I really think we’ve come a long way with computer and CGI and being able to animate things, but there really is no substitute for a real actor or actress.
Christina: Absolutely. There is something about humanity — and it’s part of the reason that animators are struggling to get it as lifelike as possible — there’s just something about the way people move, the way that they speak. There’s just these little nuances that are impossible to fake. It’s part of what makes watching something that was captured volumetrically — or something in a film or a TV show — part of what makes it so satisfying, is to capture all those little quirks, and those little things that make people, people.
Alan: I think in a world where AIs and robotics are going to replace a lot of our jobs, it’s nice to know that we’ll still have some. [laughs]
Christina: Yeah. And I like connecting with real people, and I like seeing real performers. And when we’re talking about celebrities or public figures, or the CEO at your company; I don’t want to watch an animated CEO give a board presentation. I want to see the real guy. When it comes to anybody in our society that’s “the real deal,” volumetric really is the only real way to capture them for these experiences.
Alan: You touched on something that is really, I think, interesting. You mentioned CEO presentations, or investor presentations. We could talk about the entertainment aspects of this — and we met up at the New York Volumetric Filmmakers event and you spoke at that event, and I was blown away by the stuff you guys are doing — but a lot of it is creative and arts and entertainment, which are businesses as well. But are there companies that are leveraging this technology now, to broadcast their CEO or whatever? How are companies using this technology now?
Christina: We’ve done a number of B2B captures in Metastage, and I’m excited to do more of them. It’s exciting to see other industries outside of the entertainment field getting involved with XR, and starting to see how they can use this tool to not only improve workflow and make money, but also just to dazzle. That is an exciting opportunity right now, to just be really ahead of the curve and do something that nobody’s done. You still have that opportunity right now with augmented and virtual reality.
Alan: You just mentioned something: how to “dazzle” using this technology. I think we always hammer down on “what’s the ROI? What’s the ROI?” There is an intangible ROI, in the dazzle.
Christina: Absolutely, yeah. And you got to make sure you partner with the right people, because you’re not going to dazzle unless you’ve got the right production team to do it. But if done right, I mean, that’s one of the most fun parts of my job, is getting to watch people’s eyes light up when they first see a person appear right in front of them — almost like a human hologram — using volumetric capture. So, yes, we have done a few different business applications, and one of them was specifically for that use case that I just described.
We had an executive come in and do a board presentation for his CEO on the Hololens. Basically, he came in and he captured the beginning and end of his presentation, volumetrically. He went out on the stage, we used 106 cameras, we had a teleprompter. And basically, the presentation was about the company’s plans to use technology to build their future, and how technology was going to affect the future of their company.
By the way, I’m going to use the term hologram to describe it, because I think that’s an easy way to wrap your head around what you’re looking at. There’s questions of whether that’s the correct term or not, but we’ll just call it a human hologram, when it’s integrated into an augmented reality experience. So he was using the Microsoft Hololens, which is some glasses that you wear on your head, and it allows you to place digital objects into the real world.
So we captured him holographically, and using the teleprompter, he gave this intro to the CEO. Then it went into a data visualization sequence. Hologram disappears. Now he’s showing — using data in a three dimensional space — some of the ways the technology is going to transform and affect their business. And this, by the way, is a huge, huge, huge company. I’m not sure whether I’m allowed to use it as a case study publicly, so I’m being a little discreet, but a huge company. Anyway, he gives the intro. There’s awesome data visualization showing how technology’s going to transform their business. And then it ends with him coming back and kind of wrapping it up. He said that, in the 35 years he’s been working at the company, that this was the first time he’s ever seen the CEO smile. So that was kind of a nice thing.
Alan: Wow. That’s incredible.
Christina: And it’s also now preserved forever. He only had to do the presentation once. He can now show it to anybody, anywhere. It’s this evergreen piece of content that will live on. Incidentally, the executive that we captured — who had been at the company for 35 years — is leaving this fall. And so, in some ways, this presentation was his legacy; talking about his dreams for how he wants the company to progress when he’s gone. That was really cool.
And then also one thing we’ll do at Metastage — which I think is always cool — is while he’s in there — you know, this is a guy that has a family and some kids — I said, “you know, well, while you’re here, why don’t we record something for your family, too?”
Alan: Awww, I love that.
Christina: Yeah. You always feel the energy shift when that moment happens. He got up there and said, “it’s February 12th, 2019. And I want to tell my family this, this, this, and this, and tell them how much I love them and how proud I am.” And it was just this moment where he realized — now, he’s not going anywhere; he’s not like a super old guy — but he realized that this little piece of content might actually live on, and be something that his family could cherish later. And so that was also a nice moment. And that’s just one example. I’ve got more.
Alan: The preservation of people, places, and things using volumetric is beautiful. I know a mutual friend of ours — Simon Che de Boer — he’s running around the world capturing places; he does photogrammetry of places. If you take the photogrammetry that he’s creating of these real places around the world, and you take the videogrammetry that you guys are doing and put people into those places? The possibilities are literally endless.
Christina: Definitely. And as a former documentary filmmaker and journalist, making sure that real life is preserved and a part of this new virtual landscape — I think — is an important mission.
Alan: I agree. There’s places in the world where we still have unrest, and cultural landmarks are being destroyed. We have to — have to, have to — at least get them as a digital [simulacrum] — you know, obviously, if we can’t protect them physically — if we can get a digital version of them… the fire in Paris is a great example of that. Notre Dame Cathedral. They have rudimentary LiDAR scans of the building; they’re not perfect, but they can recreate the building digitally, and then use those three-dimensional drawings to recreate the actual building.
Alan: It will never be the same, but at least they can get close.
Christina: Yeah. And with volumetric capture, you don’t have to be as concerned that… first of all, it’s really, really easy process. You don’t have to put on a mo-cap suit or points, and go out and make a bunch of different facial expressions. Super high-res face scanning, with the purpose of being animated later, is a really, really intense process.
Alan: Oh, my goodness. People don’t realize. It’s a full day, just to be able to say “hello.”
Christina: Yeah, it’s a full day, and it’s really intense on the performers. Volumetric capture is super easy on the talent. You just go out on the stage, action, cut, and you’re done. Off you go.
On our end, we like to take a little more care than that. We’ll do some tests to make sure your hair looks right, or your clothes look right, and all of that. We’ve done celebrities at Metastage that we’ve had a very, very limited amount of time with. But as long as we can give them a once-over to make sure they’re volumetrically friendly, they can go out and be off in no time at all. And you have the added comfort of knowing that this isn’t going to be something that is puppeted and rigged to say things that you didn’t want to say. This is really you. It’s capturing you and making sure that you are coming across in the way that you actually want to in your real life. It’s a preservation technique. It’s a performance tool, but it’s nothing to be nervous about. That’s one of the things that I want to make sure gets across, because I can imagine an actor, for instance, getting nervous about the increased digitization of actors.
Alan: “What are you going to do with my avatar?”
Alan: It’s interesting, because some friends of mine, they own a company and they do photorealistic avatar creation. And they have a side business producing adult content avatars.
Christina: Right. You don’t even have to go much further than that to understand how that could cause some pause for an actor that takes pride in the work they do and how they do it. Volumetric capture really is video, but it’s a fully three-dimensional video. That’s a key differentiator.
Alan: So what are some other use cases that you’re seeing pop up for this type of technology?
Christina: Well, one of the great projects we did this past fall — and I can talk about it — is we did something with the CEO and president of the Royal Caribbean Cruise Line. They’re giving you a virtual tour of the ship. And so, this has some great and broad applications for a lot of businesses that, maybe, do a lot of onsite tours. One of the great things that VR and AR can give you is access. Access to places you can’t normally go. Access to people you couldn’t normally engage with.
So, there’s two ways of doing the virtual tours. But the way that the Royal Caribbean did it was the CEO and president give you a tour of the ship using the Royal Caribbean Celebrity Cruise app. You can basically make their holograms appear in different rooms of the ship, and they tell you about the design, the features, how the ship was built, and why it was built the way it was built. That is an app that anybody can access. If you type in “Celebrity Cruise app,” you can access the CEO’s intro when you’re not on the ship, and the rest of the holograms can only be accessed on the ship. So, it’s kind of this cool site-specific augmented reality experience.
Alan: That’s really cool. So, you have to be on the ship to actually experience the full thing?
Christina: Exactly. You can see the intro — which is really cool, and you can get an idea for it. Richard pops up, and he’s holding a model of the Celebrity Cruise ship in his hands. And he says, “we built this ship using 3D technology — the most advanced 3D technology. And so we thought it was only appropriate to use the most advanced 3D capture technology to explain to you why we built the ship and some of the features.”
Christina: At that point, he puts the ship down, and you can explore a little bit of it on the app. But the rest of it, you have to be on the ship to experience.
Alan: Is this an AR app for your phone?
Christina: Yes, it’s an AR app for your phone. Another fun selling point of the Microsoft system and Volucap system that we use at Metastage is that, the assets are really, really beautiful, and also super small file sizes, so you can actually activate them using a mobile device. So yes, using your phone, you open the app and like magic, he appears right in front of you, and you can walk around him as if he’s standing right there. He integrates fully into the scene.
Christina: Yeah, totally. So, that’s great because it gives people access to Richard and Lisa, who would never get access to them normally. And from the CEO side of things, it allows them to reach their customers in this really intimate and friendly way, without actually having to go out and shake everybody’s hand. For businesses that maybe do a lot of tours on site, but would like to give access to more people without actually having to take the time, energy, and resources to give them the physical tour, you can do a capture of the facility or the warehouse or wherever it is, and then integrate your CEO or star employee or whatever into that environment. And you can give somebody a realistic and authentic virtual tour of that place without having to — like I said — dedicate the time and resources of actually showing them in person.
Alan: You know, this comes back to something that pops up on every single episode of the show; training. Immediately, when you said that you could give people a tour. Imagine: for new employees, working on a cruise ship must be a daunting experience. The training just to train people where things are on the ship, it’s got to be incredible to do that. And one thing that I think Metastage and volumetric capture will really drive home is the fact that some people are really, really great at training, and some people are not so great. Maybe they’re great at creating the content, but not presenting it. Now you can have the best person train every single employee.
Christina: Exactly. You can get your star employee to walk them through the process, show them physically how to do it – which, depending on what field you’re in, being able to show somebody with one’s body how to do something can be crucially important. And you’ve immortalized them. That employee may move on to another position, but you’ll always have that spirit and that knowledge captured for years to come.
Alan: Employee on-boarding, training; but I also love this idea of being able to download the app, see it, it says, “well, the rest of it is on the ship.” I think there’s so much that can be done with this type of capture. We’re only really just scratching the surface.
Now, Metastage is not the only volumetric capture system out there, correct?
Christina: Correct, yeah.
Alan: There’s 8i. There’s what, Intel Studios? Maybe some other ones that we don’t know about. But I think there’s more companies realizing the potential of this. What sets Metastage apart — in my opinion, it’s obvious, because I’ve seen the results — your partnership with Microsoft really sets it apart. I’ll let you speak to that.
Christina: Sure. A lot of volumetric capture stages, from just the outside appearance, will look similar. You’ll go in and you’ll see a bunch of cameras, all facing inward at a stage, so it can look on the surface like they’re all the same. But the truth is, the real magic is in the software. What the Microsoft Mixed Reality software does better than any other volumetric system on the market is not only create really clean, high-fidelity captures that look great in the body, in the face, and don’t have a lot of artifacting, and look good from every angle that you happen to be viewing the captured asset from. We are also able to compress those captures to super tiny file sizes, which — if you’ve ever tried to make a project for virtual or augmented reality — you know how important that is.
If you’re making a training app for your employees, or you’re doing something like the Royal Caribbean cruise line, you can’t have a 10-gigabyte application when this whole thing is done. It needs to be something small that isn’t going to take up a ton of room on their phone and that’s easy to distribute. There’s other things, but that alone is the key. Ask any other volumetric caption stage how big their final file size is. Microsoft has just really got it down to some really workable file sizes. We know that you’re doing it — let’s say for the HTC Vive or the Oculus Rift, or you’re trying to do it for a mobile phone device — we can export at different settings to optimize for mobile AR. Or if we can push it a little more for VR, then we can add a little bit more quality. But regardless, like the Royal Caribbean cruise app, when you download it and you see Richard, he looks really fantastic. So we can get our files sizes down to… for a minute of volumetric capture? At 50 megabytes. That’s 5-0 megabytes.
Alan: Holy crap.
Alan: 50 megabytes.
Christina: For a full minute of capture. And then it scales up, depending on our export settings and what your final platform is. That’s pretty incredible.
Alan: It is really amazing. I think what people don’t realize is that when you’re pushing out this type of 3D content for a phone, for example, it doesn’t actually have to be as high-res as you would think, because you’re already looking through a high-res screen at another image in 3D space. It has to look clear, but people are like, “oh yeah, I need 4K video in AR.” Whoa, wait a second. You’re looking at a screen within a screen. The maximum of it is really only going to be 720p. So there’s some little tricks that people don’t realize.
Christina: Right. And I show people the Royal Caribbean app all the time, and they think that looks better than any volumetric capture they’ve ever seen. And that’s at our smallest export setting.
Long story short: quality at low file sizes is the key difference between Metastage and other capture facilities. But there’s also some added benefits. For instance, there’s the Microsoft toolset that we deliver along with our captures, which includes gaze retargeting. One of the things is, when you’re watching biometric capture — because it’s an authentic performance of what happened — if I’m watching it, and I step to the right or the left, it might look like the person is looking past me. So gaze retargeting will allow the head to subtly follow the viewer. If you’re trying to make it look like the capture is looking at the person watching it, there can be a subtle gaze retargeting where the head just sort of follows the viewer. That’s a standard tool that we deliver with our captures to the client, along with some relighting tools for game engine. Almost like… I want to call them “Instagram filters” for the capture, that allow you to give a little bit of a dramatic look, or sunset lighting. Those all go along with it as well.
Beyond that, we offer full end-to-end project integration. When we first opened, we were offering just holo capture. But it’s become clear that for some of our clients, they don’t necessarily have access to game engine developers or environment creation. So if you come to Metastage and you’re interested in doing this kind of project, we can produce the project from conception to completion for you. Or, if you’re a production company or agency that already has access to those professionals, then we can simply just do the holo capture and deliver that to you.
Alan: That’s incredible. So, I know there’s a lot of people listening that are probably thinking, “oh man, I want to use this. I want to jump in.” Let’s talk about price. I don’t know if this is something that’s really expensive. What does a minute of footage cost to develop? What does the process involve? If somebody wants to dive right in and say, “yeah, I want to host a two-minute video in AR for my shareholders,” what would something like that [cost]?
Christina: Well, the prices start around $15,000, and then it goes up from there, obviously. It is — I would say — very comparable to what commercial video production rates are, if that’s something you’re familiar with. And it’s much, much cheaper than mo-cap and animation.
Alan: To put things in perspective, to create a photorealistic digital avatar — rig and everything — you’re looking at, what, $10,000 a second?
Christina: Yeah. I mean, it depends on the team you work with. It depends on how detailed you want the face and head to look. But I would say that that’s the price it costs to get through the door and get something going. And then obviously, the more content you’re trying to capture, the prices go up from there.
Alan: I’m really excited. I know you invited me for a tour of the stage. I haven’t been to LA yet since we talked about it, but I definitely want to come down and check it out.
The other thing that I noticed about Metastage — and I watched the video that you showed — is that it’s designed like a studio. It’s designed to be in congruence with what actors and actresses and people doing professional video are used to.
Christina: Absolutely. That is important, and our production-savvy client-facing staff want you to have a seamless client experience, and a fun production day; that at the end of it, you say, “oh my gosh, that was really fun and easy. I’d like to do more of that.” And so, “easy and fun” has been kind of a mantra at Metastage since the beginning, and I think we’ve been successful with that. All of our clients have had a really good experience, and then on top of it, were surprised when integrating the assets that it was as plug-and-play as it is.
So yeah, if you’re interested at all, please don’t hesitate to shoot us a line. You can contact us on our website and we’ll do consultation with you. We’ll hold your hand. We’ll work to get your goals achieved using this, and hopefully something that will be great for your business for years to come.
Alan: Well, is there anything else that you want to share before we wrap this up? It’s been a great interview and I can’t wait to record a message. You know, you mentioned about the family, and I’ve been thinking of it ever since. I just want to record not only myself, but my children at their age now. Because as they grow up, you’re never going to get them at this age again. And it’s like capturing them in a time capsule.
Christina: Totally. And that’s one thing we have been talking about, having a different pricing model for something like that. I think that’s really important. I totally understand the desire to want to capture your family; almost like a family portrait in this really interesting, three-dimensional way, that you could then stand next to later and marvel at the changes. Long story short, I do think that at some point we will have a way for average people to come and capture their families, as well. We’re just figuring out exactly how that works with our professional soundstage.
The power of human presence, in virtual 3D space.