How Spatial Computing and Spatial Audio will revolutionize our consumption of digital media

Spatial Computing and Spatial Audio – the dreamteam

Content

Are the promises of “immersive technology” too tantalizing for you to ignore and leave uninvestigated? Join two industry pioneers, Ben Chon from Gaudio Lab and Martin Rieger, as they discuss how spatial audio is revolutionizing our approach to media. Read part one Get ahead of the curve with Apple's Vision Pro and spatial audio.

The recent arrival of Apple’s Vision Pro is bringing us closer than ever before to unlocking the true potential of defining Virtual Reality (VR) and Augmented Reality (AR) experiences using 360° immersive sound systems – which will totally change your perception of what’s possible when it comes to consuming digital content!

Discover all that 3D Audio has in store – buckle up because this journey could be a game-changer.

The Significance of the Plausibility

Ben: I absolutely agree with Martin’s idea in the previous post that true immersion involves careful spatial technology integration and the alignment of sound and visuals. Recent research on immersive audio emphasizes how crucial plausibility is for how listeners perceive the quality of immersive sound. Essentially, this means that the same immersive audio mix can give listeners a totally different experience depending on visual cues or other factors. For instance, imagine a string quartet mix in a 360 video – it sounds amazing when you’re using a headset and watching the video, but it doesn’t have the same impact without the visuals.

DEMO(Orchestra session live)

The Non-diegetic track is another important tool for sound engineers to make the audio experience of digital world more plausible. Most popular use case is adding reverberation as a non-diegetic signal that lets sound engineers get creative with how they design the sound scene to make the sound waves in virtual environment more believable. As another use case, I’ve also seen artists intentionally use non-diegetic vocals to create a sound scene where the listener feels like different sounds from non-vocal instruments are spinning around the listener while the vocal stays as a voice in the listener’s head.

Immersive Audio: Transitioning from 360 Videos to 2D Screens

When the excitement around virtual reality (VR) started to decline in 2018, we looked for other ways to use immersive audio beyond just 360 videos, especially for regular 2D screens desktop computers. Around that same way of time, there were experiments with playing binaural audio to create a more immersive sound experience. However, it didn’t work well because the binaural surround sound didn’t match what was happening in the accompanying video. This lack of synchronization made the experience less believable.

So, we decided to create a completely different binaural sound system for streaming services. Drawing on what we learned from working with 360 videos, we conceived BTRS (Being There Recreation System) – an innovative system that synchronizes spatial audio with the camera’s perspective. The system also provides a reverb controller to the directional audio filters make sure the audio matches what was seen on the screen, making it feel more believable. BTRS has been used in over 200 shows, including live streams, by platforms like Naver and Dingo. (Dingo, a Korean content creation studio, boasts a subscriber base exceeding 44 million.)

Spatial Technology in Earbuds for Augmented Reality

Starting in 2019, Gaudio began exploring the possibility of immersive audio in earbuds for augmented reality (AR). We believed that adding immersive audio could enhance the plausibility of the AR experience. The initial prototype, utilizing Bose Frames, was showcased to audio professionals at the AES (Audio Engineering Society) Convention in New York 2019, garnered interest from various tech giants like ByteDance, Tencent, etc.., finding admiration from entities like Blackbird Studio and Belmont University in Tennessee.

This was even before Apple’s AirPods Pro supported spatial audio – a term that began to gain more traction than “immersive audio” around that time. The response was unanimous; everyone I met agreed that this was the next big thing!

Enter Gaudio’s Spatial Audio for True Wireless Stereo (TWS)

When it comes to Gaudio’s spatial audio for True Wireless Stereo (TWS) devices, our primary focus is on achieving superior sound quality while optimizing implementation. This consideration is crucial because the processing power of TWS chips is significantly more constrained compared to systems like smartphones or head-mounted displays. Another critical aspect we prioritize is minimizing the “motion-to-sound” latency, a factor intimately linked with overall plausibility and naturalness of the AR experience.

As a result of our efforts, I can confidently state that Gaudio’s Spatial Audio implementation on TWS offers remarkable sound quality with an extremely low motion-to-sound latency of approximately 60 ms, a substantial improvement compared to existing solutions worldwide. I encourage you to explore our blogs for in-depth analyses for latency analysis and sound quality evaluation.

On a related note, Gaudio’s Spatial Audio was honored with CES Innovation Awards 2023, and multiple manufacturers have selected it for their new TWS product lineups.

Spatial Audio in Vision PRO

Martin: I already wrote a detailed article about the spatial computing device. Here are the key takeaways:

Apple’s Vision Pro camera captures spatial photos and videos and reproduces your memories in 3D with spatial audio.
The audio pods provide an immersive sound experience, with flexible straps that conform to the unique shape of the user’s head.
With the collaboration between Unity and Apple, developers can create 3D apps for Vision Pro that are natively run on Apple hardware with integrated spatial audio.
Vision Pro’s sensor array is taking personalized sound experiences to the next level by analyzing the features and materials of your physical environment using audio raytracing technology.

If you want to learn more about spatial audio compatible headphones or Apple Spatial Audio

Killer Content for the Apple Vision Pro Spatial Computer

The Apple Vision Pro is a fascinating device because Apple isn’t quite sure themselves how to fully utilize its capabilities. While using avatars for Facetime calls and watching movies on Disney+ is fun, it doesn’t necessarily showcase the device’s full potential. However, this uncertainty presents a unique opportunity for innovation in the world of immersive audio and video experiences.

I think there is potential for a renaissance of 360 or 180-degree videos, as well as for the creation of RealityKit applications that we can’t even imagine yet. With so much up in the air about what’s possible with the Vision Pro, one thing is certain: it’s an exciting time to support spatial audio, and an opportunity to do something truly innovative and groundbreaking in the realm of immersive content creation.

For me, the real thrill of using apple spatial audio technology lies in creating something that is not limited by traditional boundaries of stereo for instance. Whether it’s a virtual reality experience or a captivating augmented or mixed reality (so, extended reality) use case, spatial audio allows me to create something truly extraordinary. This is how to turn spatial audio into something valuable, not hype. It’s time to jump in and explore this exciting new frontier of sound.

The future of the spatial computing revolution

The new HMD, Apple Vision Pro, is a major leap toward Apple’s commitment to investing in spatial audio support and computing. Although it’s not explicitly called virtual or augmented reality, it is what it is for me, an immersive device combining AR and VR. But currently, I feel like the target audience is developers to come up with or use cases that make use of the spatial computing capabilities and Apple spatial audio.

I think it will take a few more years for this kind of technology to become a consumer product. It may be a product called Apple Vision Air or something like that. In the future, once a mass-market spatial computer is launched, there will already be many user interfaces and a wealth of applications available to use, from professional work applications to engaging entertainment experiences.

Apple has already laid the foundation for this kind of infrastructure, with a seamless integration of software and hardware that can be used across different platforms and mobile devices. It’s exciting to think about the possibilities that a spatial computer could bring, and how it could change the way we interact with the virtual environment around us. So I’m happy to have years of experience headstart in taking advantage of this iPhone moment.

Apple’s Vision Pro and the Realm of 360-Degree Videos

Ben: For me, Apple Vision Pro seems to align more closely with the concept of an AR device, as it places a significant emphasis on delivering a seamless integration of virtual objects with the user’s physical surroundings, encompassing the user’s environment body movements and interactions with individuals nearby. However, in all honesty, the extent to which the spatial computing revolution will impact our lives remains uncertain.

One compelling application that could thrive is 360/180 videos, as Martin mentioned earlier. Looking back to the VR boom in 2017, the major challenge was the cost of content creation. Providing a plausile experience in virtual reality required every object within a medium to be defined spatially in both audio and visual formats. These objects needed to be rendered in a way that interacted with the user’s position and orientation. This process of defining the virtual scene, packaging the sources and metadata, and rendering objects interactively incurred significant costs.

That’s why 360/180 videos gained immense popularity in VR applications. These formats provided a more cost-effective solution for capturing spatial surroundings using 360-degree cameras and ambisonic microphones. Moreover, sound engineers were able to employ spatial audio plugins like Gaudio Works instead of delving into Unity programming for post-production of 360 videos. This made the entire process more accessible and feasible for content creators back then and it will be very similar in the upcoming Vision Pro’s era.

Audio Technologies in Vision Pro

In terms of Vision Pro’s audio tech, several key points stand out:

Spatial Audio Capture:

Capturing Spatial Audio becomes more manageable when a device is equipped with a sufficient microphone array. This field has several technology companies with expertise, such as Schoeps and Zylia from microphone sound capturing backgrounds, as well as Gaudio and Nokia from spatial audio signal processing backgrounds.

Open Speaker Array Design:

Vision Pro has opted for an open speaker array as opposed to closed AirPod-like earbuds. This choice of having open ears enhances the naturalness of spatial audio compatible headphones through the physical characteristics of the ear canal, removing the occlusion effect. Another benefit is that the need for personalization, which simulates user-specific ear shape differences, becomes unnecessary since the outer ear resides in the sound propagation path. This decision enhances the naturalness of spatial audio in relation to spaciousness and timbre, resulting in a smoother blending of sound from the real world and the spatial audio works as delivered by Vision Pro.

Unity Collaboration and Spatial Audio Engine Integration:

Working together with Unity seems quite natural because Unity is the most popular platform for augmented reality applications. The decision to include Apple’s own spatial audio engine instead of a 3rd party solution in Vision Pro is to ensure Apple has control over the spatial audio work experience. I think the majority of post-production tools dedicated for 360 videos will implement Apple’s spatial audio engine for consistent sound reproduction over Vision Pro.

Estimation of Acoustic Room Characteristics:

In augmented reality scenarios, accurately determining the acoustic properties, typically concluding to the Room Impulse Response (RIR), of the physical room the user is in and recreating them during the binaural rendering process is crucial for plausibility. Apple opted for ray-tracing technology, leveraging existing visual information. As the ray tracing has been considered a very complex method for real-time applications, I am very curious how Apple approximates and optimizes the technology. At the same time, there are many other technologies estimating Room Impulse Response. One example is Gaudio’s recent AI-based research under the title “Room Impulse Response Estimation in a Multiple Source Environment” at the AES 2023 International Conference on Spatial and Immersive Audio. With the arrival of Vision Pro, I’m hopeful that the industry will readily embrace and implement the valuable research from the audio community to elevate the overall user experience.

Outro Spatial Computing meets Spatial Audio:

In conclusion, with the introduction of Apple’s Vision Pro, the threshold of spatial computing beckons ever closer. It is worth noting that this type of technology could revolutionize the way people interact with their real life and the virtual world. While the realization of a mass-market product may necessitate some patience, it’s evident that when Apple takes a stride, the industry follows suit.

We are excited for the journey ahead and look forward to seeing what kind of products and use cases come out of this next era of consumer technology. If you’re looking to learn more or have any questions about this topic, don’t hesitate to contact us!

Get in contact