A text by Martin Rieger with the kind support of Ana Monte and the German Association for Film Sound Professionals (BVFT).
Immersive audio isn’t just a buzzword; it’s a transformative technology that offers an entirely new level of engagement. To comprehend its significance, it’s crucial to delve into what immersive audio truly entails, how it functions, and the myriad possibilities it unlocks for creative minds in the realm of virtual reality.
Virtual reality sound has ushered in a paradigm shift in the realm of auditory experiences, necessitating a fresh approach from both technical and creative standpoints. This innovation demands immersive audio solutions that can captivate audiences in a three-dimensional soundscape, where the auditory environment dynamically responds to the viewer’s movements, with or without the aid of head tracking. What immersive audio is, how it works and what you can do with it, comes here:
Extended Reality is the umbrella term for Virtual Reality (VR), Augmented Reality (AR) and Mixed Reality (MR). Spatial Audio plays a very different role in each of these. But let’s take things one step at a time, because first of all a big “attention”: Most sound engineers talk about VR, but actually mean 360° videos.
But such productions are only a very small niche of the whole virtual reality. There are two big limitations: 360° videos are linear in time like conventional movies and secondly they only allow a rotation of the viewing direction around three axes: X, Y and Z. This is known as three degrees of freedom. Also known as three degrees of freedom (3DoF). The hype around such productions has already died down and will probably only play a minor role in the future.
So much for the topic of XR. Now let’s move on to the hype topic of “immersive”. There is a great deal of euphoria among film sound engineers, and the topic is a hot topic at the relevant conferences. “Immersive audio” is often used as a synonym for “3D audio”.
But where and when can 3D audio actually offer added value? It is important to find out where the new technology works best and what could be promising areas of application.
The possibilities for working with 3D audio are almost unlimited. Most people first think of Dolby Atmos for immersive film or music productions. But this is only a small part of the potential, and we want to focus on the most relevant ones for BVFT. There is a huge jungle of new formats, devices, plugins and distribution channels emerging right now that we may not even know about today. Meaning it may well be worth becoming adventurers yourself and plunging into the wildness.
For recording on the road and in the studio, sound engineers need to have a good overview of the existing multi-channel, microphone arrays. These are usually suitable as so-called beds. However, channel-based formats such as 5.1, 7.1 and 7.1.2 are of secondary importance. Although they are well established in the film sound context, they do not reproduce the sound spherically. Evenly from all sides, as immersive audio usually requires. This is because the sound is often rotated around the spatial axes during later playback.
At the latest here one stumbles over the term Ambisonics. This format has already existed for a few decades, but it only achieved its great raison d’être in combination with 360° videos. There are already quite affordable microphones from various manufacturers that have four tetrahedrally arranged capsules. These raw recordings are referred to as A-format and are later transferred to B-format by software, where they are mixed with ambisonics.
The advantages and disadvantages of the format will not be discussed here, but you can learn more here here. The main point is to show that during recording it is sometimes not even clear on which device or platform the later production will end up. So the ORTF-3D or an omni-binaural microphone can be the right choice just as well.
The most important difference to classic film sound is that the boom microphone is more or less completely omitted in 360° videos. Otherwise, the sound man including the boom would be visible in the VR image. Therefore, a solid radio system is necessary here as well. On the one hand, it is important to record a 3D bed that captures the sound as well as possible from all directions, and on the other hand, it is essential to record the audio objects as isolated as possible. In this case, completely different rules apply than in classic film, since it is suddenly decisive who speaks from which direction and how the scene is spatially resolved how it sounds.
Many tools are now available in the popular DAWs for ProTools and Nuendo, but not all of them. You quickly come up against limits in the form of bus sizes or speaker configurations. Therefore, the detour via Reaper can be worthwhile where you are virtually unrestricted and can write your own scripts in case of doubt. Speaking of scripts, there is almost no way around them in Unity or Unreal at the latest, but more about that later.
In most cases, music is produced in stereo. In the immersive area, there is a dedicated head-locked stereo track for this purpose. This is an audio track that does not change, no matter if the viewing direction is changed. The problem with this is that now this static soundtrack works against the immersive soundtrack and breaks its localization.
Therefore, it can make sense to have effects and music delivered in object-based formats. Unfortunately, the production side often chooses a classic approach with music and voice-over. This leaves almost no room for the original sound to have an effect, since three sound layers have to harmonize with each other somehow, but are lay somewhere between diegetic and non-diegetic.
Therefore, it is better to think simpler and not to overload the sound from the beginning. Most of the time there are visual elements that already challenge the user enough, so there is no need for epic music and an advertising speaker.
Make no mistake, not only are there no format standards and a motley assortment of microphone options, neither are there standards for measuring loudness. Here AudioEase and the FB360 workstation have LUT approaches that attempt to measure the sound field signal, at least for 360° videos. The most important thing is that the mix is not too loud and causes problems later in the decoder when binauralizing or distributing to speakers. Sounds easy to say, but in many cases, it is difficult to anticipate, because you absolutely cannot rely on levels.
Also, you have to say goodbye to the illusion that with 3D immersive audio, the sound will always sound exactly as you imagine it. For example, Facebook and YouTube use different HRTF models for binauralization, which means that one and the same mix sounds significantly different on different platforms. This affects not only the timbre but also the localization and mixing ratios with the head-locked portion.
Speaking of head-locked, this already mentioned optional stereo track is mostly used for music, but can also be used for voice-overs. In-head localization makes it clear to the user that the person cannot be localized and thus cannot be located in the scene. Nevertheless, it can be irritating for people to hear a voice and not be able to locate it to a person.
In a classic film, no normal viewer would be irritated by this. In VR, however, other laws apply because you are part of the scene yourself. So here, too, you have to question mixing decisions that have been made for ages in moving images. Channel-based workflows in stereo or surround are taught by default, but cannot be adopted for spherical audio mixes.
In addition, listening habits also play a big role here. In the meantime, the presence of the clip is almost preferred to the naturalness of a boom microphone. In the same way, female listeners first have to get used to externalization (the opposite of in-the-head localization), i.e. the feeling of really hearing a person from the outside with headphones. In this case, speech sounds much more spatial, just as one knows it from reality, but not as one knows it from movies.
So, on the contrary, here the stereo sound is actually considered false sound, which can break the immersion that you tried to build with the image. There are very few good reasons to resort to static sound in a virtual reality sound experience, and very obviously shows that the potential of the medium has been far from exhausted here. Here more thoughts on it.
Let’s just get to the point that actually makes the XR theme so powerful. And that is something that the fan of the linear moving image will probably not like at all: namely interactivity. Here we quickly find ourselves in the world of 3D models and game engines.
But here comes the good news: there is a job description that is very close to the requirements here: Game Audio Designer. This may not have much to do with classic film sound, but the boundaries are becoming increasingly blurred. Just because you’ve learned game audio doesn’t mean that you can work in sound for gaming. And just because you learned film sound doesn’t mean you have to do film. Sound for XR is somewhere in that gray area.
The biggest difference from classic film sound is that instead of a longer audio file that runs constantly from start to finish, so-called audio assets are now delivered. These are individually attached to game objects in the game engine. These can be, 3D model in space, characters or linked to events. In movie audio, we know exactly when a person walks through the door, for example, and we can add the appropriate door sound and reverb.
Unfortunately, in interactive applications, we rarely know exactly when the event will occur. Therefore, the door sound asset is linked to the game object door. As soon as a character steps through the door, the stored sound is played. Sounds simple, but for credibility a few steps are missing – not only the character’s, but also reverberation algorithms for the two rooms separated by the door have to be defined in advance. Besides, you don’t want to hear the same door sound every time, so you usually deposit a small palette of sounds directly. Or write a script that pitches the door sound higher or lower depending on how hard the door is slammed.
As you can see, this is not a perfect mix, as with linear audio productions, but trial and error. Since most game engines are usually only quite rudimentary in their audio functions, middlewares come to the rescue here. These extend the range of functions and enable an interface to the project without having to make changes to the project itself, which pleases the programmers who usually still have to work on the same project in parallel.
When developing an XR story with spatial audio, you should think about how to use sound to drive the story. So to get started you have to rethink audio in general. There are so many workflows that you do with just mono or stereo sound that don’t work in XR with spatial audio. Because in XR, you’re not just looking in one direction and the sound can direct your eyes.
There is only a manageable amount of advanced training. However, more and more universities are recognizing the need and are already working on new concentrations with immersive audio and labs are being expanded. Related associations like the AES and VDT have similar offerings of workshops and webinars.
It’s worth taking a look at what content is already available. What XR experiences there are with spatial audio that work well. And then from there, maybe you can start developing your own stories or thinking about your own ideas. Because chances are very good that you have a vision that no one has had before in terms of sound, so that lowers the barrier to entry.
As developments continue at a rapid pace, it is very difficult to predict the future. However, it is definitely worthwhile as a sound person to think not only about his or her department. This makes communication with other creatives or technicians easier and also helps to broaden one’s own horizon in terms of sound. So an education in media technology with a focus on sound can be a good foundation here. It’s no use having all the specialized knowledge about immersive audio if you can’t communicate to other people why it’s important to them.
Many audio colleagues are currently considering building speakers on the ceiling and upgrading to Dolby Atmos in the hope of attracting new clients. But in reality, the advertising promise, or any “return of investment” at all, will occur. Why would a customer suddenly spend significantly more money on an audio product that probably would have worked just as well in stereo? And if everyone suddenly offers 3D, there is an offer on the market of which it is not at all clear whether it can be covered by music and movie demands at all.
That’s the crux of it, most sound colleagues just want to stay in their studio, keep using their favorite DAW and then somehow the jobs will come. But that doesn’t work in the XR and immersive audio world. Often, the additional work involved in 3D audio production is disproportionate to stereo. And here, too, people prefer to spend their budget on the visual part rather than on the sound. After all, one has a lot to do with fancy smart glasses. That’s why it’s time to turn the tables: Get out of the sound comfort zone and think for yourself what can be an exciting application in terms of XR – it’s worth it.
Therefore, above all, a different mindset is required, as you’d call it nowadays.
Moviegoers are willing to pay a good price for a surround sound experience. But whether stereo or surround, at the end of the day it’s a nice feature. You don’t need to expect to suddenly charge more just because you’re now mixing in 3D – for a movie or music that works mostly well in stereo, too.
Does that mean 3D audio won’t be a big priority in the future? Of course not! Therefore I would like to mention another extreme example, which we like to laugh at, but still gives us hope: 8D Audio. Here find the best detailled article. But the short version is: Someone came up with the idea to run a song through a spatializer and let it circle endlessly around your head. Sounds absurd, and to our trained ears it is.
And yet, here millions of people have been reached with 3D audio content and the click numbers are in the nine figures. This shows how good it can be to think unconventionally for a change and do things that you might be reluctant to do. It’s not enough to buy a 3D spatializer and think you’re doing immersive audio now. That’s why there are still a lot of people missing in the industry who really have years of experience.
You have to remember who we are mixing for and why. Do we want a 3D mix that can be shown to colleagues with a clear conscience (Dolby Atmos would be the choice here), or do we want to reach the consumer (8D Audio shows that it can be done). The truth is probably somewhere in between, so it’s time to harness the potential from both worlds and find the added value of immersive 3D audio.
The typical approach is to use 3D audio where surround has already worked well. It’s probably no longer a secret that over a thousand feature films have already been mixed immersively in Dolby Atmos. This fact alone is remarkable for the sound. While the listening experience can currently be enjoyed almost exclusively in cinemas or friendly studios, the three-dimensional mix will soon also increasingly find its way into the living room at home via soundbars. Clever algorithms with virtual speakers make it possible.
Or even simpler: in virtually every household there are headphones that make three-dimensional audio playback in the form of binaural stereo accessible to consumers. As a listener, you get more and more the feeling that you’re not just watching a movie, but that you’re part of the action. Mixers can also enjoy the improved transparency, since the spatial distribution of the different sound levels in the room means that fewer compromises have to be made than with stereo.
Artificial head stereophony has been a popular recording method for playback on headphones for decades. The vision behind it is to reproduce the acoustic environment as realistically as possible for humans.
Put simply, it can be used to trigger emotions, feelings of presence and perceptions that are deeply connected to one’s own experiences – more immediately than mono could, for example, because a level of abstraction is omitted and is easier for our brain to process. That’s why the keyword immersion, i.e. immersion in a virtual world environment, describes it quite well.
Currently, however, there is a lot of momentum in the topic again, since sound is now also becoming more accessible as a three-dimensional event for a larger audience with loudspeakers. Furthermore, additional technologies such as head trackers, data glasses and real-time renderings are popping up in a wide variety of areas, giving us an increasingly realistic impression of hearing as we are used to in our natural environment.
In this age of rapid technological advancement, topics like artificial intelligence, voice assistants, blockchain, link id=”3246″ text=”Smartspeaker”] etc. might seem far removed from the realm of immersive audio. However, these buzzwords are not to be dismissed, as they are becoming pivotal in the ever-evolving field of immersive sound especially for sound engineers. The fusion of these technologies is shaping the future of audio experiences.
While it might initially appear abstract, it’s important to recognize that sound is a versatile, cross-cutting technology that transcends the boundaries of immersive audio. It’s a tool that can seamlessly integrate into numerous domains beyond the auditory realm. As sound engineers, it’s our prerogative to embrace these advancements, step outside our comfort zones, and explore uncharted territories in audio innovation. The future of music is not confined to three dimensions; it’s an expansive soundscape limited only by our imagination and creativity.
Find out how they sound here! So contact Martin Rieger now without obligation.Contact us