This article was written for the Filmton Guide of the Berufsvereinigung Filmton (BVFT). Of course, it should also be available to non-members. My aim here was to highlight the topic of immersive 3D audio in the context of cinema in film sound and TV.
But beyond that, there are many possibilities for the soundtrack of the moving image. What these are and how the good, spatial sound can be used to its full advantage are described below. Have fun reading!
Admittedly, hardly any other word is as hyped in the audio scene as "immersive". Yet "immersive audio" is often used as a synonym for "3D audio". This underestimates the fact that even mono playback can be immersive with a stirring story. And music in good old stereo also invites you to close your eyes and immerse yourself in a world of sound.
Dummy head stereophony has been a popular recording method for playback on headphones for decades. Currently, however, there is a lot of momentum in the subject again. As sound is now also becoming more accessible as a three-dimensional event for a larger audience with loudspeakers.
Furthermore, additional technologies such as head trackers, head-mounted displays, and real-time renderings are popping up in a wide variety of areas, giving us an increasingly realistic impression of hearing as we are used to in our natural environment.
Put simply, they can trigger emotions, feelings of presence, and perceptions that are deeply connected to our own experiences – more directly than mono could. For example, because a level of abstraction is omitted and is easier for our brain to process. Therefore, the keyword immersion, i.e. diving in a virtual world environment, is quite apt.
Of course, this changes everything – doesn’t it?! The euphoria among film sound engineers is great and the topic is a perennial topic at the relevant conferences. But where and when can 3D audio actually offer added value? In recent years, I have had the opportunity to deal with immersive audio in a variety of contexts more intensively than almost anyone else. But precisely because of my enthusiasm, I like to question whether 3D audio is always the first choice of things.
Because I would like to warn against mentally storing "3D audio" in the mental drawer "better than stereo". So it’s a matter of filtering out where the new technology works best and what might be promising areas of application. Since it is difficult enough to do justice to the topic in its entirety, we still want to scratch the surface as best we can.
The possibilities for working with 3D audio are almost unlimited. Most people first think of Dolby Atmos for immersive film sound or music productions. But this is actually only a small part of the conceivable applications and "daring spoiler alert": I think the real potential lies hidden elsewhere. But more on that later.
Fortunately, there are now a number of manufacturers who have jumped on the immersive bandwagon: Microphones, plug-ins, distribution platforms. Nevertheless, there is still a lot of research to be done in the field of Next Generation Audio, personalized listening, binauralisation, etc. And it seems that every discussion at relevant conferences ends with the keyword HRTF (Head-Related Transfer Function). But I would argue that what we currently lack is not so much the tools and theories, but the knowledge of what we can do with them in creative practice.
I know at least a handful of colleagues who snapped up the Sennheiser Ambeo VR mic in combination with a Zoom F8 recorder as soon as the product was released. Wisely anticipating that all the 360° requests would come soon. A few years later, it is now clear: that never happened. But how can it be that in other areas there is more production with 3D audio than ever before? So let’s start with the best-known example.
The typical approach is to use 3D audio where it has already worked well with surround. It is no longer a secret that more than a thousand moving pictures have already been mixed immersively in Dolby Atmos. This fact alone is remarkable for the sound. While the listening experience can currently be enjoyed almost exclusively in cinemas or friendly studios. In the future the three-dimensional mix will soon find its way increasingly into the living room at home via soundbars. Clever algorithms with virtual loudspeakers make it possible.
Or even simpler: in almost every household there are headphones that make three-dimensional audio reproduction in the form of binaural stereo accessible to consumers. As a listener, you increasingly get the feeling that you are not just watching a film, but that you are part of the action. Mixing engineers can also be happy about the better transparency because, with the spatial distribution of the different sound levels in the room, fewer compromises have to be made than with stereo.
The technology behind this is called object-based audio. Put simply, the so-called "bed" serves as the basis for diffuse sounds such as atmospheres and reverbs. This is supplemented by mono objects, which give the whole thing its name. These objects are not, as in the classic workflow, mixed inseparably into one sum, but are still present in the master file as separate files that are additionally described with metadata.
But object-based audio not only offers the possibility of three-dimensionality, but personalized interactivity is also becoming more and more exciting. The fact that the files are also available separately in the codec for end-users opens up unimagined possibilities.
How would it be, for example, to be able to mute the audio object "commentator" during a football broadcast? In order to enjoy the enveloping stadium atmosphere and better be able to make your own comments with your mates on the couch? The immersive film sound then offers the same possibility for entertainment formats that the cinema hall cannot provide.
This means that the actually sacred audio master of the mixing sound master can be manipulated by the viewer – at least as far as it is defined in the mix. An idea that the makers will have to get used to.
But the exciting question is: How could interactive TV formats work in the future, for example, in which the sound can be personalized? This is a field in which interactivity could perhaps be even more exciting than three-dimensionality and is already being implemented at BR, for example, as "Dialog+", within the framework of a pilot project.
Speaking of personalized listening – let’s now look at an area that literally enables immersive immersion in the film and its sound.
First of all, a big "Attention": Most sound engineers talk about VR, but actually mean 360° videos. However, such productions are only a very small niche of the whole virtual reality and should rather be considered as a special case. There are two major limitations: 360° videos are linear in time, just like conventional films, and secondly, they only allow rotation of the viewing direction around three axes: X, Y and Z. This is known as the three degrees of freedom.
Also known as three degrees of freedom (3DoF). The hype around such productions has already died down and will probably only play a minor role in the future – even if the flood of tools and plug-ins suggests a need that hardly exists on the market.
VR applications in the true sense should be considered an interactive experience, and have more in common with computer games than film productions. Here you can’t get around game engines like Unity or Unreal and middlewares like Fmod or Wwise. This requires programming skills and a certain rethinking of audio production.
However, the true potential of VR lies in this interactivity, which does not have to be linear in time. Furthermore, 3D audio gets another component, because in addition to the axis rotation, the transformation based on the three spatial axes is added. This means that users can not only rotate from a fixed position, i.e. look around, but also move freely in space. This adds three more degrees of freedom and is thus also referred to as experiences with 6DoF (six degrees of freedom).
Thus, there is no longer one optimal listening position, as it is taken as a basis for audio mixes. It is much more important to take into account that the playback location can be anywhere in the virtual space. This opens up an incredible number of possible applications for rethinking sound. Since this article cannot do justice to this, I will now concentrate on the area of 360VR in terms of immersive film sound.
Back to the world of spherical videos. How could it be otherwise, not only the image is a round thing here, but also the sound. The Ambisonics format has established itself as a quasi-standard for 360° videos and has been officially supported by YouTube and Facebook, among others, since 2016. In addition, there is a head-locked stereo track that, in contrast to the Ambisonics sound field, does not change when the head is moved.
This additional, optional audio stream is preferably used for voice-overs or music. This to narrate audio content that is not in virtual space. This blurs the boundaries between diegetic and non-diegetic content. There is thus still a lot of need for discussion about not confusing users with speech and music that is not visually reflected in a 360° scene, but rather to support storytelling with a targeted soundtrack.
Ambisonics is often badmouthed in the sound scene because it certainly has disadvantages, such as localisation, which even higher orders can only solve to a limited extent. Nevertheless, due to its practicability and in combination with object-based audio in the form of mono sources, it absolutely has its raison d’être here. But since this is mainly about the content and less about the constantly developing technology behind it, let’s continue with the text:
For sound engineers, mixing 360° videos means saying goodbye to the "centre channel". There is no longer a fixed viewing direction and the user now decides where the interest falls. This requires a big change for filmmakers in almost all departments. That’s why this area is currently mostly staffed with people who understand cross-media work and also have experience with video game productions. And yet the soundtrack is not only used here to set the picture to music. I like to go so far as to say that immersive film sound is even more than 50% of the experience in VR. Because we can miss things in VR but not overhear them.
For example: if someone comes through the door from behind on the left, we can add crotch sounds about a few seconds earlier. This subtly directs the viewer’s gaze and they don’t miss the moment when the new protagonist enters the room. But if you hear footsteps at the back left in the cinema, it influences the narrative, but only in extreme cases motivates the audience to turn their heads away from the screen, which is usually not desired. In VR, however, it is precisely this physical turning of the head that we want to achieve in order to influence the gaze and thus the image detail by intuitive means.
Most applications with 3D audio like to aim at making the reproduction as realistic as possible. This may be the maxim for some applications, but we as film sound designers know: reality usually sounds pretty disappointing. However, the term "larger than life" also applies in 3D space: the aim is not to reproduce reality, but to create a credible soundscape that above all supports the storytelling.
Now for the more complicated keyword: added value. However, it is not at all easy to make a 3D mix really sound better than a stereo mix in comparison, as the following example will show: Dolby Atmos Music and Sony 360 Reality. Here, over 1000 well-known songs were remixed in 3D and offered on streaming providers such as Amazon Music or Tidal. Produced in major studios, this is a really big deal and a huge opportunity to get consumers excited about 3D audio.
Several listening tests and hours of 5.1.4 listening sessions, however, left a handful of sound engineers and me quite disenchanted with the résumé: most 3D mixes do not necessarily sound better than the respective original stereo mix.
I don’t want to generalize here, as there are also very beautifully made mixes, but most of the time I had the feeling that all the pressure had been lost in favor of spatiality, which is not necessarily good for most productions. Who wants to do the self-experiment: Here are all the streaming providers and formats listed that currently allow 3D audio playback.
The possible causes are beyond the scope of this article but are briefly touched on with time pressure, unfamiliar new tools, stereo stems instead of individual tracks, and object-based mastering.
Does this mean that 3D music will not be of great importance in the future? Of course not! Therefore, I would like to mention another extreme example that we like to ridicule, but still gives us hope: 8D Audio. For more details, it’s best to ask the search engine you trust, but the short version is: someone came up with the idea of running a song through a spatialiser and letting it circle endlessly around your head. Sounds absurd, and to our trained ears it is.
And yet: millions of people have been reached here with 3D audio content and the click numbers are in the nine-digit range. This shows how good it can be to think unconventionally and do things that one might resist.
One must not forget for whom and why we are actually mixing something. Do we want a 3D mix that can be shown to colleagues with a clear conscience (Dolby Atmos would probably be the choice here), or do we want to reach the consumer (8D Audio shows that it can be done). The truth is probably somewhere in between, so it’s time to harness the potential of both worlds and find the added value of 3D audio.
By the way, our ears have unexpected allies here: our eyes. As humans, it is much easier for us to interpret what we hear if we also see it. In the 3D audio context, too, I advocate not subordinating the sound to the image or, conversely, the image to the sound, but creating a symbiosis that must be approached differently for 3D audio than we are used to from classic film sound.
It’s nice to have an immersive mix for a feature film. But if it were stereo only, the narrative would probably work just as well and only the physical immersion experience would suffer.
This is not a bad thing, but it makes it difficult to get viewers excited about 3D audio. This is where the next hype word comes into play: immersive storytelling. It’s a complex topic, but the principle is simple: find an application where spatial sound has an added value that goes beyond the mere soundtrack.
This means that we as sound engineers are now in demand. And this is the case long before we go on set or into post-production. So the sound production has to be extended with the mere recording and mixing. Also it needs to act as a director’s department and project management in a way, in order to be able to influence the production at all.
Perhaps linear narrated films are also not the best medium to experiment with 3D audio dubbing. However, it is worth turning the tables and looking beyond the film sector to see in which industries and applications there is undreamed-of potential. True to the motto "Sound First".
Amidst all the immersive hype, however, one must not lose sight of the big topic of immersive film sound. Already, the next hype topics are hitting: Artificial intelligence, blockchain, voice assistants, smart speakers, etc. What at first feels like buzzwords that are far removed from our film sound world are actually very hot topics that sound engineers should take a close look at.
Sounds a bit abstract, but the wonderful thing about sound is that it is a cross-sectional technology with which you can virtually dock onto any other topic. This goes far beyond immersive audio. So now it’s time to become a doer yourself, to leave your comfort zone and venture into unimagined audio territory. For it is in our hands to shape the music of the future and this has more than just three dimensions.
Thanks to the editors just before Christmas: Philipp Eibl, Regina Bäck, Alexander Rubin, Felix Andriessens, Jörg Elsner, Mathis Nitschkemehr erfahren