What is Spatial audio and how does 3D sound work?

Content

Spatial audio? Heard of it? Chances are good that you’ve stumbled across it before. Some know it from gaming, some from movies or VR. In fact, a reference to the topic is popping up from all directions right now – rightly so, in my opinion!

Me, that is Martin Rieger, also known as VRTonung (vrtonung.de), probably one of the biggest enthusiasts in this field. Someone who already said to himself years ago “the topic is so crass, I don’t want to do anything else”. Nevertheless, the topic is so new and there are many myths buzzing around this technology.

The most fantastic marketing terms quickly come up – “immersive audio”, “Dolby Atmos”, “3D, 360°”, “8D sound” , “3d audio”, “3d sound” or “the music revolution” etc.. After all, you want to sell new software or hardware. But we’re going to do a deep dive today, look at sound from all angles and then know the most important basics, so we can look and listen in the next article to see where this is just gimmicky or really makes sense. But let’s take three steps back.

What does spatial audio mean?

The principle is simple. Spatial audio describes a playback method that makes it possible to hear sound not only from the front left and right (as with stereo) or from behind (as with surround sound), but also above or below. This adds a new, third dimension and audio element to the spatial room acoustics.

#3daudio #3dsound #3dlistening #immersivesound #spatialsound #spatialaudio

Why should that be a game changer now? Well, we humans always hear in three dimensions. We know what it sounds like when someone is standing behind us and talking to us. We don’t have to see the person for that. And now it is technically possible to reproduce this natural sound impression artificially.

Finding the right words…

In this context, people also like to talk about “immersive audio”. An enveloping sound that is so natural that we humans feel really comfortable in this digital reality. We virtually immerse ourselves in the artificial world and forget everything around us (and bit like the immersion in VR glasses, only related to the sound). Ideally, however, all senses are included in the immersion. However, as we know, you can get very far with good image quality and sound in most media.

But if you will, even a stereo mix or even mono can be “immersive” if the content is well done. At the disco, people still get sucked into the music – even if it doesn’t have more than two audio channels. That’s why I prefer to say “3D audio”, because it somehow implies a technology is available that you can somehow hear around you.

Apple Spatial Audio is more than Apple Music

Apple also does it this way, in English they call it “Spatial Audio”. This is often translated back here as “spatial sound”. But you can see that there is not yet complete agreement, so let’s see what they say in a few years. And Apple‘s spatial audio feature doesn’t work with any device. You’ll need compatible hardware and software such as a current iOS device, supported headphones and a supported app. In general it does work best on Apple devices with spatial audio support.

You can experience spatial audio on iOS devices if you turn on the spatial audio icon in your volume control center. This also works with volume control on a Macbook Pro, or newer iPad models. It basically applies directional audio filters, makes spatial audio worth a try.

How to listen to 3D sound? How does spatial audio work?

One of the reasons why this audio technology is only slowly becoming mainstream is that it can sometimes be quite difficult to enjoy the 3D audio sound at all. There are two possibilities. One has never really caught on (loudspeakers). The other is something everyone has: compatible headphones.

5.1 surround sound is what again?

First of all, the speakers. Honestly, not even 5.1 surround has really made its way into the living room. As the name suggests, you would need a total of five speakers, which you place in front of the left, center and right, as well as in the back on the left and right. In addition, there is a sixth speaker in the form of a subwoofer, also called LFE, so that you not only hear explosions in the whole movie theater, but also feel them.

Sounds all kind of impractical – it is. From the speakers you not only need the space, you also need to pull cables across and should have a room that is at least a bit acoustically adjusted. For 3D audio, you need even more speakers. Here, 7.1.4 is just becoming the quasi-standard for recording studios. That’s right, at least 12 speakers are being installed in the studio on the wall or ceiling.

Soundbars to the rescue

For consumers, all this is not quite reasonable, but there is already a remedy in the form of soundbars. These elongated speakers are usually placed under or in front of the TV, where there is usually enough space. But this space is used as efficiently as possible, because there are actually several built in speakers, into these speakers. A few face forward – where the couch is in the average living room. But some of the speakers fire to the side – or even upwards.

A home tv setup with spatial audio soundbars

Huh, so past us? Right, but the target is actually your walls or ceiling, where the sound is reflected and we the people watching a movie on the couch, it sounds like you actually have speakers hanging on the ceiling, or to the side of you. The device measures itself and your room with all audio settings at startup. This way, it knows approximately where your walls are and eliminates the so-called room modes, i.e. frequencies where your room could sound dull. This is also referred to as beamforming or virtual speakers.

What doesn’t work so well yet is the “from behind”, because you have to play the sound over the band twice here. Therefore, the package is sold with two additional speakers and a subwoofer for bass lovers. Now you just have to pay attention to how many speakers are installed in the soundbar, like 5.1.2 or even 9.1.6. In fact, you can say here: A lot helps a lot, the more the better. If it also says Dolby Atmos, chances are good that you can connect the device directly to your smart TV via HDMI and get that cinema feeling at home via streaming services. But do not use the TV speakers as support, but let the home theater system work through the soundbar function.

Netflix as an exceptional case

Netflix recently announced a partnership with Sennheiser’s Ambeo to make the streaming and personalized spatial audio experience accessible to all customers who want to experience personalized spatial audio without special hardware. With the integration of Netflix Spatial Audio, various titles can now be enjoyed in an even more immersive way, some may even have Dolby Atmos support.

A custom signal processing algorithm maintains dialogue integrity and adds a sense of spaciousness to the surround sound, without artificial reverberation or room sounds. Netflix tested this option for over two years and had it approved by sound engineers for the streaming catalog. It’s really exciting to see what the future of the streaming audio experience looks like – and it’s all thanks to the Netflix app.

If you want to watch the currently available movie titles with 3D spatial audio compatible headphones on, just search for “Spatial Audio” on Netflix and dive into a new era of streaming! You don’t need any extra setting, just set the right series in the browser and enjoy it at your favorite volume

Smart speaker

These homepods often use a similar principle. They understand where they are in the room and try to use this situation as well as possible. Nevertheless, the usage is usually quite different. While soundbars are mostly used under the TV for movies, smart speakers are usually placed in the room and are best suited for music or podcast playback.

The interaction between compatible devices is usually different. Mostly, a voice assistant is integrated, which you blithely tell your wishes. In the hope that he or she understands what you want. In reality, this doesn’t work perfectly, but it can be easier than manually navigating to the desired audio content on your smartphone. We all know the hype about ChatGPT, so we can only hope that smart speakers can now really call themselves “smart”.

But back to the topic of surround sound: I was allowed to supervise a bachelor thesis, where my student wanted to find out why there are actually several speakers in a smart speaker. Simply put, the device wants to sound bigger than it is. And indeed, with the right 3D audio content, the box no longer sounds like it looks. Sure, the sound still tends to come from one direction, but with the next generation, you can pair multiple devices together and be enveloped in sound even better.

Headphones with 3D Audio

I’m more of a headphone guy. For the simple reason that everyone has a stereo pair of of headphones – probably more than one. Many think that you need special headphones to support spatial audio, but that’s not true. They show the later listening examples and how our spatial hearing works.

This means that a huge infrastructure is already in place to play back 3D spatial audio content well. The only prerequisite is that the content in question must be available binaurally (we’ll get to how that works in a moment). Or that the content is converted to two channels in real time by the playback device (as with Dolby Atmos). The problem is that often where it says spatial audio, it doesn’t really is spatial audio. 3D spatial audio with dolby atmos tracks is not a seal of quality and is often used as a marketing label. Even Dolby Atmos playback does not mean “that sounds great”. There are simply some contents that work better or worse in 3D. It’s a bit like 3D movies: it’s fun for action, but it doesn’t always make sense for quieter genres. Just like 3D audio with dolby atmos doesn‘t make sense for any type of music. It makes less sense for hiphop and more sense for movie soundtracks.

I recommend over-ear headphones for listening pleasure. Simply because the sound is then generated as far away from the eardrum as possible. Experience has shown that this works somewhat better than when the sound is generated in the eardrum, as with earbuds, and thus has to travel little distance through our hearing apparatus. As a sound engineer, I am a fan of headphones with a linear frequency response, these distort the audio the least. But as I said, you don’t really have a particular model.

Well, unless you want that certain something. And that is dynamic head tracking in this case. This means that acceleration sensors are installed in the hardware, which recognize where you are looking and turn spatial audio sound effects around in real time. If you move your head, the apple’s spatial audio sound field rotates accordingly. The dynamic head tracking technology is already built into all Apple devices, and the competition has already followed suit. But what do you hear? This question will be answered in detail in the next article. In fact, the manufacturers themselves usually don’t know. This is not so bad, because they have already built a hardware infrastructure that can still be adapted on the software side with exciting applications or content afterwards.

Spatial hearing

Spatial hearing is fascinating and lets us experience the world around us in 3D. But how can we do that with just two ears? It’s all a matter of timing and intensity. Our ears hear sound waves from the environment in different ways, depending on where the different sounds come from. The brain processes these differences to create a personalized spatial audio map that tells us where a sound is coming from. It’s an incredibly complex process, and yet we manage it effortlessly and without even realizing it. Three factors play the biggest role in how our brain derives peak performance from comparing left and right ears.

ITD: interaural time difference: time delay
ILD: interaural level difference: level difference
HRTF: Head related transfer function (more on this later).

Example

It all sounds a bit abstract, so let’s take a quick example. Let’s imagine we are standing on a street and hear a car honking from the right. Then the sound reaches the right ear first, before it reaches the left ear (ITD). After all, the right ear is closer to the car, even if it is only 17-20 centimeters. In addition, the sound is also louder on the right ear than on the left, because the head shades the sound event like a small wall. And last but not least, the horn has a different frequency response on both, which is due to the shape of our pinna. While on the right the sound can enter the ear canal quite easily, on the left the sound is refracted around our head and captured by the pinna. In the process, the frequencies change.

How listening to spatial audio may look like in the future

Air upwards

So what our brains do in real time all our lives, software algorithms are now trying to recreate – that‘s how spatial audio works. So the tools always ask themselves: I have so-and-so many 3D objects in my virtual space – how would that sound to two human ears now. This process is also called binauralization. This makes it possible to hear 3D spatial audio with just two ears – even on standard headphones.

This is the simple principle, but as you can imagine, these algorithms are not only CPU-hungry, but also only approximately correct. For example, every person has a different ear shape and head size, which is why the calculation usually only renders with average numbers. But if the renderer knows what your own ears look like, it can be adjusted. This is called personalized HRTF, and in fact Apple was one of the first to make the setup process of taking pictures of your own ears socially acceptable. From these, a 3D model is provided that refracts sound exactly as it happens for each individual person.

In-head localization

Let’s move a bit away from technology and towards perception. What spatial audio wants to create via headphones is precisely the impression that the sound is happening around us and that we are more or less at the center of the action. But isn’t that always the case?

Not quite, because I’m going to tell you a problem that you didn’t know was a problem. When you listen to a podcast on headphones, for example, that is a mono signal, what happens on the headphones is this. The signal is output on the left and right channels at the same time. As a result, the sound arrives at the eardrum at the same time, if we turn spatial audio around here in mono, it doesn‘t change anything. This means that the difference in level, time and frequency is 0, which leads our brain to the logical conclusion: the sound must be in our head. In fact, this is called in-the-head localization.

In the head what?! Sounds abstract, so just watch the following video and put on headphones: 3D Audio Demonstration. So mono is always perceived in the middle of our head via headphones. Even if you work with stereo, you can only turn the sound source to the left or right, but our brain still knows that the sound comes from the headphones. There are microphone methods like the ORTF, which, like an artificial head, makes use of a distance between two microphones. With this, a certain spatiality can already be achieved via stereo – you kind of spatialize stereo.

But only when the three parameters of spatial hearing are fulfilled does the feeling really arise that the sound is coming “from outside”. I also like to say that it feels like you’re not wearing headphones at all. It has happened to me many times that I have listened to 3D content through headphones, but thought the sound was coming from my speakers and wanted to take the headphones off again. Which left me sitting in a silent recording studio and realizing: the speakers weren’t even on and the headphones were tricking me. That’s the magic of 3D audio.

How is 3D sound created?

There are two ways to do this: Record the sound already three-dimensional with special microphones, i.e. the right hardware. Or you can take existing mono recordings and artificially add a certain spatiality to them.

Hardware: Microphones

Recording 3D sound is nothing new, by the way. Some of you may be familiar with artificial head recordings. These are created with a microphone that was modeled on the human hearing apparatus and actually has ears. Like a mannequin with a microphone in each ear. The prime example of how such a microphone can be used creatively is the Virtual Barbershop. But ASMR also like to use a similar one and get millions of views with it: is.gd/virtual_barbershop

A microphone with which you can record spatial audio tracks

Unfortunately, such a dummy head has a big disadvantage: What works great on headphones, does not work at all when played back on speakers. Feel free to make a comparison yourself. To record sounds three-dimensionally in such a way that they also work well on loudspeakers, other methods are suitable. For example Ambisonics or ORTF-3D from SChopes or the Sennheiser Ambeo VR Micro. Here 4, 8 or more microphones point in all directions, a bit like a 360° camera where several lenses form a sphere. If you want an overview of current GE councils, I have compiled one here, neatly arranged: vrtonung.de/360-mikrofone-3d-audio-recording-overview

Once this sound field has been captured, it can usually be reproduced quite flexibly afterwards on 1, 2, 4 or 8 loudspeakers placed around you. But you can already guess that this is a bit unwieldy. And what if I want to create scenes to which I can’t just drag a microphone array. Or locations that don’t even exist on Earth?

Software: Spatializer

Here comes the second variant, which makes it possible to create spatial audio content: Namely with the appropriate software, often called “Spatializer”. There are various plugins that take a mono signal as input and place it in an artificial space, e.g. right behind us, coming from above. So with these directional audio filters the sound suddenly gets three-dimensional information. Now the software only has to convert the signal in real time in such a way that it creates directional spatial audio with filters and the illusion via headphones that it is located exactly at this position. The free IEM Suite from the Graz University of Applied Sciences could be used for this.

If you want to use multiple speakers, the software needs to know what kind of configuration you are using. For example 5.1, 7.1, or 7.1.4 (seven speakers on the horizontal, one subwoofer, 4 speakers hanging from the ceiling). Again, this calculation happens in real time and you can have a mono object flying around your head in three dimensions. Probably the best known software is Dolby Atmos. It is at the end of the day a panner (the knob that you can turn either left or right in stereo), except that now you also have two additional controls for front/back and up/down.

Someone using spatial audio algorithms to mix a well sounding track

That gives us everything we need, doesn’t it? Not quite. So far, we’ve simply moved a sound around in a virtual space. But which space at all? That’s the sticking point, because for simplicity’s sake we’re simply assuming an anechoic chamber here. However, this means that the headphone playback only works semi-well. Dolby artificially adds a studio/cinema reverb. Otherwise, you get the feeling that the sound sources are very close to your head movement, but are not spatial.

Conclusion spatial audio

In summary, 3D audio is an innovative and exciting technology that allows us to create complex, immersive and realistic soundscapes. I hope this article has clarified the most important questions, because we will build on these basics in the next article when it comes to applications.

You still have questions? No problem! I’m really excited to hear what people have always wanted to know about the topic. Are you ready to start your own journey with 3D audio technology? Stay tuned for the next article where we dive deeper into this fascinating technology.

Contact me!

3D Spatial Audio Apps - Apple AirPods Pro, Galaxy Buds Pro and more!

Hearables with Spatial Audio - and more smart Earbuds Technology

Dolby Atmos Apple Music: Why It Sounds Bad and How 3D Spatial Audio Can Do Better