head tracking audio header
NewsOverviewTools

Dynamic Head Tracking – Spatial Audio for 3D Surround Headphones

Content

    Recently, I have made a few predictions that have actually come true shortly after. Like the implementation of 3D audio for Video Calls like Zoom. Because Clubhouse just came forward.

    There, I already touched on the term head tracking in the audio context. Because for localization and spatial hearing, this technology is a real booster. Most people know the feeling that the sound field dynamically adapts to the movement of the head – if at all – from VR glasses. So my prediction for today is:

    "Head-tracking implementation will be virtually standard for headphones in the future for 3D audio. Just like ANC (active noise canceling) is ususally built into hi-fi products today".

    But everything that is needed to make of head-tracking happen in the audio sector can also be realized without an HMD (head-mounted display). In fact, there are already millions of headphones on the market that can do just that!! Crazy, right?!

    Only very few consumers know about this, because there is still very little 3D audio content that uses the potential. Yet you could already fill these these augmented audio reality devices with great content – which is one of the things I’m working on right now.

    What is Head-Tracking for Audio?

    You probably know what head tracking is, right? It’s a technology or rather different technologies with the same goal: to track head movement in one way or another. So the dynamic head capture for audio content that is spatially distributed. Most of the time we see it linked to different visual media like VR or 360° Videos. So if it helps you to understand the technology: Think of a virtual reality headset and remove the visuals. You end up with sound via headphones that adapt to wherever you are looking at.

    Apple calls this technology something like dynamic head tracking audio. Or sound that adjusts based on your head movement coming from all around you for AirPods and Beats. Bose called its technology Bose AR. Augmented Audio Reality also came up in the context. So you can see that the terms are still vague, but the content for head tracking is coming.

    But here is the thing, good story-driven visual experiences usually need audio as well. So there you already have part of the answer to the headline above. Immersive digital visual experiences rely also on immersive audio. That’s why not only the visual aspect should be targeted in head tracking, but also the listening experience.

    video call

    How does Head-Tracking with audio work?

    Without going into any technical details (we’ll get later to that, don’t worry), it works similar to the visual head tracking aspect. When we move our heads the field of view behaves usually like the real world. Meaning that meant to be still objects stay where they are even if we turn the head. And the same goes for audio. That’s one of the advantages of object-based and sound field-based 3D audio formats, they are really versatile. This allows for lots of new possibilities, where head tracking is just one of them, as the dedicated reader of this Blog knows!

    What about examples?

    I’ve already mentioned VR and 360° Videos and how proper experiences in this field should be: with good audio. This means that the audio should behave like the image in order to merge into an immersive experience. I had the pleasure to travel to Kenia to create a 360° Video with my fellow team. You can watch it and read about it here.

    This is “just” a pretty simple demo using Ambisonics in combination with an immersive VR Video on Facebook360. Guess what, it is technically possible to distribute this type of sound experience since 2016. YouTube is since then supporting first order Ambisonics (FOA). Facebook is using a so-called hybrid higher-order Ambisonics (tbe). You can find the details here.

    So the audio head-tracking has already been used for years. Now this technology get’s already built into headphones. I like to consider it as the new ANC (active noise canceling). It quickly got integrated into the newer products. But with headphones you don’t need your eyes, right? So let’s have a closer listen what our ears can do:

    Is Audio Head-Tracking only used together with visual Media?

    The vast majority where audio head tracking is involved is in combination with visual media. An example for an audio only head-tracking application would be AR soundwalks, but there is currently not really a market for this, it’s interesting but very niche. However, just recently Apple announced dynamic head tracking support for its music streaming service Apple Music.

    Head tracking support for Apple Spatial Audio is actually not that new, the VOD (video on demand) sector offers head tracking since the inception of Apple’s Spatial Audio. Which is, to be fair, still quite new. You can learn more about that here.

    Something to point out here is that in the context of VOD the head tracking is audio only, which means that only the audio reacts to the head movements, the video doesn’t. This obviously backs up the relevance of audio head tracking.

    To come back to the announced head tracking support for Apple Music, it’ll be exciting to see how this will impact the audio consumer market. In my opinion there is room for improvement regarding spatial audio with Apple Music, which you can read here. But with the introduction of head tracking this could and probably will be quite an improvement!

    Do I need multichannel audio for head-tracking?

    The short answer is no, but we need to clarify that a bit. There are currently two possibilities for audio head tracking available:

    Some devices, like the Galaxy Buds Pro earphones, use an upmix algorithm where mono or stereo sources are put in the virtual space. This reminds me of the infamous 8D Audio where mono or stereo content gets panned around the head with a binaural panner. The difference with the Galaxy Buds Pro f.e. is that the panning is dependent on the head movement. So it’s not an actual surrounding 360° experience. This is not very beneficial for media like music or film, but very well for communication, like video calls.

    The other possibility is the more surrounding one with either Ambisonics, object-based formats, or multichannel formats. For 360° videos Ambisonics is the go-to format, in the VOD and music streaming sector it depends: Without dissecting where which format is used, it can be said that Dolby is a big player here and it’s also not always sure which format is actually used. However, when it comes to object based audio formats, Dolby Atmos and MPEG-H are prominent examples. Also, 5.1 surround is still a good option. In the end it has to be converted to binaural audio anyways, therefore a lot of different audio formats can be used for head tracking.

    What do I need Head-Tracking for?

    Most of the time head tracking is used for navigation and/or control. When we look at VR Applications or 360° Videos we control the field of view and the sound field by moving the head in order to navigate through the virtual world. Here we can distinguish between so-called “Three Degrees of Freedom” (3DoF) and Six Degrees of Freedom (6DoF). 3Dof means that you can move your head in all directions but can not change your position in the virtual world. This is usually the case in 360° Videos. With 6DoF you can also move around in the virtual realm, like in many VR experiences.

    Apart from controlling virtual surroundings, head tracking can be used in security. For example, using certain movement patterns as passwords. So we’d consider head-tracking as 3DoF since it can rotate but not translate in space. Let’s say for now because it’s possible to have this kind of tracking in the future. Similar to the inside-out tracking of VR headsets.

    What are the benefits of Head-Tracking?

    Head tracking allows us to experience realistic virtual experiences. In regards to spatial audio, the tracking of head movements enhances the sense of immersion further. In the real world, we use small head movements to localize sounds and head tracking lets us to that in the virtual realm. That’s also why the launch of head tracking on Apple Music could be a hit, but we’ll see.

    Also, head tracking allows us to navigate or control freely without using our hands. As humans, we are used to experiencing sound in three dimensions. And we know how the sound field around us changes when we move our head. Now, this feeling can finally be rendered in real time. Hence the new discussions on that.

    Another big benefit of head tracking and 3D audio is the reduction of Zoom fatigue. When you hear a room full of voices all coming from the same source (speakers or headphones) it takes the brain a lot of effort to decode them. Since with mono, the brain localizes the sound as “inside our head”. Also called “In head localization”. This cognitive overload is a key component of Zoom fatigue, as explained in this paper.

    Head tracking can eliminate that problem by adding direction to the voices. So they appear to come from multiple sources, which matches what your eyes are telling you. Since this seems to be so obvious yet under represented in any application, I dedicated and in-depth article on this topic.

    Where else is Head-Tracking used?

    Head tracking is also used in different military technologies. Pilots for example use it to gain further control over their machine. They can use it to control their cockpit faster. Or hear where an enemy is located as stated here.

    While music and films seem to be the most relevant options, I’d say it just includes a small portion of potential. Even if we just talk about entertainment. Because this is clearly where games have been using 3D audio to hear an enemy from behind for instance. Literally what pilots will do with spatialized sound in the future. So just as stated above – yet, less harmful.

    Also, conferences will benefit a lot from having a realistic feeling of conversations in the virtual world. Since we have more degrees of freedom than in a zoom call, audio also has more dimensions. This is where people like to drop the term “metaverse”. If want to know more about how to communicate in digital realities, check out this post.

    Is head-tracking the game-changer for 3D spatial audio?

    It depends on the content. There are use cases that I’m really excited about like 360 sound experiences. Or even watching a film as stated above with several apps.

    But there are scenarios where even I – a huge enthusiast – am skeptical, which is spatial 3d music. It’s big topic with Dolby Atmos and Apple Music at the moment so find out the real talk behind this.

    Some people say a personalized HRTF will be the holy grail of spatial audio for headphones. Others claim it’s headtracking. But there is way more behind it like frequency curves of headphones, amplifiers, loudness. And guess what: the content. I just can’t stress enough that this is where we can catch the people’s attention. In the end I really like head-tracking it helps fight the front-back localisation. With it, you’ll have good help to hear all the sounds of a scene. But finding out where this is useful and better than stereo, this is where it gets interesting – and I’ll gladly show you how.

    What hardware is already out there?

    Regarding head tracking hardware we can distinguish between built-in, or internal, and external technology. Internal head tracking is common for head-mounted displays but also in more and more headphones. Gyroscopes and accelerometers inside the device measure the head movements.

    External head-tracking devices are usually attached to the headphones or the headset. There are of course different technological approaches for those devices. Some work with infrared cameras, others work with the mentioned gyroscopes, and so on. Also, software plays a big part in head tracking since the localization data has to be interpreted. With the right software, even a standard webcam can become a head tracker.

    What external head tracking devices are out there?

    Head-Tracker Description
    Waves NX Head-Tracker This head tracker works with the mentioned NX Virtual Mix Room software, see Audeze Mobius.
    AudioEase AudioEase’s head tracking works with the 360pan Suite software, which is compatible with an inexpensive accelerometer and gyroscope, easily available from Amazon.
    3D Sound Labs The company produces head-tracking modules and related software. They also launched head-tracking-enabled headphones, the 3D Sound ONE, back in 2017.
    MrHeadTracker This is a Do-It-Yourself head tracker based on the Arduino platform. The cost of the device is just 25€.
    Supperware The elongated head tracker from Supperware can be easily attached to the headphones and comes with accompanying software.
    MMR – METAMOTIONR There already is a newer version of it called MMRL – METAMOTIONRL
    NVSonic Headtracker NYU It can work with OSC data similar to NXOSC by audioo.com

    If you want to know, which headphones and earbuds are already using head-tracking for spatial audio click here. Spoiler: Apple, Yamaha, Samsung, Bose are already in – so that’s huge.

    Millions of headphones with head tracking have already been sold on the market. But hardly any consumer is even aware of owning this technology. And manufacturers struggle to find decent content and use-cases to benefit from the infrastructure. Let’s change that!

    What about the audio content?

    It’s a common topic in spatial audio, the hardware and technology are already available to a great extent, but the content is not so much. It has to be understood that it’s a duality: Good content profits from good technology and vice versa. Therefore it’s time to get creative and explore what’s even possible with the already established hardware. Sure, there is already pretty cool stuff and good content out there, but oftentimes the availability or compatibility on different devices and/or platforms is a problem. And also it’s often not clear what actually determines ‘good content’. So I think there is still a lot to explore! Further, we shouldn’t forget that these technologies are still in their infancy, stereo, for example, came up in the midst of the last century and it took quite some time to figure out how to deal with it in a ‘proper’ way.

    How can head-tracking be done right?

    But how do we know if head tracking is done right? In an ideal world, consumers will use the technology without even thinking about it. But in reality, there are still a lot of constraints to take care of. That’s why I’m glad to have been working with CEVA here to shine more light on this complex topic.

    They offer the availability of a complete 3D audio reference design. It is used for the deployment of headsets and True Wireless Stereo (TWS) earbuds supporting spatial audio. The product can later be used for the use-cases listed above such as gaming, multimedia, and conferencing.

    To get everything working, you just need to plug in any pair of stereo headphones. Additionally, you connect a Bluetooth device like a smartphone as a playback device. Now you can play some music from your phone and experience music with head tracking. Meaning you hear it as if you play it back on speakers in your room. Just like you are used to from reality, when not wearing any headphones.

    head tracking

    What do you hear when using spatialized stereo?

    So in this case, we use a stereo input that is upmixed. Apple now refers to this as “spatialize stereo”. When you watch a movie for instance, the soundtrack is just a 2.0 stereo mix. This means you don’t need a fancy multichannel file in 5.1 or Dolby Atmos. Honestly, I’m skeptical about this since the Samsung Galaxy Buds Pro didn’t really do a good job with it.

    But I learned something new as I felt like the music didn’t lose its original punch. In my opinion, this is the main problem with 3d music as I state regarding Apple Music. The goal of having externalization fights against in head localization. This is supposed to give you the feeling of hearing music outside from your headphones.

    Due to most algorithms, the bass introduces artifacts and sounds worse compared to the normal stereo. With CEVA’s reference design I had for the first time ever the feeling that the music didn’t lose its timbre. It was close, as I like it from non-spatial audio, but still didn’t feel like “in my head”.

    The reason for it can be found in its software and hardware. The reference design leverages Beken’s BK3288X Bluetooth Audio SoC series. It is featuring the CEVA-X2 Audio DSP running VisiSonics’ RealSpace ® 3D audio software. Together with CEVA’s MotionEngine™ head tracking algorithms. This is where the sweet spot is hit by having latency-free hardware. Combined with software that not only works in a technical manner for the sake of it. But also affects the overall sound experience in a good way. Here is a little demo video:

    Latency

    As mentioned, another key component for a smooth head-tracking experience is latency. So far, most of the devices I tested have a noticeable delay when you move your head. Which means the sound moves slightly after you move your head. Still, I’d say that most consumers wouldn’t even hear this.

    The problem lies in the sensors that literally need some time before even noticing the movement since it is time-based. Even if it checks for movement every 10 Milliseconds, you have a minimum latency of 10 Milliseconds. But here as much as I tried to rotate my head, fast and slow, I feel like the tracking was always on point.

    This highly-optimized hardware plus software solution offers OEMs and ODMs (original design/equipment manufacturer) a cost-effective, ready-to-deploy SoC (System-on-a-Chip). Using any audio file format, for VR, AR, and the new generation of motion-aware earbuds where 3D audio enhances the overall user experience.

    Conclusion on a reference design

    When I saw the reference design kits for the first time I had a little flashback to my informatics studies when working with circuit boards. I usually just know market-ready products that somehow output sound magically. So it was super interesting for me to see the step before even making it into a product.

    Regarding features, it had everything I needed to test it. I could – sort of – plug n play my sound files and there even is a button to turn of the 3d audio upmixing. This allows instant comparisons. But of course, you can dig a little deeper and ask for different HRTF to tweak the output as you wish.

    The 3D Audio Reference Design is available directly from CEVA. The associated software package combining VisiSonics’ RS3D with CEVA’s MotionEngine™. It is likewise available now for licensing by CEVA.

    head tracking

    What else is possible in addition to the sound rotation?

    We learned that head tracking has lots of potential in audio and beyond. But what else could be done with it? The new T5 II True Wireless ANC earphones by Klipsch for example offer gesture control. This means you can take on phone calls just by nodding your head or skipping songs with different movement patterns. And it’s expected to gain further functionalities with this feature in the future.

    Sometimes it just needs creativity in how to use already available technology in order to find new use cases. But don’t be fooled by marketing terms. Forbes magazine thinks this gesture control is artificial intelligence (AI). Now you know better, it’s head-tracking 😉

    What does the future sound like?

    A big topic in this “tracking area” could be face and eye-tracking. It can be used to further refine head tracking, but also as a standalone technology. Face tracking allows the analysis of facial expressions and can be used in VR experiences to translate the emotional states of the player for example. Also, eye tracking can be used as another tool for navigation or control. Most of the time there is a main focus on the visual aspect first when it comes to new VR technology, but I’m sure there are also exciting audio use cases to come in the future! As mentioned, the audio head-tracking we are talking about here is just three degrees of freedom. But it’s likely that we’ll have more freedom in movement in the future. We know how it sounds like when we move our body to a sound source. This will be the next step of making augmented audio hyper-realistic.

    head tracking audio conclusion

    Conclusion on headtracking spatial audio

    As mentioned already at the beginning of this article, head tracking is used in a vast amount of different areas and applications. Although it’s always about tracking head movements, the way it’s used is what makes it versatile and diversified. And I think that’s always an exciting part of innovative technologies, you don’t know what it can and will be used for in the future.

    We also saw that there is already a good amount of hardware and head tracking technology in general out there. This means it’s time for us content creators to explore those technologies and find out what’s possible with them. Needless to say, I’m this kind of explorer and if you need someone like me, don’t hesitate to ask!

    more articles