Increasingly popular in recent years, the podcast medium is part of everyday life for many in 2021. And the concept of an auditory information broadcast is anything but new – think radio broadcast.
There’s now a podcast on just about every topic, so there’s no need to worry about the supply of content. Fortunately, there is now new momentum with 3D Spatial Audio technology.
Recently iHeartRadio created a little hype with reports on engadget.com, gearnews.de and theverge.com. They have expanded production capabilities and have dozens of projects planned for this year.
Will 3D audio now reach podcasts, radio and audio drama? Right now the topic is creating a buzz again with the Sony Playstation 5, Apple Airpods Pro and Samsung Galaxy Buds Pro. So let’s take a look at what binaural 3D sound means for platforms like Spotify, Soundcloud and others:
It’s almost obvious that spatial audio is an asset to podcasts: immersive audio content allows listeners to find themselves in the middle of the action, whether it’s an interview, a story, or a radio play. As in many other areas, 3D audio offers more possibilities in terms of storytelling, immersion, and conception.
So much for the typical buzzwords of similar articles. But in this blog, of course, we take it a step further. Time to analyze the true added value for 3D podcasts from a technical and creative perspective.
In classical radio plays, the listener is usually only a silent observer. You passively follow the action and are provided with additional information by an additional narrator. But with 3D Audio you have much more the impression to embody a part of the scene. The brain mentally creates a three-dimensional image of the scenery and thus understands where the protagonists are around you.
With head tracking even more, but more about that at the end of this blog post.
The fact that we can now arrange the sounds spatially makes it easier for the brain to process this information quickly. A so-called abstraction level falls away. In a podcast – which is classically produced in mono or stereo – we use headphones to localize the speaker: in our heads. But this contradicts our natural listening habits. If we meet with our friends – after Corona – we don’t hear their voices in our heads either, but localize them in the room.
This makes immersive audio games more pleasant to listen to in the long run, to put it simply. Because the brain is not constantly busy abstracting where the voices are to be spatially classified. This phenomenon is increasingly known as Zoom Fatigue. If we have other voices in our heads all day, it is tiring. But if they are externalized, i.e. perceived outside our head, the problem seems to be solved.
The use of a narrator’s voice as a voice-over can still offer added value at certain points. My Heat-Map Case Study has already shown that the combination of spatial sound for protagonists and off-speakers in mono prevents confusion. This is because it provides an innovative way for consumers to understand whether a person is part of the scene and thus more spatially locatable. Or whether it is a narrator’s voice depicting an in-head localization.
In fact, iHeart Media, one of the largest media companies in the U.S., for example, has recognized that the benefits of spatial audio are the future. The company recently announced that it was investing in 3D audio. But other content productions are also increasingly jumping on this bandwagon – including yours truly, of course:
The following list is not exhaustive and will probably grow in the future. But with my colleagues from HearOn I have an ear on it:
Those podcasts that already work with 3D audio do so mostly on the basis of binaural audio. This works well and is definitely better than normal stereo or even mono. But this is where most rookies fall right into the first trap.
The playback platforms such as Podigee, Chimpify and Stitcher usually do not support more than two audio channels. But you must not make the mistake of thinking the production is stereo. This format can only be played back on headphones and is not suitable for loudspeakers. This unnecessarily restricts the target group and overlooks the future topics of smart speakers, soundbars, and automotive car hi-fi.
In addition, there is no possibility of later personalization. The algorithms that generate the three-dimensional sound for the human ears (keyword HRTF) are generic. That is, the calculation is based on the mathematical average of our head size and ear shape. In the future, our individual parameters will be taken into account during playback. As is already the case today with the Sony WH-1000XM4, for example.
The challenges mentioned above can be overcome with the right partner. In addition, there is already another possibility for the next wow effect with Next Generation Audio. In the meantime, Apple or Samsung, for example, offer the head-tracking enabled headphones already mentioned.
But what is the added value? In combination with audio content that is produced on an object basis, listeners even have the experience of moving around the sound scene. So the whole scene is adapting to the head movement just like in real life. This takes the sound experience to the next level and is an absolutely unique selling point.
And if we are already naming loud buzzwords here anyway. This is sustainable. In fact, at VRTonung, we can ensure that productions are future-proofed to serve other media with the advent of soundbars and smart speakers.
We can find out how this can be implemented for your next project during a free consultation together. Looking forward to it.