By Martin Rieger, Immersive Audio Producer, VRTonung And Dominik Zingler, Senior Audio Designer, CryTek
There can be little doubt that 3D audio can greatly enhance the gaming experience when designed well. Being able to hear sounds from all directions can make for a considerably more immersive and exciting game.
This article discusses the state of the art and future potential. However, there are also some pitfalls to consider when using 3D audio in games, so let’s have a look behind the marketing hype of this new thing sometimes called “Spatial Audio”.
The related scientific paper from the ACM Digital Library (Association for Computing Machinery) can be found here.
Three dimensional audio is utilisised to help a gamer experience sound like in real-life – happening around you, being immersed inside the experience, making you forget you’re even wearing headphones. This phenomenon is also called externalization.
It’s great to make a user understand which sounds are part of the scene (like footsteps, the ambiance, etc.). But don’t use 3D audio for the sake of it, thinking it’s always the best choice.
If not used carefully, 3D audio can even distract a gamer. Mono is still favored for narrators and stereo for music. This is called inside head locatedness (IHL).
When wearing headphones, you’ll always localize the sound not around you, but within your head. This is where the sounds waves traveling from the left and right loudspeakers meet – inside of our head.
We are used to this but with the introduction of 3D audio it becomes clearer that this habit can become a problem.
As a rule of thumb, everything that is diegetic (part of the gaming scene) should be 3D, while the rest of the audio content is non-diegetic. The key to making spatial audio work is to find an audio guideline where the user instantly understands which sounds have which contexts.
This is where companies such as Dolby, Microsoft and DTS are wrong at the moment. They try to make everything 3D, but I say: The combination of formats is key (mono, stereo and surround).
It really depends on the scenario if 3D audio can improve the gaming experience. For POV or Third-Person-View games, the benefits are obvious.
You are moving through a 3D environment, so the sounds should also be three-dimensional, right? In a shooter game, it would allow you to pinpoint where exactly the enemy is firing from.
But imagine a strategy game where you are basically a God, watching your civilization grow from a top view. This is where stereo audio is perfectly fine and sounds from behind could be confusing.
So, what’s next? Gamers are used to staring at a screen in front of them.
But in our daily lives, our head makes continuous micro-movements which help us to orientate in our environment – or when watching a screen. This is where head-tracking introduces the next evolution of binaural 3D audio.
Slight movements help to distinguish sounds coming from the front with sounds from the back. Currently this so-called front-back-confusion (FBC) is a very real problem in binaural audio.
Apple is already building spatial audio technology and head-tracking into their AirPods! So, there are millions of devices on the market already. The numbers are rising constantly. Gaming headsets are also starting to implement gyroscopes or webcam trackers as add-ons.
Not only does the audio experience with dynamic head-tracking feel more natural, but it can even be used for gestures. Nodding or shaking your head are common forms of communication in our daily lives, but larger screens they can also communicate where we are attending to.
Combined with speech recognition we’ll have more interactive forms of gameplay. Artificial intelligence is getting smarter, and with more sensory inputs reacting even more natural to user inputs the gaming experience will become more fluid and life-like.
As artificial intelligence and neural networks continue to evolve, so will the possibilities for game audio. Currently, we work with prerendered audio assets. But as tools like Midjourney, Dall-E show, the audio could be more adaptive and personalized. Meaning: the generation of sound effects and music in real-time. Every user has a slightly different perception and taste – with personalization, the game can sound more like the user wants it to sound, resulting in being more realistic and believable for the user.
With this new infrastructure of hardware, there will be a new market for games, that put sound first. They could be audio-only or have visual cues where your ears help your eyes to unlock the next achievement. This will create a new way to immerse yourself in a game.
With most games, you just watch to the front. Now with head-tracking, you are free to move your head multiple axes. Virtual reality (VR) experiences are already taking advantage of the physical body movement. But what many of these virtual worlds are missing, is true realism.
The urge for increased realism and – connected to this – higher fidelity drove a lot of technological advancements in the computer graphics industry, and especially in games.
This focus on graphic fidelity left few resources for research and development of more realistic in-game audio.
With rising diminishing returns of investments in graphic fidelity, more and more studios and developers are getting back to game audio, seeing its enormous potential to improve the user experience.
Topics like 3D audio rendering, realistic room simulation and sound propagation are getting more attention to finally create: Fully Realistic Audio in games.
Whilst some of these technologies can be a “one-size-fits-all” solution, like 3D Audio rendering techniques, others, like room simulation and sound propagation, need to have flexibility built into the core of these technologies for designers to tweak them further.
The major challenge, even with all of this impressive upcoming technology will be, adapting them to a game’s creative vision. One of the major pitfalls in the drive to reach a higher level of realism is, that true acoustic reality is impossible and often undesireable to achieve.
The true goal should be an authentic, natural-feeling, high-fidelity psychoacoustic based auditory world.
Audio designers will always have to creatively adapt a simulated environment to a game’s needs. Sound attenuation is rarely based upon a physically correct simulation of the decrease in sound pressure level and perceived loudness. They are more often an adapted curve tailored to a game’s needs.
Next Generation Audio features need to be carefully integrated into a game’s content with an extensive toolset that enables designers to fully achieve their creative visions.
While simulating sound obstruction in video games with simple ray-casting methods have been common for years, more sophisticated sound occlusion, absorption, transmission, and propagation methods operating with raytracing operations are still uncommon.
This could come from both lack of availability of software solutions and, probably, even more, hardware limitations on the consumers’ side. These systems tend to be cost-intensive to implement properly (both from a technological and a design perspective).
Implementing them didn’t have a big advantage over conventional approaches as it couldn’t be used to full advantage (or at all) for most players as they simply wouldn’t have the required hardware processing power.
Now, that more people have better access to high-performance hardware in the form of current-gen consoles or cloud computing vai streaming, these technologies will start to seem more and more viable.
But even in their near-future state, even the most advanced room simulation technology would still have scope for improvement, such as the creation of more sophisticated and optimized early reflection systems that are using all the advantages of 3D Audio.
The exact positioning of the first order early reflections that are not only spatialized but also each have their own occlusion values calculated will provide a more natural sounding expansion of audio waves in a simulated environment.
But what about headphones? headphones are fine for some people. But there are other people who don’t want to use headphones. Maybe they don’t like how headphones feel after playing for hours.
Companies like Audioscenic are already developing a solution. Imagine a Soundbar that stands under your visual monitor. Normally, you’d hear that it is literally a few feet away from your ear. This is where beam-forming comes in. Sounds fancy – and it is!
A webcam tracks where your ears are. With smart audio waveform processing, the sound gets directly “beamed” to your ear canal. Even though the sound is coming from the front, your brain will have the feeling of sound also being localized from the left or even surround. The concept is called “invisible headphones” and sounds promising
The games industry is always on the lookout for new ways to improve the gaming experience, and 3D audio is one way that seems to hold a lot of promise.
However, with the right combination of technologies, 3D audio can not only improve the gaming experience but create new ways to play using head-tracking. The computer graphics industry has been striving for realism for years.
Now it is up to game audio to lead the way to even higher fidelity, and 3D audio will only be the first step in achieving this goal.Back