Overview

Virtual Reality Audio Formate – Pros & Cons

Content

Pretty much all technologies for playback of virtual reality audio on headphones can be put into three categories: channel-based, object-based and soundfield-based. Some examples, bevor we go into detail:

channel-based: e.g. stereo, 5.1 Surround
object-based: e.g. Dolby Atmos, G’Audio Lab
soundfield-based: e.g. Ambisonics

Channel-based

Every channel of an audio-file is routed to a fixed place for playback. For stereo, it is “left” and “right” and therefore has two channels. 5.1 Surround has six channels, which means there are two additional loudspeakers at the back of the listener, as well as a center between left and right, but also an LFE-channel for the subwoofer.

Binaural Stereo

Binaural basically means that you are able to get a surround experience, just via headphones and therefore only has two channels. It simulates how a human is hearing by recording with special microphones (a dummy head with two artificial ears) or it can be calculated as a downmix from a surround-formats with something called HRTF (Head-Related Transfer Function).

Pro	Contra
Can be played back on every platform, just like a normal stereo-file	No playback on loudspeaker – technically possible, but sounds weird
Quick possibility to listen to the tone of a spatial audio mix	No head-tracking possible, it can only deliver the sound of a fixed viewing direction

5.1 Surround

Pro	Contra
Can be played back not only on headphones for VR but on a conventional loudspeaker setup	positions in between the loudspeakers are only realized as phantom sound source
Easy setup, used and supported for ages	only two dimensional, doesn’t support height information from above or below

Object-based

Here, sounds are placed as so-called audio objects in 3D space without being bound to loudspeaker arrangements or channels. During the later playback, the position of the object in the room is calculated on the available loudspeakers, thus an acoustic irradiation with an almost unlimited number of loudspeakers is possible and represents in the arrangement in about a hemisphere.

Dolby Atmos VR

The Dolby Atmos tools got transformed to be used for virtual reality. The Atmos-master file is transcoded to an ec3-file, which later supports playback with head-tracking.

Pro	Contra
It is possible to playback the file without further decoding since it recognizes, the current situation and converts itself from surround to stereo	baked-in format, few possibilities for distribution, even Previewing your own video can be complicated
object-based approach on VR, playback during mixing is easily possible on surround loudspeaker setups	Its VR Transcoder can output ambisonics, but it is only first order and will sound worse than a Dolby’s ec3

G’Audio Lab

The team behind it already started working on MPEG-H in 2005 involved in its binaural rendering. But they knew that MPEG-H is not perfect for VR since it is not possible to use channel-, object-, and soundfield-based audio at the same time.

Pro	Contra
Uses the benefits of channel-, object-, and soundfield-based audio each	Not support on platforms, but encoding workaround to ambisonics possible
Can also be used for interactive VR, so moving away from the camera (6 instead of 3 degrees of freedom)	Mac-only at the moment

Soundfield-based

Ambisonics

For this format, I already wrote a more detailed article, right here Mixing with ambisonics for virtual reality audio is relatable to working with object-based audio. But the technologies are way different.

Ambix / FuMa (Furse-Malham)

These two Ambisonics formats are pretty similar and compatible with each other, so I will not distinguish between them.

Pro	Contra
High compatibility with other channel-based formats with decoder	No playback without decoder possible but it’s implemented on most platforms
Scalable with more channels, for 360° videos, four channels are already making fun	Music is usually static and not supposed to rotate with the scene, but stereo is only supported with a workaround which is not lossless (head-locked)

Two Big Ears (TBE)

It’s its own ambisonics format, which company got bought by Facebook and got introduced as its standard. It’s a hybrid higher order Ambisonics, which uses eight channels, a well thought through concept, but also some flaws. More on that perhaps on my blog.

Pro	Contra
Good compromise of channel number and possible resolution; it supports an additional static stereo-track which solves the ambisonics problem	baked-in format, it is difficult to bring it to other formats, but it was recently improved
complete pipeline from DAW to SDK, you won’t here big surprises throughout the process	nontransparent: what do the channels stand for, what kind of HRTF is being used etc. Free to use, but not open source

Quad-Binaural

Is a format, that can be classified somewhere between channel-based and soundfield-based. It relies on four stereo-files which represent four lines of sight at 0°, 90°, 180°, and 270°. During playback, the audios are interpolated for angles in between these fixed numbers. Although in the future it will probably be used less, it still has its right to exist.

Pro	Contra
When programming apps, there is no need to implement an HRTF with a decoder, which saves resources	playback is mostly only a mix of the audio and therefore not very accurate
the stereo track at 0° represents a downmix (see binaural stereo) which can be useful as a preview without head-tracking	It is possible to go e.g. from ambisonics to quad-binaural, but not vice versa, so it’s a dead end for post-production

So, that was my little overview of cinematic virtual reality audio. If you have any questions, comments or feedback, feel free to write a mail.

Get in contact

Ambisonic for Virtual Reality and 360° Soundfield

Spatial Audio format support for 360° Video Player VR Apps

360 Reality Audio - What is the Sony 360RA experience in detail?

MPEG-H Audio vs. "Dolby Atmos" - there is a winner!