audio hardware tools
Overview

Virtual Reality Audio Formate – Pros & Cons

Content

    Pretty much all technologies for playback of virtual reality audio on headphones can be put into three categories: channel-based, object-based and soundfield-based. Some examples, bevor we go into detail:

    • channel-based: e.g. stereo, 5.1 Surround
    • object-based: e.g. Dolby Atmos, G’Audio Lab
    • soundfield-based: e.g. Ambisonics

    Channel-based

    Every channel of an audio-file is routed to a fixed place for playback. For stereo, it is "left" and "right" and therefore has two channels. 5.1 Surround has six channels, which means there are two additional loudspeakers at the back of the listener, as well as a center between left and right, but also an LFE-channel for the subwoofer.

    Binaural Stereo

    Binaural basically means that you are able to get a surround experience, just via headphones and therefore only has two channels. It simulates how a human is hearing by recording with special microphones (a dummy head with two artificial ears) or it can be calculated as a downmix from a surround-formats with something called HRTF (Head-Related Transfer Function).

    Pro Contra
    Can be played back on every platform, just like a normal stereo-file No playback on loudspeaker – technically possible, but sounds weird
    Quick possibility to listen to the tone of a spatial audio mix No head-tracking possible, it can only deliver the sound of a fixed viewing direction

    5.1 Surround

    Pro Contra
    Can be played back not only on headphones for VR but on a conventional loudspeaker setup positions in between the loudspeakers are only realized as phantom sound source
    Easy setup, used and supported for ages only two dimensional, doesn’t support height information from above or below

    Object-Basiert

    Here, sounds are placed as so-called audio objects in 3D space without being bound to loudspeaker arrangements or channels. During the later playback, the position of the object in the room is calculated on the available loudspeakers, thus an acoustic irradiation with an almost unlimited number of loudspeakers is possible and represents in the arrangement in about a hemisphere.

    Dolby Atmos VR

    The Dolby Atmos tools got transformed to be used for virtual reality. The Atmos-master file is transcoded to an ec3-file, which later supports playback with head-tracking.

    Pro Contra
    It is possible to playback the file without further decoding since it recognizes, the current situation and converts itself from surround to stereo baked-in format, few possibilities for distribution, even Previewing your own video can be complicated
    object-based approach on VR, playback during mixing is easily possible on surround loudspeaker setups Its VR Transcoder can output ambisonics, but it is only first order and will sound worse than a Dolby’s ec3

    G’Audio Lab

    The team behind it already started working on MPEG-H in 2005 involved in its binaural rendering. But they knew that MPEG-H is not perfect for VR since it is not possible to use channel-, object-, and soundfield-based audio at the same time.

    Pro Contra
    Uses the benefits of channel-, object-, and soundfield-based audio each Not support on platforms, but encoding workaround to ambisonics possible
    Can also be used for interactive VR, so moving away from the camera (6 instead of 3 degrees of freedom) Mac-only at the moment

    Soundfield-based

    Ambisonics

    For this format, I already wrote a more detailed article, right here Mixing with ambisonics for virtual reality audio is relatable to working with object-based audio. But the technologies are way different.

    Ambix / FuMa (Furse-Malham)

    These two Ambisonics formats are pretty similar and compatible with each other, so I will not distinguish between them.

    Pro Contra
    High compatibility with other channel-based formats with decoder No playback without decoder possible but it’s implemented on most platforms
    Scalable with more channels, for 360° videos, four channels are already making fun Music is usually static and not supposed to rotate with the scene, but stereo is only supported with a workaround which is not lossless

    Two Big Ears (TBE)

    It’s its own ambisonics format, which company got bought by Facebook and got introduced as its standard. It’s a hybrid higher order Ambisonics, which uses eight channels, a well thought through concept, but also some flaws. More on that perhaps on my blog.

    Pro Contra
    Good compromise of channel number and possible resolution; it supports an additional static stereo-track which solves the ambisonics problem baked-in format, it is difficult to bring it to other formats, but it was recently improved
    complete pipeline from DAW to SDK, you won’t here big surprises throughout the process nontransparent: what do the channels stand for, what kind of HRTF is being used etc. Free to use, but not open source

    Quad-Binaural

    Is a format, that can be classified somewhere between channel-based and soundfield-based. It relies on four stereo-files which represent four lines of sight at 0°, 90°, 180°, and 270°. During playback, the audios are interpolated for angles in between these fixed numbers. Although in the future it will probably be used less, it still has its right to exist.

    Pro Contra
    When programming apps, there is no need to implement an HRTF with a decoder, which saves resources playback is mostly only a mix of the audio and therefore not very accurate
    the stereo track at 0° represents a downmix (see binaural stereo) which can be useful as a preview without head tracking It is possible to go e.g. from ambisonics to quad-binaural, but not vice versa, so it’s a dead end for post-production

    So, that was my little overview of cinematic virtual reality audio. If you have any questions, comments or feedback, feel free to write a mail.

    More on Audio Formats