DE
EN

Formats for cinematic Virtual Reality Audio

Pretty much all technologies for playback of virtual reality audio on headphones can be put into three categories: channel-based, object-based and soundfield-based. Some examples, bevor we go into detail:

  • channel-based: e.g. stereo, 5.1 Surround
  • object-based: e.g. Dolby Atmos, G’Audio Lab
  • soundfield-based: e.g. Ambisonics

Channel-based

Every channel of an audio-file is routed to a fixed place for playback. For stereo, it is "left" and "right" and therefore has two channels. 5.1 Surround has six channels, which means there are two additional loudspeakers at the back of the listener, as well as a center between left and right, but also an LFE-channel for the subwoofer.

Binaural Stereo

Binaural basically means that you are able to get a surround experience, just via headphones and therefore only has two channels. It simulates how a human is hearing by recording with special microphones (a dummy head with two artificial ears) or it can be calculated as a downmix from a surround-formats with something called HRTF (Head-Related Transfer Function).

Pro Contra
Can be played back on every platform, just like a normal stereo-file No playback on loudspeaker – technically possible, but sounds weird
Quick possibility to listen to the tone of a spatial audio mix No head-tracking possible, it can only deliver the sound of a fixed viewing direction

5.1 Surround

Pro Contra
Can be played back not only on headphones for VR but on a conventional loudspeaker setup positions in between the loudspeakers are only realized as phantom sound source
Easy setup, used and supported for ages only two dimensional, doesn’t support height information from above or below

Object-Basiert

Hierbei werden Töne als sogenannte Audio-Objekte im 3D Raum platziert werden, ohne an Lautsprecher-Anordnung oder Kanälen gebunden zu sein. Bei der späteren Wiedergabe wird die Position des Objekts im Raum auf die zur Verfügung stehenden Lautsprecher berechnet, somit ist eine Beschallung mit einer quasi unbegrenzten Zahl an Lautsprechern möglich und stellt in der Anordnung in etwa eine Halbkugel da. Damit ist es aber nicht ohne weiteres möglich Objekte unterhalb des Betrachters zu platzieren, wie bei Virtual Reality Audio nötig wäre.

Dolby Atmos VR

The Dolby Atmos tools got transformed to be used for virtual reality. The Atmos-master file is transcoded to an ec3-file, which later supports playback with head-tracking.

Pro Contra
It is possible to playback the file without further decoding since it recognizes, the current situation and converts itself from surround to stereo baked-in format, few possibilities for distribution, even Previewing your own video can be complicated
object-based approach on VR, playback during mixing is easily possible on surround loudspeaker setups Its VR Transcoder can output ambisonics, but it is only first order and will sound worse than a Dolby’s ec3

G’Audio Lab

The team behind it already started working on MPEG-H in 2005 involved in its binaural rendering. But they knew that MPEG-H is not perfect for VR since it is not possible to use channel-, object-, and soundfield-based audio at the same time.

Pro Contra
Uses the benefits of channel-, object-, and soundfield-based audio each Not support on platforms, but encoding workaround to ambisonics possible
Can also be used for interactive VR, so moving away from the camera (6 instead of 3 degrees of freedom) Mac-only at the moment

Soundfield-based

Ambisonics

For this format, I already wrote a more detailed article, right here: https://www.vrtonung.de/ambisonics/ Mixing with ambisonics for virtual reality audio is relatable to working with object-based audio. But the technologies are way different.

Ambix / FuMa (Furse-Malham)

These two ambisonics formats are pretty similar and compatible with each other, so I will not distinguish between them.

Pro Contra
High compatibility with other channel-based formats with decoder No playback without decoder possible but it’s implemented on most platforms
Scalable with more channels, for 360° videos, four channels are already making fun Music is usually static and not supposed to rotate with the scene, but stereo is only supported with a workaround which is not lossless

Two Big Ears (TBE)

It’s its own ambisonics format, which company got bought by Facebook and got introduced as its standard. It’s a hybrid higher order Ambisonics, which uses eight channels, a well thought through concept, but also some flaws. More on that perhaps on my blog.

Pro Contra
Good compromise of channel number and possible resolution; it supports an additional static stereo-track which solves the ambisonics problem baked-in format, it is difficult to bring it to other formats, but it was recently improved
complete pipeline from DAW to SDK, you won’t here big surprises throughout the process nontransparent: what do the channels stand for, what kind of HRTF is being used etc. Free to use, but not open source

Quad-Binaural

Is a format, that can be classified somewhere between channel-based and soundfield-based. It relies on four stereo-files which represent four lines of sight at 0°, 90°, 180° and 270°. During playback, the audios are interpolated for angles in between these fixed numbers. Although in the future it will probably be used less, it still has its right to exist.

Pro Contra
When programming apps, there is no need to implement an HRTF with a decoder, which saves resources playback is mostly only a mix of the audio and therefore not very accurate
the stereo track at 0° represents a downmix (see binaural stereo) which can be useful as a preview without head tracking It is possible to go e.g. from ambisonics to quad-binaural, but not vice versa, so it’s a dead end for post-production

So, that was my little overview of cinematic virtual reality audio. If you have any questions, comments or feedback, feel free to write a mail.

zurück zum Blog