Formats for cinematic Virtual Reality Audio
Pretty much all technologies for playback of virtual reality audio on headphones can be put into three categories: channel-based, object-based and soundfield-based. Some examples, bevor we go into detail:
- channel-based: e.g. stereo, 5.1 Surround
- object-based: e.g. Dolby Atmos, G’Audio Lab
- soundfield-based: e.g. Ambisonics
Every channel of an audio-file is routed to a fixed place for playback. For stereo, it is "left" and "right" and therefore has two channels. 5.1 Surround has six channels, which means there are two additional loudspeakers at the back of the listener, as well as a center between left and right, but also an LFE-channel for the subwoofer.
Binaural basically means that you are able to get a surround experience, just via headphones and therefore only has two channels. It simulates how a human is hearing by recording with special microphones (a dummy head with two artificial ears) or it can be calculated as a downmix from a surround-formats with something called HRTF (Head-Related Transfer Function).
|Can be played back on every platform, just like a normal stereo-file||No playback on loudspeaker – technically possible, but sounds weird|
|Quick possibility to listen to the tone of a spatial audio mix||No head-tracking possible, it can only deliver the sound of a fixed viewing direction|
|Can be played back not only on headphones for VR but on a conventional loudspeaker setup||positions in between the loudspeakers are only realized as phantom sound source|
|Easy setup, used and supported for ages||only two dimensional, doesn’t support height information from above or below|
Hierbei werden Töne als sogenannte Audio-Objekte im 3D Raum platziert werden, ohne an Lautsprecher-Anordnung oder Kanälen gebunden zu sein. Bei der späteren Wiedergabe wird die Position des Objekts im Raum auf die zur Verfügung stehenden Lautsprecher berechnet, somit ist eine Beschallung mit einer quasi unbegrenzten Zahl an Lautsprechern möglich und stellt in der Anordnung in etwa eine Halbkugel da. Damit ist es aber nicht ohne weiteres möglich Objekte unterhalb des Betrachters zu platzieren, wie bei Virtual Reality Audio nötig wäre.
Dolby Atmos VR
The Dolby Atmos tools got transformed to be used for virtual reality. The Atmos-master file is transcoded to an ec3-file, which later supports playback with head-tracking.
|It is possible to playback the file without further decoding since it recognizes, the current situation and converts itself from surround to stereo||baked-in format, few possibilities for distribution, even Previewing your own video can be complicated|
|object-based approach on VR, playback during mixing is easily possible on surround loudspeaker setups||Its VR Transcoder can output ambisonics, but it is only first order and will sound worse than a Dolby’s ec3|
The team behind it already started working on MPEG-H in 2005 involved in its binaural rendering. But they knew that MPEG-H is not perfect for VR since it is not possible to use channel-, object-, and soundfield-based audio at the same time.
|Uses the benefits of channel-, object-, and soundfield-based audio each||Not support on platforms, but encoding workaround to ambisonics possible|
|Can also be used for interactive VR, so moving away from the camera (6 instead of 3 degrees of freedom)||Mac-only at the moment|
For this format, I already wrote a more detailed article, right here: https://www.vrtonung.de/ambisonics/ Mixing with ambisonics for virtual reality audio is relatable to working with object-based audio. But the technologies are way different.
Ambix / FuMa (Furse-Malham)
These two ambisonics formats are pretty similar and compatible with each other, so I will not distinguish between them.
|High compatibility with other channel-based formats with decoder||No playback without decoder possible but it’s implemented on most platforms|
|Scalable with more channels, for 360° videos, four channels are already making fun||Music is usually static and not supposed to rotate with the scene, but stereo is only supported with a workaround which is not lossless|
Two Big Ears (TBE)
It’s its own ambisonics format, which company got bought by Facebook and got introduced as its standard. It’s a hybrid higher order Ambisonics, which uses eight channels, a well thought through concept, but also some flaws. More on that perhaps on my blog.
|Good compromise of channel number and possible resolution; it supports an additional static stereo-track which solves the ambisonics problem||baked-in format, it is difficult to bring it to other formats, but it was recently improved|
|complete pipeline from DAW to SDK, you won’t here big surprises throughout the process||nontransparent: what do the channels stand for, what kind of HRTF is being used etc. Free to use, but not open source|
Is a format, that can be classified somewhere between channel-based and soundfield-based. It relies on four stereo-files which represent four lines of sight at 0°, 90°, 180° and 270°. During playback, the audios are interpolated for angles in between these fixed numbers. Although in the future it will probably be used less, it still has its right to exist.
|When programming apps, there is no need to implement an HRTF with a decoder, which saves resources||playback is mostly only a mix of the audio and therefore not very accurate|
|the stereo track at 0° represents a downmix (see binaural stereo) which can be useful as a preview without head tracking||It is possible to go e.g. from ambisonics to quad-binaural, but not vice versa, so it’s a dead end for post-production|
So, that was my little overview of cinematic virtual reality audio. If you have any questions, comments or feedback, feel free to write a mail.