for Virtual Reality Videos
Actually, this format is about 40 years old and has barely established itself. The problem is that the sound usually has to be encoded into the ambisonics room and decoded for playback. In addition, the sweet spot is very small for loudspeaker applications. However, since 360° videos are played back almost exclusively via headphones with VR headsets, the necessary decoders can already be implemented in the software of the video players. A binaural stereo signal is calculated in real time from the ambisonics signal via the HRTF (Head-Related Transfer Function), which changes depending on the viewer’s point of view inside the sphere.
For an Ambisonics microphone, a compact tetrahedron arrangement with four cardioid microphone capsules showing as a spherical shape in all directions (see picture above) has proven its worth. This signal is recorded by a minimum four-channel recorder, which should have as little self-noise as possible and digital gain control to ensure identical preamplification of all microphones. Recorders and microphones are placed as close as possible to the 360° camera since the camera tripod is retouched from the image and with it the sound equipment. This ambisonics recording is referred to as A-Format and must be transferred later in post-production to the B format by an encoder plugin in order to be processed further.
The audio signal keeps its four channels when converting from A to B format, but instead of the microphone capsules 1, 2, 3 and 4, the signals in B format are now converted to W, X, Y and Z channels. X, Y, and Z are spatial axes, while W is a mono-compatible signal that contains all signal components and is therefore omnidirectional. The following image illustrates the arrangement of the described channels.
The four-channel resolution is relatively diffuse but is particularly suitable for ambiances and original tones, which would have to be created time-consumingly in post-production by foleys and placed in 3D audio space. Since the recording can be adopted as it is from the set and the head-tracking already works, it offers a good starting point for the spatial sound mixing.
Advantages / Disadvantages
- Ambisonics is spherical in shape and, unlike most surround or immersive formats, which represent a hemisphere at most, can also project height information "from below".
- Patents have already expired, making the technology virtually freely accessible.
- Ambisonics is arbitrarily expandable by higher orders with more channels, on the other hand, no extra downmix is necessary to get e.g. from 16 channels to 4; only the resolution is lower.
- With First Order Ambisonics Ambix (ACN, SN3D) a standard is being established. (e. g. on YouTube)
- Cannot be played back correctly without a decoder and the same audio may sound differently on different platforms.
- Low compatibility for static stereo sounds (e. g. music) -> workaround required
- Ambisonics is scene-based, so has limited possibilities to move away from the camera position, which means it’s less suited for interactive vr-experiences.
- Higher order Ambisonics helps for a better localization, but requires a high bitrate and does not really solve the problem of being limited to a number of audio-tracks.