Actually, this audio format is about 40 years old and has barely established itself. The problem is that the sound usually has to be encoded into the ambisonics room and decoded for playback. In addition, the sweet spot is very small for speaker arrays.
However, since 360° videos are played back almost exclusively via headphones with VR headsets, the necessary decoders can already be implemented in the software of the video players. A binaural stereo audio signal is calculated in real time from the ambisonics signal via the HRTF (Head-Related Transfer Function), which changes depending on the viewer’s point of view inside the sphere.
Before delving into Ambisonics, it’s essential to understand its precursor and distant “cousin” – the Mid-Side technique. Mid-Side is a widely used stereo recording method that involves a cardioid microphone capturing sounds from the front (Mid) and a figure of eight microphone capturing sounds from the sides (Side). This technique forms the foundation for understanding the concepts of spatial audio and encoding sound information, which are fundamental aspects of Ambisonics.
For an Ambisonics microphone, a compact tetrahedron arrangement with four cardioid microphone capsules showing as a spherical shape in all directions (see picture above) has proven its worth. This signal is recorded by a minimum four-channel recorder, which should have as little self-noise as possible and digital gain control to ensure identical preamplification of all microphones.
Due to only using 4 audio channels, its spatial resolution is not very high. Recorders and microphones are placed as close as possible to the 360° camera since the camera tripod is retouched from the image and with it the sound equipment.
Ambisonic recordings are referred to as A Format and must be transferred later in post-production to the B format by an encoder plugin in order to be processed further.
In the following, we will present some relevant Ambisonics microphones that can be used for VR productions. Spoiler: There is no perfect ambisonic microphone
Soundfield offers three high-quality Ambisonic microphones on the market that are in the higher price segment. For VR productions, these microphones might be too big in some cases, as they are usually hidden under the camera. A more handy alternative is the newer Røde NT-SF1, which was developed in collaboration between Røde and Soundfield
The Sennheiser Ambeo® VR microphone also offers a good balance between size and sound quality. The Ambeo produces what is known as A-format, a raw 4-channel file that must be converted to Ambisonics b format. Sennheiser provides the appropriate software for this purpose. Together with Dear Reality, Sennheiser offers a plug-in that converts the audio signals from A-format to B-format and rotates it directly.
For smaller productions, the Zoom H3-VR recorder is a relatively inexpensive and practical solution. Unlike the other ambisonic microphones, this one already has a recorder built in, which can speed up the workflow considerably. In addition, Zoom offers a standalone software for audio editing. This allows the ambisonic files to be converted to stereo, binaural audio or 5.1, for example, without having to use a DAW.
An interesting system with a different approach is the MH Acoustics Eigenmike. With this microphone it is possible to record both FOA and Higher Order Ambisonics. The microphone contains a large number of individual microphone capsules. However, it is not only available at a very high price point, but also presents a challenge to achieve a good level of sound quality with such a complex system.
The audio signal keeps its four channels when converting from A to B format, but instead of the microphone capsules 1, 2, 3 and 4, the signals in B format are now converted to W, X, Y and Z channels. X, Y, and Z are spatial axes (polar patterns), while W is a compatible mono signal that contains all signal components and is therefore omnidirectional. The following image illustrates the arrangement of the described channels. Mathematicians also calls this kind of arrangement Spherical harmonics. You can think of it as mid-side stereophony, only with three side signals instead of one.
The four-channel resolution is relatively diffuse but is particularly suitable for ambiances and original tones, which would have to be created time-consumingly in post-production by foleys and placed in 3D audio space. Since the ambisonic recording can be adopted as it is from the set and the head-tracking already works, it offers a good starting point for the spatial sound mix for sound designers.
There are two major spatial audio formats used for VR. The first is the open AmbiX format, which is compatible with YouTube, Facebook, Samsung, Oculus, and many other VR players. The second format is called Ambisonics format Two Big Ears (.tbe). It was developed by the Two Big Ears company, which was later bought by Facebook.
When designing sound for VR movies in digital audio workstations (DAW), the B format is often used. Here it is important that the DAW has at least 4-channel tracks. Standard DAWs such as ProTools, Logic or Audition support mixing sound for videos, but they are not compatible with VR videos. Additional plugins are needed for this, for example the free Facebook 360 Spatial Workstation (FB360) plugin. With this program, VR videos can be played back in the DAW and spatial audio for VR can be generated.
Surround sound enthusiasts will find a compelling advantage in Ambisonics. Unlike most immersive audio or traditional surround sound formats, which are hemispherical and primarily focus on left-right sound projection. Ambisonics offers spherical audio projection, even incorporating height information “from below” compared to technologies like Dolby Atmos. This is an interesting benefit for sound reproduction, but later more on that.
As the fan base continues to grow, so does the number of platforms on which VR content is made available. However, many of these platforms have demanding content requirements and are not always easily accessible to VR creators.
The two leading platforms that allow immersive video content such as VR movies to be uploaded are YouTube 360 and Facebook 360, both of which have uploaded tens of thousands of VR videos. YouTube 360 supports First Order Ambisonics in b format signal, while on Facebook 360 also allows tbe content with optional head-locked stereo track. The corresponding audio formats can be exported with the FB360 encoder to ensure correct display on the respective platform.
First-order ambisonics is an extremely useful way to create spatial sound for VR movies. However, it also has some limitations. The spatial resolution of the B-format is limited and cannot always be reproduced perfectly, especially when recording with Ambisonics microphones in reverberant environments or when the microphone is far away from the sound source.
In such cases, the sound may appear diffuse. Using computer-generated Ambisonics audio that simulates HRTF (Head-Related Transfer Function) can also lead to localization errors. One possible solution is to place the Ambisonics microphone closer to the sound source, as is common in conventional movies. However, this has the disadvantage that the microphone is visible in a 360° shot. On the other hand, spatial perception can be distorted if the microphone is far away from the 360° camera, since the sound sources then no longer correspond to the visual positions.