Overview

3D Spatial Audio Wiki: A call to the community for glossary

Content

One thing in advance: This is not an ordinary blog entry as you know it from this homepage. This time the readers are invited to leave your passive role!

The idea and motivation

The goal is to create a comprehensive glossary / dictionary / wiki on the topic of 3D audio. At best, this can serve as a basis for future work and discussions. The deadline is the end of the year, in order to have a useful result next year.

Update: My colleague of Mach1 are doing something similar, let’s work together and help them out here: https://research.mach1.tech/glossary/

Here the music plays

It should have the most precise definitions possible and be freely accessible. Via this link you can access a Google document. This document already contains a glossary on the topic, which is waiting to be used, extended, and checked by other people.

After following the link and reaching the wiki, you will have the possibility to comment on each of the terms. The comment section can also be used for discussions! But please make sure to write only in the comment section. This way the administrators can keep a better overview.

Most of the terms on the topic have complex basics, so it is often difficult to formulate the most important things concisely. Furthermore, there is the danger with only one author that noteworthy information is forgotten or not recognized as such. In the case of an extensive glossary, many people should at best contribute their knowledge.

What is the motivation?

I have noticed that storytellers, for example, use completely different terms for sound than programmers, for example. Even among sound people, depending on their professional background, details can differ. Therefore, this project should serve as a basis for future discussions and also final papers, which usually have to contain a dictionary anyway.

You can’t please everybody – but many

I am fully aware that it is probably impossible to find a definition for every term that will make everyone happy without exception.

As new terms and technologies are constantly being introduced, this wiki does not claim to be complete and is an agile process. In fact, one would probably have to ask experts from all over the world and analyze the results scientifically. But I’m a fan of getting into action, so here goes!

this link

This is how it could look like (excerpt)

It is still unclear whether this blog is even the right platform for a dictionary. But exactly this is to be defined in the course of the process. The following section will show how it might look like:

General Spatial Audio Terms

Binaural, Headphone Surround

Binaural is the term used to describe audio content that can produce a three-dimensional sound image via headphones. To achieve this effect, a human-like artificial head is usually used in the recording process, which has a microphone in each ear canal. Due to the differences in time and level between the two microphones, as well as the resonances of the artificial head outer ear (Head Related Transfer Functions–> HRTF), realistic 3D audio recordings are created. Meanwhile, binaural audio can be calculated with the help of HRTFs, which is used for example when listening to Ambisonics over headphones.

HRTF

Stands for Head Related Transfer Functions, these describe the functioning of the human directional localization of sounds. This is composed of interaural time differences (depending on the direction, the sound arrives earlier in one ear than in the other), interaural level differences (depending on the direction, the sound is louder in one ear), and resonances due to the shape of the outer ear and to some extent even the shoulders. HRTFs can be calculated using evaluated impulse responses and are used especially when 3D audio content is listened to over headphones, for example in many VR applications.

Quad Binaural/Omni Binaural

For quad-binaural recordings, four artificial head microphones are used, each offset by 90°, thus covering all directions. During monitoring, the microphones are switched by crossfade, so you always hear a maximum of two of the four stereo channels while the other four would be muted. Such microphones are often called omni binaural although omni binaural could also be a different number than four pairs or ears.

Ambisonics Related Terms

There is already a blog post on how Ambisonics is the context of 360° videos, click here to read more about Ambisonics. A channel-independent recording and playback method for imaging spherical 360° sound fields. Ambisonics is largely the audio standard for 360° video content and is supported by Youtube and Facebook.

A-Format

The unencoded recording signal from (first-order) Ambisonics microphones is referred to as the A format. This can vary from microphone to microphone and must be encoded into the largely standardized B format to work with it.

B-Format

B format is the name for multi-channel audio formats that are used in the practical work with Ambisonics. It contains the direction and level information of the audio signals without a specific channel assignment, so the format is (theoretically) decodable for all playback situations. The directions are expressed by the axes of the Cartesian coordinate system X, Y, and Z. The ambiX-B format, which is supported by Youtube and Facebook, has established itself as standard.

Virtual Reality (VR) and 360° Video related Terms

Head Tracking

Head-Tracking is the technique for recording head movements. This technique is integral in the VR field. Concerning 3D audio, head tracking enables the representation of sound sceneries independent of head movements, since the perspective is changed in real time with the head movements. As a result, one can move more or less freely within a virtual soundscape. In the programming world, it is also referred to scene-locked audio. The sound objects are bound to the scene, i.e. head movement independent (through head tracking)

Head-Locked Audio

Head-Locked Audio can be understood as the opposite of Head-Tracked Audio. In the context of VR media, head-locked audio means sounds that follow every head movement, basically as if you were listening to music through normal headphones. Head-locked audio is mostly used for non-diegetic sounds, such as film music.

HMD, VR glasses, VR headset

There are numerous devices to dive into VR. While the early models such as Oculus Rift, HTC Vive are tethered (need to be connected to a computer), newer devices like the Oculus Go, Quest, Pico, etc. are standalone. They usually differ in degrees of freedom, the field of view, and if they have integrated headphones, etc which is more relevant for the audio engineer.

VR-Sound

All sounds and formats that are necessary for a VR experience can be described as VR sound. This can be Spatial Audio, but also normal stereo, for Head-Locked Audio, e.g. film music for VR.

Three Degrees of Freedom (3DoF)

3DoF is mostly used in the context of 360° media and describes the interaction or movement possibilities within the medium. Three Degrees of Freedom in this context means that rotational movements are tracked in all directions (sideways, backward and forwards, up and down). For VR-Experiences this means that head movements are tracked, but not changes of position. 3DoF is usually available for 360° videos.

Six Degrees of Freedom (6DoF)

In 6DoF, in addition to 3DoF, there are also translational movements, i.e. changing locations sideways, backward and forwards, and vertically. 6DoF is possible in many VR games.

“Somewhat considered 3D Audio” Terms worth knowing

Binaural Beats

Binaural beats have nothing to do with 3D audio and dummy head recordings, it is a psychoacoustic effect that occurs when a sound with a slightly different frequency is played over headphones for each ear. This results in a pulsating sound, unlike a beat, this pulsation is not caused by the superimposition of sound waves.

ASMR

Stands for Autonomous Sensitory Meridian Response, it is a phenomenon of physical sensations like tingling or tickling in certain parts of the body, which are triggered by certain sounds. These sensations are perceived as pleasant and are used by many people for relaxation. The sounds are mostly “intimate” noises such as whispers or soft voices. Often an artificial head microphone is used for these recordings.

back to the Blog

How do we localize sound for 3d binaural audio

Personalized Spatial Audio - The holy grail called HRTF

Ambisonics for Virtual Reality and 360° Soundfield