The coronavirus currently affects everyone in some way. The exciting thing about it is how different the extent can be: It’s brought more movement to the VR scene and there are applications that are actually benefiting from the Covid-19 lockdown. Such as the virtual events.
Even though I have already been able to take a closer look at some VR platforms as a consultant, the providers themselves should not be the topic today. I am more interested in showing what hidden potential immersive audio can bring to such virtual conferences, networking, or whatever events, and where solutions are still needed.
Ever noticed: a Skype or Zoom Meeting usually does not start with a "Hello". Instead, you greet each other more with a "Can you hear me?!".
This phenomenon is similar in VR, only much more complex: Because you are in a virtual space, you have to think spatially when you listen to the sound.
At real events, a few meters of distance is enough to make sure that nobody is listening in unintentionally. Reverberation is a major factor that helps to convey a sense of presence. Imagine a staircase where you are having a conversation. In reality, your own voice and that of the person you are talking to echo through every floor. As a result, our brain directly knows how loud we need to speak and who is likely to hear us.
So in reality, we hear our voice twice. One part is emitted through the room, reflected and perceived by our ear. And one part is transmitted directly from the speech organ without detours as bone sound. This also leads to the fact that our own voice sounds so unfamiliar to us on recordings – we simply hear ourselves differently.
In VR, however, the first point is omitted because the voice is not reflected in space. However, it is possible to implement this function, which is quite computationally intensive and is implemented quite complex in games, for example.
Now the topics "localization", "distance" and "presence" have already been addressed. But especially the last point is missing a decisive factor: The finetuning through sound design.
Even if our voice – as described above – can realistically propagate in virtual space through reverb parameters, our brain still thinks we are in an empty, abstract space. If we hear reverb on our voice, we still don’t really know whether we are in a train station, a concert hall, or a restaurant. Only when we hear a basic atmosphere, such as clinking cutlery, can our brain correctly classify the place as a restaurant.
The challenge is therefore the following: Good sound can’t be missing but it can’t be conspicuous either. Some platforms for virtual events use audio objects to create sounds such as birds (outside) or room sounds (inside). However, this is not the most optimal approach, as the sound is easily localized and generates unnecessary attention. Multi-channel audio formats offer better possibilities to breathe life into such spaces.
Let’s talk briefly about the "hard sound effects", i.e. sounds that are triggered by interactions. A good example is steps because these are almost the only sounds we make when we move. If this is not acoustically translated into VR, it can quickly happen that a person suddenly and unexpectedly stands next to you because you couldn’t hear how they approached you.
In reality, we usually perceive other people unconsciously. When they stand behind us, we cannot see them, but we can hear them. We are only frightened by this when someone sneaks up on us – the ears simply could not warn us.
Language is an important medium for communication on virtual events platforms. Nevertheless, the voices of the conversation partners usually sound a bit like "over the good old telephone". The reason for this is data reduction. Hearing tests and psychoacoustic analyses have shown that we do not need to hear all frequencies in order to interpret speech content. With Social-VR this means that all voices sound rather electronic and tinny. The sound engineer would also speak of a "bit crusher effect". In the end, that’s exactly what is done: the bit rate is crushed.
And the built-in microphones of VR glasses like the "Oculus Quest" or "Oculus Go" don’t sound too bad. But the priority is the real-time audio transmission – so the data packets must be kept as small as possible in order to continue to be transmitted in an understandable and seamless manner.
Currently, the first audio codecs are already being developed specifically for virtual events software. Thus, on the platform of the provider "High Fidelity", many people in a room can find their friends just by hearing. "TivoliCloud-VR" has already licensed the codec. So there is a lot happening in the backend of the tools at the moment.
However, please make sure to observe an audio netiquette: Everyone involved should wear headphones to avoid crosstalk between the audio playback and their own microphone! If the sound is played via loudspeakers, it goes straight back into the microphone. This means that conversation partners hear each other with a slight time lag and a rather unpleasant feedback is created. You know it from virtual telephoning with Skype or Zoom. The whole thing can technically be solved with noise reduction, but it’s best not to let this obstacle arise in the first place 😉Read more about the advantages