HD, 4K, 8K, 16K? The evolution of the visual component in entertainment media seems to be progressing rapidly, but what about audio? The term Next Generation Audio has been given a catchy name to the latest auditory development in the broadcasting and streaming sector.
Next Generation Audio should not only provide consumers with immersive listening experiences but should also offer interaction possibilities. That sounds very promising, so what’s behind it?
The DVB consortium decided to support the two formats Dolby AC-4 and MPEG-H. These are both audio formats that support multi-channel and object-based content and also make do with comparatively low bandwidth. The big advantage of object-based audio is its channel independence since the rendering is only done in the playback system of the end-user. Systems that support Next Generation Audio must, therefore, have an appropriate decoder installed. This promises a constantly optimized audio playback.
So what about the interaction? The magic word in this respect is meta-data. As already mentioned, the rendering takes place in the playback system. So how does the decoder know how the mix should sound? Right, through meta-data. This data is transmitted as a separate track and contains information about volume ratios, positions of audio objects, and interaction parameters.
Besides mixing, the producer can set parameters to determine to what extent interactions are possible later during playback. In practice, these are usually volume ratios that can be controlled, or the selection of different presets. For example, different languages or voice amplification for people with hearing loss. Theoretically, however, as a producer, you could sound out the possibilities and let intervene in all conceivable parameters during playback. An interesting idea which in turn raises interesting questions.
If you think briefly about the theoretical possibilities of interaction, it quickly becomes clear that it is probably not always sensible to give the consumer so much freedom. Think for example of the news or political broadcasts in general. Here, too much individualization would probably be counterproductive or almost manipulative, if certain persons could simply be muted. In other words, how you deal and will deal with Next Generation Audio depends on the content. Also, the question then arises as to whether and how television broadcasts, for example, will develop. The potential for new broadcasting concepts is definitely there!
Establishing these new freedoms could prove to be difficult, as users are not necessarily used to being able to intervene in the mix of a broadcast. Moreover, these possibilities are likely to appeal only to a technology-oriented clientele and will probably pass most of the masses without any concern. Who knows, maybe broadcasts optimized for Next Generation Audio could help to ensure that the interaction possibilities are not only registered by people but also actively accepted.
The effects of these possibilities on the production side are also exciting, as the mixer has to say goodbye to delivering a mix set in stone. Pessimistically seen one releases a mix that probably won’t be the personally preferred optimum because one leaves its completion to the consumer. Optimistically seen it is, however, a new challenge for him to explore and try out new things, and that is exciting.
In May 2017, South Korea introduced MPEG-H, the first Next Generation Audio Codec for a 4K UHD TV service. Major events are often important "springboards" for the advancement of new technologies. The 2018 Olympic Games in Pyeongchang, for example, are important stepping stones for the use of Next Generation Audio. Furthermore, the "Rock in Rio" festival and the Eurovision Song Contest have already been broadcast in MPEG-H format (see video). Meanwhile the format is also officially used in China and Brazil. In this respect, it would be exciting to know what the consumers who already use these formats regularly say in general about it.
The audio codecs AC-4 and MPEG-H are also already used in the purely musical field, for various music streaming services. However, so far only the immersive aspect of Next Generation Audio (NGA) is relevant here. The still quite manageable range of "3D music" is usually only available for an additional charge to the normal subscription of a streaming service. Also, apart from headphones, there are still very few products on the market that can play NGA. It will probably take some time before the format becomes established, unless the audio content is made more easily accessible.
Next Generation Audio definitely has the potential to bring a breath of fresh air into audio consumption and production. It remains to be seen to what extent this potential will be used. And above all, whether it will reach the people. I think a step in the right direction would be to make accessibility more attractive for consumers on the one hand and to make production tools more accessible for freelance producers on the other. The latter concerns the musical sector in particular.
But the technology is already very advanced and the creative possibilities are endless. I really want to bring immersive and interactive audio to the people via NGA – it can’t take much longer.