HD, 4K, 8K, 16K? The evolution of the visual component in entertainment media seems to be rapid, but what about audio? The MPEG-H audio format, also known as Next Generation Audio (NGA), is revolutionizing the way we perceive audio content. It enables immersive 3D sound and enhanced audio quality for an immersive listening experience. Next Generation Audio is a catchy name given to the latest auditory evolution in broadcasting and streaming.
Next Generation Audio should not only provide consumers with immersive listening experiences but should also offer interaction possibilities. That sounds very promising, so what’s behind it?
The DVB consortium decided to support the two formats Dolby AC-4 and MPEG-H. These are both audio formats that support multi-channel and object-based content and also make do with comparatively low bandwidth. With the advent of immersive audio production technologies, content creators gain more creative freedom to shape the way sound elements are woven together, much like a composer in a symphony. The big advantage of object-based audio is its channel independence since the rendering is only done in the playback system of the end-user. Systems that support Next Generation Audio must, therefore, have an appropriate decoder installed. This promises a constantly optimized audio playback.
What about interaction? The magic word here is: meta data. As already mentioned, rendering takes place in the playback system. The audio definition model (ADM) serves as a framework for next generation audio content, allowing for the intricate manipulation of sound elements during immersive audio production. With the help of metadata, audio systems can offer personalized listening experiences. Information about positions of audio objects and interaction parameters allow viewers to create their own sound mix. So how does the decoder know what the mix should sound like? That’s right, through meta-data. This is transmitted as a separate track and contains information about volume ratios, positions of audio objects, and interaction parameters.
Besides mixing, the producer can set parameters to determine to what extent interactions are possible later during playback. In practice, these are usually volume ratios that can be controlled, or the selection of different presets. For example, different languages or voice amplification for people with hearing loss. Theoretically, however, as a producer, you could sound out the possibilities and let intervene in all conceivable parameters during playback. An interesting idea which in turn raises interesting questions.
If you think briefly about the theoretical possibilities of interaction, it quickly becomes clear that it is probably not always sensible to give the consumer so much freedom. Think for example of the news or political broadcasts in general. Here, too much individualization would probably be counterproductive or almost manipulative, if certain persons could simply be muted. In other words, how you deal and will deal with Next Generation Audio depends on the content. Also, the question then arises as to whether and how television broadcasts, for example, will develop. The potential for new broadcasting concepts is definitely there!
Establishing these new freedoms could prove to be difficult, as users are not necessarily used to being able to intervene in the mix of a broadcast. Moreover, these possibilities are likely to appeal only to a technology-oriented clientele and will probably pass most of the masses without any concern. Who knows, maybe broadcasts optimized for Next Generation Audio could help to ensure that the interaction possibilities are not only registered by people but also actively accepted.
The effects of these possibilities on the production side are also exciting, as the mixer has to say goodbye to delivering a mix set in stone. Pessimistically seen one releases a mix that probably won’t be the personally preferred optimum because one leaves its completion to the consumer. Optimistically seen it is, however, a new challenge for him to explore and try out new things, and that is exciting.
In May 2017, South Korea introduced MPEG-H, the first Next Generation Audio Codec for a 4K UHD TV service. Major events are often important “springboards” for the advancement of new technologies. The 2018 Olympic Games in Pyeongchang, for example, are important stepping stones for the use of Next Generation Audio. Furthermore, the “Rock in Rio” festival and the Eurovision Song Contest have already been broadcast in MPEG-H format (see video). Meanwhile the format is also officially used in China and Brazil. In this respect, it would be exciting to know what the consumers who already use these formats regularly say in general about it.
The audio codecs AC-4 and MPEG-H are also already used in the purely musical field, for various music streaming services. However, so far only the immersive aspect of Next Generation Audio (NGA) is relevant here. The still quite manageable range of “3D music” is usually only available for an additional charge to the normal subscription of a streaming service. Also, apart from headphones, there are still very few products on the market that can play NGA. As immersive audio becomes more widespread, it brings about new possibilities for sound design, enhancing the immersive experience for both live production and broadcasting. It will probably take some time before the format becomes established, unless the audio content is made more easily accessible.
Strict rules apply to participation in the Eurovision Song Contest. For example, the songs may not have been published before September 1 of the previous year and may not exceed a length of three minutes. Most works actually almost exhaust the 180 seconds – but shorter is also conceivable.
Covers are not allowed, but the language in which the song is sung is not prescribed, so contributions in fantasy languages are also possible. A maximum of six people per country are allowed on stage, while animals are forbidden.
The songs are performed live, but the music comes from a tape. Since 2021, the voices of background singers have also been allowed to be pre-recorded, which was not allowed before. As the audio industry evolves, broadcasters now harness the potential of next generation audio to offer an immersive experience for viewers, enhancing programme production beyond the boundaries of the past. With this, I can’t subscribe to the prejudice that “they all can’t sing.”
Next Generation Audio definitely has potential to bring a breath of fresh air to audio consumption and production. The integration of audio production technologies enables professionals to craft tailor-made listening experiences, whether for streaming services, smart speakers, or mobile devices. It remains to be seen to what extent this potential will be exploited. And above all, whether it will reach the people. I think a step in the right direction would be to make accessibility more attractive for consumers on the one hand, and to make corresponding production tools more accessible for freelance producers on the other.
The latter concerns the musical sector in particular. But the technology is already very advanced and the creative possibilities are endless. I’m really keen to bring immersive and interactive audio to people via NGA – it can’t take much longer.More about the new generation of sound