Sound Live Stream: Cabaret in 3D Munich Lach- & Schiess

Why cabaret in a live streaming needs more than just good sound

Until now, many livestreams have above all been one thing: practical.

They worked. You could see something. You could hear something. That was important. But often the experience remained flat. It did not feel like a stage. Not like a room. Not like a real evening with an audience.

That is especially a problem with cabaret and comedy.

Because here it is not just about speech. It is about timing. About pauses. About reaction.

About the laughter in the room. About the energy that arises between stage and audience. It is exactly this part that is often lost in classic streams.

A key difference lies in the acoustic and spatial properties of binaural recordings, which make it possible to localize sound sources precisely in space and thus create a much more immersive sound image.

You can still hear what is being said. But you no longer experience how it is being said. And above all, not how it works in the room.

That is exactly where this project came in.

For the Münchner Lach- & Schießgesellschaft, a 3D audio demo was created to show what is actually technically and creatively possible in streaming.

The idea was not simply to deliver better sound. The idea was to bring the experience itself back.

Whoever puts on headphones should not just hear a livestream. They should have the feeling of actually sitting in the legendary venue.

That is why this project was so exciting for me, because it showed how powerful immersive audio can be in the cultural sector as well. It was not about effects. It was about presence.

When a stream manages to make the room audible again, everything changes. Then the stage is no longer just an image on the screen. Then it gets a place again.

Then the audience no longer sits in the background as a flat reaction track. Then it becomes part of the action again.

And that is exactly what matters in cabaret.

Because a good evening does not live only from the artists. It also lives from the atmosphere in the room.

From the collective laughter. From the tension. From the moment in which a room reacts together.

It is precisely here that the question arises of how precisely sound sources can be assigned and interpreted in space, since this decisively influences the immersive listening experience and the transmission of atmosphere.

Transferring this quality into digital form was the actual purpose of this project.

In the video, you can see and hear, among others, Bruno Jonas, Irmgard Knef, Faltsch Wagoni, and Alberto Lovison. For anyone wondering what new formats were developed during this time, this demo is a pretty good insight.

Unfortunately, from my point of view, the project was very promising, but it could not establish itself in the long term. Behind the scenes, it ultimately failed because of ego conflicts among the operators.

Anyone interested in this in more detail can find further background under the keyword “Frank Klötgen: Die Münchner Lach und Schieß Gesellschaft 2020–2023”.

Nevertheless, for me the work itself remains a strong example of what livestream audio can do when it is taken seriously.

Auftritt zweier Schauspieler beim Kabarett

How an audible 3D scene with binaural sound was created from a stream

The decisive difference lay in the approach.

The sound was not simply taken from the mixing desk and output as a livestream signal. That is exactly what you hear in the video as “TV sound.” In the 3D version, on the other hand, you hear my binaural mix. And with it, a completely different approach.

In this project, I was responsible for the microphone setup and created the binaural mix for the livestream.

The execution of the binaural recording took place under optimal conditions in order to enable precise spatial reproduction. Using specialized recording techniques, binaural audio places you right at the center of the performance.

In the process, the technical requirements for microphone placement, room acoustics, and playback technology were observed in order to achieve a natural listening impression. Binaural audio recreates the way we naturally hear sound and is best experienced with headphones.

At the center was a binaural main system in stage position. In other words, exactly where the listener’s perception was also supposed to be anchored: from the perspective of a good seat in the audience, slightly elevated and oriented toward the hall.

The microphone placement and the size of the microphone base—especially in the case of dummy-head or disc microphones—play a decisive role in the authenticity of the spatial imaging.

This main system was the basis of the entire scene. It captured the room. The direction. The natural depth gradation. So it was not only about collecting as much sound as possible, but about preserving exactly the information our hearing needs for spatial perception.

In this process, different areas such as the horizontal plane, median plane, and height range are represented in order to make localization of sound sources in space possible.

In addition, there were supplementary spot microphones for the performers, for example as headsets or shotgun microphones. But these spots were consciously not intended as dry main sources. They were meant only to support intelligibility, nothing more.

An important aspect is the designation and selection of the microphones and recording techniques used. For binaural recordings, dummy-head microphones, stereo microphones, or special dummy-head systems are often used.

These instruments are designed to replicate the acoustic properties of the human head and pinnae in order to achieve realistic spatial reproduction.

Because as soon as spot microphones become too direct and too dominant, perception quickly tips into that typical “in-head” sound that many headphone-optimized productions unfortunately create.

Then the voice is no longer at the front on the stage, but suddenly in the middle of the listener’s head. That was exactly what had to be avoided.

That is why the spots were used with great restraint.

Another central building block was separate audience microphones at the sides and in the rear of the room. They were crucial in order not to depict laughter and reactions as a flat sum, but as spatially distributed events.

The channels and their clean separation are essential in this regard in order to ensure authentic spatial imaging. Only in this way can the individual areas of the room—front, side, rear—be clearly distinguished from one another.

In this way, as a listener, you could really perceive: the laughter comes from the side. The reaction arises at the back of the room. The stage remains in front. The precise localization of sound sources on a certain side or at a certain position in space is based on differences such as interaural time differences (ITD) and level differences (ILD).

That changes the effect enormously. Because it creates orientation. The room becomes readable.

The show is no longer perceived only as an audio track, but as a situation. Spatial perception is additionally influenced by frequency ranges, frequencies, and the frequency response of the outer ear.

Different wavelengths and the shape of the pinna create characteristic filtering effects that shape directional sensitivity and the sound image.

Camera and audio were also not thought separately. The focus was not on a classic wide shot, but on a clear POV idea. So not just: “Here is the stage.” But: “Here is where you are sitting.”

The signal routing also followed this principle.

There was no classic FOH summed mix. Instead, a real 3D scene was built:

binaural main system as the spatial basis
spot microphones only to support speech
audience consciously distributed in the room
perspective fixed on a seat in the room

Particularly important here was the preservation of time differences and level differences—that is, exactly the acoustic information with which our hearing perceives direction and position.

In technical language, this is about ITD and ILD. For the listening experience, this simply means: the room feels believable. Channel separation is decisive here in order to preserve spatial imaging.

Restraint was also crucial from a creative point of view.

The reverb was not created artificially. It came from the real room. The dynamics were preserved. The audience was allowed to be quieter or louder than the speech. Not everything was flattened toward broadcast compression.

Because it is exactly these differences that make live experiences believable.

What results emerged in the end

In the end, what emerged was not an ordinary stream.

What emerged was a format in which remote viewers could perceive the show as a spatial event, not just as a video with sound.

With headphones, you actually had the feeling of sitting in the room. The stage was clearly positioned in front of the listener. Not in the head, but in front of you.

The audience could be physically experienced around the listener. Laughter arose from the side or from behind. The hall opened up toward the back. The performers stayed in front.

The results of the analysis and evaluation of spatial perception and the quality of the implementation showed that the localization of the sound sources and the immersive effect of the livestream audio format were particularly convincing.

This created something that classic stereo streams often fail to achieve: presence.

You could hear who was speaking at that moment. You could hear where a reaction was coming from. You could grasp the entire evening as a spatial situation instead of just being served individual signals.

The result was not only more immersive, but also more pleasant to listen to.

Because when a stream is spatially organized, the brain has less sorting to do. That reduces fatigue.

You stay concentrated longer. You listen in a more relaxed way. Especially with word-heavy formats such as cabaret, that is a major advantage.

For me, that was exactly the actual strength of this project.

It showed that livestreaming does not necessarily have to sound like an emergency solution. On the contrary: if you build it correctly, a stream can develop a quality of its own. One that comes close to really being there.

And in my view, that is more than just a technical gimmick.

It is an indication of how cultural formats can also preserve their spatial power digitally.

The Münchner Lach- & Schießgesellschaft was an ideal example of this. A venue rich in tradition. A strong program. An audience that is part of the action.

This exact combination could not only be transmitted through 3D audio, but made newly experienceable.

That is why this project remains important proof for me.

Proof that with good microphone setup, binaural mixing, and a clear perspective, you can turn a stream back into a real evening.

Not simply television. But stage.

Feel free to contact me for projects of this and a similar kind