Music & Sound in association withJungle Studios

Sonic Boom: Jack Goodman Digs Deep into Immersive Sound

Post Production
New York, USA
Forager's commercial and narrative sound designer on the future of audio through immersive sound

Jack Goodman is an award winning commercial and narrative sound designer, with an expertise in technical audio engineering, loudness perception and psychoacoustics. mixing and designing broadcast content for major brands including Ford, Pepsi, Oakley, NFL, H&M, RedBull and McDonalds, Jack’s work has aired on networks including CBS, NBC, BBC, Disney, HBO, Hulu, and NatGeo (where Jack’s original work on the Ford Bronco 2021 national broadcast spot (dir.) by Jimmy Chin, became gold winner of General Sports and General Documentary categories of the 42nd Annual Telly Awards). 

Jack is also a recognised music producer and composer, with syncs on shows including Arrow on CW Network, Broad City on Comedy Central, Treadstone on USA Networks, and Emily In Paris on Netflix. Recently, Jack’s original song, Beach Break (produced for artist Julietta), won the official sync for Starbucks Summer 2021 broadcast campaign, which aired on National TV for four months.

Jack splits his commercial sound design work with narrative and long form content too, where he has been the sound design and mixer for three feature broadcast docs, as well as three television series. Jack has also spent the last seven years in the Hollywood Production Film Industry, where he has become an acclaimed Location Sound Recorder and Mixer. Recently, Jack was featured in Headliner Magazine Issue #2, ‘Jack Goodman, The Art Of Flight,’ where his experience recording rappers and singers while skydiving and on hot air balloons became monumental in the field of wireless transmission techniques. 

Sound is experienced around us. We hear things in all directions, and sound reproduction has always at least tried to account for that. I'm sure you have heard of the term immersive audio by now, but I would like to tell you a little bit more about it. How we got there, what it does, how that relates to immersive audio over headphones (and Apple's new spatial audio), and what that means for the future of audio as we know it!

Let us start with the advent of stereo, where the use of two speakers played on the idea that we have two ears. Sound at this time (and for the first time) was able to be placed in a frontal panorama that enhanced perception of realism by allowing for separation between a signal's position into more than one speaker. A left side, and a right side, was now possible. 

Later events in surround sound formats (such as 7.1 and 5.1) brought enhanced localisation of sound sources by allowing mixers to move signals around the listener. But still, neither technologies could account for how we perceive sound in real life as truly immersive; with directional signals enveloping 360degrees around us…and neither format would eventually lead to the possibility of hearing immersive sound over headphones.

Enter Dolby Atmos! An exciting new immersive mixing format that allows for detailed sound placement in front of, behind, and above the listener. More than just adding height speakers to a surround format, Dolby Atmos uses metadata that allows the speakers to communicate with each other to place object-based audio in pin-point locations throughout the space. This is different from conventional surround sound, where the Audio Mixer positions a sound in a general zone or channel. It’s essentially a lot more precise, and it's completely scalable…meaning that all the end user has to do is tell the Dolby Atmos Renderer how many speakers they have, and where they have them. The renderer then plays back the full mix completely faithfully to the capabilities of the room, without the need for an additional downmix or re-mix by another Sound Engineer. (Very different from current practices, where a surround 5.1 mix, for example, must be specifically and additionally mixed in the stereo format for any user that doesn’t have surround sound. More time consuming…more expensive). What’s more fascinating, however, is that if you first mix something in full scale Dolby Atmos, the stereo version coming out of the Dolby Atmos Renderer actually sounds better and more detailed than the mix would have sounded if you started out in stereo!

Why does this matter? Dolby Atmos mixes (done in rooms with a minimum requirement of 12 physical speakers) can now be scaled down to a pair of headphones, so that everyday listeners can enjoy the immersive surround sound experience without the complicated mess of hardware. It's called Binaural audio, and it's played over two channels. But unlike its conventional stereo counterpart, binaural audio contains a wealth of information that takes the listener into a fully immersive environment with the ability to hear sounds from all directions.. So how does this work? How do we turn a Dolby Atmos Mix with physical speakers all around the listener into two channels of audio over a pair of headphones that’s not just plain stereo audio?

The process is complicated, but in its essence: measurements of a listener's head are taken in extreme detail within an anechoic chamber. With a subject in the listening position, tiny microphones are placed within the listeners ears and faced outwards towards the physical speakers on the walls around them. These speakers (on the front, sides, behind, and above walls) playback signals in turn (sine-wave sweeps) known as Head Related Impulse Responses (HRIR). The resulting speaker playback is directed in a sequence towards  the listener while the ear microphones record the sound from the listener's perspective. These microphone measurements are referred to as a Head Related Transfer Function (HRTF) —a profile, if you will, of the listeners head and ears…also known as the listeners physiology. The HRTF is then used alongside wave synthesis technology to filter immersive audio mixes in real time into a binaural headphone re-render. In other words, it's as if a true Dolby Atmos Immersive mix was played back in that anechoic chamber and recorded by the two ear microphones on the subject, so that it can be experienced by future listeners as binaural audio, wherever they are. Because the measurements taken place include the size of the head, the shape of the ears, and the body torso and neck, the resulting binaural audio (two channel audio played back over headphones) actually sounds completely immersive (above, in front of, and behind you) even though its played over headphones. Is as if you were in the anechoic chamber listening in real life.

Some of the challenges with this technology though, are that the concept of the head related transfer function is just a bit ahead of its time. Unfortunately (or fortunately really) not all humans have the same physiology (head shape, pinnae shape (outer ear), etc.), and these factors are what really allow a person to perceive the direction of audio — especially on the height plane, where the physical head and ears help the brain localise the audio above us. So while these handful of HRTFs work and translate very well to many listeners, a lot of us won’t actually perceive the sound from the above and behind planes as well as others…yet. Of course if on an individual level, we all went to a facility with an anechoic immersive playback chamber, put microphones in our ears, sat in a chair, and let Head Related Impulse Responses from each speaker playback into our heads…we could all have excellent and custom HRTF profiles that allow us to filter immersive mixes into binaural audio over headphones in real time with unbelievable results! The problem though, is that it is an expensive and time consuming approach. 

There are actually a lot of fascinating devices that have been developed to overcome this problem of scalability. One such variation involves the use of a physical mesh. The idea was that you could mail someone this metal-like chicken wire mesh, they can shape it around their head and torso, and then send it back for a personalised HRTF… that never came about for obvious reasons. More recently, however, there have been advancements in other technologies (such as facial recognition) that have allowed for newer innovations. One such company, Genelec, has developed a system it calls Aural ID, whereby users take photos of their ears and head, and send that physiology data back to Genelec. The company then creates and returns a highly successful and compatible custom HRTF profile embedded as a filter into a pair of headphones that you can listen to immersive audio through….(its around 650$).  

The most important recent technological advancement and integration of immersive audio though, is Apple’s recent launch of Spatial Audio, which is their version of Binaural Re-rendering. Here, Apple takes our immersive Dolby Atmos mixes (done with physical speakers surrounding the engineer in the mix environment), and re-renders them through their own proprietary software (which uses their own collection of Head Related Transfer Function data) that filters the audio in real time to produce an extremely convincing binaural experience for nearly all listeners using Apple Mac Pro Pods. As this technology continues to develop, and we eliminate all obstacles facing listeners who don’t fit an anatomical and physiological stereotype, we will see the adoption and implementation of immersive audio in all walks of life. As we pair this technology with other new technologies such as those that track the listeners physical head position, we can continue to enhance the realistic possibilities of experiencing life-like audio of a room or place that we aren’t actually in.

Imagine a world where we wear VR glasses that add visual objects in our immediate real-life surroundings. Now imagine if those virtual objects made a sound, and that that sound was produced and perceived exactly as faithfully as the neighbouring real sounds within the same environment the listeners were in. In this scenario the subject wearing the VR glasses is experiencing a virtual world synthesised over a real world, with no apparent differences in quality of perception! That's already happening, and will in mass scale in the following years. 

Of course, many of us have heard the outcomes of Dolby Atmos mixing in theatrical exhibition, where physical speakers around and above the audience allocate for a new dimension of realistic sound perception. But now, many of us have not heard what it's like when those physical surround speakers are scaled to a personalised two channel binaural signal carrying the same immersive experience! That's already happening, and you can hear for yourself with Apple Pro Max Pods on an iPhone when watching Dolby Atmos mixes over Netflix, for example.

While this technology is no doubt taking over every faucet of the audio industries, perhaps most exciting for me personally, is the podcast realm. Some may argue, and I may agree, that listening is a collective experience.. That you lose something in community by isolating the viewer with headphones to experience everything alone. I love watching movies and listening to music with friends... But that said, there is a time and place for everything. And I think the place, more than anywhere else, is podcasts. There's nothing more exciting in my opinion than an immersive podcast. For the first time ever, we can truly place the listener in the sonic world we are trying to sell, and I think the potential here is absolutely ground breaking. 

And in terms of the employment field — there is no doubt in my mind that the advent of immersive audio, the dedication required to both learn and mix in its format, and the understanding needed to present this format to listeners to gain a more valuable life experience from such content, will keep Sound Designers and Audio Post professionals in business for another great 10 years. I'm looking forward to all of the trends and possibilities most certainly to come…and we haven’t even talked about music. Have you thought about a live concert, where a visual light flying through the audience is accompanied by a pinpoint sound that travels exactly with it? Because that’s already here. Wait till you hear it for yourself! 

One final thought - all of this technology is a result of the ability for data to be saved and recalled within metadata that is carried along with the content in the audio file. That means that the users (very very very soon), will be able to control various aspects of the mix that used to be reserved just for the mix engineer. Is the dialogue too quiet ? Turn it up. Is SFX too loud? Turn them down. Are you in a loud environment? Your phone will already know and employ Dynamic Range Control (DRC), to adjust the loudest and quietest portions of your mix, bring them closer together, and allow you to hear the mix as a whole suited to where YOU are. 

Work from Forager
Think of a Woman