George Lucas likes to say, “Sound is fifty percent of the movie-going experience” and not just because it helps sell home theater components branded with his proprietary THX Sound logo. Sound is equally important for video producers. YouTube today is a long way from the platform we remember from the days of yore. The resolutions and sound fidelity on hand are truly impressive and make poor audio even more noticeable and harder to forgive. In this article, we cover everything you need to know about audio for video.
When it comes to recording audio for video, great recordings begin in good environments. Poor location choices can haunt you for the rest of a project. It results in time lost on noise cancelation, editing around sounds and adding overdubs. It’s better and easier to have the best recording environment possible.
Often times, you don’t have a choice with location sound, but capture as much as you can. This includes capturing environmental sound, room tone and dialogue. Scout ahead when possible, use directional mics and take lots of notes. Detailed notes make everything easier during post-production. Take down information like: time codes, take numbers, performance notes, edit points, etc. Basically, write down any information that helps you remember and differentiate between takes, and helps create a checklist of work for later.
Staying in sync
How do you sync up the sound? For decades, in Hollywood, they recorded audio to tape and then synced to the picture in the editing room. To simplify the mating of sound and picture, Hollywood relies on clapperboards to simultaneously generate a loud audio “clack!” and the filmed image of the two boards coming together as a visual cue to sync up the audio and video. Today, we scrub by frame and align the audio and video at the sample level. Now that’s progress.
That basic method, or the more primitive version of a handclap on camera, has worked since the earliest days of the first talkies in the late 1920s. Possibly the best sync option to consider is recording audio to an external digital recorder. Another is to use the audio from the camera’s built-in microphone as a guide. Use your DAW to slide the waveform images of the onboard and dedicated audio tracks until they align. Provided both audio tracks use the same settings, more on that below.
The last pieces of the synchronization are your audio project’s bit rate and frames rate (fps) settings. Audio for video uses a sample rate 48 KHz rates at a minimum, with the option of 96 KHz. The FPS settings are ultimately decided by your target medium. Frequently used rates are 24 fps for film, 25 for PAL, 29.97d for NTSC, 30 is common for YouTube videos. Videos shot or played at 60 fps are popular in video games and action cam videos. The most important part is that your audio settings match those of the camera and vice versa at the start of the project.
Another obvious but important factor when it comes to recording audio for video is microphones. Microphones are categorized by their electromagnetic design, polar mic pattern and form factor. The two most common designs are dynamic and condenser microphones. Other types and variants also exist such as ribbon microphones. Additionally, large versus small diaphragms and matched stereo pairs. Dynamic microphones require no supplemental energy in the form of 48v Phantom Power. Condenser microphones are more sensitive because they utilize lower mass diaphragms that move more easily and pickup more sound.
The microphone pickup patterns are the most important feature, as the pickup pattern decides whether the microphone is the right tool for the job. Polar patterns determine the microphone’s directionality or lack thereof. The shotgun microphone is a staple and darling of the video industry and for good reason. It has an exaggerated hypercardioid pattern that provides excellent front pick up and good rejection at the sides and back. Paired with a condenser capsule and we end up with a sensitive directional microphone that makes it a must-have.
Which brings us to the different microphone types: handheld, mounted, suspended, boundary, lavalier, and wireless. Most handheld microphones can also be mounted using standard clips and mounts that attach to stands, booms, and camera attachments. It is normal to see shotgun mics mounted to cameras and booms. Stage productions use a blend of suspended overheads, dynamic boundary mics along the front of the stage, and flesh colored wireless headsets.
Lavaliers come in two flavors omnidirectional and cardioid. When directionality is not an issue and you need good pickup regardless of placement the easy choice is go with the omni. Cardioids offer more directionality and good rejection of ambient sounds, but are also more susceptible to wind and mechanical noise.
Microphone placement is also important when recording audio for video. The microphone placement mantra in video production is always the less visible the better. We admire a well-placed lavalier with a tidy wire loop. You can refer to our How to use a lavalier article for more details on placement and tips. Additional hiding may be required, visibility does sometimes present challenges and will require compromises. Boom mounted mics make that much more sense because the value of capturing direct sound where possible.
Forget the camera’s onboard mic
When recording audio for video, it’s important that your camera has a good microphone. Camera buying patterns have moved on from consumer camcorders to a lot of videos being filmed on DSLR cameras. The microphone experience remains to be improved, due to the limitations of mounting a small diaphragm mic capsule in a camera. We recommend pursuing some of the alternatives below.
If the camera has a built-in mic jack, by all means use it! Moving the mic closer to the subject and away from the noise of the camera is a good thing. The Rode VideoMic NTG is a camera compatible and mounted shotgun mic and great candidate for the job.
Some firms like BeachTek make adaptors that function as a base for the camera and can connect microphones using XLR. These adaptors have an additional advantage of stabilizing the camera for handheld shots because of their additional weight.
If your camera lacks an external mic input, we will look at a workaround later in the article. In the meantime, let’s consider several options for successfully miking on-air talent, including shotgun, handheld, and lavalier mics. Experiment to find out what mics work best in your situation. In general, the closer to the source the mic is, the better the sound will be.
Rode’s Lavalier is discrete, portable, and compatible with multiple systems. A shotgun mic helps to facilitate interviews where it’s impossible or awkward to attach a lavalier to the guest. The SM58 microphone works as both a handheld interview mic and a tabletop mic for voice-over duties. The price to mic ratio is still phenomenal, given that the SM57 and 58 share capsules, it’s common to have a dozen lurking about.
p-p-Plosives and room tone
Making use of windscreens and filters helps reduce wind noise and plosives (popped-Ps). Lavaliers will become more visible, use covers with caution. This is fine for news and documentaries, but their use in film does require some effort around concealment.
One simple hint, but easy to forget in the heat of a field recording, regards capturing room tone. In between recordings, capture about a minute of room tone. The sound of the room when no one is speaking or moving. This will become extremely useful in the editing process.
Dual system audio – no mic jacks? No problem!
If you have a favorite camera, but it’s missing a jack for an external microphone, here are some alternatives.
Dual system audio uses a separate recorder to capture sound that is synchronized later, for example the Zoom H4n. This leaves you free to use microphones of your choosing or in a pinch use the built X-Y pattern stereo mics.
You can alternatively record MOS, as they said in the early days of talkies. Pidgin German for Mit Out Sound, a.k.a. without sound, and then add music, sound effects and narration in the editing bay.
Head for the DAW house
Digital Audio Workstations — DAW — are the standard production tool in the recording industry. They are used for everything from recording, editing, plugins, mastering and final output. The benefits of using a DAW are clear and helps make editing easier and more precise. Time can be quantified in: bars, beats, ticks and seconds. With zoom levels all the way down to the sample level.
This is the same approach used by the recording industry. Artists perform multiple takes where the best versions are edited together. There are still acts out there that record in real time and mostly use overdubs to layer in additional sounds. Performing in a recording session is a skill that develops over time and one that some artists are renowned for.
Plugins originate from physical outboard equipment like: compressors/limiters, EQ, noise gates, reverbs and delays. The digitization that developed DAWs also created plugins and saw a lot of well-known equipment receive digital emulation. Software companies created swaths of computer plugins designed to enhance vocal and instrumental tracks. These same plugins are incredibly helpful for processing video dialogue.
You can perform some pretty extreme dialogue edits with pitch-shifting software like Antares Auto-Tune and Melodyne. More subtle free options exist, for example, Logic’s Time and Pitch Machine. They can create Mickey Mouse and Darth Vader effects by pitching an entire vocal up or down. You can also use them to tweak the pitch at the end of a sentence. Taking a sentence that concluded on an upturned phrase end and giving it a more definitive ending. In other words, you can make an “uptalker” sound a bit less like a Valley Girl. Judicious use of pitch and formant correction can help blend the editing of multiple dialogue readings into a cohesive whole.
Audio Damage makes a variety of unusual and offbeat computer music plugins and sells a very affordable plugin called Discord. It recreates Eventide’s original H910 harmonizer from the mid-1970s. While lots of digital delays provide echo…echo…echo…, Discord pitch changes that echo so that it descends or rises in pitch as it fades away. If you’ve heard David Bowie’s hit 1975 single Fame, you know the sound of the Eventide Harmonizer’s impact on vocals. Repeating the song’s title from a high Mickey Mouse voice to a pitch a couple of octaves lower. Combine this same effect with a Discord’s digital delay to a transition between scenes.
Izotope’s RX 8 suite, works as both a standalone program and a plugin, can provide surprisingly transparent-sounding noise reduction. Making it great for muting computer fans and other background noise from a recording. It is effective at reducing digital distortion and can rescue recordings that came in too hot.
I was happy to find out that the SoundSoap application survived BIAS Inc’s closure in 2012. The SoundSoap 5 plugin is now owned by Antares Tech, AKA Auto-Tune, and still works the way I remember. It colors the audio more than Izotope’s RX, but does wonders to salvage an audio tracks impacted by noise. I originally used BIAS Peak for editing and summing stereo mixes and prefer its color over bouncing from the DAW.
Finally, Ozone, Izotope’s mastering plugin, has a variety of applications for processing vocals. The compression function can do a lot to level a vocal while automatically staying just under the distortion level. The exciter function uses clever psychoacoustics to boost the level of a lead vocal instrument above competing sounds. The mastering setting adds the final touch of feeling processed and compressed when applied to the master stereo file.
The audio spectrum
It might be useful for you to understand the frequency range at which humans can here, known as the audio spectrum. This range spans from 20Hz to 20000 Hz. Understanding frequencies is important when it comes to your mixing workflow — hearing fatigue is a thing.
Audio accuracy is critical when it comes to audio for video, in the same way that color accuracy is critical for our cameras and display monitors. A suboptimal monitoring setup can contribute to uneven levels, frequencies and stereo images. The most common issue is poor frequency reproduction when using smaller speakers with a poor low-frequency response. This can manifest as too much or too little in low, mid, and high ranges. The problem moves the other way when using retail or portable speakers whose designs tend to be bass-heavy. You need to know how to monitor audio to ensure the best product.
Our last pearl of wisdom is to recommend a pair of headphones to complement your speakers. Switching between headphones and speakers is common and you will likely be using the same pair out in the field.
One of the early steps in recording audio for video is sound design. The sound design phase is where you decide what your video’s world will sound like. Start pulling together ideas during the preproduction phase and deciding on the premise of what your project sound will need. Depending on the size of the project you may be flying solo or working with a team.
Look to divide the work between the logical audio production centers: recording, mixing, foley, ADR and music. It’s very possible the same person might fill several positions, nor are one-person shows unheard of.
Make considerations for each step of the project. What does the location sound offer? How much of that dialogue will be used and how much will be recorded during ADR? Has anything inspired any sound design ideas? Do you have access to foley or planning to use it? Will you be relying on using sound effect libraries instead, or using both?
Is anyone composing original music, will you be using paid libraries and resources, or going the royalty-free route? We would recommend against no music unless it is not appropriate for your project. The above are just some of the questions you will be asking yourself as part of your resource planning.
You are now at the point where everything comes together and you’re ready for sound mixing. An aspect of design to consider is your reference material. What are your external influences in terms of style and quality? Consider elements like dynamics, levels, and panning. Use them during your mixing as reference points to help guide you along the way.
We include editing in this step because it is an absolute must before committing to your mix. Editing means lining up your tracks, creating your timeline, stripping away silence, removing unwanted sounds, and checking for correctness. This is a good time to export a proxy video file to your DAW and save some processing power.
Start with the dialogue and then move on to any material that has been provided. You can then start introducing new sounds once you have a flowing narrative. Lock in your dialogue takes and note any ADR that needs to be recorded.
The next step is the first pass of levels, panning, and basic EQ. Give parts their space in the mix in terms of levels and stereo placement. Try to leave at least 6 dBFS of headroom in your mix for mastering to set the final output levels.
Test your mix on other systems throughout this stage, the goal is to hear consistency across multiple systems.
Once the levels and processing are finalized the last step is automating those changes. You can automate anything from levels, mutes, pans, EQ, and individual plug in settings. The very last step is to export a flat file for mastering.
Sounds from the studio
When recording audio for video, there are a few instances in which audio is recorded in the studio, away from all the on-set filming. These include voice-overs, Foley and ADR.
In video, voice-over is often used to narrate what is happening on the screen. The voice-over is read from a script and typically by a specialist voice actor. Synchronous dialogue occurs when the narration mirrors the action happening on screen. Asynchronous dialogue is usually pre-recorded and placed over the film and is common in documentary or news reporting.
Foley is the art of reproducing everyday sounds, which are added to films to enhance the audio. A Foley artist use a number of odd tools, such as shoes, rope, metal and so much more — all of which help mimic the sounds of the film.
ADR, or automated dialogue replacement, is used to fix and dialogue that might not have been picked up well enough during filming. The original actors re-record any bits of dialogue which are then added into the footage.
Recording audio for video is a tricky process at times. Audio by nature is a rinse and repeat process until you get the result you are searching for. There is a lot of theory around acoustics, signal flow, microphone design, mixing practices and standards. Audio is not all esoteric wizardry and there is a lot of knowledge that guides us. The more you use these skills and concepts in anger the easier it gets.
You can find a list of common audio terms here.
- Rode Video STG polar pattern
- Rode Lavalier photo
- Zoom H4n
- Presonus Eris E5
- Antares plugins