Edit Suite: Editing Picture to Sound

A picture’s worth a thousand words. We all know this conventional wisdom; and since video at its best is primarily visual, it’s usually good advice. But not always.

For instance, Retirement Plans for Physician Partners was the pulse-pounding topic of a corporate video I made for a big HMO and it was the hardest gig I ever landed. Why? Because the content included almost nothing intrinsically visual. After a sprightly montage of geezers spending the rewards of fiscal prudence on golf courses, ski slopes and cruise ships, what other pictures were there? IRS Forms? Investment brochure covers? Physicians romping in bank vaults like Scrooge McDoc? I don’t think so. How was I to visualize abstractions like defined benefit annuities, voluntary payroll deductions and retirement age/service length trade-offs?

To solve the problem I admitted that in retirement planning, a word was worth a thousand pictures. Then I built my program accordingly: from scripting through post production. I created an illustrated lecture with music and effects. The sound and naration drove that program completely, with visuals selected opportunistically and timed to support the audio.

This approach is not uncommon in professional production, especially for commercials and corporate projects where the program subject is not inherently visual (like investments, pharmaceuticals and communications services).

In personal videos as well, the visuals may not have enough natural structure to organize the program. The footage of the garden club’s annual show may be pictorial as all get-out; but when you try to edit it, the result is just one darn flower after another.

To turn a formless mess of footage into a coherent presentation you can often reverse the normal editing procedure and use sound as an organizing blueprint, laying audio first and then matching the video to it.

The two best audio organizers are narration and music, so let’s take a look at each one.

You Gotta Talk the Talk

Half the job of creating and laying narration is making a good match with picture. If you do it right, the audience will feel that the images prompt the words, even when the truth is just the opposite.

Audio/video matching is important because our brains process visual and verbal input separately before combining them into a single body of information. If the two input streams are not closely related, they’re too hard to merge and the result is confusion or at least extra effort that diverts the viewer’s attention.

At the very least, picture and track should be complementary–that is, they should have some obvious connection. While the flower show video displays shots of prize roses, the audio shouldn’t be talking about daffodils.

It’s better yet if the visual and verbal information are presented simultaneously. In Figure 1 notice that Ms. McQuipple is identified by the narrator in the first shot but not shown to viewers until the second one. Though the audience is smart enough to combine the sound from shot A with the image from shot B, the effort required is briefly distracting.

Figure 2 improves the presentation by cutting to the winner as her name is announced. Here on the printed page, the improvement may seem trivial; but in a video the result will feel noticeably smoother.

Where titles are paired with narration, the match should be even closer: the printed and spoken texts should be identical. Figure 3 shows two ways of pairing narration and titles. In 3a, the narration was intended to supplement the information in the titles by providing an example of each category.

The problem is that this forces the audience to absorb and collate two different sets of information in real time and the result is confusion. Version 3b is better because the identical audio/video language is mutually reinforcing. Also, some viewers have a more visual learning style, while others are more verbal. Identical title and narration address both styles.

In short, though it may seem obvious and heavy-handed, viewers process information better if you show and tell the same thing in the same way at the same time.

Timing is Everything

Timing, as we’ve just seen, is a crucial part of audio/video pairing. If your first goal is to match picture and track, your second is to give the illusion that the visuals are driving the narration, rather than vice-versa.

The trick is to give each shot the screen time it needs. Suppose, for example, the narration says, Bonsai trees are created by pruning plants and training them along armatures. A narrator can read those phrases in three seconds apiece, but the visual illustrations may last several times that long.

Canny timing starts when you record the narration. If the narrator reads the sentence continuously, you will be unable to cut it in half and time it to the visuals of the two topics (because "plants and" will record as "plantsand"). A good narrator will end "plants" with a little upswing in pitch followed by a clean pause before continuing. That pause will let you sever the two phrases and space them out to fit your visuals.

When timing shot by shot, we’ve said that picture and narration should usually keep a one-to-one match; but when you move to a brand new topic, you can often improve the transition with a purposeful mismatch called a split edit. In a split edit, the sound shifts to the new topic ahead of the picture (or, occasionally, it will shift behind it instead).

Suppose, for instance, our bonsai section follows an exciting survey of flowering shrubs. Okay, but why should shrubs come before bonsai or vice-versa? Fact is, there isn’t any logical reason for either order; but you can make one by the way you write and time the narration that connect the two.

Figure 4 shows a straight transition. Notice that there’s no sense of progression, but merely one thing after another. By contrast, Figure 5 uses a split edit to imply a sequential connection between the two sections by laying sound across the transition between them.

True, the narration in Figure 5 has been re-written to strengthen the split edit. Again, the revising of narration is a shameful little secret of professional production. When you can’t make the picture fit the sound, adjust the sound to fit the picture.

A Little Music, Professor!

After narration, the other great audio organizer for your video is music. In fact, it’s obvious that music can determine the overall feeling that your program delivers.

In choosing music, style and mood are always the first considerations. You don’t want a Bach fugue for a skateboard montage (Hmm: or do you?) and the pounding energy of rap doesn’t seem to fit a picnic by the river. Once you’ve chosen the score for a sequence, its character will influence the style and pace of the editing.

Music is especially useful for setting the length of shots that lack internal timing. If you need to show someone walking from X to Y, then the shot length is built in: it lasts from X to Y. By contrast, a flower closeup just sits there, providing no clue as to how long you should leave it on screen.

That’s where music can help you make good edits. By timing the images to musical phrases, you can turn a random bunch of flower shots into a pleasing passage. (Since we can’t play music on a magazine page, imagine that the little verse in Figure 6 is a melody instead, with each musical beat in bold letters.)

In Figure 6, each image lasts for exactly one phrase of music. This is a useful technique when you want a relentless, driving feel. At other times, it helps to vary the shot length by cutting on beats within phrases, like this (this time just imagine the flower closeups):

  1. See the endless Summer flowers
  2. Nodding on our TV screens.
  3. Do their
  4. Shots go
  5. On for hours
  6. Or is that just the way it seems?

Timed by musical beats, the shot lengths are now 4-4-1-1-2-4. By breaking the strict symmetry of phrases, you can achieve a more organic and natural effect.

In these examples, the shots are cut on the musical beat, so that each one starts with an accent. Alternatively, you can achieve an even more complex rhythm by cutting "off the beat," like this:

  1. See the endless Sum
  2. mer flowers Nod
  3. ding on our T
  4. V screens.

Be careful, though: cutting off the beat means timing exactly half-way between accents. Making edits anywhere else will destroy the rhythm. Incidentally, if you edit digitally, you can display the pattern of your music as a visual wave form, with spikes indicating the accent points. That makes it much easier to drag your shots around the editing time line to fit the rhythm of the track.

To sum up quickly: narration’s a great way to organize and sequence visual material and music provides easy patterns for timing your edits. By selecting words and music creatively and then laying picture to track, you can make an engrossing video on almost any topic–even the Salmonella County Garden Show.

The Videomaker Editors are dedicated to bringing you the information you need to produce and share better video.

Related Content