The big buzz–from the highest end professional studios to the latest prosumer camcorders–is
“digital.” And not just digital video, but digital audio as well.
Most consumers are familiar with digital audio, thanks to the popular audio compact disc.
Although some audiophiles will argue the finer points, CDs have resulted in an increase in the
quality of sound people can now hear in their homes. Does this mean that all digital audio is
superior? Not necessarily, though digital audio certainly has the potential to be an improvement
over what we now hear from consumer videotape formats.
Let’s go over exactly what digital audio is, and how it’s going to change what we actually hear
when we play back our video masterpieces.
How We Hear
Before we can fully understand the implications of digital audio, we need to understand
analog audio so we have something to compare it to. Videomaker has covered this subject before
in numerous places, so I’ll go over it quickly.
Sounds are nothing more than vibrations in the air. These vibrations tickle our eardrums, which
our brain then translates into what we know as “sound.” The larger these vibrations, the louder the
sound; the faster the vibrations, the higher the pitch (or “frequency”) of the sound. Each unique
sound has a unique pattern that it vibrates the air with; our brain is able to decode these different
vibrations and match them up with our memories of what different things sound like.
Our ears can hear a pretty wide range of frequencies: usually from 20 to 20,000 “cycles per
second.” These numbers reflect how fast something–speaker cones, vocal cords, etc.–are
vibrating the air. Each cycle per second (also commonly represented as “Hz” or Hertz) equals one
complete back-and-forth movement. The higher the number, the higher the apparent frequency or
pitch of the sound. We often refer to the spread between these numbers as the “bandwidth” of the
The unique timbres (or sound quality) of individual sounds are actually the result of combining
numerous different frequencies of vibrations together in a particular way. This means that even if
a sound appears to have a relatively low pitch, it may have higher-pitched components present
which help us identify it from another sound at the same pitch. Therefore, it is important that our
ears–and any recording system, analog or digital–be able to accurately translate a wide range of
frequencies (i.e. have a large bandwidth).
The higher the level of a sound (or audio “signal”), the louder it appears. The dynamic range of
a sound is how much it can vary from its quietest to its loudest. Related to these numbers is the
“signal-to-noise ratio”–how much louder a sound is than the background noise. The larger this
difference, the easier it is to hear the sound.
For example, as I write this, I am sitting in front of a computer with a noisy fan and several
whirring disk drives. The level of this whir and hum is the background noise level in my room. If I
wanted to listen to some music over it while typing, I would have to increase the level of that
“signal” until it was sufficiently louder than the noise, so my brain could make it out.
This is the same reason you place a lavalier microphone on the person speaking, rather than
relying on the microphone on your camcorder. You’re hoping to improve your signal (speech) to
noise (surrounding environment) ratio.
To record sounds and listen to them later, we need to capture those vibrations in the air in
some way. Since it’s not very practical to freeze them in mid-air, we use devices such as
microphones to convert them to electrical currents that fluctuate in proportion to the air vibrations.
To store these currents on tape, the recording head converts them into a magnetic field that mimics
the change in current. This changing magnetic field is then trapped in tiny metal particles on the
tape. So the end result is a magnetic pattern on tape that actually follows the original air
vibrations–an analogy, if you will, of the original sound. Hence the term analog
To play it back, we reverse the process. We drag the metal particles past a playback head that
responds to the magnetic field and generates an electrical current. We amplify this current and run
it through speakers or headphones to re-vibrate the air in exactly the same way the air originally
vibrated when the sound was first made. Whew!
At least, that’s how it’s supposed to work. There are many ways that these vibrations can lose
their accuracy during their travels. The electronic circuitry may not perfectly reproduce them, or
the metal particles may not retain the exact details of the magnetic fields impressed upon them.
Perhaps the user tried to push the electrical or magnetic levels higher than the equipment could
handle–this results in extreme distortion. Or perhaps the signal levels were too low, making the
desired recording nearly indistinguishable from the random noise present on magnetic tape.
As you can see, analog audio has its share of imperfections in the way it stores sound, many
of which digital fixes. Instead of trying to pass along a continuously varying current or magnetic
field, digital audio converts these vibrations into a string of numbers easy to store and replay at a
The conversion from analog to digital takes place by “sampling” (measuring) the air pressure or
current level at regular intervals. We call the frequency of these intervals the “sample rate.” The
sample rate relates almost directly to the bandwidth of the sound recorded, bandwidth being the
difference between the lowest and highest frequencies a system can reproduce. The more
frequently a digital audio device samples the moment-to-moment level of the sound, the more
accurately it can capture and describe higher frequencies and the finer nuances of sounds.
The highest theoretical frequency a digital system can capture is half of the sample rate, since
you need to measure both the positive and negative positions of a fluctuation to know one
occurred. In reality, the highest practical frequency captured is a bit less than half that. Audio CDs
have a sample rate of 44,100Hz; divide this by two and you see it nicely contains the human
As mentioned, each of these samples (or measurements) becomes a number. The more
resolution–the wider the range of values these numbers can have–the more accurately the system
can store the sound. It would be nice if a recorder could just count from zero to infinity, using
whatever value it needed. This isn’t practical–digital audio systems have limits like everything
else. In the case of digital, this limit is in the number of bits available (bits being the little on/off
switches that store all digital information). The more digital bits devoted to each sample, the more
accurately the system can store each measurement. For example, using 16 bits for each sample
gives you 65,536 different possible values. CDs use this 16-bit audio system, and the results sound
There’s a disadvantage attached to using fixed numbers instead of continuously-variable
voltages or magnetic fields. Because the recording system has a limited amount of resolution, it
has to approximate levels by picking the nearest number that fits. For example, an 8-bit digital
audio system says that the loudest possible signal can fluctuate between the extremes of -128 and
+127. If the fluctuations swing past that, then they get clipped off and distort, just like with analog
audio. And if a given sample measures +124.379, for example, the recorder has to pick +124–it
can’t record the actual value. It rounds down in this case, causing something called “quantization”
error. The result is distortion in the sound that quite often sounds like noise.
Quantization distortion is particularly devilish, since it increases as the sound gets quieter.
Compare the percentage of error in rounding 1.379 down to the nearest whole value (1) with the
above example. This is why the apparent noise level seems to rise as the sound itself gets
The advantage of using numbers to represent audio is once you have captured them, they
usually don’t degrade any farther. The methods used to record digital numbers onto tape are much
more reliable than those used to record analog fluctuations, because random tape noise has no
effect on the system’s ability to read the numbers. The result is far fewer worries about audio
quality being reduced as it passes through parts of the signal chain, or to and from tape.
You can also pass multiple channels of audio down one wire (i.e. “the first number is the left
channel, the next number is the right channel”) rather than the usual requirement of one wire per
channel of analog audio. It’s also possible to combine digital video and digital audio down the
same cable simply by knowing which number represents which signal.
Audio in the DV Format
The new DV Format (digital video format) is interesting in that it has two different options
for digital audio: two channels of 16-bit 44.1 or 48kHz audio, or four channels of 12-bit 32kHz
audio. The first one is similar to that used by audio CDs, and is great for commercial video
releases and professional applications. It’s the second format that offers some interesting options
and tradeoffs available with digital audio.
Four-channel audio allows you to record the location sound that occurs while you’re shooting
the video, and then later add music, sound effects, or narration on top. The problem is how to
shoe-horn four channels into the same space reserved on tape for two. If you drag out a calculator
and run the numbers mentioned above, you see how the math works out: two channels x 16 bits x
48,000 samples per second = 1,536,000 samples per second. Four channels x 12 bits x 32,000
samples per second = 1,536,000 as well. The DV format can store either in the same amount of
So what’s the tradeoff? Bandwidth and resolution. Remember our method for dividing the
sampling rate by two to get the highest frequency the system can record? Notice that 32,000 Hz/2
= 16,000 Hz, which even under the most optimistic conditions is starting to crimp in on the upper
limit of our hearing. This is not great fidelity, but in practice is still better than most other
consumer videotape formats–let alone TVs–can comfortably handle. And what about 4096 values
(12 bits per sample) versus 65,536? Again, not perfect. But in this case the DV system distributes
the 4096 numbers in a special way to use more of them for lower sound levels. This reduces
apparent quantization distortion and noise. All in all, not that bad a set of compromises.
It’s inevitable that we’ll be seeing more and more media–pictures, words, video, and sound–in
digital form. Dealing with digital audio and video may seem new to many of us, but it certainly
is the future–one that offers us a lot of creative options.