Audio Advice: Digital Audio Basics
The big buzz--from the highest end professional studios to the latest prosumer camcorders--is "digital." And not just digital video, but digital audio as well.
Most consumers are familiar with digital audio, thanks to the popular audio compact disc. Although some audiophiles will argue the finer points, CDs have resulted in an increase in the quality of sound people can now hear in their homes. Does this mean that all digital audio is superior? Not necessarily, though digital audio certainly has the potential to be an improvement over what we now hear from consumer videotape formats.
Let's go over exactly what digital audio is, and how it's going to change what we actually hear when we play back our video masterpieces.
Before we can fully understand the implications of digital audio, we need to understand analog audio so we have something to compare it to. Videomaker has covered this subject before in numerous places, so I'll go over it quickly.
Sounds are nothing more than vibrations in the air. These vibrations tickle our eardrums, which our brain then translates into what we know as "sound." The larger these vibrations, the louder the sound; the faster the vibrations, the higher the pitch (or "frequency") of the sound. Each unique sound has a unique pattern that it vibrates the air with; our brain is able to decode these different vibrations and match them up with our memories of what different things sound like.
Our ears can hear a pretty wide range of frequencies: usually from 20 to 20,000 "cycles per second." These numbers reflect how fast something--speaker cones, vocal cords, etc.--are vibrating the air. Each cycle per second (also commonly represented as "Hz" or Hertz) equals one complete back-and-forth movement. The higher the number, the higher the apparent frequency or pitch of the sound. We often refer to the spread between these numbers as the "bandwidth" of the sound.
The unique timbres (or sound quality) of individual sounds are actually the result of combining numerous different frequencies of vibrations together in a particular way. This means that even if a sound appears to have a relatively low pitch, it may have higher-pitched components present which help us identify it from another sound at the same pitch. Therefore, it is important that our ears--and any recording system, analog or digital--be able to accurately translate a wide range of frequencies (i.e. have a large bandwidth).
The higher the level of a sound (or audio "signal"), the louder it appears. The dynamic range of a sound is how much it can vary from its quietest to its loudest. Related to these numbers is the "signal-to-noise ratio"--how much louder a sound is than the background noise. The larger this difference, the easier it is to hear the sound.
For example, as I write this, I am sitting in front of a computer with a noisy fan and several whirring disk drives. The level of this whir and hum is the background noise level in my room. If I wanted to listen to some music over it while typing, I would have to increase the level of that "signal" until it was sufficiently louder than the noise, so my brain could make it out.
This is the same reason you place a lavalier microphone on the person speaking, rather than relying on the microphone on your camcorder. You're hoping to improve your signal (speech) to noise (surrounding environment) ratio.
To record sounds and listen to them later, we need to capture those vibrations in the air in some way. Since it's not very practical to freeze them in mid-air, we use devices such as microphones to convert them to electrical currents that fluctuate in proportion to the air vibrations. To store these currents on tape, the recording head converts them into a magnetic field that mimics the change in current. This changing magnetic field is then trapped in tiny metal particles on the tape. So the end result is a magnetic pattern on tape that actually follows the original air vibrations--an analogy, if you will, of the original sound. Hence the term analog audio.
To play it back, we reverse the process. We drag the metal particles past a playback head that responds to the magnetic field and generates an electrical current. We amplify this current and run it through speakers or headphones to re-vibrate the air in exactly the same way the air originally vibrated when the sound was first made. Whew!
At least, that's how it's supposed to work. There are many ways that these vibrations can lose their accuracy during their travels. The electronic circuitry may not perfectly reproduce them, or the metal particles may not retain the exact details of the magnetic fields impressed upon them. Perhaps the user tried to push the electrical or magnetic levels higher than the equipment could handle--this results in extreme distortion. Or perhaps the signal levels were too low, making the desired recording nearly indistinguishable from the random noise present on magnetic tape.
As you can see, analog audio has its share of imperfections in the way it stores sound, many of which digital fixes. Instead of trying to pass along a continuously varying current or magnetic field, digital audio converts these vibrations into a string of numbers easy to store and replay at a later time.
The conversion from analog to digital takes place by "sampling" (measuring) the air pressure or current level at regular intervals. We call the frequency of these intervals the "sample rate." The sample rate relates almost directly to the bandwidth of the sound recorded, bandwidth being the difference between the lowest and highest frequencies a system can reproduce. The more frequently a digital audio device samples the moment-to-moment level of the sound, the more accurately it can capture and describe higher frequencies and the finer nuances of sounds.
The highest theoretical frequency a digital system can capture is half of the sample rate, since you need to measure both the positive and negative positions of a fluctuation to know one occurred. In reality, the highest practical frequency captured is a bit less than half that. Audio CDs have a sample rate of 44,100Hz; divide this by two and you see it nicely contains the human hearing range.
As mentioned, each of these samples (or measurements) becomes a number. The more resolution--the wider the range of values these numbers can have--the more accurately the system can store the sound. It would be nice if a recorder could just count from zero to infinity, using whatever value it needed. This isn't practical--digital audio systems have limits like everything else. In the case of digital, this limit is in the number of bits available (bits being the little on/off switches that store all digital information). The more digital bits devoted to each sample, the more accurately the system can store each measurement. For example, using 16 bits for each sample gives you 65,536 different possible values. CDs use this 16-bit audio system, and the results sound pretty good.
There's a disadvantage attached to using fixed numbers instead of continuously-variable voltages or magnetic fields. Because the recording system has a limited amount of resolution, it has to approximate levels by picking the nearest number that fits. For example, an 8-bit digital audio system says that the loudest possible signal can fluctuate between the extremes of -128 and +127. If the fluctuations swing past that, then they get clipped off and distort, just like with analog audio. And if a given sample measures +124.379, for example, the recorder has to pick +124--it can't record the actual value. It rounds down in this case, causing something called "quantization" error. The result is distortion in the sound that quite often sounds like noise.
Quantization distortion is particularly devilish, since it increases as the sound gets quieter. Compare the percentage of error in rounding 1.379 down to the nearest whole value (1) with the above example. This is why the apparent noise level seems to rise as the sound itself gets quieter!
The advantage of using numbers to represent audio is once you have captured them, they usually don't degrade any farther. The methods used to record digital numbers onto tape are much more reliable than those used to record analog fluctuations, because random tape noise has no effect on the system's ability to read the numbers. The result is far fewer worries about audio quality being reduced as it passes through parts of the signal chain, or to and from tape.
You can also pass multiple channels of audio down one wire (i.e. "the first number is the left channel, the next number is the right channel") rather than the usual requirement of one wire per channel of analog audio. It's also possible to combine digital video and digital audio down the same cable simply by knowing which number represents which signal.
- Sponsors

Digg This!
del.icio.us
Technorati
StumbleUpon
Reddit
An Introduction to Video and Audio Measurement
Understanding Digital Video Architecture
2008 Video Capture Cards Buyer's Guide
Media Matters: Blank Media Guide
Viewfinder
Viewfinder
Editing: Intermediate Codec Transcoding
Viewfinder: HDV - So What Should I Do?