Moog Monday- On Synthesizers: Vocal Sounds, Part II

July 5, 2016

A vocoder is a complete analyzer-synthesizer system that breaks down (analyzes) a vocal or other audio signal into a series of adjacent frequency bands, and then uses the amplitudes of the frequency bands to build up (synthesize) a signal that is similar in certain respects. Vocoders were originally developed in the 1930s to be a potentially efficient means of transmitting voice signals via telephone lines. Today, musicians are becoming increasingly aware of vocoders because of their ability to impart "speech" to musical sounds, and because of the easy access to a wealth of timbral resources that they provide.

Like locomotives, vocoders come with a variety of bells and whistles, but their basic mechanisms are the same. Fig. 1 shows the basic vocoder function in block diagram form. The left side of the diagram is the analyzer portion of the device. An audio signal, usually called the speech signal, is fed through a series of bandpass filters. The center frequencies F1, F2, ... Fn of the filters are spaced one-quarter to one-half octave apart; together, the filter bands cover most of the audio spectrum. Thus, the filter bank slices up the spectrum of the speech signal. Each slice then goes to an envelope follower, the output of which is a control voltage that is proportional to the strength of that slice. The envelope follower outputs are control signals. They tell us how strong each slice of the frequency spectrum of the speech signal is at any time. In other words, the analyzer output is a set of slowly varying control voltages that constitute a code or analysis of the spectrum of the speech signal.


The synthesizer portion of the vocoder is shown on the right side of Fig. 1. A set of bandpass filters, identical to those of the analyzer section, is fed by a second audio signal, called the replacement, carrier, or excitation signal. These filters slice up the carrier spectrum into bands in the same way the analyzer filters slice up the speech signal spectrum. Each slice is then fed through a voltage-controlled amplifier. The outputs of the VCAs are mixed. This mix is the audio output of the basic vocoder.

If all of the VCA control signals were of the same voltage, the vocoder output would (in principle) be the same as the carrier input. If, on the other hand, the VCA control inputs are connected to the analyzer envelope follower outputs, as shown with dotted lines in Fig. 1, the spectral variations of the speech signal are impressed on the carrier signal. Suppose, for instance, that the speech input signal is a person speaking or singing, and the carrier input is a steady tone that is rich in harmonics, such as a sawtooth wave. The vocoder output then has the pitch of the carrier and the timbral variations of the speech. This is the basic principle of the vocoder. Just this much function is not quite enough to reconstruct convincing speech. It is, however, adequate to impart musically useful vocal inflections to steady tones. For instance, Walter Carlos used standard modular envelope followers and voltage-controlled amplifiers, plus two standard half-octave fixed filter banks, which were modified to provide separate inputs and outputs for each of the filter sections, when he produced his version of the Beethoven chorale for the Clockwork Orange score. He patched the component modules together exactly as shown in Fig. 1. He achieved the effect of a "synthesized chorus" by feeding many oscillators of a keyboard-controlled synthesizer into the carrier input of his setup, while a singer's voice signal was fed into the speech input. The vocoder output signal needed no further filtering or articulation.


Several manufacturers now offer vocoders, some of which are shown below. The EMS Model 2000 contains its own oscillator and noise generator for use as carrier generators, plus a circuit that chooses between the two, depending on whether the speech signal is voiced or unvoiced. It also has provisions for slowing down or stopping the control voltage changes, thus enabling the musician to hold vocal sounds indefinitely. The Bode Model 7702 features direct access to all analyzer control outputs and synthesizer control inputs, so that speech patterns may be "scrambled." For instance, each analyzer output may be patched to the synthesizer input one octave higher, to impart a strained, "Donald Duck" quality to the vocoder output. The Bode unit also has provision for holding control voltages, and for selecting between voiced and unvoiced sounds. The Sennheiser Model VSM201 provides individual emphasis of its 20 channels, plus controls that enable it to be used as a multifilter. It also permits detailed control over "silence bridging," a feature that fills in silences between spoken words. Finally, the original EMS Vocoder allows complete patching flexibility between analyzer outputs and synthesizer inputs, and provides frequency tracking so that pitch inflections as well as timbral changes of the speech signal may be imparted to the carrier.


If you're interested in more information on commercial vocoders, you may want to write to the manufacturers. Their addresses in the U.S. are: Bode Sound Company, 1344 Abington Pl., N. Tonawanda, NY 14120; EMSA, 269 Locust St., Northampton, MA 01060; Sennheiser Electronic Corp., 10 W. 37th St., New York, NY 10018.


The vocoder imparts speech patterns to steady tones by individually shaping a multiplicity of adjacent frequency bands. Next month's column will discuss ways of simulating speech by setting up simple analogs of the vocal tract.


For more articles by Bob Moog, please visit 

Keep up-to-date on the latest news
Get our Free Newsletter Here!
Show Comments

These are my comments.

Reader Poll

What best describes your dream job?

See results without voting »