How It Works: TALKBOX 101

Love ’em or hate ’em, vocals that are processed or Auto-Tuned until it sounds like a machine is doing the singing are a staple of funk, hip-hop, and electronic dance music.
Image placeholder title

The Rocktron Banshee 2 is the gold standard of talkboxes.


Love ’em or hate ’em, vocals that are processed or Auto-Tuned until it sounds like a machine is doing the singing are a staple of funk, hip-hop, and electronic dance music. Before the vocoder was in widespread use, and long before Auto-Tune existed, artists like Roger Troutman and Peter Frampton were getting this sound using a synth or guitar with a talkbox. Like the vocoder, the talkbox is often mistaken for an effect that processes your voice. On the contrary, it lets your voice be the processor.

In Theory

Any sound boils down to two things: the frequency (pitch) and amplitude (volume) of its fundamental tone and of each harmonic. Then there’s the way each of these elements changes over time—their envelopes, by a familiar name.

That so much changes at once is what makes human speech so tricky to imitate. Vocoders tackle this by using a number of envelope-following bandpass filters, which your voice, the “modulator” signal, controls. The “carrier” signal (e.g., your synth) gets filtered and comes out with a harmonics-over-time profile that resembles your voice. The more filters a vocoder has to divide up the frequency spectrum, the closer the resemblance.

A talkbox achieves a similar result using far less electronic plumbing. See, you are a synthesizer. Your vocal cords are the oscillators, and your mouth is an incredibly flexible filter. The talkbox lets you use that filter on external audio. In fact, it’s just a tiny powered speaker that amplifies your synth. The Rocktron Banshee, for example, uses a compression driver similar to many tweeters, but with fuller range. A vinyl tube fits snugly into the speaker recess. Put the other end in your mouth, play your synth, mouth some words, and you get syllables imposed on the sound. All the action happens in the acoustic domain between your choppers.

In Practice

Master talkboxer P-Thugg of Chromeo once told me, “The talkbox is very physical and easier to personalize [than a vocoder]. . . . If you don’t know how to use it, it’ll sound like a wah-wah at best, but when you practice pronunciation, you create another voice out of your own.” Here are some ways to find that voice.

Use a mic. Unlike a vocoder or an audio effect, a talkbox doesn’t take your voice as an audio input. For practice, you can hear the result without a mic, but for performance or recording, you’ll need one to capture the results.

Given how you shape the sound of the vibrating bar with your mouth, the venerable jaw harp is really the world’s first talkbox.


Simple synth sounds are best. Start with patches that are fairly bright and have an immediate attack. A monophonic lead with no portamento gets that Auto-Tuned “T-Pain” sound.

Resist the urge to sing or speak. Playing your synth is what provides the musical pitch. With a vocoder, you say words into the mic to give the filters a control signal to work with. With a talkbox, the shape of your mouth does all the work. Speaking only adds your dry voice to the mix.

Exaggerate vowels. Pretend you’re mouthing at someone who just learned to read lips but who also has bad eyesight. Long “o” and “u” sounds are easy, but long “e” sounds stretched the corners of my mouth into a Jack Skellington grin.

Consonants help. An exception to the “don’t speak” rule is that saying an initial consonant into the mic often helps the synth’s “voice” sound like the desired word. That’s because the human ear identifies sounds mainly by their attacks.

*Stevie Wonder sings "Close to You" on the talkbox.

*Zapp and Roger Troutman's "More Bounce to the Ounce."