Don’t Get Fooled Again: On MIDI 2.0 and the Reality of What We Hear
When I was in 4th grade, we had an assembly in the Julia R. Ewan Elementary School gym. As we filed in and made our way to the bleachers, there in the middle of the polished oak basketball court were an electronic keyboard and a handful of other instruments, some covered in a sheet. Next to them stood a tall pony-tailed man who could only be, to my 10-year-old eye, a Musician. This pony-tailed and silken-shirted man proceeded to demonstrate his instruments and talk to us about making music. In hindsight, I’m not really sure there was much of a point to the assembly. Maybe Mr. Ponytail was Principal Tipton’s prodigal brother in need of something to do with his time, I don’t know, but anyway I did learn something that I have never forgotten. MIDI is an acronym that means “Musical Instrument Digital Interface.” I learned this as Mr. Ponytail removed the white sheet to reveal two translucent plastic cymbals, one neon orange and the other neon green1. As he told us about MIDI and how it helped make one instrument sound like another, he would hit the cymbals and elicit from them not typical crashes and splashes but rather grumpy shouts of “Ow!” and “Hey, cut that out.” This to uproarious laughter from the gathered 4th and 5th graders. The dark portent of such magic was lost on us all but we were only 10.
Like any technological wonder, MIDI—and now, in the first-ever update after 37 years, MIDI 2.0—is about power. Which is only to say it’s about effectiveness, about capability. Like every technology, it will ask us once again what we’re capable of. It might be nice, on the threshold of yet another distant horizon, to stop and think about that for a moment.
A Day in the Analog Life
What MIDI does is language. It may seem quaint in the seamless performance of our 21st century of bluetooth, airdrop, and cloud computing, but back in the day, synthesizers and other electronic instruments had trouble talking to one another. They all operated on the same basic principle, turning a musician’s playing into 1s and 0s until an amplifier re-converted it into psychedelic sound2. But, the instruments themselves spoke different languages. Their strings of code were functionally illiterate to one another. MIDI rose on the scene in 1983 like a reverse Tower of Babel, offering a common language for the exchange of all that information and suddenly you could hook a Moog up to a Korg and go nuts. This made a lot of geeks very excited, but MIDI arguably made its biggest waves when recording studios began to go computerized.
In the earliest days of recording, you got one track. You’d set up a mic and hit record and the entire group would gather round and play a single take of the whole song. You got what you got, and if June sang the alto part too loud, you played the whole thing again. It wasn’t until the 60s that multi-track recording became widely available. Multiple mics, each recording to its own tape, with a mixing board serving as a sort of driver’s seat to play it all back at once and tweak the volume until the whole thing sounded balanced.
But, because of the quirks of analog recording, early multi-track recording still had its limits. Keeping tapes playing at the same time and at the same speed takes some knowhow, and if you get it wrong the pitches get all warbly and generally unlistenable. So the earliest equipment might have just four tracks. In a typical four-piece band (guitar, bass, drums, singer), four tracks is just enough. But, what if you want to add another guitar player so one can play rhythm and the other lead? You have to start making choices. Maybe record the bass and the drums onto the same track. Ok fine, but then what if you know the perfect finishing touch would be to add a train whistle and some howling dogs3 right at the end? Well, engineers learned to do what’s called a bounce down4. And what happens when everything is perfect except for the middle section of one track? Get out the scissors and the splice tape. Point being, multi-track analog recording was hard and required rooms filled with big, expensive, hard to use machines.
Bands like The Beatles and Pink Floyd pushed analog recording from the bare-bones essence of Johnny Cash’s “Ring of Fire” to Sgt. Pepper and The Wall, but it was still a long way to something like the swoony undulation of a song like Taylor Swift’s “Delicate.”5 MIDI was the unseen bedrock that modern recording built its towers upon.
Blinded by Science
MIDI became the backbone of a proliferation of recording equipment that could digitize sound, turning vibration into code. The ease with which code can be manipulated opened up a whole new door. Remember those cymbals complaining about being hit over the head? A neon green cymbal can say, “Ow!” instead of being made to produce whatever sound a plastic cymbal would make (probably lackluster “thwack”) because the cymbal has sensors that read vibration and translate the frequency and amplitude (pitch and volume) into language that a computer can read and use to trigger the playback of a grouchy teamster by rearranging the 1s and 0s.
More practically though, by turning sound into code and dropping into the ever-increasingly powerful computing environment, MIDI allows musicians to, for example, play a digital keyboard and recode it as anything from drums to kazoos to a full-blown orchestra. But this raises the epistemic question of our age, a question perhaps unique in all of time to our time. Is any given thing the sum of all of the apparent characteristics I can perceive, or is it first and foremost what I’m told it is? If I play a piano but it sounds like a trumpet when you listen, what have I played? What, really, is truth?
Now, to my knowledge, there have not been any cases of musicians holed up at a home studio playing MIDI keyboards and swapping in horn sounds and promoting themselves as the next Herb Alpert or Winton Marsalis. But we could probably all list a ready handful of people who made a career as a singer when perhaps they only played one on TV with plenty of help from autotune. With the help of MIDI, a weak, off-key voice can be dialed in rich and sweet. Throw in a good dance number, and you can cover a multitude of pitchiness in a singer. Which—pardon a quick aside–but that’s another contrivance: the seemingly superhuman ability to execute 90 minutes of energetic choreography and still be singing on key and not, you know, panting and gasping for air. Physiologically, you simply can’t be in good enough shape not to need more oxygen when your heart rate goes up. When the headliner of a traveling song-and-dance extravaganza appears to keep step with the dancers and carry the show vocally, it’s an image of a person acting more machine than human. Simply denying the limits of the body and in order to project a literally impossible image6 under the pretense that it’s really happening. What I’m saying is, by translating sound to code, MIDI is a pretty significant tributary to a larger stream of innovations that make it more difficult to distinguish between what’s true, what’s embellished, and what’s outright false.
But for all that, MIDI was always limited by a measly 7 bits of available data. Just 128 discrete levels of detail. For all the wonders they can produce, musicians and audio engineers have been swimming in a relatively shallow pool. Until now. MIDI 2.0 shoves off into deep water.
A Sailor’s Guide to Digital Earth
It’s rather astonishing that MIDI 1.0 has held sway for nearly 40 years without any competition. Just one digital language to rule them all. Think of how much TVs have sharpened their resolution since 1983 as companies like Sony, Samsung, LG, and others vied for market share. Next to the living room set of the early 80s, back when a TV was literal furniture, a 2020 4K TV is a quantum leap in visual intricacy. To still be making music with 80s code is practically anti-capitalist. To get a sense of the depth and suddenness of the change MIDI 2.0 will bring, imagine an 80s kid falling asleep playing 8-bit Nintendo and waking up with an HDTV and an XBOX 3607.
So the short story is that instead of breaking electronic signal down into 7 bits, MIDI 2.0 offer 24 bits which translates to billions of shades of nuance8. The impact of this update on music (and any digital sound in general) is almost incalculable. If MIDI 1.0 was a revolution, MIDI 2.0 might go so far as to be annihilation—at least it might if we handle it in ways we’re capable of.
The music lover that lives between my ears is intrigued by the possibility of what all of this new detail might sound like. There’s always been a difference between sitting next to someone playing a violin and listening to even a nice recording of one. Will MIDI 2.0 erase the difference and trick my brain into thinking all the molecules in the room really are vibrating? Will the sound be so detailed I can feel it on my skin? For me, that would beat the socks off of any VR headset you could send my way. And this is the way of innovation. It is always tantalizing at least.
MIDI 1.0 works best when you feed it input from a piano (something you may have known without noticing if you’ve ever played even a cheap digital keyboard and could make it sound like everything from a pipe organ to twinkling stardust with the push of a button). A piano, because of its own limitations, works really well with MIDI’s limitations. With a piano, each key is a discrete note. You can’t mess around with the pitch, bending it slightly higher or lower, you can only affect the volume (how hard you play) and sustain (how long you hold down the keys). Stringed instruments are much freer. Because you can bend the strings (which increases the tension and makes the note higher), stringed instruments make sound with more variables than MIDI could really do justice to. Musicians expect that MIDI 2.0, by subdividing the intricacy so dramatically, will wield enough nuance and detail to receive input from any musical instrument and make it convertible to any other.
Imagine being at a rock concert. Electric guitars roaring, bass thrumming, drums thudding. A lightshow strobing near seizure levels. People standing on the seats, throwing the flags of their home nations, their own shirts, and various other mentionables and unmentionables onto the stage. A general sonic bacchanal. Now imagine someone could flip a switch and instantly you’re looking at bedlam but hearing a buttoned up symphony pumping through the loudspeakers. And so, here comes the doomsaying. Now imagine the singer could flip a switch and go from sounding like themselves to sounding like Adele9. Or Bob Dylan (but maybe the less “I drink a cup of rusty nails each morning” younger Dylan). Or Prince, whether his estate likes it or not. The possibilities, verging on resurrection, get disorienting pretty quickly.
Disorienting is exactly the word. It literally means loss of the east, loss of the means to find your way. Of course, it’s one thing to imagine this sort of disorientation in entertainment. In that realm, the creative possibilities are at least intriguing if legally a little sticky. Less exciting, though, is the thought especially of voice alteration metastasizing to, say, politics. What a hellish double-edged sword that would be. On one hand, a politician could be caught on tape saying any atrocious thing and wouldn’t even need to dismiss it as “locker room talk.” They could just deny it outright, say it was faked. And who would be able to say otherwise because on the other hand, fake recordings of famous voices saying all kinds of hideous things would surely abound. The technical capabilities of MIDI 2.0 could take us into deep water indeed, and it’s not hard to imagine sea monsters down in the deep.
God Moves on the Water
There’s kind of a standing question these days of whether it’s even possible to stop and think about anything anymore. While you’re standing still, the next five new things will carry everyone’s attention a few light years away. Even though MIDI 2.0 took 37 years to get here, it is landing in an era defined by tumultuous change. When it comes to seismic digital shifts, honestly, Pandora’s opened the box, and it’s hard to know what use thinking would be anyway. The people who make technology always proclaim its march inevitable and admittedly not much has stood in the way thus far. At the same time, over and over, people prove that if you want something different it’s always just a choice away (hello there, flip-phone re-adopters). So, in hope, I see three things with MIDI 2.0 to think about.
When we talk about MIDI 2.0 or any other expansion of our capability, if we talk about them at all before we just dive into the next brave new world, we are talking around the difference between the given world and the engineered world—that is, the one we make in an attempt to cope with our frustration with given things. Or sometimes just from boredom. Back in the glowing, hypnotically spinning, and quietly hissing days of analog recording, the ethos was about capturing sound to celebrate it and share it (well, sell it anyway). It was about discovering how electricity and magnetism could work together in quasi-miraculous alchemy10 to capture the fleeting euphoria of sound waves made by people bringing their instruments to life in a room. We were given a world teeming with possibility. Music is one of our mountain peaks as creatures and creators.
Digital music is a world at a remove. Rather than being rooted in our given world, it is rooted in one of our own inventions. With the right things lying about, anyone can make some kind of guitar or some kind of fiddle or flute with just a few rudimentary tools. And if it all goes to hell, we can always still sing. Nobody can make a computer from scratch, first of all. Even if you could, take away the electricity or the processing chips or just scramble the code a little and it all goes away. All that’s left is the monotonous clicking of dead keys. Our engineered world, marvelous as it is, is cast in a fragile image. It’s just not as durable as the world we were given. Turn out the lights and analog recording might pass away but the instruments will still be in hand and singing. In that sense, there is something close to eternal in it. Something un-take-away-able. So the first thing we might keep in mind as another wave of technical wonder washes over us is just how volatile it is, how quick to vapor it can go, and we might remember that there’s good, given ground under the clouds.
Another thing about analog recording is just how much it relied on musicians. If you wanted to add a piano or a sitar or a barking dog, you had to find someone who could actually play or a dog willing to bark on command (otherwise you’d be wasting a lot of tape waiting). If you couldn’t find those people or that dog, you had to do without. MIDI totally upended that, paving the way for the individual alone in the basement putting together an entire album on a laptop. And while that’s one way to make music, and it can sound amazing, still you have to wonder if that’s the best way to make music. One beautiful lesson of creativity—and it used to be inevitable in making music—is that individual limits are the very foundation of community. They are the means by which we are allowed to feel our need for one another. So then the second thing we can keep in mind: You can blur individual limits with technology (and MIDI 2.0 offers quite a deluge), and you might get where you want to go, but you’ll get there alone.
So that’s the lovely, human stuff we left on the shore. On to the dangers out at sea, then. We have yet to devise a technology that builds an unsinkable ship. Because of the degree of detail MIDI 2.0 allows, I fully expect not just recorded music but all recorded sound to take on an easily manipulated yet wholly convincing realism that has implications like those above but hardly limited to them. If there’s anything we’re capable of, it’s getting creative with power. And the power over what appears to be truth is a terrible power indeed. In Ephesians 4, Paul urges the church to, among other things, put off falsehood and speak truthfully to one another. Immediately following, he gives warnings about letting anger linger, and he includes the odd exhortation to “not give the devil a foothold.” These things all seem bound together—falsehood and anger as an opportunity for the devil to climb into our midst.
Anger is a strong emotion, usually flowing out of hurt or at least danger. When we’re angry (or just fearful, which is curiously similar to anger in a lot of ways), we are most likely to reach for power to defend and avenge ourselves. And what could be more powerful than a lie? A lie has the ability to create an entire reality, albeit a false one. One of my making in which you cannot know what’s true, only what I tell you. I control the rules. A well-told lie is the ultimate power play. No wonder, then, that anger and falsehood are the devil’s own foothold. The father of lies can make a lot of hay with false reality.
Even if we don’t agree on the particulars of good and evil or heaven and hell and devils, we surely can agree that using false premises as a form of manipulation is seriously bad. Exerting control over a person’s experience and littering it with so much uncertainty and falsehood that they doubt the truth of their own senses and reasoning. That’s gaslighting. That’s the logical outcome of a lie.
Well, but we seem to be an awfully long way from anthropomorphic electric cymbals in the 4th grade gym. Will MIDI 2.0 be capable of gaslighting? Of course not. Only people are capable of gaslighting. Ah but alas, people are exactly who will be using MIDI 2.0. And so our struggle can’t be against “technology” because it barely exists in any metaphysical sense. It’s an inanimate tool. Like any tool it is built with an intended result in mind, but like any tool, there’s also the unintended results. So our struggle will always be with the human heart behind the technology.
In the world of AI, there’s the Turing Test which, such as it is, measures if a technology has gone far enough to fool somebody into thinking it’s human. If the technology is effective enough. That measure of effectiveness, of brute capability, is all too often where we stop when we think about technology and where I fear we’ll stop again with MIDI 2.0. But the Turing Test leaves out the only important questions to ask of any technology, and whose answers will always be more telling:
Why did the person using/building the technology want to fool us in the first place? What’s the downside of the new effectiveness? What are we now capable of?
1. It was the 90s.
2. “On the Run,” Pink Floyd.
3. “Caroline No,” The Beach Boys.
4. “Bouncing down is what you do when you need to keep tracks free. You mix together two tracks and transfer the mixed sound to a third track. Then you can re-use the first two tracks to record two more parts of the music and you bounce these down to the free fourth track. Now you’ve got two tracks containing mixes of two tracks each. If you’ve still got a lot of parts to record, you can bounce these two tracks down into one, giving you a single track with four parts on it and three free tracks.
“And so on.
“The problem is that once you’ve done this you can never separate the tracks again. The mix you make is what you’re stuck with…You’re making final, irrevocable decisions as you go. It’s a recipe for disaster, unless the person doing it is a genius.”
—The Ground Beneath Her Feet, Salman Rushdie, p. 300
5. “Delicate,” Taylor Swift. Really the magic here is the digital simplicity of adding overdubs. You want a tea kettle on your track (listen close at the beginning)? Piece of cake. Just record one and drop it into ProTools.
6. This also happens as a singer ages (especially a female singer). Years of touring and recording simply wear down someone’s vocal cords. Singers often, instead of changing their approach or singing in a lower key, resort to autotune to patch up the weathered quilt so to speak. Keep the sound of agelessness, which again, that’s impossible and cultivates, I think, the expectation of an impossibility that can’t lead anywhere healthy.
7. It’s interesting that while people were gobbling up higher and higher visual resolution, nobody worried about their ears. Back when Jay Z launched his Tidal streaming service and Neil Young was backing the Ponos music player, both of which offered hi-resolution audio for a rather pretty penny, NPR did a little informal polling to see if people really cared about the sound quality of their music. Not only did most listeners not seem to care about sound quality, most couldn’t even tell the difference in the first place. Some speculated that the human eye is simply designed to pick up more detail with all its rods and cones compared to our auditory combination of a drum and three small bones. I was too afraid to take NPR’s quiz lest I too prove inept to the challenge.
8. “How Will MIDI 2.0 Change Music?”
9. What would this take? Not terribly much, I don’t think. It would take an abundance of recordings of someone’s voice plus someone with the right expertise, and there’s always someone with the right expertise, to come up with a set of algorithms to match the pitch and sonority (perhaps the most embodied part of the voice because it has to do so intimately with the shape of the mouth, throat, and nasal cavity). From there it’s a matter of taking one singer’s voice and dialing in the right adjustments to make it sound like another. It’s not simple, but neither was doctoring photographs in the daguerreotype days and now Photoshop is child’s play.
10. Really a cascade of alchemies. So take the electric guitar. Guitar strings vibrate and that makes the music. In an electric guitar, the strings vibrate right above a magnet. The ferrous metal in the string stirs up the magnetic field and since the magnet is wrapped in copper wire it generates a weak-ish electric current. This current rides down a cable to the amp. The amp is plugged into the wall which, via a series of vacuum tubes and capacitors, adds some serious juice to the fairly weak signal coming from the guitar. This amplified electric current heads to a wire coiled near another magnet and as the electric signal pulses through, the wire is attracted to and repelled from that magnet and the motion vibrates a speaker cone et voila, we’re back to sound waves. These go into a microphone and the whole process (magnets, wires, electricity) happens again but only this time the electric signal travels to the record head of tape deck, which is really a tiny tower of magnets with just enough space between them to generate an electromagnetic signal in response to what it’s “hearing” from the microphone. This wavering electromagnetic signal rearranges tiny magnetic particles into patterns on a coil of brownish plastic that’s spinning reel to reel. And now you have an imprint sound you can put in a suitcase and carry down the street. Vibration to electricity to magnetic energy and back and forth and over and over until it hits your ear where your eardrum turns it into one last electric pulse that hits your brain and becomes experience and memory.