Read The Universal Sense Online
Authors: Seth Horowitz
Science has been trying to tackle music for centuries. In the last thirty years, I’ve probably read forty books and several hundred papers on the relationships between science and music; the biological underpinning of music in the ear, the brain stem, and the cortex; psychological bases of music perception; and musical cognition and its relationship to intelligence and the mind.
These range from Christian Schubart’s early nineteenth-century silliness that gave jargon-laden descriptions of the emotional bases of different keys through Helmholtz’s classical work on the perception of tone.
30
Studies have been carried out by instrumentation as simple as glass bowls that would resonate at specific frequencies and by technology as recent as neural imaging, a favorite of contemporary theorists. Even archaeologists have gotten into the arena with the discovery of 35,000-year-old flutes made of bird bones, pushing the human relationship with music back toward the Cro-Magnon era.
But to tackle music scientifically, we need a good operational definition of what it is. I’ve seen several dozen attempts at it, ranging from the dictionary-like “a series of tones arranged in a precise temporal structure” (which would probably leave out all ambient, hard-core jazz and a huge amount of non-Western music) to the cognitive “an acoustico-emotive communication form,” which winds up including things such as birdsong and gorilla chest thumping.
Music
seems
like it should be amenable to scientific analysis. It’s composed of tones in an orderly (or deliberately disordered and hence non-random) temporal arrangement. It exercises control over frequency, amplitude, and time, three elements we’ve been throwing analytical tools for centuries. But once you’ve actually picked up an instrument and gotten it enough under control to elicit a response from an audience, even if it’s just
yourself, you realize that music isn’t the notes, the timing, the buildup and relaxation of psychological tension between all of those elements, or even how the player or listener feel before, during, and after.
Still, music requires all of these things, so we’re kind of stuck. Music is a global and subjective subject. Just ask your parents how they feel about the “noise” that you like listening to or your kids why they listen to “that crap.” And science has trouble with subjectivity. Without precision and testable definitions, you end up with statements like “D minor is the saddest key,”
31
with hundreds of classical musicians nodding in agreement as they think about Bach’s Toccata and Fugue or Beethoven’s Ninth, and an equal number of neuroscientists and psychologists wondering why they can’t see that in their fMRI data. This is what I find fascinating about trying to figure out the relationship between music and the mind. Their mutual complexities are almost mirror images of each other, and in understanding one we may discover the underlying basis for the other. But none of the hundreds of books or thousands of studies carried out has truly grasped the interface between the two. Perhaps the best we can do is to treat it like a complex jigsaw puzzle, looking at small pieces along the edges and seeing how they fit to create a basic framework, or looking at what the final idea is like and taking a wide and fuzzy view of the effect of one on the other.
To start with edge pieces, one of my favorite examples of the clash of music and science that addressed a very basic and important part of music was a classic psychological experiment carried out by C. F. Malmberg in 1918. He was interested in
trying to quantify and really nail down the idea of consonance and dissonance as a psychophysical phenomenon. Consonance and dissonance are psychological precepts underlying what combinations of notes in the Western twelve-tone system sound “good” or “smooth” versus which ones sound “tense” or “grating.” At a core musical level, it’s hard to get more basic than combining two tones, and yet the sonic and musical result is one of the most complex aspects of psychoacoustics that we have a relatively solid handle on. In Western intonation, the played musical scale is based on a mechanically linear separation of notes divided into twelve half steps until reaching the octave. The octave represents the same pitch, but doubled in frequency, and the musical range is composed of a series of octaves spanning from the infrasonic C1 of a pipe organ (at 8 Hz) up to the piccolo’s upper limit of about C8 (around 4,400 Hz). However, the notes on this scale are not what you might think of as evenly spaced. Intervals are defined along a logarithmic scale, and the psychological qualities that emerge from combinations of sounds, such as whether they are major or minor, consonant or dissonant, emerge from the mathematical ratios of their base frequencies. Certain intervals have always been described as consonant—sounding smooth and lacking tension—such as unison (a note played with itself, yielding, of course, a base frequency ratio of 1:1), an octave (with two notes of the same pitch but one double the frequency of the other, yielding a 2:1 ratio), or a major fifth (with a ratio of 3:2). Intervals such as diminished seconds (two adjacent semitones played together) or even more musically common ones such as minor thirds (with a ratio of 65,536:59,049) are described as dissonant or tense.
Consonance and dissonance would seem like a relatively
simple musical feature to try to characterize scientifically. We all have experience with intervals (both musically and in other daily sonic experiences ranging from birdsong to our own voices) and generally have standardized reactions to musical intervals, at least within our own experience. The problem is that even this relatively simple musical feature is not a stable one. Historically, certain intervals were defined as consonant. Pythagoras defined consonant intervals as the ones with the smallest interval ratios, so most likely he was including only intervals from the pentatonic scale (unison, major third, major fourth, major fifth, and the octave). In the Renaissance, major and minor thirds and sixths were included as consonant, although minor thirds were supposed to be immediately resolved into a major chord afterward. In the nineteenth century, one of the first great psychoacousticians, Hermann von Helmholtz, declared that all intervals that share a harmonic were consonant, thus defining only seconds and sevenths (intervals adjacent to the octave notes) as dissonant. To add to the complexity, by the beginning of the twentieth century, musicians were experimenting with a much broader range of musical combinations and regularly using and sustaining intervals that would have made a Romantic-era composer pray for sudden-onset hearing loss.
This was the background for Malmberg’s experiment. While the famous paper describes the basic aspect of getting musicians and psychologists to agree to a standardized scale of consonance and dissonance, the background story I’ve gotten from some older colleagues and Carl Seashore’s classic 1938 book
Psychology of Music
was a bit more interesting (albeit possibly apocryphal). Malmberg deliberately chose a jury of musicians with no scientific training and psychologists with no musical training and forced them to listen to intervals played on tuning forks, a pipe
organ, and a piano under strict environmental conditions (which one of my older sources insisted meant “no access to food or bathrooms during the session”). These sessions would be continued for a whole year or until
everyone
in the group agreed on a single judgment of consonant or dissonant for each interval.
32
And thus he acquired one of the first group psychological assessments of musical qualities in a study that is cited to this very day.
Despite the social complexity required to get a single critical chart that related the psychology of the perception to the underlying mathematics of sound, the study didn’t dive too deeply into the underlying neural basis. It wasn’t until we had a greater understanding of some of the biology of the cochlea that some particularly interesting relationships between musical intervals and neuroscience popped up. In 1965, R. Plomp and W. J. M. Levelt reexamined consonance and dissonance, using a similar psychophysical technique (i.e., playing simple intervals and asking subjects to rate them) but with a different context. Rather than just looking for agreement between musically trained and untrained groups about the psychological percepts of consonance and dissonance, they plotted relative consonance and dissonance against the width of the ear’s critical bands.
Critical bands (the term was coined by Harvey Fletcher in the 1940s) are the psychophysical elements that almost made me ditch grad school and go back to being a dolphin trainer, largely because they were always explained by psychophysicists. Yet once you get past the jargon, they are pretty easy to understand.
If I played a 440 Hz tone for you, and you had some musical experience or perfect pitch, you could identify it as a single tone, a concert A in musical terms. If I played a 442 Hz tone, you could not tell the difference between that and the 440 Hz tone. Nor 452 Hz, nor 475 Hz. You probably couldn’t tell the difference until I had actually changed the tone by about 88 Hz. This estimate is based on a filtering function of the hair cells laid out along the cochlea, with hair cells near the basal end (i.e., near the opening to the inner ear) responding to high-frequency sounds and those at the apical end responding to lower-frequency sounds. Although a healthy human cochlea has about 20,000 hair cells spread over about 33 mm (about 11/4 inches), hair cells within about 1 mm of each other tend to be maximally sensitive to the same general frequency. These regions that hear about the same frequencies are called critical bands. They are referred to as bands because they are roughly linear in the way they’re distributed along the cochlea, but the critical bands have different frequency spans or bandwidths. Lower-frequency sounds, especially those in the vocal and musical ranges, tend to have much smaller critical bandwidths, yielding better frequency resolution, whereas pitches higher than 4 kHz or so have much broader critical bandwidths, meaning it is harder to resolve individual pitch changes across the same frequency span.
This gave Plomp and Levelt a biological basis against which to compare consonance and dissonance. Once again they asked their subjects to rate different intervals for consonance and dissonance for five different base frequencies, but they plotted these ratings against the width of the critical bands for these frequencies. What they found was that there were specific relationships between the rating and the position within the critical band. Very small differences in frequency between the two tones,
ones in which the two tones were separated by less than a full critical bandwidth, yielded the greatest dissonance rating, whereas those with a separation around 100 percent of the critical bandwidth separation were judged more consonant. In other words, listeners responded to different musical intervals based on the underlying organization of the hair cells in the cochlea.
The idea of consonance and dissonance has been traced upward throughout the brain. Research has shown that auditory neurons respond more strongly to consonant intervals than dissonant ones, and that the relationship between the two actually matches that seen in the older behavioral tests. The same has been shown to be true for higher regions of the brain (not merely auditory) that process both sound and emotion, with multiple studies showing differences in activation of emotional processing regions based on whether the interval is major or minor. This confirms what we know from listening on our own—that major intervals sound not only consonant but “happy,” whereas minor ones sound “sad.”
A recent study by Tom Fritz and colleagues took this one step further, in a paper titled “Universal Recognition of Three Basic Emotions in Music,” in which the researchers examined the ability to recognize emotional aspects of music across cultures. They had Western (German) and non-Western (members of the Mafa ethnic group from northern Cameroon) listeners categorize music from the other’s culture as happy, sad, or scary based on underlying consonance or dissonance. Both sets of listeners were supposed to be naive about music from the other’s culture, to try to rule out familiarity and experience effects. The results showed that both groups identified the emotional content of the Western music, with similar directionality of “liking” the music in both the German and Mafa groups. It would
seem as if we have a relatively solid, long-term, cross-cultural lock on the idea that there is a biological basis to music.
Or do we?
As with any attempt to address something this big with precision, there are a lot of problems. For example, in the Plomp and Levelt study, while they were using very precise intervals, the base frequencies that they used were based not on musical notes but rather on easily generated pure tones at specific frequencies, specifically 125, 250, 500, 1,000 and 2,000 Hz. Yet in musical contexts, you never run into pure tones. Every musical instrument creates a complex timbre (often called sound color) that includes harmonic frequencies. Even the simplest of flutes doesn’t create pure sine waves, which are, in effect, what Plomp and Levelt were testing with. An excellent player on a world-class flute still creates not sine waves but triangle waves: in scientific terms, multiple individual sine waves add up to create odd-numbered harmonics that get reduced quickly at the high-frequency end, creating a pure-sounding tone with just a hint of bite to it. These harmonics are not simply ignorable overtones that arise from the structure of the instrument and the player’s performance. The clashing and overlapping of harmonics are another important factor in the creation of perceived dissonance or consonance.
Next, the very idea of consonance not only has shifted across time within Western culture but has a great many problems when you start bringing in other cultures. For example, in the study using the Mafa listeners, the study couldn’t present Mafa songs that were categorized as happy, sad, or scary because the Mafa do not assign specific emotional descriptions to their music. The German listeners had to decide on the emotional tone of the music based on the relative “dissonant” shifting of the
key of the original piece. But if you actually listen to some of the traditional Mafa flute pieces used in the study, while the timbre of the flutes is somewhat coarse (and hence musically quite interesting), the actual intervals played are those found in Western twelve-tone scales. What would happen if you tried using music from non-Western cultures that employ different intonation and intervals? While all known human music includes the basic intervals that are commonly labeled as consonant, many go far beyond them. Indian ragas often use intervals of less than a Western semitone, and some Arabic music uses quarter-semitones. One of the more extreme examples (for most Western listeners) is Indonesian gamelan music, which uses either a tuning with five equally spaced notes per octave or one with seven unequally spaced notes per octave. Gamelan music sometimes not only uses both tuning techniques in a single song but also will slightly detune two instruments that are technically playing the same notes. This detuning creates a beating or roughness between harmonics, which is one of the definitions of dissonance for complex timbres.