In 1996 the prestigious Nature journal published an intriguing report by Marcel Zentner and Jerome Kagan entitled Perception of Music by Infants. The results suggested that infants prefer consonance (the relative attractiveness of different pitch combinations) over dissonance (the relative unattractiveness of different pitch combinations). The findings brought some much-needed light on the age-old debate regarding the role of nature vs. nurture in the perception of consonance and dissonance, and implied that consonance preference indeed has a firm biological foundation.
Fast forward 20 years to 2016 when the very same journal published an equally fascinating yet completely contrasting report by Josh McDermott and colleagues: consonance perception is actually a Western concept and is unlikely to be innate. The authors based their conclusion on their empirical findings that an indigenous people called the Tsimané, living in the Amazon rainforest in Bolivia, do not prefer consonance over dissonance. So, which one of the two sharply opposing empirical reports – both published in the same prestigious journal – got it right: is consonance perception innate or is it not? In this post I will dissect some of the ways the universality of consonance perception has been studied and what the results have been so far, allowing us to form an opinion about the possible innateness of consonance perception.
To study the innateness of consonance and dissonance perception there are basically three ways to tackle the question: 1) animal studies, 2) infant studies, and 3) cross-cultural studies. With animal studies the results have so far been inconsistent, with many studies showing a preference for consonance while others an indifference. The preference for consonance in such studies is assessed with for example how long animals spend near a consonant vs. dissonant sound source. Unfortunately, animal studies in the domain are plagued by problems including small stimulus sets and sample sizes, and a lack of replication. Addressing these issues with future research animal studies could make a relevant contribution to the broader question of the innateness of consonance perception.

Infant studies in turn are quite numerous and much more consistent (with one notable exception) suggesting that infants indeed prefer consonance over dissonance. This tendency is also independent of any prenatal or early postnatal exposure – it is something to keep in mind that the exposure to consonance and dissonance arguably starts as early as in utero. The preference for consonance in such studies is typically measured with a looking time paradigm to see if infants preferentially look at consonant or dissonant sources of sound.
Cross-cultural studies in the perception of consonance are extremely rare, arguably due to the fact that such research is getting harder and harder to conduct with globalisation reaching even the most remote parts of the world. So far there have been only three studies dealing directly with this question. 1) Janet Butler and Paul Daston’s study from 1968 concluded that American and Japanese listeners perceive consonance similarly, with a clear preference for consonance over dissonance in both cultures. 2) Timothy Maher’s study from 1976 in turn concluded that Hindustani (North Indian) listeners do not perceive dyad intervals (consisting of two simultaneous pitches) that are extreme dissonances to Westerners equally dissonant. While dissonances that are perceived as harsh by Western listeners are used more freely in Hindustani classical music, the concept itself of consonance and dissonance very much exists in Hindustani music much the same way as it does in Western music. 3) The previously mentioned study by Josh McDermott and colleagues from 2016 concluded that the Tsimané people of the Amazonian rainforest are indifferent to consonance and dissonance in dyad intervals and triad chords (consisting of three simultaneous pitches). The premise and rationale of the study is fascinating and was thoroughly thought-out: McDermott and colleagues chose the Tsimané as participants because their musical culture does not include a concept of harmony or polyphony, providing a rare opportunity to investigate whether they exhibit aesthetic responses to consonance and dissonance despite not having prior exposure to them. McDermott et al. concluded on the basis of their findings that consonance preferences are unlikely to be innate and seem to depend on exposure to particular types of music, presumably those that feature consonant harmony.
Despite the strong rationale behind the experiment setting, the study by McDermott and colleagues seems to have quite a few methodological shortcomings. Daniel Bowling and colleagues raised many concerns with McDermott et al.’s paper in a thought-provoking response paper from 2017. The main arguments Bowling et al. raise are indeed crucial and question the results obtained by McDermott and colleagues. First, Bowling et al. point out that McDermott et al. failed to test the most consonant interval across cultures, namely the octave. Second, they question the validity of the sole dependent variable – subjective liking assessed on a 4-point scale – and suggest it is demonstrably affected by cultural differences in interpretation and use. Third, they call into question the way the data is presented: although the Tsimané did not prefer the consonant perfect 5th interval to the dissonant major 7th (in contrast with Americans) they did prefer the perfect 5th to the minor 2nd (which is the harshest dissonance among intervals in Western music). This crucial piece of information was not discussed in the paper by McDermott et al. but was evident on the basis of their extended data figures. Fourth, Bowling et al. draw attention to the fact that the Tsimané’s musical tradition makes use of the pentatonic musical scales which comprise highly consonant intervals and are among the most commonly used scales across cultures. They take this as evidence that the Tsimané are indeed sensitive to the same acoustic and aesthetic principles that govern tonality in other musical traditions.
My further addition to these excellent points raised by Bowling et al. is the selection of McDermott et al.’s chord stimuli. They compared only the major triad and the augmented triad in terms of consonance and dissonance; using only the augmented triad as an example of dissonance is a most unrepresentative choice, as it is notorious for not being acoustically (objectively) as dissonant as it is perceptually (subjectively) for Western listeners. The perceived dissonance and high tension of the augmented triad has been linked with its intervallic equidistance (the chord is made up of two consonant major thirds, hence its dissonance is hard to pinpoint acoustically), but a more probable explanation is provided by the concept of tonal dissonance by Johnson-Laird and colleagues in a paper from 2012. Based on empirical findings, they propose that tonal dissonance depends on the scales in which the chords can occur: chords occurring in a major scale are theoretically less dissonant than chords occurring only in a minor scale, which in turn are less dissonant than chords occurring in neither sort of scale. As the augmented triad is present in the (harmonic) minor scale but not in the major scale, this theory offers a neat explanation for its perceived dissonance and tension despite the fact that it is acoustically not very rough (roughness meaning an acoustic measure that objectively quantifies one important aspect of dissonance perception). Because of this it is logical that for example the diminished triad (which is present in the major scale) is consistently rated as more consonant than the augmented triad among Western listeners, despite being acoustically rougher. Johnson-Laird et al. suggest that Western listeners – even without musical training – acquire a tacit knowledge of tonality, and so violations of these principles lead to the perception of dissonance in chords.
In other words, the augmented triad is a flawed example of dissonance in a cross-cultural setting, as its perception is evidently influenced by cultural factors. Had McDermott and colleagues included truly rough chord sonorities (e.g., a very dissonant chromatic cluster) in their chord stimuli, the consonance perception of the Tsimané could have been different and likely to be more consistent with that of the Western listeners. Somewhat ironically McDermott and colleagues themselves concur that “…although an aversion to roughness is apparently present cross-culturally, it seems to be unrelated to consonance and dissonance, perhaps because musical sounds are in practice not very rough.” Again, the augmented triad is certainly not very rough but including for example a chromatic cluster in the stimuli could have told a very different story. Using the augmented triad as the benchmark of a “dissonant” chord in a cross-cultural study seems ill-advised and does not warrant for far-reaching conclusions nor sensational headlines in mainstream media outlets about consonance being exclusively a cultural artefact.

So, is consonance perception innate or is it not? Evidently, the exact ratio of nature vs. nurture in consonance perception calls for further research and the debate about its innateness is likely to continue. However, on the basis of the previous research presented above it seems safe to assume that humans are to some extent biologically prepared to prefer consonance over (harsh) dissonance, with culture also shaping how its perception is appreciated. Neglecting either side of the coin might result in false conclusions about the perception of this elusive phenomenon that presumably has roots in both nature and nurture.
This review is missing the latest important breakthrough of the search for an explanation for consonance/dissonance. In https://doi.org/10.1016/j.bica.2017.09.001, Pankovski and Pankovska suggest that for the most consonance sounds, the culture is irrelevant (the physics prevail), but for the others (dissonant) the culture is probably winning. Here is a lay-person video explaining all that: https://doi.org/10.1016/j.bica.2017.09.001
Worth mentioning that with this study the authors succeeded to perfectly reproduce the interval list order by consonance (as widely accepted in Western music), and give for the first time probable causal explanation of the consonance/dissonance, following the process from the sound creation to the emergence of the emotions…
LikeLike
Hi, thanks for your comment! Have you seen the empirically tested consonance rank ordering of intervals in the paper “Vocal similarity predicts the relative attraction of musical chords” by Bowling et al. (2018)? https://www.pnas.org/content/115/1/216
The ordering of the interval consonances is not set in stone even within the framework of Western culture it seems.
LikeLike
Hi Imre, thank you for pointing that paper, and a good point – that is exactly what the results of https://doi.org/10.1016/j.bica.2017.09.001 paper suggests: the highly consonant intervals are “set in stone”, where the less consonant may differ, from culture to culture. The inner workings of the neural network used in the research clearly demonstrate that.
LikeLike
Thanks Toso! More cross-cultural research is needed to settle this question I think. Cultural familiarity might affect the preference for highly consonant sonorities as well; we are currently reporting a study conducted in remote Northwest Pakistan which will definitely be of interest in this regard. Stay tuned! 🙂
LikeLike
In my previous post the video link is wrong: here is the correct link: https://youtu.be/99ODsIMq-ZM
LikeLike