What does cross-cultural research tell us about harmony perception?

When we think of music, we often bring to mind the music that is most familiar to us. However, it is important to remember that, across the globe, there is a wide range of musical styles, each with their own musical language that is made up of a set of characteristics including rhythm, melody, instrumental textures and harmony. The field of music psychology benefits greatly from studying as wide a range of musical styles as possible in order to understand which aspects of music perception and cognition are universal and which are context-specific. One fascinating musical style is Sutartinės, prevalent in North Eastern Lithuania. Sutartinės are folk songs sung by two to four female singers without instrumental accompaniment. This style has a long history, starting as a living oral tradition of working songs. In the latter days of the Soviet Union, Sutartines grew in popularity as an assertion of Lithuanian national identity. More recently, the Sutartinės style has been associated with the neo-Pagan movement. We are lucky at Durham to have a research relationship with Rytis Ambrazevicius, who happens to be one of the foremost directors of Sutartinės groups when he is not doing his day job as a professor of ethnomusicology and music cognition.

Map of Lithuania with the Sutartinės Region marked in green

One of the defining stylistic features of Sutartinės is the prevalence of parallel seconds. Though often notated as major or minor seconds in Western notation, the interval as sounded by expert singers corresponds to an interval that is typically a little flatter than the major second in an equal temperament scale. In many musical styles, seconds are considered dissonant. This phenomenon is not unique to the Sutartinės style – it is also found in Ganga (Bulgaria), mizwhiz (Middle East) and in vocal music in parts of Indonesia. In Western diatonic harmony, we hear seconds reasonably frequently, but typically they quickly resolve to a consonant interval. The Sutartinės style, and some of the examples listed above, often contain long sequences of seconds in parallel.

An example of a Sutartinė song, Ko Siudi, Ko Liudi

This raises an interesting question – why are some intervals considered consonant in one style, but not in another? On one level, we can answer that consonance and dissonance are cultural constructs, and exposure to a particular musical culture will shape a listener’s perception of which sounds are consonant and which are dissonant. On the other hand, we know that the human auditory system is better equipped to deal with some sounds than others, and the minor second in particular is one that poses problems for an obscure part of the inner ear called the basilar membrane. (Narrow intervals fall within the same ‘critical band’ leading to a perceptual phenomonen known as roughness – there is a digestible introduction here). This line of thought should lead us to think that certain aspects of consonance and dissonance perception should be universal. So, questions of consonance and dissonance are often framed in terms of the Nature vs Nurture debate.

The Nature vs Nurture debate is prevalent in almost every area of psychology. In many areas, such as mental health research, psycholinguistics or child development there has been a fruitful dialogue between the two standpoints, and researchers attempt to determine the relative contribution of each. However, researchers in consonance-dissonance typically remain polarised into two camps. One school of thought argues that harmony is a cultural construct, another argues that the auditory system is attuned to particular ratios of sounds (i.e., their harmonicity). Relatively few scholars take a position that integrates the contribution of both culture and biology into a unified approach to consonance-dissonance studies.

Although the broader question of whether consonance or dissonance is biologically innate or culturally learned is still a matter of debate, musical styles such as Sutartinės suggest that dissonance is associated with culture. To try to resolve this conundrum (why do people enjoy listening to and performing music that contains sounds that are tough for the inner ear to resolve properly), and to attempt to unravel the Nature vs Nurture debate in consonance-dissonance, we devised an experiment to test two different types of response to the squeezed second interval: automatic responses and conscious ratings.

Before we go further, it is necessary to take a brief detour into the world of affective priming, which has been used in social psychology for almost forty years as an implicit measure of attitudes. A detailed description of the affective priming method in music is given here by my colleague, Imre Lahdelma. The TL;DR version is this: participants hear a chord just before they see a word on a computer screen. They are asked to categorise the word as positive or negative as quickly as possible. If the chord and the word are congruent (a consonant chord paired with a positive word or a dissonant chord paired with a negative word), the classification will be faster and more accurate compared to when the chord and the word are incongruent (a dissonant chord paired with a positive word or a consonant chord paired with a negative word). In effect, the gap between the sound and the word is so short that the listener’s response to the sound interferes with their response to the word. (The experiment is simpler than it sounds – try it here!)

The first task was an affective priming experiment of the sort described above. Participants were presented with a sequence of 256 words and were asked to categorise each word as positive or negative. However, just before each word, they heard a musical sound – either a squeezed second or a perfect fifth in a vocal timbre reminiscent of the Sutartinės style.

In contrast to the complexity and quickfire response of the affective priming task, the second task was a more familiar rating task where participants were asked simply to rate a squeezed second and a perfect fifth on a scale of 1 – 7, where 1 is the most negative and 7 is the most positive.

Both of these tasks were completed by four groups of participants: Sutartinės singers, Lithuanian non-musicians, musicians trained (only) in Western styles, and Western non-musicians. This allowed us to work out whether any differences between Sutartinės singers and controls were because of something specific to the Sutartinės style or whether they were a result of musical training in general.

The ‘Nature’ argument – i.e., that our sense of consonance or dissonance is dictated by the human auditory system – predicted that there would be no difference between the groups. The ‘Nurture’ argument – i.e., that our sense of consonance and dissonance is learned through exposure to a particular musical culture – predicted that Sutartinės singers would give a null result because, in their case, the squeezed second did not carry a negative connotation, whereas all other groups would show congruency effects.

In short, we had two competing hypotheses for the automatic task:

1. There would be no difference between groups for the automatic task, i.e. that all participants would show the same congruence effects. (i.e., the “Nature” argument)

2. Congruence effects would be present for the Western groups and the Lithuanian control group, but not for the Sutartinės singers. (i.e., the “Nurture” argument)

For the rating task, we predicted that Sutartinės singers would rate the squeezed second more positively than the other groups, but that all groups would rate the perfect fifth equally positively.

What happened? In the automatic task, hypothesis 1 was the winner. We found that all groups responded in the same way – we saw congruency effects for all people. Critically, this means that Sutartinės singers perceived the squeezed second as dissonant at a purely reflexive level. However, in the conscious rating task, Sutartines singėrs rated the squeezed second as more positive as compared to the other groups.

Surprisingly, we found that Sutartinės singers rated the perfect fifth more positively than did all the other groups. We had anticipated the higher rating for the squeezed second, but this was not a possibility we had anticipated. So – how did we account for this unpredicted result? We suspect that the familiarity argument we had used for the ratings extends also to timbre. The synthesized voice timbre we used had been designed to sound like the Sutartinės style. We surmise that the Sutartinės singers’ familiarity with this timbre was responsible for their liking this interval more than the other groups did.

What does this mean for the Nature vs Nurture debate in consonance and dissonance? At first sight, the apparently conflicting results may appear to confuse the issue further. However, on closer look, the results are indeed consistent. It makes sense that our instinct is to perceive the ‘rough’ squeezed second as negative – yet the rating results suggest that familiarity with a musical culture where seconds are used frequently results in a subjective view of these intervals as positive. All in all, is our perception of consonance and dissonance down to nature or nurture? It seems that the answer is that it depends on quite how you ask. Our reflexive response seems to be governed by how an interval interacts with our auditory system and in particular the basilar membrane, whereas our slower, more considered response seems to be governed by which musical styles we are accustomed to hearing. Perhaps music psychology scholars should look outward to other disciplines in psychological science to find a way of working that allows us to combine the strengths of both the Nature and the Nurture arguments.

We would like to express our thanks to the Institute of Acoustics for funding this research, to Akvilė Jadzgevičiūtė for assistance with English-Lithuanian translation, and to the participants who took part in the study.

The full text of our recent article is available:

Armitage J, Lahdelma I, Eerola T, Ambrazevičius R (2023). Culture influences conscious appraisal of, but not automatic aversion to, acoustically rough musical intervals. PLOS ONE 18(12): e0294645.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.