Clayton, M., Jakubowski, K. & Eerola, T. (2019). Interpersonal entrainment in Indian instrumental music performance: Synchronization and movement coordination relate to tempo, dynamics, metrical and cadential structureMusicae Scientiae.

Summary: This paper represents the most extensive quantitative investigation of interpersonal entrainment based on natural performance data to date. We analysed note-to-note synchronisation using audio data and longer-term shared periodic movement using computer vision techniques from four North Indian classical duos playing 10 pieces in over two hours of recorded music. Synchronisation varied significantly in relation to tempo, event density, peak levels and leadership, while movement coordination was greater at metrical boundaries than elsewhere, and was most strikingly greater at cadential than at other metrical downbeats. These findings complement previous ethnographic investigations of this genre, and the methods developed here can be easily modified and extended to investigate interpersonal entrainment in other musical styles.

Extract from piece 3a_Gtr_Dham, showing CWT Energy of both performers over time, with cadences and non-cadences (both occurring on matra 1) indicated with vertical lines.

Eerola, T., Jakubowski, K., Moran, N., Keller, P.E., & Clayton, M. (2018). Shared periodic performer movements coordinate interactions in duo improvisations. Royal Society Open Science, 5, 171520.

Summary: We made use of two improvising duo datasets—(i) performances of a jazz standard with a regular pulse and (ii) non-pulsed, free improvisations—to investigate whether human judgements of moments of interaction between co-performers are influenced by body movement coordination at multiple timescales. Bouts of interaction in the performances were manually annotated by experts and the performers’ movements were quantified using computer vision techniques. The annotated interaction bouts were then predicted using several quantitative movement and audio features. Over 80% of the interaction bouts were successfully predicted by a broadband measure of the energy of the cross-wavelet transform of the co-performers’ movements in non-pulsed duos. A more complex model, with multiple predictors that captured more specific, interacting features of the movements, was needed to explain a significant amount of variance in the pulsed duos. The methods developed here have key implications for future work on measuring visual coordination in musical ensemble performances, and can be easily adapted to other musical contexts, ensemble types and traditions.


Jakubowski, K., Eerola, T., Alborno, P., Volpe, G., Camurri, A., & Clayton, M. (2017). Extracting coarse body movements from video in music performance: A comparison of automated computer vision techniques with motion capture data. Frontiers in Digital Humanities, 4, 9.

Summary: Three computer vision techniques were implemented to measure musicians’ movements from videos and validated against motion capture (MoCap) data from the same performances. Overall, all three computer vision techniques exhibited high correlations with MoCap data (median Pearson’s values of 0.75–0.94); the best performance was achieved using two-dimensional tracking techniques and when the to-be-tracked region of interest was more narrowly defined (e.g. tracking the head as opposed to the full upper body).


Research Tools

We have developed a set of user-friendly tools for quantifying and tracking movement from videos using EyesWeb (as reported in Jakubowski et al., 2017 above). These have been documented and made freely available for use by other researchers on GitHub at:

Screen Shot 2017-04-11 at 16.23.20