We have all heard about the replication crisis in psychology. It is not that the research ideas or methods themselves are flakey but selective analysis and reporting as well as insufficient details are making the past findings difficult to replicate (see also the discussion at the end of one of the rare replication studies in music psychology). There have been several proposals aimed to mitigate this problem such as pre-registering studies beforehand and altering the statistical principles, but perhaps the simplest solution is to share data. To what degree does data sharing happen in music and science at the moment? Are there other incentives to do it than just produce better quality scholarship?
Sharing data basically means that data – which not only includes the raw observations but all explanations of the procedures (pre-processing, meta-data, etc.) relating to it – are freely available to anybody without any other restrictions than permissive licenses such as CC BY. Such data is typically called Open Data or Open Access data in some parlance. Open Data is now actively encouraged and even required by many funders such as Research Councils in Britain. There are ample opportunities to share data on robust repositories such as Harvard Dataverse, UK Data ReShare, Dryad, Figshare, or Open Science Frame (OSF). These repositories will provide excellent guidance on how to best share the data. But the question of motivation is perhaps the one that needs a reflection.
Why should I release my data?
Why should I go for the extra mile and prepare an Open Data release of my data? There are different drivers of sharing the data that range from standards imposed to us from above to recognition of what is good for the field in general:
- Because it is the right thing to do and funders want it.
- Because others will benefit from my data when testing their ideas and building models.
- Because it might disarm reviewers.
- Because it reduces the effort from the others to replicate your work.
- Because I’ve benefitted from the data shared by other scholars.
Why I might not bother?
- It is extra work and is not recognised as a particularly valuable.
- If you get the details wrong, it will not only undermine the study but your reputation.
- You might lose the opportunities to write the follow-ups of your study.
- You are not needed as a co-author when other people want to use your data or stimuli.
- Sharing has implications to the way you collect and process the data. For instance, it might influence how you articulate the ethics approval statements and informed consent forms. Some aspects of the sharing might also conflict with copyright issues such as releasing the music stimuli that actually have been taken from commercial recordings.
I think I have heard all these reasons and I have been influenced by many of them as well. Some boil down to laziness (points 1-2), and the notion of others running away with your data before you have had a chance to capitalise it is quite strong (points 3-4). A recent study among psychologists showed that it is particularly the perceived effort that prevents scholars from committing to the Open Data, but the social norms and attitudes can overcome this (Harper and Kim, 2018). For instance, when journals advocate Open Data, the social pressure becomes stronger and more efficient (Nuijten et al., 2017). In 2015, guidelines about Transparency and Openness Promotion (Nosek et al., 2015) for scholarly journals were proposed and now many journals subscribe to these guidelines.
Often we can decide ourselves whether we want to release our data. But is the moral high ground gained from making your work transparent beneficial in other ways too?
The benefits of Open Data?
Are studies with Open Data cited more? Yes, at least in sciences there is a 10% increase in citation rates for studies with Open Data (Piworar et al., 2013a; Piwowar et al., 2013b). For Open Access articles, the citation boost is higher, about 20% (Piwowar et al., 2017). It is still early days to determine whether the boost is a form of novelty or genuine mark of quality and interest, but if the future brings more detailed assessment of your scholarly impact, then Open Data would actually work towards your advantage if your data gets attention from other scholars.
Does Open Data lead to better quality research? I don’t know but analyses have shown that non-shared data leads to higher number of statistical errors and to more positive results (Wicherts, Bakker, & Molenaar, 2011), which are of course detrimental to research.
What about Open Data in music and science?
In my opinion this topic is becoming under the focus in our field. Many of the journals in our field (Psychology of Music, Music Perception, and Musicae Scientiae, and Psychomusicology) do not yet explicitly mention or require Open Data. Only the new arrival, Music and Science, has an explicit endorsement of the Open Data, although I suspect this will change over the next few years when the integration of repositories and manuscript submission procedures at journals will become more common (Royal Society Open Science is a good example requiring the full data to be deposited to an approved repository before accepting the manuscript for a review).
What about social pressure and training in music and science? I recognise that some colleagues have already shared their data for years and promote these standards actively. I’d like to consider myself as one since I have been releasing tools and stimuli since 2004 and datasets since 2012. There are some colleagues who are digital natives in this landscape and utilise the latest repositories and transparent principles to share their data in ways that surpass my efforts. If I just pick just three examples, they would have to be:
- Finn Upham, just take a look at their figshare repository for some really interesting materials on emotional responses and psychophysiology or check activity analysis code at github.
- Marcus Pearce has a wonderful repository of stimuli and data. He also promotes the sharing of data and software at soundssoftware
- Alexander Refsum Jensenius shares his models at github and talks passionately about how papers are not complete without the data.
We at the Music and Science Lab at Durham also like to strive for transparent research and we have shared the majority of our datasets. You can find data about chills, emotion cues, performer movements, emotions expressed by chords, and other datasets at our collection page.
Currently we are trying to tackle the educational aspects of Open Data by advocating replication studies in our undergraduate teaching (Music and Science module), and by sharing the good practices in the analysis processes among the postgraduate students and other colleagues. The experiences from these lab activities might be something that one of our postgraduates wishes to write a blog post?
3 thoughts on “Open Data in Music and Science”