The State of Open Data 2021
Data Science - Figshare - Springer Nature

How open data can help validate research and combat scientific misinformation

Prof Ginny Barbour
Co-lead, Office for Scholarly Communication,
Queensland University of Technology
and Director
Open Access Australasia

The 2021 State of Open Data survey provides valuable insights into data sharing globally. Though it can’t capture what researchers everywhere think of data sharing, this survey of nearly 4,500 researchers offers helpful perspectives, some reasons to be hopeful, and some key takeaways that can support discussions on how open data can help validate research and combat scientific misinformation.

The decision to share data and the mechanisms necessary to support sharing don’t exist in a vacuum. In many ways, the problems of how to share data are reflective of both the culture of science and of current logistical challenges playing out across research globally. How can we move to a more open world? How can we ensure that the research done and published is of the highest quality? How do we increase trust in research? How do we shape an incentive system that addresses these challenges? The survey has insights to offer on each of these key questions.

It is worth noting up front that anyone answering a survey on data sharing is likely to have an interest in the topic as well as time and sufficient access to the technology to respond. Not surprisingly, therefore, the country with the largest individual responses was the US; the sample was tilted towards researchers from large institutions and only a handful of countries had more than 100 respondents. The respondents were largely supportive of open access to all research outputs (over three-quarters) — a higher number than would be found in a random sampling of researchers. The responses should therefore be viewed as being slanted to the technically well-supported, relatively well-funded end of the researcher spectrum, who have been exposed to discussions and information on open science.

What do we learn from the survey that can inform the debate on how we can promote trust in research? First, at an individual level, many respondents are putting in the hard work of data sharing. They are making data management plans (74%), doing work to curate their data for sharing (76%) and 66% of respondents are familiar with the FAIR principles that underpin data sharing. These are hopeful indicators, though digging further into the results, there are gaps in the support for data sharing. At the most basic level, 30% of respondents are unclear on who will pay to make data open, and more than half the researchers need support in understanding copyright and licensing of data.  The results show evidence of a policy vacuum around data sharing, with respondents looking to both institutions and national funders for leadership: 52% of researchers believe that funders should make data sharing a requirement and 48% feel that if such a mandate is in place, funders should hold researchers to it. 

When we turn to researchers’ efforts to reuse data, difficulties become apparent, and the importance of key infrastructure is highlighted. For example, respondents who sought to reuse others’ data were more likely to be able to get the full dataset from an institutional repository compared with a journal. But even when datasets were accessible, more than 50% lacked clear licensing information, and the quality of other descriptors were variable, pointing to gaps in key metadata and contributing to researchers’ perceptions of the quality of the datasets. When considering challenges to making their own data open, researchers expressed a series of concerns, ranging from being scooped, having sensitive data misused and getting insufficient credit for making data open.

“ Open data has two important, overlapping roles to play in increasing the credibility of research: validating research, so that researchers can trust it, and combating scientific misinformation, so that wider society can trust it. ”

Underpinning all of research has to be the concept of reproducibility. For too long, we have had a publishing system that rewards the publishing of new and exciting findings in specific journals more than publishing of confirmatory (let alone so called “negative”) findings. Largely, these findings are still published in a prescribed format that allocates more time and resources on typesetting and branding than it does in providing access to the underlying data, code, and other materials that allow research to be verified. The survey shows that researchers are only too aware of the limitations of the system that they are required to work within but that they understand the need for change — and want this change. 80% of researchers thought that a research article that had data openly available was more credible but when we wonder why this practice is not more widespread, the finding that only 18% of respondents believe that researchers currently get sufficient credit for sharing data, offers an explanation. Researchers know how they want to be credited: 61% want to get credit for the data they share through citations of papers with data. Interestingly, in the absence of such credit, researchers are cooperating among themselves to share credit by including the generators of datasets as authors on papers that reuse these data. This practice indicates, yet again, how critical it is for researchers’ careers to get sufficient credit for their work and in the absence of other mechanisms — such as specific support for open science practices, as championed by DORA and through the Hong Kong Principles — researchers will attempt to get credit through the current system of journal publications.

What about wider public trust in research? The COVID-19 pandemic has shown us, yet again, how critical it is that research is trustworthy. Making data open is not of itself a panacea for public support but it can certainly help. High profile retractions of papers during the pandemic because of concerns over underlying data show how far we have to go. In less highly-scrutinized research, it’s unlikely that the problems with underlying data would have come out so quickly, if at all. Contrast this state of affairs with what the public expects for other products that they consume: there would be outrage at a similar lack of proper control in the production of a novel food item. As we face the complex challenge of climate change, trust in research will become even more critical, especially as climate policy is so politically charged and climate research itself is often the subject of public debate. A recent paper from the International Science Council makes the case for ensuring data behind research is available so as to strengthen the trustworthiness of research.

So, the underlying message of this State of Open Data report should be one of cautious optimism, but with some pointers for change. Researchers largely want to share their data, but the current system fails to support or adequately or reward them for doing so and we are still a long way from a world where it is the norm to share fully-curated data. Until then, researchers are left to navigate a system that makes it harder than not to share and where, most alarmingly, the public may only fully understand the importance of data sharing when it’s shown to have gone dramatically wrong. There’s no time to lose. We need to strengthen confidence in research as we seek to address the looming global challenge of climate change. 

Over the course of the six years we’ve been running the State of Open Data survey, we’ve had over 21,000 responses from researchers from 192 countries, providing detailed and prolonged insight into their motivations, challenges, perceptions, and behaviors toward open data.

This year, the survey set out to continue monitoring the levels of data sharing and usage as done since the outset in 2016, and also focuses on a few key topics including what motivates researchers to share data and the perceived discoverability and credibility of data shared openly.

