The State of Open Data 2021
Three key findings from this year’s State of Open Data survey

Dr Greg Goodey
Research Analyst
Springer Nature

Megan Hardeman
Product Marketing Manager

Over the course of the six years we’ve been running the State of Open Data survey, we’ve had over 21,000 responses from researchers from 192 countries, providing detailed and prolonged insight into their motivations, challenges, perceptions, and behaviors toward open data.

This year, the survey set out to continue monitoring the levels of data sharing and usage as done since the outset in 2016, and also focuses on a few key topics including what motivates researchers to share data and the perceived discoverability and credibility of data shared openly.

There is more concern about sharing datasets than ever before

In this year’s survey, the proportion of respondents indicating they have concerns about misuse of data, don’t receive enough credit or acknowledgement for sharing data, or are unsure about copyright and licensing has gone up compared to previous years. Given that 65% of respondents have never received credit or acknowledgement for sharing data, it comes as no surprise that this is an area of concern.

Respondents indicated that their primary motivations for sharing their data are: citation of their research papers (19%), co-authorship on papers (14%), increased impact and visibility of their research (11%), and public benefit (11%). These motivations are tied to more traditional institutional measurements of impact and credit. There are calls for credit systems to be put in place for data sharing like the Credit for Data Sharing initiative developed by the Association of American Medical Colleges, the Multi-Regional Clinical Trials Center of Brigham and Women’s Hospital and Harvard, and the New England Journal of Medicine. Initiatives such as this, however, have yet to be widely implemented.

Concerns over misuse of data and licensing are closely tied to ensuring data are as FAIR as possible; the more thoroughly documented the data are, the less likely they are to be misinterpreted or misused.

“ About a third of respondents indicated that they have reused their own or someone else’s openly accessible data more during the pandemic than before. ”

There is more familiarity and compliance with the FAIR data principles than ever before

It has now been five years since the FAIR (findable, accessible, interoperable, and reusable) data principles were established. Yet despite concerns over misuse of data and licensing, 66% of respondents had heard of the FAIR (findable, accessible, interoperable, and reusable) data principles. Of that, 28% were familiar with them, the highest number since this question was first asked in 2018. In addition, 54% of respondents thought their data was very much or somewhat compliant with the FAIR data principles; this was also the highest number since this question was first asked in 2018. These numbers are hugely positive and indicate that there could be a lessening of concern over sharing data in the long run if data are as accessible and reusable as possible.

A snapshot from the survey: To what extent do you think you make your data open in compliance with Fair?

There’s also a correlation between respondents who are familiar with the FAIR data principles and respondents who reuse their own or others’ data. Of those who were familiar with the FAIR data principles, 58% had reused their own data (compared to only 45% of those unfamiliar) and 44% had reused openly accessible data shared by other research groups (compared to 26% of those unfamiliar). This suggests that data that meets the FAIR data principles are likely to be reused.

Concerns about sharing data

Repositories, publishers, and institutional libraries have a key role to play in helping make data openly available

If respondents required help in making research data openly available, 35% relied upon repositories, 34% upon publishers, and 30% upon institutional libraries. Therefore, it’s imperative that these organizations are able to provide the required support and resources for making data open and FAIR. Areas such as copyright and licensing (55%), finding appropriate repositories (46%), and data management policies (43%) were where respondents needed the most help. Copyright and licenses continue to be the area requiring the most help (55%) and have been so since the question was first asked in 2018. Institutions can also provide more guidance on how to comply with their policies on open data with 58% of respondents indicating they would like more direction from institutions.

Problems/concerns with sharing data over the last 4 years

