The State of Open Data 2021
Natasha Simons
Associate Director, Data & Services
Open data saves lives. The global pandemic has highlighted beyond anything that came before it the importance of data sharing in solving the big challenges of our time. COVID-19 data may be the most visualized data in history and it was made publicly available on a daily basis to people all over the world. The urgent need to better understand and treat the virus in 2020 brought unprecedented collective and collaborative action from all research stakeholders on an international scale to bring down barriers to research and speed up analysis and testing. These efforts, combined with support from governments and industry, resulted in not one but many vaccines made available by the end of the year. This gives us a glimpse of what incredible research outcomes are possible when we start with collaboration to address a common threat. Imagine how much more we could do, how many more lives we could save, if research data was routinely made open and shared. So, why isn’t data sharing the norm? The answers lie in the harmony needed between policies, infrastructure, and practices.

Despite the increasing number and strength of data sharing policies from publishers, funders, and institutions — along with significant improvements in the technical infrastructure required to support data sharing — why is “data available on request” still the most common data availability statement in journals today? Why do researchers hesitate to share data and make it FAIR (findable, accessible, interoperable and reusable)? The reasons are complex and in this sixth year of the State of Open Data report, we have the data to reflect on these reasons. The data underpinning this report is based on the largest longitudinal survey of researcher motivations, challenges, perceptions and behaviors toward open data with over 21,000 responses from researchers in 192 different countries over the six year period. The State of Open Data report from Figshare, Digital Science, Springer Nature and other leading industry and academic representatives is a critical piece of information that enables us to identify the barriers to open data from a researcher perspective, laying the foundation for future action in addressing these barriers.

Enormous strides have been made in policy over the past decade as highlighted in the 2021 UNESCO Recommendation on Open Science. This landmark document defines shared values and principles for open science and identifies concrete measures for enabling open access and open data for adoption by the 193 member states. The recommendation includes making an effort to contribute at least 1% of their national GDP to Research and Development, to set up regional and international funding mechanisms for open science, and to ensure that all publicly-funded research is in line with the core values and principles of open science. 

What is most striking about this year’s State of Open Data report is that while researchers’ familiarity and compliance with the FAIR data principles is greater than ever before, there is also more concern about sharing datasets than ever before. In their article on the three key findings of this year’s State of Open Data report, Dr. Greg Goodey and Megan Hardeman stress that concern has risen in several key areas, one of which is not receiving enough credit or acknowledgement for data sharing. This points to the uncomfortable tension between the increasing ubiquity of data management and data availability policies and the rareness of rewards and recognition for data sharing. Clearly, the reward and recognition structures of academia are misaligned with the increasing demands for openness and transparency of research from publishers, funders, and institutions. 

Professor Ginny Barbour reflects the sentiments expressed by many of this year’s survey participants in calling for a change in the rewards system. In her article examining how open data can help validate research and combat scientific misinformation, Barbour asks: how can we ensure that the research done and published is of the highest quality and invokes trust? Open data, she argues, has overlapping roles to play in increasing the credibility of research and combating scientific misinformation so that wider society can trust it. Barbour challenges us to strengthen confidence in research as we seek to address the looming global challenge of climate change. 

The principles of open science and open data are globally applicable across all research disciplines and this year’s report contains perspectives from contributors in Africa, Asia, North America, Europe, and Australia. Daniel Kipnis draws out the State of Open Data trends in researchers’ attitudes, behaviors, and practices in the life sciences. Almost half of the life sciences researchers responding to this year’s survey share their research with the public using institutional repositories while almost 40% use external repositories such as Figshare or Zenodo. This is a significant finding as repository choices vary between disciplines and this is evidence that institutional and general repositories are the preferred option for many researchers. 

This year’s report found that repositories, publishers, and institutional libraries in almost equal measure have a key role to play in helping make data openly available. There is a shared responsibility between those who provide assistance to researchers that is not widely acknowledged and a corresponding lack of coordination between them. Regardless of the data sharing platform selected, researchers need help in making data open yet support for the effort required is rarely factored into the funding for research projects. Researchers must carry out this activity themselves and they seek help from those who may be able to offer it. What kind of help do researchers need to make data open and how is it offered?

Dr Connie Clare introduces us to a day in the life of Jan van der Heul, a curator for 4TU.ResearchData in the Netherlands. He describes scenarios whereby researchers need assistance to improve the quality and FAIRness of their data. Aside from assessing data files, he helps researchers improve the quality and richness of their metadata to improve the discoverability, reusability, and reproducibility of their research.

Veliswa Tshetsha, Rosina Ramokgola, and Pfano Makhera from the University of Pretoria provide tips for engaging researchers in open data practices. They suggest that while research data management is still new at the university, the institutional library will continue to grow support for data sharing particularly in key areas such as copyright and licensing which, according to this year’s report, continue to be the area that researchers require most help. 

While the report shows that researchers are seeking help from institutional libraries, institutional support for data sharing is not the sole responsibility of the library. Data sharing at the institutional level is a cross-cutting activity because it is a significant undertaking that involves support across the whole research lifecycle. To streamline the process, over half of Australia’s universities are collaborating to develop and trial a national research data management framework through the Australian Research Data Commons’ Institutional Underpinnings program. While still in progress, it is a promising model for institutional support. 

This year’s State of Open Data report contains a surprising insight about researchers’ attitudes to policy mandates. Of the survey participants based in Asia, 42% believe funders should withhold funding or penalize researchers for not sharing their data if the funder has mandated that they do so at the grant application stage. This sentiment puts the onus on funders to check compliance yet the STM Association’s 2021 research on funders with data policies found that less than one quarter actually checked compliance. The large variation in the content and strength of data policies continues to be a challenge to researchers’ understanding and compliance. While solid progress has been made in the area of publisher policies, we need to standardize and harmonize data sharing policies within and between publishers and funders. The funder-publisher alignment project currently underway through the Research Data Alliance offers promising progress in this area.

Hurdles to data sharing in the area of policy and cultural change will fall short if we do not have underpinning research infrastructure and the experts needed to run the infrastructure. We need world class data repositories, virtual research environments, facilities, supercomputers and the like to support open and FAIR data in all disciplines. We need information infrastructure on a global scale that enables interoperable human and machine readability of metadata, standards, and persistent identifiers to support data sharing and these need to be well established in research communities and embedded into research workflows. Nobuko Miyairi’s interview with Keisuke Iida from the Japan Science and Technology Agency shares insights into the development of J-STAGE Data, an evidence data platform for Japan’s learned society publishing. Iida outlines the challenges of building a data platform needed to match rapid changes in the scholarly publishing and technology landscape. 

There have been vast improvements in data infrastructure with the development of national and regional cloud services such as the European Open Science Cloud and the China Science and Technology Cloud. Alignment, cooperation, and interoperability between open science clouds is important as research is global, and initiatives such as CODATA’s Global Open Science Cloud aim to make progress in this area. Indeed, international collaboration through forums such as the Research Data Alliance, CODATA, GO FAIR and FORCE11 play a key role in identifying the challenges of policy, infrastructure, and culture change in open data and open science and putting forward solutions to these. 

The State of Open Data report provides insights and commentary to the progress and challenges in researchers’ attitudes and behaviors in open data. I hope you are as excited as I am to read the report to reflect on how far we have come in open data and where we need to go if we are to address the big challenges of our time and save lives. 

