Open data and the life sciences: the turning point
It may have finally happened. The catastrophic COVID-19 pandemic had you hearing conversations such as: “But what does the data say?”, “Did you read the Israeli study analyzing real world safety data from the Pfizer vaccine?”, or “Was the sample size large enough?” Scientific research may never be the same and COVID-19 could be the historic inflection point where using open data transformed how researchers collaborate. Life sciences research is global and COVID-19 has proven this with researchers from all around the world coming together to seek solutions.
The Open Data movement has slowly grown with 2,700+ repositories available in Re3data. Approximately 1500 are in the life sciences and 68 are COVID-related entries. So, where do we stand on sharing data since the COVID-19 pandemic arrived in 2020? This is the sixth State of Open Data report that Digital Science has published and this year’s survey results and analysis for 2021 reveal shifts in how life science researchers are viewing open data.
Key findings and guidance on next steps
- Respondents from within the life sciences (20%; n=820), from North America (26%; n=743) and from larger institutions (27%; n=246) were significantly more likely to indicate that their funder was encouraging them to develop a data management plan than average (16%; n=4,491).
Seek out a librarian on your campus to help with working on a data management plan and working on your data’s metadata. Many universities have data librarians on staff to help researchers manage their data. Creating a README file is helpful for future researchers and only 20% of 271 life science researchers were able to access a README file from survey results. Placing energies at the start of data collection will help down the road when it comes time to share, discover, and reuse the data. A good place to start is DMP Tool where you can access ready-to-use data management templates.
- Almost half (46%) of 779 life sciences researchers responded that they share their research with the public using institutional repositories, followed by external repositories (e.g. Figshare, Zenodo) at 39%, cloud file sharing (e.g. Dropbox, Google Drive) at 20%, funder repositories at 19%, blogs/websites at 14% and other at 13%.
Consider archiving data in a discipline-specific or a general repository. This helps with consolidating subject-specific data and will address the different and siloed approaches in how various publishers are handling data. Archiving data is another method to increase citation rates. If increased visibility to research and impact factors continue to be models for promotion in the academy, then archiving data and making it readily available should help with elevating researchers.
- According to the 820 life science researchers who responded, 42% had concerns about misuse of shared data, 41% had concerns about not receiving appropriate credit or acknowledgement, and 34% were unsure about copyright and data licensing.
Data can be as relevant as an article citation. One could even argue that an article citation only happens with data. Researchers should advocate for tenure committees to see the value of open data and rethink what “counts” in the academy. Many prizes are given for scholarly papers, why not prizes for data or other vital research content? For example, the importance of open data is being elevated thanks to awards that demonstrate the importance of open data including the University of Bristol Open Research Prize and The University of Groningen Library Open Research Award.
In addition, here is another opportunity for librarians to help with understanding and teaching copyright and licensing issues. Education efforts teaching about FAIR data principles continue to be an opportunity for librarians and data curators. 29% of 820 life science researchers had never heard of FAIR data principles before taking the survey. 30% of respondents indicated familiarity and 41% had previously heard of the FAIR data principles, but were not familiar with them.
Many complex issues involving open data continue to exist including interoperability between dataset, discoverability of datasets, misuse of pre-published research and long term storage, and data management strategies. The findings in the survey show how researchers are working with open data and the work that needs to continue to help with research innovations that save money and address the global problems such as climate change and food security.
Issac Newton is credited with the expression “standing on the shoulders of giants'' to exemplify that truths can be discovered by building on previous discoveries. In order for this to happen, a transparent process of sharing data is imperative to help with reproducing studies and creating new shoulders to stand on.