How publishers can uphold research quality through embedded data support
Scholarly publishers have a fundamental duty in upholding research quality, from editorial expertise to managing the peer review process. Research data is a growing part of Springer Nature’s policies, systems and workflows and a key component of the ambition that research outputs should be openly available and reproducible. In order to uphold the quality of data alongside that of the related literature, we are building on the specialist support developed for data articles, developing processes more widely applicable across our journals.
Previous State of Open Data reports have highlighted the key role that publishers play in helping researchers share their data. The COVID-19 pandemic put a particular spotlight on data quality and, moreover, research quality. The scientific community’s initial response focused on making research outputs rapidly and openly available; funders, journals and researchers combined their efforts to ensure this happened. Preprints saw considerable growth from these initiatives and peer review times dropped. Up-front release of data was specifically included in these measures and some publishers, including Springer Nature, provided additional support for data curation and sharing.
However, there have been doubts raised over the quality of such “rapidly published” research. The Surgisphere scandal was a notable example of extremely rapid data release with major question marks over the quality, provenance, and veracity of said data, despite the fact that it swiftly formed a basis of public health decision making.
The role of research data and specialist support
So, what do we learn from such scandals? Research data has a clear part to play, ensuring there is evidence behind the claims in peer-reviewed literature. We at Springer Nature have championed FAIR data since its inception while supporting transparency and reinforcing community expectation through the rollout of standardized data policies. By focusing on the findable and accessible aspects, simply making data available is a first step in improving the quality of published research, allowing greater scrutiny of reported findings. Along this theme of transparency, much of the backbone of FAIR data is good metadata, with detail provided (or not) enabling an assessment of how much a dataset can be trusted. As the Surgisphere example demonstrates, however, data quality doesn’t end with making data available and potentially reusable.
Specialist data support (also known as data curation or stewardship) is a growing field enabling FAIR compliance and checks on the robustness and reliability of data. This is ideally provided as early as possible in a research project, for example when producing a data management plan. Some research institutions and repositories provide this service, but as the 2020 State of Open Data report outlines, researchers usually look to publishers for help sharing data related to their papers. While a researcher is the expert in their own data, a general data specialist supplements this expertise with support in areas the researcher may not know about like selecting the right repository, adding useful metadata, long-term preservation, data rights, and linking. Working alongside editors, who often bridge the gap in disciplinary and data-specific expertise, these specialist roles provide researchers with assurances about their data, minimise risk, and promote data quality.
Springer Nature supports data sharing both through improving data availability across our research journals and publishing data-specific journals and articles. Two prime examples of this data publishing are Scientific Data, Springer Nature’s flagship data journal, and the briefer data notes article type at BMC Research Notes and BMC Genomic Data. All have embedded support from research data specialists to safeguard data quality working alongside peer review of data and manuscript itself.
Like the FAIR data principles, the checking process considers three areas:
• the data themselves
• metadata describing these data
• infrastructure e.g. hosting, linking, and preservation
Expanding this support to a wide range of journals and disciplines, standardized checklists can form the basis of data quality assessment. The resulting action, however, might be something a specialist can apply an immediate fix to or that requires a closer look with the author and or/editors. Such issues include:
• Are the data shared in the right repository? Is there a more suitable discipline-specific venue available? Have the right standards been used?
• Are the data provided complete, consistent, and accurate alongside the reported manuscript or metadata?
• Do the data contain sensitive elements that should be removed or anonymized?
• Are the data licensed appropriately to maximise reuse?
• Do the metadata provide sufficient context for another researcher? Are the files organized in a way that supports access and reuse?
These checks may supplement or even overlap with peer review which will incorporate considerations such as methodology to produce the data. In this context, another main consideration and challenge for publishers is effectively getting data in front of reviewers.
What’s next?
The development and implementation of standardized data policies across Springer Nature’s journals has provided a strong foundation to promote and improve data quality. Data journals like Scientific Data have this at their core, as reflected by embedded support and the expectation of data peer review. The reality is that for much of research publishing, reviewers might not look at underlying data at all. With the growth of wider data policies, data sharing mandates, and community expectations, there is a growing awareness of data in the publishing process that can and should be supplemented by standardized checks, workflows, and supporting tools.
The development and implementation of standardized data policies across Springer Nature’s journals has provided a strong foundation to promote and improve data quality. The checks and balances outlined above lead the way in particular journals and article types; our next steps are to apply a suitable level of data expertise more widely throughout our publications and processes. It is encouraging to see in this year’s State of Open Data that researchers highly rate the quality factors of clear descriptors, classifications and coding of data, as well as links to peer reviewed literature. These are all areas that embedded editorial support can improve.
We also acknowledge that researchers come to us relatively late in the research lifecycle and that this is a community effort with other actors such as funders and institutions playing a growing role in data management, support, and sharing. Putting the researcher’s needs front and center is paramount, whether as data producers or users, and we all have a role to play.