Reviewing metadata of datasets before they get published and guiding researchers through the data sharing process.
File and metadata checks to enhance the discoverability and reusability of data in line with the FAIR principles and support repository-specific standards.
Over the past 10 years, we have observed a remarkable growth in data sharing coupled with institutional, publisher, and funder mandates for public access to research data. Along with this has come the recognition that making data open is not sufficient to make it reusable. The development of the FAIR principles in 2014 has helped unite global initiatives with a broad, common goal to make open data “Finable, Accessible, Interoperable, and Reusable” according to community standards.
At the same time we have seen researchers' knowledge of FAIR and interest in these data best practices grow year over year. FAIR data has great potential to advance science by supporting discovery, aggregation, and reuse of datasets by both human researchers and artificial intelligence models.
One thing that we have found through our work with institutions, including with a repository pilot with the National Institutes of Health (NIH), is that the involvement of human curators makes a significant difference in metadata completeness, which can affect the discoverability and reusability of open data. We found it was useful for data curation experts to review datasets before they were published and also to guide researchers through the data sharing process, which is a data management skill not often explicitly part of scientific training.
Both the notion of data curation and the FAIR principles have many layers; from checking metadata for completeness to open file formats, licenses, or specific metadata or documentation schemas and beyond. As the need for FAIR data review increases, these checks by humans or machines must be designed in a way that is scalable.
The Figshare repository infrastructure has encouraged these checks for several years through our review workflow and curation module for Figshare for Institutions portals, which allows data curators and librarians at the institution to review and approve items before they are published and to work directly with researchers submitting the work.
However, we recognize that not all organizations who are interested in supporting a Figshare portal to host open research have the staffing bandwidth or expertise to conduct this curation themselves. Thus, in 2020, we launched the Figshare Curation Service (FCS), as a service subscription that can be added to a Figshare for Institutions platform subscription.
Leveraging Figshare’s review workflow and in-house data curation expertise, FCS offers publishers, institutions, governments and funders, and commercial clients alike the option to have all submitted content moderated, reviewed, and refined prior to publication. The goal of FCS is to promote best practices for metadata assignment and naming mechanisms, with the aim of increasing discoverability of the research published to your branded Figshare portal and promoting interoperability and reusability standards.
When research is submitted to a Figshare instance that has FCS in place, our review team of Figshare staff with expertise in data curation and scientific research will conduct a review of the deposit before the item is made public. This review includes a spot check on a sample of the data files and a review of the metadata for quality and completeness. The review team may contact the submitting author by email and work with them to make edits with the aim of enhancing the discoverability and reusability of the published data. As part of this process, the FCS team will:
● Conduct a spot check of a sample of files to see whether they appear to match the general description of the research, can be opened, and include documentation.
● Check that a descriptive title is included for the dataset that provides context to the work.
● Check that an item is an output type (e.g. dataset, poster, article) that is accepted by the specific repository.
● Recommend file organization as single or multiple ‘items’ or ‘collections’.
● Check that a submitter has affirmed that no personally identifiable information (PII) is contained within the files or metadata.
● Check that the metadata description provides context for the data or links to resources that further describe it.
● Check that a license has been applied.
● Check that funding information is specified and linked if appropriate.
● Check that supporting papers or preprints are linked if applicable.
Note that FCS will not review or moderate the content of data files.
These checks are also customizable to your portal and the needs of your researchers or organization. This customization might include custom metadata fields for your organization, metadata, file, or documentation standards specific to a discipline or methodology, confirming links to publications, funding, or other resources, or other repository standards. Note that customization may not include the review or moderation of the content of data files as suitable for publication.
With Figshare Curation Service, our Figshare data experts are available to your researchers throughout the deposit process and can provide one-on-one guidance on using Figshare and recommendations for data sharing best practices. Custom training is also included with the service subscription including a custom guide for data deposit and up to 2 remote training sessions per year. A quarterly update from FCS staff will give you details about the items that have been curated and services provided.