Consolidating research data management infrastructure: a vital piece of the FAIR jigsaw & (meta)data quality improvements
“Putting all of your eggs in one basket” is an idiom with negative connotations, for example, when you’re referring to personal finance or data storage practice. But in our case, for the University of Oxford’s Sustainable Digital Scholarship (SDS) service, this is exactly what we are trying to do for digital research, offering digital research projects guidance, support, and a long-term home for their digital outputs. The opportunities to converge and consolidate research data management infrastructures onto managed, shared services (e.g., Figshare) are vast, but are also not without their challenges.
We have some exceptional, world leading “eggs” at Oxford and it is only right that we have “baskets” fitting to store and showcase them. However, many researchers are often wedded to their current (or until now have had little choice but to use), often aging, “baskets” which they have had for many years and it’s only when the “basket” finally gives out and “eggs” fall and are broken are they forced to consider an alternative. Let’s dispense with this metaphor and discuss the Sustainable Digital Scholarship service’s approach to rationalizing research data storage.
The Sustainable Digital Scholarship service: how do we ensure the content is accurate and as FAIR as possible?
The Sustainable Digital Scholarship service was launched at Oxford in February 2021 to offer support and guidance to researchers and provide access to a managed repository for storing research outputs and to showcase digital research projects. Projects are predominantly connected with the field of Digital Humanities; however, our support is by no means limited to one discipline. The primary aim of service, as the name suggests, is to ensure research data is sustainable. What we mean by that very much aligns with the FAIR principles.
Findable – A very simplistic view of meeting this principle could be the simple act of hosting research data on a platform like Figshare to make it more findable (and more accessible, interoperable and reusable) than some current hosting arrangements due to native features of the platform. However, the SDS team do offer support and guidance to researchers when it comes to metadata mapping and field creation for their projects to ensure items are well-described and custom metadata is used (where relevant) to make research more discoverable.
Accessible – It is quite often the case with some research outputs that not all the data can be made fully open for reasons ranging from personal data to copyright concerns. It has been very useful to have the feature to gate certain data items behind Single Sign-On for our repository and offer varying levels of restricted access or embargo.
Interoperable – Given the fact that many of the research projects the SDS service supports are from a Humanities-leaning discipline, the range in topics and required metadata categories have been extensive. However, we continue to work toward encouraging and promoting the use of commonly used controlled vocabularies and standardizing where a standardized approach is applicable.
Reusable – Given we are currently only 9 months into our journey as a new service at the University only time will tell. However, our hope is that as we work with and onboard more projects, we can look to reuse metadata standards and techniques to yield not only efficiencies but improved clarity & quality of (meta)data.
Research data “resurrection” of legacy collections: can we make version 2.0 better?
We predict that over the coming years, the SDS service will continue to work with researchers whose data collections or research projects have fallen offline or have experienced a level of diminishing functionality as part of its historical technical arrangements. Although this is potentially a worrying time for the researcher, out of the uncertainty of hosting on failing (or failed!) infrastructure, there are potential opportunities to reinvigorate and refresh the research project as part of its next iteration.
One current and relevant example we have been working with is a project called the Novum Inventorium Sepulchrale - Kentish and Anglo Saxon Grave Goods in the Sonia Hawkes Archive. It’s a fascinating database that published records of c. 1,000 graves and the objects found within them, including images and diary entries. However, since the project went offline indefinitely in 2018, all that remained was access to 2 metadata spreadsheets on a single project webpage. With the support of the project’s Principal Investigator, Professor Helena Hamerow at the School of Archaeology, the SDS team has brought this project back to life.
Clearly, there is a very binary way of looking at the improvements here for Novum Inventorium Sepulchrale in the sense it wasn’t a project database online and now it is back online once more. But also, we have had the opportunity to take a very hands-on and curatorial approach to cleaning the project’s metadata before we rebuilt it on our repository. Naturally, the addition of mandated Figshare fields to allow DOI creation for each record is an excellent improvement and a necessary process for ingest. We were also able to rationalize some of the metadata fields whether this be omission, merging, or adding new fields; the hope is that the quality of metadata attached to the collection will be improved by undergoing this process.
Our hope is that the number of research projects falling into the category of “resurrection” will diminish over time by virtue of the good work we are doing as part of the SDS service and encouraging the practice of building in “sustainability by design” for new research grant applications. This is something we are aiming to achieve by working closely with Research Facilitation & IT Support teams at the University.
Final Thoughts
If, at the University of Oxford, we can continue to amass digital research project “eggs” within our “SDS service basket,” this will hopefully improve data sustainability and make research as FAIR as possible. There will always be the odd “egg” that needs to be stored in a less than ideal “basket’’ that needs regular maintenance and updates or a custom-built feature-rich “basket” with all the technical ‘bells and whistles’ deemed relevant for a particular research use case. With the pursuit of research innovation this is perhaps inevitable, but where we can standardize and consolidate, we must do so as the benefits of doing so are significant.