Using Figshare to help meet the new NIH Data Management and Sharing Policy

November 9, 2022

Ana Van Gulick

The NIH’s new Policy for Data Management and Sharing takes effect on January 25, 2023. In this webinar, we go through the requirements set out in the policy and our tips for how Figshare as an established generalist repository can help you comply with the data management specifications for making your NIH-funded research data as open as possible and as closed as necessary. We cover how Figshare meets the NIH Desirable Characteristics for Data Repositories, how to include Figshare in your data management plan, how to plan for data sharing costs including for large datasets on Figshare+, and more.

‍

Transcript

Please note that the transcript was generated with software and may not be entirely correct.

Hello, everyone.

‍
I'm just going to give it a couple more seconds for people to get on, and then we'll kick off today's webinar.

‍
Can see lots of people joining, and if you can't hear me, or if it's correctly or anything, last snow now. And we'll try and on anything else before we start the presentation.
OK, great, the numbers are going up, so I will continue and say welcome to our webinar today Using Fixture to Help Meet the New NIH Data Management and Sharing Policy. I'm just gonna share the housekeeping bits and then I'll pass over to Ana for today's webinar. So all attendees are in listen only mode, but if you would like to ask a question or get some clarification, just use the Q&A function, or, alternatively, you can use the chat. I'll be monitoring both in the event choices that we get loads and loads of questions during the Q&A, and we don't have time for all of them. Not to worry. We'll be sure to follow up with those that we didn't get to answer during the session because we'll be able to see who asked what we're also recording today's session, and we'll be sharing it with all registrants in the next day or so. So if you need to drop off at any point, not to worry, you will receive the recording through an automated e-mail.

‍
So I think that's just about everything for me. So without further ado, I will pass over to honor to kick things off. Thanks very much.
Thanks, Laura.

‍
Hi everyone, I'm Ana Van Gulick, I'm the Government funder Lead and head of data review here. Figshare and excited to be talking to you today about how you can use Figshare to help meet the new NIH data management and sharing policy. So we'll talk through a little bit about what's included in that policy for anyone who hasn't been tracking it super closely yet. Time is coming to do so.

‍
And then we'll talk about the features of Figshare and some aspects of how you can use the repository to share data. It's right into a data management and sharing plan and to plan ahead for those data management and data sharing costs.

‍
So a quick notes to go back to the top level about fixture for anyone who's new. Figshare is a trusted, cloud based repository for storing, sharing, and discovering research outputs. We just celebrated our 10th birthday this year.

‍
Hosts more than four million research outputs, used by half a million users worldwide to share all of their different scholarly outputs, which includes hundreds of terabytes of data and this works than cited more than 100,000 times. We also use our infrastructure to support data repositories and research repositories for more than 90 organizations.

‍
So in addition to our picture dot com repository, these organizations such as academic institutions, funders, government agencies, publishers use Figshare to power their own repositories.

‍
And one thing we're really interested now looking at the next decade of open data is what I call open data two and something you may have heard of the fair principles the stand for findable, accessible, interoperable, and re-usable. And so, now, making data open and publicly accessible is step one, making it, so that someone can find it, and can re-use it, and can know what that data is, is Step two. And that's something we're working with, the larger repository and research communities to help support.

‍
Um, this webinar is, of course, about the NIH data Sharing policy. So you'll probably be aware that there has been an increase in the past five years or so. Of the number of funders, and publishers that are requiring data sharing. So they may make this a condition of your award. They may require data management plans. A publisher may require that you have a DOI for the dataset that accompanies publication. And you may need to put this in a data availability statement. And all of this is really spurring a huge growth in open data, which is great to see. And we can look at sort of how these funder policies have evolved over time in the US. So you go back to 2003.
There was an initial NIH data sharing policy for very large awards that one actually still holds today until January. But they've had some updates along the way, such as a genomic data sharing policy in 20 14, and one on clinical trial information in 20 16.

‍
Notably, in 20 11, NSF started to mandate data management plan. So, some of you that write proposals for both NSF and NIH may already have experience writing ... regularly for NSF proposals.

‍
Um, and we've had some White House Office of Science and Technology Policy influence here as well with a couple of memos in 20 13 and then look just most recently, this past August, stating the need for open access to federally funded research, both to research publications and two datasets.
And the most recent of these memos required is going to require that all federal agencies make these publications and data available immediately upon publication. So, that's, again, going to continue this shift.

‍
That will go into effect in 20 25 for all Federal agencies.

‍
However, the NIH data management and sharing policy, we're talking about today, will go into effect ingenieur. So, it was, this policy was released actually, quite a while ago, about two years ago, October of 2020.

‍
And after getting community feedback, and now, we're finally approaching the implementation date of January 25, 2023, just a couple of months away from now.
So, NIH has since continue to build out a number of resources. And so, I would point to them, as the experts they're creating, a lot of guidance. They have webinars you can watch. They have template materials.

‍
They're starting to now released the forms that will be used for these data management and sharing plans. I just learned today, there's a new form for the budget request for data management and sharing costs, that's just come out.

‍
So, do go to this sharing at NIH dot gov site.

‍
You know, I'm presenting my perspective on the policy today, and a few highlights, but this is where you can find all of those resources.
And if you do have questions specific related to the policy, I would point you, I'm asking them, of NIH or your specific program officer, that's who will be reviewing these plans.

‍
But when you go to this website, you can find the data management and sharing policy there, as well as guidance.
And I'll pull out a few of the highlights here.

‍
So, first of all, this, what does this new model new data management and sharing policy require?
So, it's going to require all NIH funded research that generates scientific data, which is the recorded factual material commonly accepted in the scientific community as of sufficient quality to validate and replicate the research findings.

‍
So, any research generating that type of data will require a data management and sharing plan, which will need to be submitted and evaluated on an ongoing basis.
And so, really, this covers all types of funding, so it includes all extramural grants and other award types contracts.
And also intramural research projects, quite importantly.

‍
So, but it does not include funding that doesn't generate data. So, training conferences, awards like that.
So it's really about those who are generating data.

‍
Importantly, this data should be shared, regardless of whether it supports the publication or not.

‍
The plans are meant to maximize and encourage data sharing, So, doesn't technically say that every data must be shared, But it says, Use your plan to maximize data sharing as much as you can.

‍
So, that would include sharing datasets that support null results, and it states that these data should be shared by the time of a publication, or, at least by the end of the award period, whichever comes first. So, perhaps for at all results, they might be shared at the end of the award.
All data must be shared.
Broad data sharing is encouraged, and so you just want to try to see, you know what data can we can we share, as much as possible.
It also encourages researchers, too, leverage the existing data repository infrastructure that exists and has really grown over the past 10 years of open data.
So, it encourages researchers to make their data more fair, than the principles are just introduced, to share data in established and trusted repositories that already follow community standards for metadata and persistent identifiers, and have good plans in place for preserving the data and making it discoverable.
To use discipline and methods specific repositories if they exist for the data type, to help make them discoverable unreasonable as possible, but also to use trusted generalists and institutional repositories when those are not available.
So, that, that gives you a sense of what the NIH research repository data ecosystem looks like. Some of you may be very familiar with this already. But there, I like to think of it as three broad classes of data repositories that sort of exist these days. one would be domain specific repositories.
Many of these are funded directly by NIH, or even run by NLM or other H organizations, or funded by them.
So might be genetic sequencing, databases, like genbank, or Protein Data Bank, or something like that. And these allow for very specific metadata, and also have strict requirements for the way data must be formatted. Which is great because it allows for Recombining data and really making it very re-usable, However, they're not very flexible, Right, if the data must be of a specific type, and must be documented a certain way. You know, these exist outside of NIH thing ecosystem as well. open neuro for human neuroimaging data. So that's that's one side of things.
Then you have the more flexible options that might take more types of data, or perhaps even more types of research outputs beyond data, even say software, code, videos, images. And that would include both institutional repositories and generalist repositories.
Of course, an institutional repository requires you to be affiliated with a specific institution. So this is where I would say check with your institution, with your library, with your data services group, which might be in the library or in the Office of Research.
To find out what resources they are providing, they can often provide a lot of guidance, and they may even have a repository resource that you can use, Uh, for free and you can write into Data Management plans.
If you don't have an institutional repository, that's where generalized repositories come into play. These are available to everyone and they are the most flexible of the trusted repositories.
So just an example of those FIG share institutional repositories that I mentioned. So we kind of are involved in both sides of those flexible parts of the space because our fixture infrastructure powers some institutional repositories.
So here's examples from the University of Arizona and from the ... Research Campus.
But today I'll be focusing on our public resources that are available to anyone. Wish would be FIG share dot com, our main free generalist repository. And then a little bit about ... Plus, which is a newer repository we launched about a year ago, specifically for larger datasets.
So, NIH acknowledges sort of, you know, this, this landscape of data repositories, and does not specify a data repository that you must use. So, they outline a number of characteristics that you might want to look for when selecting a data repository.
And they also point to this, you know, landscape of domain specific. So they encourage the use of domain specific repositories, first, when they're available. They list some here that are NIH supported, so you can find that on their site. You can also find these listed on a community community maintain site, ... data, to search for specific repositories that may be available.
And then it lists generalist repositories as well, or disciplines, or types of data that may not have a domain specific repository. Then the NIH acknowledges these can be a useful place to share data, because they offer that flexibility in data type format, content and discipline.
And so you can see here that fixture is listed together with seven other generalist repository resources on this NIH guidance.
And, no, to just say that, generalist repositories do play an important role in this landscape.
Here's a little bit of data, from the past nine years or so looking at citations of data that had been published in a few different generalist repositories, and you can see really the uptake, especially since 20 17. These curves have just gotten steeper. So these are This is data that's in a generalist repository.
And not only is it then deposited there, but someone is also citing it in the scholarly literature, and you can see that dodo at the top, sort of running away with it. Fixed shared drive. Dataverse also there, and so I think we do see that researchers need that flexibility, in some cases, for data sharing.
So NIH outlines a number of characteristics of repositories and these may seem a little overwhelming if data repositories aren't what you do.
Day to day. There are a long list of about 14 characteristics and they say, you know, make sure your repository you pick has these, we won't tell you which one to use.
So we've tried to put together a help page for you about how FIG share meets most of these NIH desirable characteristics.
They're not required characteristics, They're really recommendations, but fixtures, Infrastructure, and functionality really does meet most of them, with the exception of some of the more restricted human subjects, data use cases, for which, you know, we would want to use a more, a repository that offers restricted access controls that are specific for sensitive data, which is not something that we do with our fixture dot com repository. But we provide full open access. We use standard persistent identifiers, standard metadata for broad discoverability, everything you publish with outers without an embargo is free and fully open access.
There are metrics that are collected for each datasets.
We have, we offer security, integrity and privacy controls, provenance, and restricted access in some ways for certain datasets. So I would encourage you to go to that Help page, if you want and see all of that.
We've tried to put it together for L confer a little to make it a little bit easier to find all that information in one place.
But we do have a history of hosting NIH funded data, share dot com.
If you search, people actually aren't great about adding their funding to datasets on FIG share dot com. So searching for NIH funded datasets right now gives us about 110 items and 600 datasets.
I'm sure there are actually many more, but it's important to say that even with just that subset where people have added an NIH award number that we're able to find in their metadata.
That these items have more than 300,000 views and downloads and more than 350 citations. So it is impactful to share this data and make it publicly available, is being re-used.
We've also been working with the NIH directly on a couple of projects in the past few years and that's actually why I joined FIG share in 20 20. It was to help support these projects prior to that.
I got into cognitive neurosciences, first, and then working as a academic data librarian.
And came over to support this project initially, which was with the NIH Office of Data Science Strategy to provide a fixture pilot for NIH funded researchers. In this was to provide a flexible generalists repository for both intramural and extramural researchers.
And this was just a one-year pilot, but we did have submissions from about 22 out of that 27 or 28, depending how you count them Different NIH institutes and centers that comprise the National Institutes of Health.
So National Cancer Institute, Infectious Disease and Allergy, Environmental Health, and Safety, no Heart, Lung, and Blood Institute, all of those different ones were represented. And part of this activity was to see was there a need for this flexible repository? And we really saw that there was, and it was also used by intramural researchers at NIH.
Then our current work with the NIH Office of Data Science Strategy is something that we're doing collectively with the generalist repository community, And this is a program called gray. And it's the, stands for the Generalist Repository Ecosystem Initiative.
And so gray launched this past February, it now includes seven different generalist repositories.
Dryad, Dataverse, FIG share, mentally, open science framework, dibley and no doubt.
And we're all working together to enhance our support for NIH funded research data in generalists repositories.
How can we better support the use cases of these researchers? How can we have common metadata standards that allow searching for this data across repositories, making sure it's discoverable, making sure that there are common metrics, tracking its re-use.
How can we track research that's associated with certain funding of certain research organizations?
So, we're all coming together as part of this four year project, to support that collectively, and you'll probably see that we're doing quite a bit of training and outreach as well.
And it's really well timed with this policy when there'll be so much more need for NIH data sharing.
So, that takes us now to talking about the fixture fixture dot com repository specifically.
So, this is a free freely available generalist repository that can be used for any research discipline and to share any scholarly output.
So it provides that flexibility, as well as those standards I mentioned, like having persistent metadata, and providing open access and tracking. So I'll walk through these a little bit so you can share any research output type or file type.
We preview over 100 file types in the browser, so when you upload them, someone can view them without even downloading the file. But we realize that you may need to share other file types as well, or even zip files.
So you can actually upload any file type that you have, and someone else can download it.
You can upload single files up to 20 gigabytes and publish data sets up to 20 gigabytes at a time.
If you have a dataset that's larger, that we'll talk about fixture plus where we support larger datasets.
And then you can also group items into Collections to make sure that related materials are all found together.
We have an Open API, an FTP, as well as integrations, such as those with Git Hub, gitlab, and Bitbucket so that we're matching researcher and workflows.
We want to make the barrier to data sharing low, the Soviets, simple, and easy process, and you to build it into your day-to-day data management and data sharing for your research and your research group. So you have also have collaborative spaces that you can share with others who you're working on the dataset with.
We offer a persistent metadata. So, each item published on ... gets a unique data site DOI digital object identifier. You might be familiar with these from publications, but they're actually really important for every scholarly output. These give that a unique, persistent identifier that can be cited and tracked over time. And that can also be linked, so then you can link the DOI for the dataset with the DOI for the paper, and say, This dataset supports this paper. We know that there are two different outputs that the researchers should get credit for separately, but we also know that they're related to one another. And similarly, we have other persistent identifiers for authors, percentage of our Help ORCID, and you can integrate your ORCID accounts with your fixture accounts and push materials back and forth.
Then it links to publications, linked to the funding sources, embedded in the metadata world. It's all about creating those linkages and making sure everyone gets credit. And everything is tracked and links.
When you publish content on FIG share dot com, it is fully open access. The metadata is licensed with CC, zero, and the files can be licensed as you choose with a CC zero or CC V Y license.
CC, V Y B one requires attribution CC zero being the broadest re-use possible, really an open license, which may be actually the best choice for data in many cases. So you'll want to consider that.
We also offer some software specific licenses.
And then with all of this openness, we want to make sure that it's discoverable.
So content is indexed across search engines and indices, like Google Dataset Search on Google Scholar, depending on the item type.
And then that allows us to then track the impact. So you have a public author profile.
And the views, download citations, and altmetrics scores are attracted the individual item level, or the collection level. So you can see what sort of attention your work is getting, even report on that attention to funders or to your institution. Importantly, the citations for datasets and materials are pulled from the full text of the scholarly literature.
People don't always cite datasets in the references lists.
So it's important that we're looking for those dataset dois throughout the full text and people can find your work with faceted Search.
So, that's a lot of the features.
How would you distill this down into what is being requested in the NIH Data Management and sharing? So there's a few top level things to consider here. Parts of a data management plan, such as, you know, how it will be overseen, or whatnot, would sort of, you know, go beyond the scope of the repository, but there's a lot of things you need to specify about the datasets.
So you need to plan what data will be shared, where it will be shared, when it will be shared, and how it will be shared. And those are things that you can specify and can include fixture for parts.
And I want to say specifically that it is perfectly fine to use FIG share dot com when other generalists repository's jointly together with other disciplines specific repositories as well. These may fit different outputs that you have from the research project.
You might even put different forms of the same dataset in a discipline specific and a generalist repository.
Perhaps they're for different audiences who might re-use different formats of the data, or you want to, you know, make them accessible in different ways. You wouldn't want to duplicate the exact same dataset, but you may have a dataset, and then you may have other materials, like the code, or the images that supported it, That you can't put it in a discipline specific repository. So, you can put those in a generalist repository and then link those GWAS.
So, if you go to, I should have pointed out, this is the page on our help page that we have. So, how to write a DMP and include big share. And so, I've pulled out a few highlights from that here.
We won't, I won't go over them all, but to say that these include some prompts about what to write in the plan as well as some example language about a fixture.
So if you're including fixture, you can include some of this language if you'd like and adapt it for your plan, but to be, you know, be clear about what data. you plan to put it fixture, will it be raw data or processed data, must be de identified.
State what file formats you'll be using, what documentation you might include.
You can say that each dataset and the picture will have that persistent identifier that DOI or a PID as it's called in this, in this realm.
These days that kid is version controlled, sighted and tracked, you'll have metadata associated with that DOI and then maybe state what license you plan to use and that the data will be discoverable and preserved.
So, once you've included a picture in a data management plan and we'll get to executing that plan.
So not only do you need to write, the play is carried out, and my understanding is that, while program officers will review these plans initially at the time of funding submission, they'll also be the ones who are going to be checking during the period of performance, how these plans are carried out. And you may need to report on them in annual reports or the funding reports, and so forth.
So important to know how to do that. So you can easily get started with your FIG share dot com data sharing. Just go and create an account sign up with an e-mail address free and easy.
I'd encourage you to create a researcher profile and link your ORCID. This gives you some options about if you want to push information back and forth between your ORCID profile and your fixture profile. Feel free to toggle those on or off as works for you, but you authenticate your ORCID accounts.
You can also add some other social media if you'd like, and then you'll end up with this public profile page that people can find out to your ORCID your Twitter, see who you publish with, and also your collective statistics usage picture.
We can start uploading files up to 20 gigabytes, find our API documentation, or to use our FTP. These are available in that that's available in the Apps section. And our API documentation is at docs dot picture. But you would just go to create a new item, and then drag or drop your files, if you're using the browser.
one thing to point out is item types and Collections, So, an item refers to sort of that single page in the repository. You've got, file a single file, or many files could.
be up to, you know, several hundred files, if you'd need. And then the description, the metadata, that single unique DOI, that all corresponds to an item in picture.
And that item could be, any item type, could be dataset, code, media, presentation, paper, document workflow, however, you'd like.
A collection, on the other hand, is a public grouping that also has its own metadata and its own DOI. But then you put other public items into the collection. So this can be useful if you have multiple datasets, supporting a single paper, or project, or research group. And you want to be able to point people to all of them with a single DOI that's tracked. Then I would encourage you to put those public items in the collection and use the collection dois to do that.
It can get a little unwieldy if you publish different things, all separately. And you have sort of nine different DOI's for the datasets.
And I would just encourage you to group things as you would want, as you think people may use them or saipem, or as they need to be licensed. So, if you have things that need to be licensed separately, they'll need to go in.
Different items, but I would put them together if you think someone will cite them.
So it may you know collected data from different sources, you know, would people, great. I've got a slide that's all the steps.
You've got data you've collected from 20 different mice, people are unlikely to cite the data from mouse one separately for maps, too.
So you would probably want to put them all together in a single item.
If you want to preserve file hierarchy, you can upload zip files or other compressed files, and that will help preserve the file structure, and people can preview the file items, the file, sorry, file names within the different levels of the zip file.
In the browser before downloading it.
However, there's other cases where it might make sense to separate things, that you have the dataset and the code, and they need to be licensed separately. Or maybe in a publication, you have several different experiments that are using different methodologies. So someone may cite the MRI dataset separate from the dataset.
They show different things, they use different methods, put them both in different items, and then group them in a collection to point to all of the research materials supporting that publication.
And then I've listed here a few other considerations for choosing a file format, using some consistent file naming conventions, including documentation.
So, you'll see that there's a lot of metadata you can add to the fixture record, but you can also upload documentation as files. So as a readme file or text, text file, codebook, data dictionary, and it's certainly appropriate to do both of these.
The documentation in the metadata will be discoverable and searchable.
The documentation uploaded as a file will be downloaded by the end user, together with the data set, helps keep it together, and when they go back to re-use it, they'll have easy access to your instructions and descriptions.
Really, I would encourage you to do both.
Here's some examples of datasets on fixed sharing. You can see many different item types can be shared and previewed. So you've got some media, image files.
In this case, 10 different files in an item, got a spreadsheet. that can be previewed. Someone can click through the different tabs of a spreadsheet and preview them all.
On the far right, you've got a zip file being previewed with the files in it, um, and the bottom of an item that has a few different file types in it.
And then importantly, like I said, you can share any type of scholarly output as well.
So this is broadly about meeting the NIH data sharing policy. However, feel free to use FIG share for other scholarly materials as well.
So conference posters, slide decks, presentations, pre print workflows for how you did something. You can share a PDF or an image file, even teaching materials workshop materials, as long as there is something that is your own scholarly output. You can share it publicly accessible and licensed and suitable.
Here's an example of the collection. So, you can see it has its own title and description, metadata, license, sorry, not license, the license to be at the item level, but it does have its own citation and DOI.
And then, underneath that, what I've put on the right would be all of these public datasets, these 13 datasets that would be within the collection when you go to that DOA.
And then each one of those, as it's item level page, and it's item level citation.
So, here's the describing your work part.
And this is really critical for bear data, and it's something that I would encourage you to think about as you manage your data. So data management, I like to say, is the gateway to data sharing. If you have good data management throughout the life cycle of a project and everyone on your research team is documenting the dataset well, it will make it easy to publish it later.
And here, we just have the metadata fields that are included in our Edit item page. So make sure you have a meaningful title, included authors, metadata to help it be discovered, and then links to related materials, and I'll walk through a few of those briefly. Here's a dataset that has quite a lot of documentation, so you can see, you can go pretty far with describing your methodology, even just here. And that will really help someone who comes across this dataset. Know what it is.
I think a lot of researchers feel that the dataset is fully described by the paper, which works well if someone's reading your paper and they have the DOI to the dataset or the link in the paper, and then they go find the dataset and download it.
However, in this new world of open data, we want to think about data as a first class research project product on its own, and something that someone may find by searching across data repositories. They're looking for data about x-ray scattering curves. They come across this. Oh, what is this dataset?
So while you can put a link to the paper here in the dataset and say, Go read the paper that describes the dataset, it's even more effective for that search and discovery.
And reusability, if you can be very clear about what the research method is, what the variables are, what the research question was, that this dataset answered conclusions, formatting. If all of that can be included here in this metadata, and in the file documentation, that lets this dataset standalone, so please don't title your datasets, dataset, dataset dot excel, or even the name of that paper. Right?
Because you want to differentiate the dataset as being its own research output that is worthy of credit.
You did a lot of work to collect that data set into documents it, so you want credit for that separately. So, oh, you're a scribe, your work and meaningful Title one that provides context is not identical to the paper, although you may reference the paper. You might say, This is the dataset supporting this paper title.
Add, add the authors, so make sure to add all authors who are relevant, who contributed to this portion of the research project. You can add authors whether they have a FIG share account or not. I'd also encourage you to add their ORCID if you can find it, that will help them get credit for the work.
Select Categories, this will help people find your work. If you select the research fields.
Select one or more research category that it fits into. This is a drop-down list using a controlled vocabulary, so you can select that?
Select your item type.
These are the item types on FIG share dot com You're using a fixture for institutions repository. You may have even more to choose from but here's, you know, where you can decide Is this a dataset or that media that's really up to you. I think people can say that image files are now considered data in many fields. So that's your call there.
Here's the license types, can select on picture dot com from these options. So, here's the software specific options I mentioned, as well.
If you're sharing software, and you can consult with a data library or someone at your institution, if you're not sure about the app, either license, that's appropriate for your work, importantly, and really importantly, for NIH, they would love, if you listed your funding sources. So, with big share, we have an integration with our sister company, Dimensions, where you can add specific funding sources, NIH grants, and awards, by searching by the award, title, or number. So, this field, as you start typing, will auto, will, will pull, from the dimensions database, a number of funding sources. So, here's some examples of searching by grant title or by Grant ID.
Importantly, there's many ways to format and NIH funding code, and for this purpose, you want to start with the activity code, so that R oh, 1 T 32 K 99 code.
Oh, by the Institute Code. That just for the Institute at NIH. So, UI for National Eye Institute here, The example. And then that six digit serial number in the funding field, you don't need to add the supplement or the ..., that's before or after these.
So, if you type that in it usually with NIH funding, it's all pretty well listed. And then once you've added that, it will create a hyperlink.
So, then, the funding, rather than just being a list of funding numbers, as maybe, you know, you're accustomed to putting into funding acknowledgment in a publication. Instead, it will be listed here like this. So, you'll have the funding title, and then the NIH institute that funded it, or the funder name, in general.
And, that will actually be a link out to more information. If this is not possible, you are also welcome to put free text in there. So, you can do that for any other funding sources, that aren't indexed, and dimensions, or if you have other funding to acknowledge.
But, for those that you do list or dimensions, then that will be clickable link out to dimensions page, with more information about that funding and that award, who was awarded to, which institution, which PI, amount, years of that.
And that will really help NIH to track the data. So this is part of that tracking ecosystem now of open data, to make sure that you get credit for complying with your data management plan.
Then similarly describing your work.
so in picture, you can add references, two related materials to other things that support the work. So this could be a GitHub repository.
A clinical trial, a paper or a pre prints um.
Code base.
Other documentation data in another repository. data in FIG share, a collection picture.
And that's where you can put these in references.
We're actually in the process of redoing this Edit Item page.
So soon I will have to redo this presentation with all new screenshots, and it will be a more intuitive user interface for creating this metadata. I think it will actually be really great.
It will give you more prompts about what information to put there. It will lead us, customize the metadata by item type. So for data versus software versus paper, what's important to put there? And these references will also be changing.
So though, rather than the single field where you put a link right now, next year, there will be a title, a link, and a relation type field, and that relation type is where you'll specify, oh, this is a, this is a paper that this dataset supports, or this is a paper, that site, that we're citing in this paper. So that's important for linking those, and you will see that forthcoming.
We do offer some restrictions. So we want data to be as open as possible, but recognize, there are cases where that's not possible and restrictions may be necessary. So you can apply a permanent or temporary embargo on the entire item or on just files.
So you can do that at the bottom of the Edit Item page to assign an embargo put a reason for the embargo. A lot of times, this works well if you're publishing the paper, but don't want to release the dataset until the paper is published.
However, this lets you publish the metadata first and just restrict the files, I think, is a nice way to do it.
And it lets you get that metadata page live and the DOI live, which editors love to see. So, when you have your data availability statement, and you've got the doe eyes there. They can check and see that they're alive. And all you need to do is go in and edit the embargo when the paper comes out to make it all public and available.
That DOI is critical, so you'll see, get to that by going to the site button.
These are version controlled, so if you make certain changes to an item effort's published, it will append a version number to the dataset. This is good, So someone knows, you know, which version of the dataset they were looking at, and if it has change. So, changes to the title, the authors, the files, will trigger a new change.
However, you can also change lots of the metadata without creating a new version. So, you can add links to publications, you can change the description, you can add other funding links, And that won't change things.
So, these guys are really important for that reporting. So, this is what you would include in the grant reports. I believe NIH allows you to put them in, biosketch now, to think of these, as important research outputs, that you should get credit for.
Here's an example of everything being open and indexed.
So you can see that these items can be found in Google dataset, search, and other search engines of your choosing, and they're indexed by data sites, as well as dimensions.
And here's the metrics that are tracked at the item level. So views, download, citations, and then this altmetrics score, which gives a glimpse of other types of attention.
So maybe before you get a lot of citations, this is an example of one with a lot of citations or soft software item, which has been used a lot. But you can see that sometimes, you know, news media, social media may take on to pick up on something first, and that's the kind of attention that our sister company altmetrics will track for them.
Then when you click through, you can see where this attention is coming from, what regions of the world, what types of work.
I've mentioned FIG share plus. This is a repository we launched a year ago for supporting sharing of larger datasets. So the need for sharing larger datasets is growing.
We, in the NIH pilot sites, we see it in our support tickets to ...
dot com regularly, that people have datasets that are 40 or 50 gigabytes, or 300, or 400 gigabytes, or 5 or 9 terabytes, and they need to share them, They want to share them, they're important outputs of that research, we try as much as we can to accommodate those requests. But from a sustainability perspective, for us as a business, there's a real cost to hosting these datasets.
We host them in the cloud, we host them, redundant, Lee, They're preserved, we have a, you know, copy with multiple copies. They're hosted in the cloud.
And we also provide free access to them, so if you're familiar with Cloud Storage, you'll know that it's not just the storage that costs money, but it's the access and egress fees that costs money. So we don't want everyone to penalize or anyone for having a really successful and well re-used dataset that gets downloaded thousands and thousands of times.
However, the bigger the dataset, and the more times it's downloaded, the more money it does actually cost us on the backend of infrastructure to support that. So in order to have this be sustainable, and, you know, I think everyone is recognizing that data sharing, data management, and sharing has real costs. There are.
there is labor involved in preparing the datasets, documenting them, curating them, and also truly in hosting them. And so NIH data Management and sharing plans allow you to acknowledge these costs, and to build them into the direct costs that you're requesting in a budget. And that's where I'm hearing, that there's a new form for how to do that, that I need to look into, because I just learned that the hour before today.
Um, but you can plan for these costs in advance and help build them into your award's, and NIH or other funders may often be quite willing to pay for these data sharing and data curation costs.
So we've run with that.
And in order to make this a sustainable practice for everyone, we launch fixture R plus where we can have a one-time data publishing charge for datasets that are over 20 gigabytes.
So we start with tiers, one, going up to 100 gigabytes for about $400, and then up to as many terabytes as you need in 250 gigabyte tiers.
We've had a nine terabyte dataset published. Large one, most people are sharing 50 to 50 gigabytes to a terabyte.
Typically, we also allow single files to be up to five terabytes on fixture plus. So if you have that single file restriction, or you need to have more files per dataset, we allow up to 5000 files instead of 500.
You can use our API, we offer additional Creative Commons license options. So on FIG share dot com, we'll just see seizure and CC by, but we also offer commercial share alike, a few of the more restrictive Creative Commons licenses, just for a little added flexibility.
But this, no repository includes all of the standard FIG share capabilities. It needs all the same NIH desirable characteristics. So you can feel confident in putting fixture plus into data management plans.
The data does not get siloed.
There, it is still discoverable on picture dot com, Google dataset search discoverable everywhere.
It's just that we're using a different repository portal for your user account to upload the dataset.
And part of that is so that we can also help with checking the data.
So, this is something we did for the NIH pilot.
We had repository experts, data, experts, help, during the data upload and deposit process.
And then we did some metadata enhancements, we would review the metadata, give suggested or required revisions to the authors, and say, Could you make this a little more complete? Could you make sure you list the funding to do link out to this? Could you describe the methods a little more?
Um, and, and now, and I think that makes a really huge difference to have some guidance there, to make sure that the metadata is complete and high quality. So, that's another service that we provide as part of the fixture plus deposit.
Is that check, which I have a diagram here. So you submit, comes to our team for review. We review the metadata. You make revisions.
It gets published with that more complete metadata to make it discoverable and re-usable.
And so this, I think has a big impact and it's something we wanted to make sure that if someone was, you know, paying to deposit a dataset, that it was well described, and could be found and re-used.
So you can go to are, whoops.
our site here knowledge FIG share dot com slash plus.
It has all of our transparent pricing You can get in touch with us at review at fixture and we can help you plan for that and help incorporate fixture plus into your data sharing costs and budgets.
If you think that you have a research project for NIH, that this is going to have many, many data sets, such as you want to create, really a database, or as part of your research program, And you want to control the metadata and, you know, review and manage the datasets. That's something that our picture for institutions clients do, right? They do this review process themselves. If they turn elect to turn on the review module.
They may have expert data experts in the library who guide researchers through this process.
So if you think that you have a project like that, get in touch with us. Review our info picture dot com.
We can see about accustom FIG share repository. That could be part of your NIH award.
So rather than building a repository from scratch for the award. You could implement this FIG share infrastructure, but then customize sort of a lightweight fixture or repository for all of your output.
So I'll mention that just another and one other way we could support NIH funded work. But here's a few examples of large datasets In FIG share plus, again, you can have collections and multiple datasets. You actually have up to 10 items per data.
Deposit in FIG share plus an up to a year to publish them or more by request.
So here's a few guides that I've pointed to during the program today. We've got our Guide to sharing NIH research on FIG share dot com. This whole point you through a lot of the best practices I covered today.
How to include fixture to DMP, and how to share Data Fixture plus.
I will also point you to another webinar series that I am organizing and participating in. This is through that NIH gray program, so jointly with other generalist repositories.
And we have two more upcoming webinars.
one is tomorrow, and it's on a very similar topic, it's how do you include generalist repositories in your NIH data management and sharing plans?
Um, so you can register for it at this. Data science at NIH dot gov site links you to the registration page for it. and that will be from a number of repositories from vividly, from dry add from Open Science Framework.
All speaking about best practices for DMP is and then best practices for data sharing, coming up in December, for using and kinross Repositories for NIH funded data.
And we're also hosting a workshop in January through that program. January 24 and 25.
if you want to hold the date, that will be, you get announced very soon.
So that is it for me today. Thank you very much for your attention. I hope that was helpful. I would be happy to take any questions you may have now. Or you can also get in touch with me on it at FIG share dot com.
Thanks very much.
Thanks very much on, and that was pretty, and we haven't got any questions in, at the moment, in the questions or the chat.
We'll just give them, you know, so, see if any come in both the contact details, though, if you think of anything afterwards, and we'll be sharing the recording site, you can go outside, You get in touch with us, the whole review and fix your team.
I know there's a lot to plan for with such a seismic shift in data sharing for such a large funder like NIH.
But we're, all, everyone in the community is ready to facilitate this.
Who's coming in? Yes, I just follow up with us after, if anything comes up, but thank you everyone so much for joining, and hope to see you at another webinar mercy.
Great. Thank you so much, everyone. Have a good day.

‍

View transcript

register for our webinar

register to access our webinar

Using Figshare to help meet the new NIH Data Management and Sharing Policy

Transcript