Play the webinar
September 21, 2022
Ana Van Gulick
Figshare.com is Figshare’s free, generalist repository platform available for researchers to use to store, share, and cite their research outputs. In this webinar, Figshare’s Head of Data Review, Ana Van Gulick, will show how researchers can share their NIH-funded research on figshare.com including best practices for data quality and creating links between data and related funding information and papers for better discoverability.
Please note that the transcript was generated with software and may not be entirely correct.
Hello, everybody, those of us that have already joined, I'm just gonna give it a couple of minutes for others to get logged on, and before we begin the presentation on the hour, so, with a couple of moments.
OK, so we're just about on the hour, so hello, everyone, and welcome to the webinar today, which is Tips for Sharing NIH Funded Research on figshare.com. So we got a brilliant presentation coming up from Ana. But to begin with, I'd like to just share some small housekeeping bits with you. You're all in listen only mode, as attendees. And I should just mention at this point, if you can't hear me now or at any point throughout, just pop a message in the chat or the Q&A box, and we'll try, and sort out any technical difficulties.
If you would like to ask a question or any clarification throughout, again, you can use the Q&A function or the chat function and I'll be having a look at both. In the eventuality that we get lots and lots of questions. And we don't get time to answer them within our session today. We'll be sure to follow up with you afterwards because we'll have a record of who asked what and we're going to be recording today's session and we'll share it with all registrants in the next couple of days. So if you have to drop off at any point, don't worry, we'll be sharing the full video with everyone.
So without further ado, we'll move on to what you came see, and I'll pass over to Ana for today's presentation.
Great, thank you. Laura, thanks everyone for joining us today. Welcome to our webinar. I'm the Government and Funder lead and Head of data review here at Figshare. And today we're going to specifically be talking about sharing NIH funded research on figshare.com, our free generalist repository.
And so I'll go over some of what we're seeing in the NIH data sharing landscape. Obviously the new NIH policy that's forthcoming, I'll touch on that broadly and then go over some of the features of Figshare and how you would use it to upload your research data or other materials.
So first of all, just to start at Square one for anyone who's new to Figshare, the trusted cloud based repository for storing, sharing, and discovering research outputs.
And you can find us at figshare.com.
Figshare.com was founded 10 years ago, this year. We're celebrating our 10th birthday all year.
And over that, time, we've published more than four million research outputs, helped more than half a million users, and currently hosts hundreds of terabytes of data. This research has been cited more than 100,000 times in the scholarly literature, and we also support research repositories for more than 80 research organizations.
Academic institutions, publishers', thunders, government agencies, and so forth.
And then another big focus for us now, as we're looking into the future, is supporting data that is as open as possible. That's free to access. And that also is re-usable. So that adheres to the fair principles findable, accessible, interoperable, and re-usable. And that's sort of what we think of as the next stage of open data data.
That's not just open, but can also be discovered re-used, that is licensed for re-use, and you can find out more about these principles in this paper here written by a number of people in the open data community. And how you can apply that to your research. We'll talk about that when we given examples and fixture.
We've seen recently, in the past 10 years or so, a dramatic increase in data sharing requirements. For a researcher.
Or someone supporting researcher in the US, in particular, there's been a shift in public and private research funders requiring more data sharing. It's actually true globally, not just in the US, but really, really, all over. That research funders are recognizing the importance of data management plans and data sharing.
the data that underlies the findings are publicly available so that it can be re-used, and we can move research further faster.
Similarly, publishers of journals have had a big impact in requiring data availability, and that's really helped move things, uh, faster in this direction towards more data being publicly available.
So, here's a brief timeline that highlights a lot of the NIH work in this area, NIH policies, So, there was an original NIH Data Sharing policy in 2003.
That's actually the one that's still, um, in effect today, which requires data management plans for awards, over half a million dollars.
During, since then, there have been a number of other changes that have come and oh I should have updated this slide. Apologies with the 2022 OSTP Memo, I borrowed this from a previous slide deck, I'll make that adjustment, but we had in 2011 an NSF data management plan.
In 2013 the OSTP Memo Expanding Public Access to research results.
Then the NIH Genomic Data Sharing Policy and clinical trial information in 2014 and 2016.
And then if we had, if I had been thorough enough to remember to add the 2022 OSTP Office of Science Technology Policy from the White House memo Expanding Access, Public access without any embargo period to publish research results and to research data.
That's critical in moving this forward with all of these funding agencies.
And that certainly dovetails with what will go into effect in January of 2023, which is the new NIH Data Management and Sharing policy that I hope many of you attending today, are already aware of.
So, here it was announced about two years ago. Just a little last in October of 2020, that there would be a new policy for data management and sharing.
And it's effective date is rapidly creeping up on us.
It goes into effect on January 25th of 2023, just about four months from today.
And I would encourage you, if you haven't yet to see the NIH's guidance that they've put together on this policy and all of their data sharing requirements.
They've built a new website at sharing.NIH.gov.
And it has lots of details about which policies apply for your work, how to write a data management plan. They're adding more FAQs and templates. That will all be available on this site. And they also have a webinar, that will help them promote, which is tomorrow. And that is a deep dive into the policy. So if you have more specific questions about that policy, I would encourage you to go here from NIH.
Office of Science Policy and Extramural research about that, but you can find this guidance there on their site.
But I'll highlight a few things here about the policy just so that we're all on the same page with what we're talking about or NIH data sharing in Figshare.
So, the new policy will require that all NIH funded research, generating scientific data, includes a data management and sharing plan that will be submitted and evaluated together with the proposal for the research.
And this will impact extramural grants as well as contracts or intramural research projects.
As long as those are projects generating scientific data, which is defined as the recorded, factual material material, commonly accepted in the scientific community as of sufficient quality to validate and replicate the research findings.
So this has a lot of impact in terms of the reproducibility of the work, and in understanding, have someone arrived at their conclusions.
This data must be shared regardless of if it supports a publication or not, and it should be shared as soon as possible. So, one thing to think about, it is not just what you're sharing, but when are you sharing it?
And it does need to be shared, at least by the end of the award period.
And the policy does not strictly say that all data must be shared. So it's important to think about what you'll share, but the policy does encourage broad data sharing.
So it encourages the researcher at the time of grant submission to think broadly about what data would support replications, even data that they might collect that support no results might be included, as well as data underlying publications and try to maximize appropriate data sharing. In one of the FAQs NIH has provided, they say, what are reasons that are not good reasons for data sharing or saying that you can't share your data. And they include things such as, I don't think the data would be of use to anyone else, or I think the data is too small, or things like that.
So really, there are reasons, some data might not be shared, but they want you to think broadly about sharing as much as possible.
So it encourages researchers to make the data more FAIR, the principles I just talked about, to use established trusted repositories that follow community standards for metadata identifiers, we'll talk about how Figshare meets these requirements.
They do want you to use discipline or method specific repositories that exist for a data type if those are available, but also to use trusted generalized, or institutional repositories. For data that may not have a discipline specific repository, that is a good fit for it. Really just using trusted repositories as much as possible, whichever ones are the right fit.
So, if we think about what that data repository ecosystem looks like, very broadly speaking, we can somewhat break it down into these types.
So you have domain specific repositories or discipline specific repositories.
Many of these are funded by NIH, or run even, even run by NCI and NIH themselves, These like Protein databank, genbank, why they weren't based and things like that that are really specific to type specific types of data.
And these offer a lot to the research community because they can make data very re-usable with very specific metadata requirements, to describe the research methodology and the dataset which makes it easy for researchers to find that dataset and recombine data cross studies.
So this is something that NIH encourages researchers to use.
Institutional repositories which Figshare provide the infrastructure for for some institutions.
All offer more flexibility, in many cases, such that you can publish many different types of data, or types of files in them, but these are, of course, restricted too.
You're two researchers that have an affiliation at that institution, so this is certainly something to look into if you're an NIH funded researcher.
Ask at your institution, often at your library, data services, groups to find out if there is a data repository that could help you publish your NIH funded research.
But then in the most flexible category in the middle, would be the generalist repositories.
And so this would be like Figshare.com and these are repositories that are available to anyone.
And that, accept any type of data, regardless of discipline, methodology, files size, different generalists repository's have different features, but, broadly speaking, they offered the most flexibility to any for anyone to use.
So NIH provides some guidance about selecting a data repository, which you can find on their site.
It's not a specific recommendation of one, but rather outlines characteristics of data repositories.
So they also do list some repositories, specifically for domain specific repositories. So if you're looking for those, do look at their guidance there to try to use these repositories first, if they exist for your data type, or your discipline. You can also find a registry of these at ...
re3data.org, which is another list of repositories.
Then they lists generalist repositories as well on this sharing NIH sites.
And specifically, they say the general's repositories are a good place when you need to publish data, regardless of data type, format, content, or disciplinary focus. So you'll see Figshare listed here together with several other generalist repositories.
With dryad with Open Science framework, with Zenodo and so forth.
And so that's the space that we sort of inhabit with Figshare.com as our free gentlest repositories.
And so there is this need in the NIH funded space for that flexibility, I think. And here's just a graph showing citations of data in generalists repositories in the scholarly literature over the past 8 or 9 years. So you can see that, during this period, the citations to data in these repositories has really been growing quite quickly, especially over the last few years.
So researchers are finding these generalist repositories to be a useful resource for their data sharing.
So I mentioned that NIH outlines a number of desirable characteristics for data repositories, and they advise researchers to look for repository's that meet these characteristics.
If you don't work in the data repository space day-to-day, it may be a little hard to interpret what all of these characteristics are, and whether a particular repository meets them. So we have created a Help page to try to outline how we meet.
The majority of these NIH characteristics, including open access, persistent identifiers, standardized metadata, will free access, full open access to the work that's published.
Providing licenses for clear, re-use, security and privacy policies, and some of the functionality for restricted access as needed and so you can find that guidance on our Help page, if you would like.
Um, we already host lots of NIH funded data on Figshare.com. Although I have to admit that when doing this search. I'm sure there's actually quite a bit more.
So, doing a search for datasets that list NIH funding, or work that lists NIH funding, get about 1100 items, and 600 or more datasets.
However, this kind of goes back to the quality of the metadata that researchers.
And so I'm going to encourage you in this presentation today to make sure that you always link to your funding.
Specifically to NIH Grant codes and I'll show you how to link those in Figshare because I'm sure there's even more biomedical data on Figshare and we're just not able to capture it. Because we don't know that it had NIH funding, but even just with this, these numbers of items. They have more than 300,000 views and downloads, and more than 350 citations.
So making that work, public is having a real impact.
People are looking at it, and people are citing it in their scholarly work, So that shows you that value of posting in a generalist repository.
We've had the privileged to work with NIH on some of this investigation of generalist repositories in the NIH data landscape.
So we conducted a NIH Figshare pilot from 2019 to 2020 for the NIH.
And this was specifically running a Figshare repository, a separate Figshare repository from figshare.com, and making it available to both intramural and extramural NIH funded researchers so that they could share their work in a flexible way in a generalist repository.
And just during that year, we had about two terabytes of data that were shared. And data that was from 22 out of the 28 different NIH institutes and centers. So a pretty broad scope and a broad need for this kind of resource that we uncovered.
And that brings us to our work today with NIH and the office of data science strategy, which is a program called GREI. And this is the Generalist Repository Ecosystem Initiative.
It is something that six different generalist repositories are participating in a multi-year project.
And one of the goals of this is to better support the sharing of NIH funded research data in generalist repository's.
How can we support those NIH use cases? How can we make data discoverable across generalist repositories?
Improve standardized metadata metrics. Discoverability make the data more re-usable. So we're excited about this work and it's funding a lot of the new feature development that we're doing for fixed share that you'll be seeing in the coming years to really improve the platform for supporting NIH research.
OK, so with that said, let's talk specifically about Figshare and the figshare.com repository. What are the features of this repository? How can it support your work?
How does it support data sharing for all research disciplines and any scholarly output?
So I mentioned it's a generalist repository. So one of the biggest things is flexibility.
We're also meeting researchers workflows where they are providing this persistent metadata. So descriptions of the data that are discoverable across indexes. So someone can find this data when they do find it, making sure that's fully open access. And then lastly, making sure that we can measure the impact that work has so that researchers can get credit for not just your peer reviewed publications but for all of the work that you're doing for your NIH Awards.
So we'll walk through each of these a little bit further.
Flexibility. So with figshare.com you can share any research output and any file type.
We preview about 1200 different file types in the browser, which means that someone can explore that file or multiple files without having to download the item. But we recognize that different use cases and different research disciplines have different file types that they need to use. So we don't put any restrictions on what kind of file can be uploaded.
Can also upload zip files, preserve specific file hierarchy that you might need, and things like that.
You can upload files up to 20 gigabytes, on published data sets up to 20 gigabytes, we also have support for larger datasets with our fixture Plus repository, which I'll talk about at the end of the presentation.
And lastly, we have the flexibility to group files into items and collections.
So that you can have DOI's links to individual files, or groups of files, or groups of items. So you can easily point people to your work at many different levels.
Mapping our repository to researcher workflows is important. We want to lessen the burden of
Data sharing, we want to make this easy. We want you to be able to build data, share it into your projects easily from the beginning of a research project.
So we offer an Open API which is available for both Upload and download content, both files and metadata.
We also offer an FTP for uploading files so that you can be more efficient than using the browser for larger files.
Offer integrations with Git Hub, Git Lab, and Gitbucket.
So if you use any of these version control systems, you can publish snapshots of those repositories and make them sustainable in Figshare, Offer collaborative spaces so you can work with researchers at your institution or another institution.
Share projects, share datasets as drafts, and review them together before they're published, and also offering restricted access. So embargo periods, on datasets, that maybe you're waiting for the paper to be published, or data, that must be restricted access or some other reason.
Persistent metadata. So this is the document. The metadata, is what describes the datasets.
Importantly, Figshare issues a unique data cite DOI for each output that's published.
These DOI's or digital object identifiers. You may be familiar with them from publications.
And they're really important for tracking published datasets, and published research, and linking different research materials together.
Persistent identifiers, for authors, for institutions, for datasets, for papers, are all really important now that this is all becoming open access and public. And we want to make sure everything gets credit, where credit is due. So these DOI's can be reserved in advance, and they're suitable. So there are what should be included in a publication, and we track the citations of them across the scholarly literature.
We have orchid integrations so you can add your author, ORCID ID and those of your co-authors again, to make sure that you get credit for all of your work.
You can link to the peer reviewed publications, or pre prints associated with any of the data, or other materials and fixed share, and then link your funding. And I'll show you some snapshots of how you do all of this in, in our interface.
Next, open access.
So providing open access to all the public files and the metadata. The metadata is licensed with CC zero license so it can be re-used, files, are licensed according to the CC0 or a CC BY licenses are the two primary license types available on figshare.com as well as some software, a specific licenses.
So clearly stating how people can re-use your data is very important for making it fair.
We make sure that research data that's published is available, discoverable across search engines and indices.
Again, the better the metadata, the better the discoverability and making a commitment to that, FAIR data to point out.
Lastly, in terms of impact, I'm offering a public author profile.
Listing views, Download citations, and Altmetric scores, or the author, as well as for each individual item published in Figshare, allows you to see the impact of your work and report on that.
In annual reviews, in grant reports include those DOI's two datasets in NIH bio sketches, and grants reports and things like that to make sure that you're getting credit for all that work.
Pulling citations from the full text of the scholarly literature, not just from the reference lists. Where many people don't cite data, they just cited in the text.
So looking across the full text of pre prints and peer reviewed papers helps us get better, better capture that citation information. And then providing faceted search for people to discover your data on, let's say, share platforms as well as in other search engines.
So the first step in, thinking about sharing your NIH funded research in Figshare.
Now with the new NIH policy maybe considering how to include Figshare in a data management and sharing plan that you would submit with your proposal.
So we have a few tips that we've compiled on our help page for this, so a shout out to go look at that when you are preparing this for some guidance. Basically my suggestion and you can find suggestions from NIH as well, is to plan what data will be shared, where, when and how. And this is where it's important to point out that figshare.com and other generalist repositories.
Don't need to be used exclusively. They can be used jointly together with discipline specific repositories as well.
So when you think about where to share data for a specific grant project, you can consider using multiple repositories, say, for different versions of the data format. So the data, maybe certain format of the data belongs in a discipline specific repository.
But another version might be of interest to a different research community. You might put that in a generalist repository. Or maybe you have code or images or video that accompany that dataset that wouldn't fit in the discipline specific repository. You can say that you would put those in figshare.com.
And that would be a good solution to using multiple repositories for your data sharing.
But, really, I would encourage you to plan ahead for data sharing with your data management practices.
So, as you think about how data will be documented, as it's collected, as it's analyzed, Think prospectively about how to document it, or sharing for public use, or re-use so someone else knows what this data is. It's always easier to start that documentation from the beginning, rather than doing it all at the end. At the time when a funder or a journal says, Please deposit this dataset in the repository, you've documented it from the beginning.
That will help greatly.
And so here are just a few excerpts from that Help page. I won't read them all here but there are a number of prompts there in italics to help guide you in how you might incorporate fixture into a D S P plan. I think about what type of data you would put in Figshare.
where/when would you deposit it, What formats of the files, what types of documentation would accompany it? How would you license it so that it can be re-used, et cetera. And you can find other guidance from NIH in their DMSP template now as well.
So once you've got your plans underway you're ready to use Figshare and so I'll walk you through a little bit now of how that would work and how you can add high quality metadata to describe your research.
And make sure that it is discoverable and re-usable in the Figshare repository.
So first, account creation, if you don't have one, is quite simple. Just go into Figshare, account, register and sign up. You just need a name and an e-mail address.
You'll be able to create a free account.
Once you've created this account, I would definitely encourage you to set up a public researcher profile and to link your ORCID ID.
This will give you a landing page that will showcase all of the work you publish in Figshare.
And it will, you can set it up with your ORCID basically or individually identifier for a researcher.
You don't have one, go get one right now. But our integration with ORCID allows you to both push content from ORCID to Figshare and vice versa from picture to Oregon.
So then when someone is looking at your ORCID profile, they'll see all the content you've published in Figshare. You can decide to toggle on and off any of these configurations. If you don't want things migrating in one direction or the other, That's totally optional, but by adding your ORCID to your researcher profile, you'll ensure that your ORCID is included together with your name, in the metadata for all of your published data and research. And that's important to make sure you get credit for that work.
Here's a few examples of a public author profiles.
So, you can upload an image if you would like make this a showcase at your institution, and then it will show your cumulative metrics across all of your published work in the repository.
Can upload files then up to 20 gigabytes and publish data sets up to 20 gigabytes total, as well, each time. So, you'll see, sort of here, this is the total available storage you have. After you've published something publicly, that storage will be freed up, You can upload more content, again, to your accounts.
To get started, you'll go to my data and to create new item, or you can go to upload.
This is where you may wish to use the API to automate these things, or use the FTP so that you can upload large files more efficiently, but our upload will also batch things through the browser so that you can do that over time.
Here's a graphic representation of what I was saying, about items and collections.
So, when you go to create new item or upload files, that's going to create an item page, so that will give you the metadata fields to fill in and files, and, you can choose whether an item, which is kinda that suitable level. So, each item is going to have its own DOI that can be cited.
You can decide whether that item is, should have just a single file or whether you want to have many files or even an archive of zip files that are part of this collection of files in the item. And I always say to think about this in terms of the way that someone would re-use the data and cite the data.
So if you have a collection recordings for mice and you have 20 different mice.
Are people going to site mouse one data and mouse to data and mouse three data separately?
Or are they more likely to sites the data recorded from all of the mice over time?
If so, then you probably want to put all of the data into a single item.
At the same time, you might have maybe multiple studies you did that are related. So you have experiment 1, 2, and 3, that are part of the same project.
You might want each experiment to be its own item because maybe someone is going to replicate experiment, too, but experiment three is not relevant to their replication in that case. Maybe each experiment is its own item. And then you're going to create a collection to group experiments 1, 2, and three together. And a collection is also going to have its own unique DOI and description.
But rather than having its own files, it will have other items within it that either have one or many files. So think through the organization of, of items and collections. That way, that's helpful.
I guess I basically said this here, so when we share this slide deck, it will be useful for you to have this written down here, But you can upload one or many files to each of these.
Other considerations for files, tried to opt for open and preserve all file formats as much as possible, while also adhering to the standards that are used in your research community to maximize re-use.
So, this is something that may, there may be a research community custom. There may actually also be a requirement from your specific Funding Institute, or Center at NIH.
So, that's something else to bear in mind when writing your data management and sharing plan, is that, while the broad NIH policy is open and flexible for researchers to consider what will work best.
Specific institutes and programs at NIH may have their own requirements. And so do bear those in mind and incorporate them as necessary.
You may want to consider file formats, consider file naming, and also consider in the documentation that you might want to upload as files as well as including in the metadata. So this might be a readme text file, codebook, a data dictionary.
You want someone who finds this dataset to know what it is, and how they can re-use it.
So, here's an example of what I mean in terms of many different files and file types. And the preview of files, as well. As you can see, that we you can upload images and movies. And these are the preview of all in the browser.
Similarly, spreadsheets, text files will be, can be preview here, or you can have multiple files and then, sorry, I'm going backwards. And then you can look at each individual file separately. You can also preview the file names in a zip file.
It's worth pausing to mention here that Figshare is quite flexible in what types of research you share. So we're talking a lot about data today because that's largely the mandate of the new NIH policy is to share research data that underlies the findings.
However, Figshare.com can also be used to share many other item types, software, code, images, PDFs, multimedia files, poster's, slide decks, presentations, anything that is a scholarly output that's associated with your work that you want to get credit for.
So, just to highlight that, you can upload many of these other file types as well, and link them together with your dataset, with your funding code, all of that.
Here's an example of a collection of items. So you will see the collection as its own title, its own description, its own metrics.
But then, within the collection are these 13 different items.
And then each of those items has its own description, DOI and multiple files within it. So, that's kind of how you would create this hierarchy of items and collections.
So, that brings us to best practices for metadata and for uploading and describing your work.
So, once you've uploaded the files, you'll be prompted to add information.
You want to have a meaningful title.
You want to provide enough metadata for discoverability to really provide context to the work and also to create those linkages with other related materials that you've published.
So here's an example of kind of a well documented dataset with quite a lot of content that gives you an overview of it. And this is taken from that NIH fixture pilot from a few years ago.
But let's break down what this would look like when you are creating a draft item. So, here on the right, you'll see the draft metadata page that you'll get the Edit Item page, we call it. As once you've uploaded some files, you'll be prompted to add this metadata.
So, first, the title. Do be sure to add a meaningful title that provides context for the dataset to stand on its own.
Please do not title it Dataset or Dataset one, or dataset.X + X. Also, importantly, this title should not be identical to the related paper title. This is largely for indexing purposes, so, you want to make sure that the citations that are to the paper are counted separately from citations to the datasets.
And while they will have different DOI's, having a different title can also help to differentiate these in the scholarly literature.
However, that being said, you might want to reference the related paper or project title in the dataset title.
So, you may want to say MRI dataset supporting the paper, bilateral, bilateral, whatever.
It is that you, that you found there, are paper title is, so do, do the query thorough with those titles.
For authors, again, be sure to add more than just yourself as an author, if appropriate. So with your Figshare account, it will automatically add your name first as the author. However, you can drag and drop the order of authors, and you can add other authors.
You can add people that have a Figshare.com or other fixture repository accounts already, or you can also add them by name, e-mail address, or get ID here adding my cat, Theodore, Teddy.
He's done a lot of great scholarly work, and so I would encourage you to add ORCIDs for all of your co-authors if possible.
Again, it just helps make the metadata that much richer and helps everyone get credit for their work and the authorship list of a dataset or code or media file, maybe exactly the same as the associated paper or it may differ, and that's, you know, the judgement of your research group and your discipline, but do think about who should be credited with this work.
The category selection is next, we use the Australia and New Zealand fields of research categories as a controlled vocabulary for these.
You can see that there are nested categories and you can add multiple different categories.
Either just one or many.
I suggest adding as many as appropriate. Usually want into three categories and then you'll also want to add keywords and I suggest adding at least five keywords for each item. These might reflect the research area, the methodology. The research question or conclusion is similar to the keywords you would ask for a publication.
The item type, here's where I mentioned, we offer a lot of different item types. Believe about 16 figshare.com and even more in some other fixture repositories.
So, you can select, whether this is a dataset, or poster, or software, or a figure, or media file, and data.
Data is a broad term, and you might think that an image file is the dataset. And so, here, you know, do defer to what your research community standards are for that.
A license, you can select a CC license which requires attribution, or a CC0 license, which is basically the most open license possible that allows for the broadest re-use of work.
This is often recommended for research data, because it works, it allows for the most re-use and recombination of the data.
So you may want to consider CC0 for your datasets and the CCBY license for maybe more text based work or image files and things like that. There's also a number of different software specific licenses if you're sharing code.
Funding, adding funding is obviously super important. If you're sharing NIH funded work, NIH would love to know what data is shared in generalist repositories, and to be able to track this and link it back to specific awards, grants, et cetera.
So luckily, we are, we're able to work with our sister company Dimensions to add an integration where you can search for NIH funding, specifically. So we have this funding field here in the metadata.
You can add free text to it and, you know, copy in your standard acknowledgement sentence, this work funded by National Institutes of Health grants numbers, and so forth.
But I would encourage you to search for the specific award and try to find it in this database as much as possible. I do find that, generally, NIH funding is pretty well indexed here.
And you can do this by searching for the grant title or the grant ID.
And if you're not sure, most of, you can't title our grant ideas, you can look it up in NIH reporter using your PI's name and so forth, and then, finding the right award.
But here, you can see once you've started typing, you will, you'll be given this auto populating list, and so the tip for entering NIH funding is to begin with the activity code.
Sometimes there will be leading digits in front of this, a one, or a zero at that you can omit. That's often the renewal code.
Start with the activity code, like R oh one, T 32, K 99, then the Institute Code, which is going to be the two letter code specific to your Institute, National Cancer Institute, National Heart, Lung, and Blood Institute, whatever that is, followed by the six digit serial number.
Code for the grant, And if you enter it that way, you should find your grants, can then, once you've added it, it will look like this, and you can add multiple sets of funding. And then what will appear is the grant title, and then the funding body, which for NIH is going to be the specific Institute or center, within NIH that funded that work.
And you can see there are some also funding here that's been added as free text. So I showed you examples of both.
However, if we look at one that was added with the dimensions integration, that title of the grants will be clickable. If you click through on that hyperlink, it will open up information about that award in dimensions, and someone can then read more about their funding in dimensions, and see that this is an $11 million grant from the National Institute of Mental Health.
Describing your work related to related materials is also very important. So you want to make sure that you link datasets to pre prints and peer reviewed publications. You add that linkage in both directions, so the dataset DOI and the publication and the publication DOI dataset and vice versa.
In figshare.com, you can do this with the references fields. So you can link to really anything that says supporting material for the dataset.
So, it could be a GitHub repository, or a lab website, or project website that describes the dataset and methods further, could be in clinical trials registry, or other pre-registration of the work pre prints in bio archive, or archive, or another preprint server.
Peer reviewed publication related datasets, or in another repository. If you create a collection, I like to put the collection DOI as a reference in each of the items. So, it's clear that there's other related materials available in the collection. You can also put that in the Description section, where you can put free text and hyperlinks and links to those DOI's.
And we've actually just redone our description section so that you have more text editing capabilities to add headings and bulleted lists and hyperlinks more easily now.
Last, one of the, one of the other options, would be to restrict access, if necessary. So you can imply an embargo on the files, or on the entire item, if necessary. You can also put a reason for why you're doing this. So a common reason might be, that you're publishing the work, you're publishing the dataset, but you're going to embargo the files. Not make them publicly available until, the associated publication is published, and then you'll released the dataset files at the same time as the paper.
And so this can be done by setting the embargo on the files. This is a good option, because it allows the DOI for the dataset to go live, The metadata would be live, and public, just like you're seeing here.
On the right, someone can find that page, a copy editor who's looking at your manuscript, can make sure that the DOI pointing to the dataset works, and is as valid.
You'll know you're citing the right thing, but you can keep those files private until a little bit later until you're ready to release them when you can go back and add the item back date, end date of the Fargo.
The site of old trackable, DOI that I mentioned is really critical. You can reserve this in, advance at the bottom of the Edit item page when it's a draft.
And we'll just add the HTTPS DOI dot org forward slash in front of the 10 digits.
That's there in order to make the URL, which will go live once you publish the item publicly.
These are permanent, unique DOI's that point to that public item. They are version controlled. So if you edit the item, certain changes will create a new version of the DOI. that will be reflected publicly.
So, changes to the title, the files, the author list, will trigger a new version, but if you change, say, the description, or a link to a peer reviewed publication, that will not create a new version, so you can keep making edits to the item over time. We would encourage you to keep that metadata as up to date as possible, And if you have new versions of the dataset, it's helpful for someone to be able to refer to the specific versions, so if you do, change the files you find, there's some new things to add, or correction to make. Someone can cite the specific version. Go back and see previous ones as well.
These are the DOIs that you want to include in publications.
Link to on your webpages, include in grant reports and bio sketches, and things like that.
Once you've published something on Figshare it is open. It's discoverable, It's indexed in Google, and Google dataset search, in data site commonness, in the number of other search engines.
So someone can find it across repositories, not just by looking and Figshare.com and then we track the usage at that item level. So, here's the example of a fairly well used items are actually a piece of software, which tend to get more citations, the datasets, at least in the current day and age, and CLC of use downloads and citations for each item.
As well as this Altmetric score metric is another sister company of ours, and they'll show attention that is outside of the scholarly citation. Things like news, media, Twitter, blog posts, Wikipedia pages, and things like that.
So, you can kind of more fully capture the attention your work is getting and you can click through on this altmetrics logo then this donut. You'll get the color doughnut. You'll get a breakdown of where this attention is coming from, which can be really helpful for some of your work and that's getting quite a lot of attention.
In the last few minutes, I want to mention our newest repository figshare + for sharing big datasets. This is built on the same fixture infrastructure just as discoverable just as open.
But it's created as a way for researchers to sustainably share larger datasets.
So, we often get requests from researchers to share bigger files and bigger datasets, 50 gigabytes, 100 gigabytes, a terabyte terabytes. And we tried to support the research community as much as we can and we were trying to grab those requests.
But to be fully truthful, we have Cloud storage costs for storing that data redundantly, and permanently in the cloud.
And so as has been recognized by research funders, data sharing does include costs, those data curation, and data publishing and data storage all have associated costs. So we created Figshare + as a repository that has transparent data publishing charges, a one-time fee to publish larger datasets.
So, you can publish datasets, over 100 gigabytes, up to many terabytes, The single file limit, being up to five terabytes.
And so, to really support the use cases that we've been seeing, especially for NIH funded work, of there being a need for flexible generalist repositories, for larger datasets.
So we have that transparent publishing data publishing charge listed on the web pages, starting a few hundred dollars or 100 gigabytes and going from there, based on the storage size that you need to publish for that specific dataset.
And we also offer a few other things on pincher Plus to make sure that we're meeting the needs of these big dataset use cases as best we can.
We offer a few more license options, some of the Creative Commons licenses that have certain restrictions, like non commercial, or share, like.
And we also provide custom, individual support for data deposit from our picture experts, who will review your dataset and your metadata, and make recommendations, person edits, if necessary, before the dataset is published. And so, we did this previously with some NIH work with our pilot. We found it was really impactful to have a human curator look at the dataset and metadata and make suggestions so that it could be as discoverable and re-usable as possible.
And so that's another service we offer as part of the fixture plus deposit process, which just looks like this after you submit it.
Comes to our team for review, makes them suggestions, you can revise them directly in your Figshare or plus accounts, review the revisions, and then publish the dataset, discoverable in the Plus Repository as well as Figshare.com, as well as all the search engines, Just as open.
Here's an example of an NIH funded dataset that's about five terabytes.
That was published.
And here's another one that is a collection of seven different datasets in Figshare plus as well.
So, with that, I'm going to stop, I'll point you to a couple of guiding materials on our help page.
We have a Guide to sharing NIH funded research, including Figshare in a DMP or a DMSP for NIH and a guide to sharing data on Figshare plus.
Giving you these tips about including metadata and so forth.
I would also make a plug for another webinar series that I am planning together with GREI. So together with these other generalist repositories and NIH, we're doing a collaborative webinar series. And you can find out more about all of these journalists, repositories, at the October one, as well as about including them in DMSP.
He is and best practices for data sharing on our November and December workshops, webinars.
And we'll have some other Figshare webinars coming up, as well, specifically, on Figshare plus, and also to using our API on November first. So, do look for registrations for those, if you would like more detailed guidance on any of those.
Thank you so much for joining us today, and if you have questions, please put them in the question box.
I'll be happy to stay back, hear a few minutes, and answer any of your questions.
Thanks so much, Thanks a lot. And, yeah, that your presentation on Figshare reminded me that the registration for that should be out in the next couple of days. And that's gonna be on October 13th and that webinar is an Introduction to Sharing Big Data on Figshare plus, so. Yeah, keep an eye out for that when we do have a couple of questions in the, in the chat already. So the first thing we have is, is there a quality control mechanism in case someone links to the incorrect funding source, accidentally or otherwise?
Great question. So in terms of linking to the funding, this is sort of where that curation comes in and so on figshare.com, We don't have those. And that would be something that each researcher would need to add themselves.
And there wouldn't be a track of it by someone else. I suppose if you found someone had linked to your grant, and it was not your work, you could get in touch with them. And I think there's, there's an error here, but that's a really good point. And it's something we should actually consider as part of the GREI projects and all the generalist repositories together with NIH.
as reporting of open data becomes increasingly important to see if someone is meeting this policy requirement and comply with their data management policy or their data management plan, you know, how will program officers check up on that?
And what if there is, if there is an error there, I do believe out of ORCID, you can report errors in linking some of the data. And so, at the moment, I would say contact our support team or contact the author directly. If you notice something is an error, I will say that with fixture plus where we have our team doing view of the metadata. I do try to look at the grants, you try to see if the grants match some of the authors, the PIs.
At least make sure that the awards that were added are real NIH grant codes, and the first standard, and I do find typos.
A, not insignificant number of the time.
And I think, you know, if you've been there, you've done, that you're in a research lab, everyone copies the same sentence from their last poster into their next paper. And if there's a typo in that sentence, then everyone is copying the same error in the grant code over and over. So we try to intervene as we can, and I would suggest just checking NIH reporter to make sure you have the right one.
Thank you. The other question we have is whether the collection function within figshare.com collates citation, and Altmetric data for all of the individualized items.
I believe that the collection DOI has its own metrics.
So in that sense, it is not an aggregation of each of the items within the collection. Rather, the collection has its own metrics.
And, so, in that sense, you would want to add together the metrics of the individual items yourself, if that's what you wanted to report.
But, it can be advantageous to have that collection DOI itself, so that it's clear what you want people to site, and that something you could include in our readme file or documentation, would be a preferred citation, and asking people to cite the collection DOI, when a re-use, any of those materials.
Right, thank you. There's no other questions in the chat currently at the minute. The contact details are there on the slide on the screen. So if anyone thinks of anything, and after this session, then you can always reach out that way. And, as I said at the start, we'll be sharing the recording with everyone in the next couple of days. And, as we mentioned, we've got some other exciting webinars coming up as well. But I think that's about it for today. So, thanks for your presentation. And thank you, everyone, for joining.
Thank you, everyone. Have a good day.