register for our webinar

register to access our webinar

Using the State of Open Data survey to put the NIH Policy on Data Management and Sharing into practice

Using the State of Open Data survey to put the NIH Policy on Data Management and Sharing into practice

Using the State of Open Data survey to put the NIH Policy on Data Management and Sharing into practice

play the webinar

Play the webinar

play the webinar

Play the webinar

Register for the webinar

(registration may be required)

Using the State of Open Data survey to put the NIH Policy on Data Management and Sharing into practice

May 10, 2022

Figshare

Ana Van Gulick

Join us for a webinar on how the State of Open Data survey — the annual survey on researchers’ attitudes toward open data and data sharing — can help your institution put the NIH Policy on Data Management and Sharing into practice. In this webinar, Figshare’s Government and Funder Lead, Ana Van Gulick, will go through a few of the key survey results and how you can use these to educate researchers about data sharing best practices and implement data sharing support in compliance with the NIH Policy on Data Management and Sharing. These survey results include:

- Researcher familiarity with and creation of data management plans

- How researchers share their data with the public

- At what point in the research cycle researchers make their data available, if at all

- and more!

Transcript

Please note that the transcript was generated with software and may not be entirely correct.

0:04

Everyone, thanks for joining us today for this webinar. My name is Megan Hardeman, and the Product Marketing Manager at Figshare, and before I hand over to Ana to discuss, using the state of open data survey, to put the NIH policy on data management and sharing into practice, had a few pieces of administration to go through. And the first is that this webinar is being recorded, and we will send around to the recording, and any relevant links on a discusses in an e-mail following the end of the webinar.

0:41

And if you have any questions, there is a question section in the control panel for goto Webinar, and there's a Chat section, and you can put your question in either place. And there will be some time at the end where we'll answer them for you.

0:59

Think that's everything. So with that, I'll hand over to you Ana.

1:03

Great, thanks Megan.

1:04

Hi everyone. Good morning. Good afternoon and evening. Thanks for joining us. I'm Ana Van Gulick, I'm the Government & Funder Lead here at Figshare and head of data curation, joining you this morning from sunny day in Seattle. one note of apology that my next door neighbor has decided to do some construction on the siting facing my office window this morning.

1:29

So if you hear a little construction noise, sorry about that, hoping it's almost over. So today, we're gonna be talking about our state of open data survey, and some key results from that. And how those come in relation with the new NIH policy and data management, and sharing. And trying to kind of pull together current practices and trends that we're seeing, how those might be affected by that upcoming policy, and how you can support your researchers in their data management and sharing needs for NIH or otherwise.

2:06

So please do put your questions in the chat. I will certainly try to save time to get to those at the end.

2:16

All right.

2:17

So I'll begin with the state of open data results. This is a survey that we've been conducting for six years now. And we've had over 21,000 respondents during that time from 192 different countries.

2:31

And so we've been asking both, was been asked the same questions, as well as some very variable questions over the years. But we've asked some of the same questions. Every year, the survey, which has given us a nice, sustained look at the state of data over time and the trends that we can see, the shifts towards more open science.

2:54

But also the changes in what concerns people have about how to share their data in a way that you know, is rewarded and that they're comfortable with.

3:05

So I do encourage you to go find the full State of Open Data Report.

3:10

There's a lot of great essays in their guest essays that we've compiled from different experts in the field, and you can dig into the reports, as well as all of the results at the DOI link here or at our State of Open Data website on our knowledge portal.

3:28

So, please do go check that out to learn a little bit more, and I'll just be giving the brief, ah, review today.

3:36

So in our 2021 survey, which is the results we're talking about, say, 2022, still our most recent survey, we have about 5000 responses total, that we analyzed and the results. These came from all over the world, with the largest representation from Europe and Asia, as well as from North America.

4:00

The fields of interests that are respondents' work in were largely in the sciences, So the biomedical and biological sciences, applied sciences, physical sciences, and Earth sciences, but also the social sciences as well, and a number of other fields. And when we look at career stage inferred from the date of their first peer reviewed publication, we found that more than half of people were could be considered late career researchers. But we also had about 30% of respondents that were very early career researchers.

4:35

So, it's interesting to look at the results broken down by that absolute as well to see sort of the generational shift inside.

4:45

one big trends nose was, of course, the context of this survey was done in, which was during the mid 19 pandemic, which has, of course, huge implications for the research community.

4:58

We got huge benefits from scientific research being done at a rapid pace, through genomes being shared in vaccine development, and it also impacted how everyday researchers were working at their institutions. Maybe they weren't able to go in and collect data in the lab during the pandemic when universities were closed. And so, how do they shift their work? And I think one thing that people did was shift to re-using datasets their own or others. How could you use that open data as a starting point to conduct novel analyzes and new research?

5:35

So, in our 2021 survey, about a third of respondents indicated that they have re-used their own or someone else's openly accessible data more during the 19 pandemic than they previously did. So, we may be starting to see sort of that open data to world science, to know where.

5:55

It's not just individual labs, collecting datasets, but actually, large datasets are being aggregated. Datasets are being re-used for Discovery.

6:07

So, I'll focus today on three key takeaways from the results of the survey. The first one of these is actually a little bit surprising, perhaps, And that is, that there's more concern about data sharing than ever.

6:20

So, one question we asked in the survey is, what problems concerns, if any, do you have with sharing datasets? And we saw that respondents gave us the answers of concerns about misuse of data as a top concern. But also, they were concerned about not receiving credit, about issues with copyright or data licensing, sensitive data, permission to share costs, things like that.

6:46

It's perhaps curious that since open data has been growing steadily over the past decade, that people might be more concerned about sharing data. But again, you might actually think this could tie into the pandemic preference became much more common in the last couple of years.

7:05

Even then before, it's been a very steep growth during Covid 19, especially in the eye of the public, reporters reporting on the results of preprints, and in newspapers, and the public learning about this work that's available, but not peer reviewed yet.

7:22

And it may be that kind of quick dissemination of research results. That gives people pause, that their data could be taken out of context, or could be misused.

7:35

And just a general hesitancy, or fear about that. That's a that's a speculation on my point on my part. But it does perhaps stem from the culture, what we saw in research in 20 21.

7:49

But we can also look at these concerns longitudinally. So here on the right, you'll see these concerns plotted from the survey results 2018, 2019, 24,021.

8:00

And so you can see that the misuse of data has actually grown steadily over time growing from 36% to 47, 43% concern.

8:09

As well as issues about not receiving appropriate credit or acknowledgment, licensing, seems to have always been a concern. Copyright licensing is always a bit of a challenge for people that don't work in those fields every day.

8:24

Always a little hard to interpret what to do.

8:29

Cost of data sharing is something that went up quite a lot from 2018, and we'll talk about big datasets as well.

8:38

So, some of these are growing. Some are holding steady.

8:43

Second key takeaway that we saw on the 2021 survey results is that there is more familiarity than ever before and compliance with the fair data principles.

8:55

So in these results, 66% of respondents had heard of the fair data principles that open data should be findable, accessible, interoperable, and re-usable.

9:07

And 54% thought that their data was much more, much, very much or somewhat compliant with the fair principles.

9:18

So this is a great story for those of us that have been working in the field for awhile, trying to use the fair principles to do outreach and training to researchers, to emphasize the importance of data, not just being open but being fair, being well documented, discoverable, re-usable. So the fact that these principles, I think, you know, a few years ago, this number was about 30% had heard of it. So the fact that this has grown over time is a success story for open data and data support.

9:51

Then the third key takeaway was, who do researchers turn to for support? So, when we asked researchers, who would you turn to to help make your data openly available, where, where would you turn to for support?

10:06

We had about an equal split between repository's, publishers and institutional libraries.

10:13

Um, so this is probably great news. For those of us who are doing outreach.

10:19

We don't need to compete over who can help researchers more, but shows that all of us are stakeholders in this community, in this community, and have an important role to play in supporting data Sharing, whether you're a publisher or a repository at an academic institution or an academic library.

10:39

I think that's particularly good news for libraries who have been working really hard over the last decade to build up data management and data sharing expertise, as well as resources.

10:50

So to see that researchers are turning there, is good news, and hopefully they'll can, that will continue to grow.

10:59

So here's a few key takeaways for a couple of these segments.

11:03

So for institutions, 30% said they would rely upon their institutional library for help, almost half said that, they share their research data in an institutional repository. So that may be a bias in our survey sample, because as someone who worked at an academic library, previously, half of respondents as high. But that's really great to see that. People are recognizing institutional repositories as a valuable resource.

11:34

And 58% of respondents would like greater guidance from their institution in how to comply with data sharing policies. So that's key for the new NIH Data Management and sharing policy is that researchers are looking for guidance or looking for support.

11:50

They want institutional support in terms of writing, data management, sharing plans, as well as in actually managing and sharing that data in the end. So having that guidance on hands, I know, will really help them out.

12:08

A few key takeaways from publishers. I don't know if we have publishers' joining us today.

12:12

But there's certainly a key part of this broader research ecosystem. And that's reinforced by the fact that 47% of survey respondents said they would be motivated to share their data. If there was a journal or publisher mandate to do so. Those mandates certainly hold great weight in the research community.

12:34

This peer reviewed journal publications still holding being the gold standard for academic research and for hiring and promotion standards. So, those, those mandates really push progress in the open data field.

12:52

Um, also interesting to see that about more than half of respondents said they had obtained research data collected by other research groups, from within a published research article, which can sometimes be challenging to do to get that open data and actually re-use it. So, interesting to see that people are trying to do that, and maybe as they try to re-use other people's data, how they share their own data, will improve to some more reasonable standards.

13:22

And then similarly, and another half of respondents felt that it was extremely important that data are available from a publicly available repository.

13:32

So, this reinforces the points that publishers and all of us in the community should support researchers in sharing data in a trusted repository that has that meets repository's standards community standards for identifiers, discoverability and metadata, things like that. Researchers do see a difference there between data shared as supplemental files or tables. Or shared through. Links are available upon request, right? That are not truly very accessible or re-usable. In fact, the dataset may not even be there when they inquire about it. So having that dataset in a repository where you know it is available, does seem important to them.

14:18

So for publishers, I mean, I think we've seen over the last 10 years, 15 and 20 years, even.

14:27

A huge increase in the publisher data sharing requirements, and also now, I think, not just that those requirements exist but that they're actually being checked more as well. That a data availability statement doesn't need to simply exist. But there needs to be a DOI pointing to a data repository, and that DII needs to be live.

14:49

And when a copy editor goes there, there needs to be dataset at the end of that DOI. So that's really great news to help move the needle on data sharing adoption.

15:02

OK, and the last key takeaways are four, funders and government agencies that are supporting the research work.

15:09

So, here, interesting, about half of our respondents said, Funders should make the sharing of research data, part of the requirements for rewarding grants.

15:18

And, almost as many said, that funding should actually be withheld or some other penalties should be incurred if researchers don't share this data.

15:26

So, they want this mandate to really be in effect. I haven't, I haven't carry carry water.

15:33

Not just in, in paper, but to actually be checked on and have people comply with that.

15:41

Even in the survey, even more people felt that even a national mandate for making research data openly available would be a good idea.

15:50

So, really strong support for funders encouraging the sharing of research results, and research data from their funded work. This is, you know, in many cases, publicly funded work.

16:04

And I think there may be a strong feeling that the public needs have access to it, and that the data is really valuable, and needs to be re-use more than once, to get that full value out of it.

16:17

And then, of course, lastly, if we come back to who's, who can we get guidance from?

16:22

Researchers say they will turn to their funders for guidance on how to comply with their policies. So I know this is something NIH is now working on with their policy, in terms of rolling out additional FAQs and guides and examples, and that will be really important for researchers, and for all of us supporting NIH funded researchers, because researchers will turn to you for that help.

16:47

Um, so, you know, again, just to say that data sharing, funding, data sharing policies have really grown in the last decade, since the twenties 13, OSTP memo about data sharing for research and development from federal agencies. We've seen more and more federal agencies, as well as private research funders, like the Gates Foundation, HAI Wellcome Trust, mandating that people have data sharing plans in place and that they report on where that data is shared.

17:24

And so this isn't growing interests for all of us, and I can look at this timeline here, Going back to the previous NIH data sharing policy, for very large, Awards' goes back to 2003.

17:39

Then we have the NSF Data Management Plan requirements coming in 2011 requiring data management Plans for all on us, ISF proposals, and I do think that we'll probably see that continuing to expand and grow that requirements in coming years.

17:57

The OSTP memo expanding Public Access to research results.

18:02

And then we saw some changes from NIH in, in the past 10 years that were specific to genomic data sharing, and clinical trial information. And now, beginning in January, we'll have the new data management and sharing policy.

18:18

So here are the notices and the awards.

18:21

I'm sure many of you have reviewed these in depth already, maybe even were involved in providing comments back in 20 20, 120 19 on the policy. But we are now reaching the end of that long awaited rollouts. And this policy will go into effect in January of 2023, just about eight months from now. So soon, it will be top of mind for researchers submitting new NIH proposals as of January 25th.

18:53

And so, these are the notices.

18:57

And I wanted to point out, for those of you who may not have caught the news, a new NIH website that they launched a few weeks a month ago, I'm sharing that NIH.gov.

19:11

And this is a really great resource they've put together for all of their scientific data sharing policies. So, you can find their new NIH Data Management Sharing plan, as well as those for more specific types of research here, as well as guidance. And I believe this, this website will be added to overtime.

19:30

So, to briefly run through sort of the highlights of this policy.

19:36

I don't speak for NIH in this context, so please do make sure that you contact Program Officers, or early on at NIH, for really, how this will be impacted. But, from my perspective and reading of this, here's some highlights of the policy.

19:53

So, the policy will require that all NIH funded research that generate scientific data will require a data management and sharing plan to be submitted, and to be evaluated on an ongoing basis to be determined by the institutes, how that will work.

20:12

And so this will apply to extramural grants, those going to academic institutions, as well as to contracts, and to intramural research. So it's really quite broad. For any, any NIH funded research generating research data.

20:27

And this data is described as the recorded actual material commonly accepted in the scientific community as a sufficient quality to validate and replicate research findings.

20:38

It doesn't necessarily mean every output generated during the research process.

20:44

It's not electronic lab notebooks or, or notes. But it is the findings that you would need to re replicate results, which is certainly very important for, for open science, and for replication. And data must be shared regardless of if it supports a publication or not.

21:03

So importantly, it should also cobbler cover null results.

21:07

Replications things that may not be published, that data can be very valuable to the scientific community. And should really be thought of as a valuable output on its own independent of the peer reviewed publication.

21:21

That certainly, something that's been top of mind on academic Twitter this week, is the value of open data and open science practices on its own.

21:31

Certainly, from my perspective, we should value that as a scientific contribution.

21:37

So, this data should be shared as soon as possible.

21:39

But at least by the time of the publication, or by the end of the award period, whichever comes first.

21:46

So, something to bear in mind, when researchers are writing these data management plans, is the timeline of data sharing.

21:53

And, while not every, not all data, must be shared, that's an important thing. The mandate doesn't say every single piece of data.

22:02

But the data management, data management, and sharing plan should encourage broad data sharing, data sharing, supporting these replications, and null results.

22:12

And I'm trying to maximize the value of the open data that's generated, And to maximize that, value, researchers are encouraged to make the data more or fair, to adhere to those fair principles, to share the data in established and trusted repositories that follow community standards. For metadata, persistent identifiers, have appropriate preservation plans in place to make the work discoverable.

22:41

They're encouraged to use discipline and methods specific or repositories first, that exist for the type of data, for the methodology, for them, for the research field, for the file type.

22:52

These obviously maximize discoverability and re-use, they can have very specific metadata.

22:59

So that's important, But the plan, the policy, also suggests trusted, generalist and institutional repositories, as in any types of data will not have a discipline specific repository that is appropriate.

23:16

So, again, turning back to the sharing dot NIH.gov site, here's some additional guidance they provide about planning and budgeting for data management and sharing. These data management and sharing costs are allowable direct costs in, in the award.

23:33

So researchers should be encouraged to plan ahead for any costs associated with these processes, whether that's curation of the data, you know, staff time to, to build public databases.

23:46

Certainly, data management and sharing is not without a labor component, right?

23:51

one needs to dedicate time to managing this process, doing that documentation, and maybe even bringing in experts from across the institution to do that. And then data sharing as well may similarly have costs, or long term data storage, especially for very large datasets, or for curation or review, in specific repositories.

24:14

So those are allowable costs.

24:16

Program Officers at specific institutes will be able to speak to, you know, how they'll be reviewing those, the data management, plan, data sharing plan, Sorry, it's a mouthful.

24:28

DMS E will be, I understand, available to peer reviewers, to review, especially the budgets, but will not be scored by them. My understanding is that review and scoring will be done by the program officer.

24:45

So, that will be the best bet for researchers, looking to have very specific questions answered for their field, because there may be institute specific differences and discipline specific differences in what they're looking to see in these plans.

25:01

Whether there's community standards, and that should be adhered to an important note for supporting your researchers there, if you go to sharing scientific data, they do have a page on selecting a data repository.

25:15

And this is where I want to talk about the data repository ecosystem, and specifically, those domain specific repositories, which NIH has put together, a nice list here. They also point out to the ...

25:29

data database to help researchers find appropriate repositories, as well as here, listing those, that are supported by NIH, which is many, including those for, say, genomic data sharing, which should still be prioritized as the discipline specific repository.

25:48

But then, also pointing to generalist Repositories. They can't point to every institutional repository, but hopefully researchers are aware of them, but also saying we realize there are gaps in the discipline specific repository space. And there's many other valuable research outputs to be shared. And generalists repositories are trusted repository sources for those. So those include dryad Figshare, open science frameworks and no doubt.

26:18

And researchers can use those.

26:21

And so thinking about the generalist repositories in this space, here's a graph of citations, of data appearing, citations of data in generalists repositories.

26:33

So this is citations, um, pulled from the dimension database, looking for the dois of those specific generalists repositories over the past 10 years. And you can see a really, you know, continuous growth of these, ... is kind of running away with it in the last three years, interestingly.

26:55

In terms of the number of citations, maybe they have more software, which tends to be cited more than other research outputs. But overall for all of these repositories, Figshare coming, second, they're in the GREI recently.

27:12

You can, you can see that researchers are turning to these flexible repositories and that research in these repositories is being cited in the scholarly literature where that's a primary citation or a citation of re-use is an interesting question for us to continue looking at down the road, but they're a valuable part of that landscape.

27:35

Another thing NIH has outlined in there, DNS key policy, NSP Um, Yeah. is some desirable repository characteristics.

27:48

These are, I, my understanding, largely taken from the White House OSTP repository characteristics. But they also include some considerations specifically for human subjects data that could be important in certain situations.

28:04

But they focused on open access, persistent identifiers, metadata, having re-use, that can be measured, security and provenance recorded, ability to restrict access when necessary. So, we do have a guide at our Help page about how Figshare meets these desirable characteristics, and it won't be for every type of data.

28:28

Say, for some of the very, um, restricted data set, data types, there may be a better option. But largely, we're in compliance with almost all of these characteristics. So researchers and those supporting them can find that on our help page.

28:43

And we've been fortunate to work with NIH for a few years now on data sharing initiatives and for generalists repositories in this space of supporting NIH funded data sharing. So we conducted a pilot repository with 2019 to 20 20. The repository is still available at NIH.figshare.com. All of the data still so publicly available.

29:07

You can go find it there today.

29:10

And this was to look at the need for a generalist repository among the NIH funded researchers. And what we saw was there really wasn't needed for this.

29:21

Researchers had a lot of different datasets to share.

29:25

We also found an impact of having someone review the metadata, to try to make the metadata of these datasets, high quality, complete, to make sure there was a, a meaningful title that funding sources were really linked to, that associated publications were linked out to.

29:44

That people knew where to find more resources and contexts about the work was given.

29:49

So, something was not simply title dataset dot excel.

29:53

Data set one, or mouse data, as, as people will support commonly, do, if, if they're not aware of these best practices. So, having human support, and human review of the datasets, we saw a big impact with this project.

30:12

And now we're pleased to be working with NIH, together with five other generalists repository's once again.

30:20

So this is a project that was announced at the end of January and kicked off this year.

30:25

So it's bringing, it's called the GREI, the Generalist Repository Ecosystem Initiative being run by the NIH Office of Data Science Strategy.

30:34

And it's bringing together Figshare together with Dryad, Dataverse, Mendeley data, Open Science Framework and ...

30:42

to work on growing the generalist repository landscape to support NIH-funded research data.

30:50

How can we have standards that are common and interoperable between these repositories for discoverability, for indexing, for metrics of re-use? How can we also have differences among the repositories that best meet the needs of different NIH funded use cases?

31:11

And how can we support together? And I'm trying to data sharing and reporting, on data sharing, importantly, as well, very importantly, for NIH, for half.

31:23

So this is where it's really a community effort to get to data, open data, to, know, we've seen that growth in data the last 10 years.

31:31

I think the next 10 years looks like a big growth in data that is not just open, but that is FAIR.

31:39

So that's where the academic research institutions are really key part of this ecosystem, as well as research communities, and then the funders themselves, the infrastructure, and the publishers as well.

31:56

So we can collaborate on these infrastructure, policies, outreach and training, supporting researchers.

32:05

So that brings us to a little bit about Figshare that I'll share today about how big share might support your NIH funded researchers. We just celebrated 10 years of Figshare. We're celebrating actually all year that's birthday. Feels like a big milestone, so that whole trajectory of growth in general, the repository's. We've been, we've been there for it. On Figshare dot com now, there is now more than four million research outputs.

32:34

Half a million users, hundreds of terabytes of data stored, and more than 100,000 citations in the scholarly literature to that work.

32:44

And we're also really proud to provide repository infrastructure to more than 80 institutions, to run their own research repository's.

32:54

So when we think about this repository landscape, we have the discipline specific repositories that was funded by NIH, genbank DB gap. Very specific, really great for re-use of specific types of data. Certainly in genetics, it's had a huge impact. We've seen that.

33:12

Encoded 19, then we have the gentlest repositories like those funded by the GREI Initiative and that includes Figshare dot com.

33:20

And then we have the institutional repositories which you may have at your institution or you may be thinking about, expanding at your institution to make sure that they are data capable and really supportive of data sharing, use cases or for this type of work.

33:40

And so about Figshare dot com, that's a freely available repository.

33:48

And we will always continue offering that freely to researchers. They can share datasets up to 20 gigabytes, and files up to 20 gigabytes here. It offers flexibility, meets researcher workflows, adheres to those persistent metadata standards.

34:06

And is it a great way for researchers to share their work in a way that's fully open access? So everything on picture dot com is fully accessible to humans, and machines can be downloaded via our API, as well.

34:24

And researchers can see the impact of their work. Then they can see openly tracked metrics.

34:29

They can create a researcher profile page and you know, get started with data sharing through our, through our free generalist repository option.

34:40

We've also recently, last Fall launched a new Figshare repository called Figshare plots.

34:48

This repository is at plus dot Figshare dot com and this is designed to support larger datasets. So those over the 20 gigabyte limit. And this is simply because we had a lot of requests from researchers coming into our support team.

35:03

I'm saying I really want to use picture but have a really big dataset. Sometimes that's 80 gigabytes, sometimes it's 250 gigabytes, sometimes it's nine terabytes. So, we, you know, data data is growing. And volume computing power is growing. The need for large-scale data, for machine learning and data science in biomedical fields, and really, across all fields of research is growing. So, I think these datasets, these large datasets are going to become more common, and we want to support researchers in all of these use cases. We don't want to turn them away. But it was simply an issue of sustainability for us to offer a free service.

35:43

You know, we need to be able to cover our costs.

35:45

And once you start hosting large datasets, redundant lead in the cloud for many years, there is a true cost associated with that cloud storage. So, we designed Figshare plus so that we could do this in a sustainable way. And that was through its through a one-time data publishing charge.

36:05

And so researchers can see those charges transparently listed on our Figshare Plus website.

36:12

Actually on our Knowledge portal, knowledge.figshare.com/plus (knowledge dot Figshare dot com slash plus).

36:17

And build those into their data management and sharing plan. So you can see, see those costs and plan for them ahead of time.

36:24

Write them into grants, whether it's 1 1 terabyte or 10 terabytes, Either is fine. The limit on files, here is going to be five terabytes per individual file, which is just an AWS limits.

36:38

And the other thing we're doing with Figshare plus is offering a little bit more metadata and a little bit more customer support.

36:45

So taking the lessons we learned from the NIH Figshare pilot that one-on-one support during data deposit really helps make the data more fair and that reviewing the metadata enhances the discoverability of the work quite a lot.

37:00

So we are also assisting researchers do it during that data deposit phase.

37:06

And hopefully that will, that will make a big impact as well.

37:10

It's something we're actually interested in doing it with Figshare dot com as well. Which is simply a scaling issue when you have millions of research outputs. So that's something we're working on with the GREI project is scaling metadata quality.

37:25

How can we do that without human curators? Are there ways to to nudge best practices or to automate data curation?

37:35

And then the other way that we support, data sharing is through Figshare for Institutions.

37:42

So Figshare for institutions is a customizable, standards compliant research repository.

37:49

Out of the box, ready to go, but very customizable to make your own to support your organization's research data management needs and to provide open access to any type of research outputs that your researchers have to share.

38:07

So here's a few examples.

38:09

From Virginia Tech, University, Arizona and from Carnegie Mellon University where I was before joining Figshare and so these are there are Figshare powered institutional repositories that are data capable and you can see that they're customized with their own URL.

38:29

For example, KiltHub, CMU dot edu, their own landing page, search ability two, really showcase the research outputs of their institution, and even to showcase the outputs of specific groups, or labs, or departments within the institution as well, can be customized.

38:52

So, there's, out of the box, repository infrastructure, meets all of those community best practices, but then lets you customize it for your needs.

39:04

So, in terms of research data management, these repositories allow you to control access to the research outputs, and they have a private storage and collaborations side.

39:17

So, researchers at your institution can login using your single sign on for user accounts, and then upload files, collaborate, and in what we call projects, to share them before they are published, and can then publish them publicly through your review. So, you can showcase these public research outputs, with a customized repository page. Here's a couple of other examples: one, this is from ...

39:47

Jamelia, in the top left from Howard Hughes Medical Research Institution in Virginia. As well as that NIH portal, which is all built on the same infrastructure, or a Figshare plus portal, is also built on that. So we're using it ourselves, actually, now.

40:04

It's interesting to be a repository manager on the inside with our own, with our own infrastructure, but these provide open access so data and code can be published openly, so that it's discoverable, sizeable, and most importantly, so that it meets those new mandate.

40:22

So, having a resource like a Figshare for institutions repository will allow researchers to recreate that in to NIH data management and sharing plans or NSF data management plans or any other funder requiring public access to research data. They can know about the resources ahead of time work, with your expert staff and putting that into their plans.

40:47

And then share the data there, along, along the way, during the project.

40:53

Reports on the specific DOI's of each research output so that program officers can find those outputs, and so that they can be included in publications. The DOI’s can be reserved in advance. So, researchers will never miss the opportunity to include the Dataset DOI in the publication. I know. That's a common chicken and egg problem. You're sharing your publishing the paper, you're sharing the data. How do I get the two DOI's into each? You can reserve the dataset DOI, put it in the paper, published the paper, and then update the dataset metadata with the Publication DOI index.

41:31

Oh, ..., our outreach as a community, is to make that just the everyday work of research.

41:39

So here's a few other features of the picture for institutions, um, functionality, infrastructure.

41:49

And so here's a couple of examples, on the left, you can see, it can just, you can share many files or a single file per item. So, an item, being that landing page that has a description, that has a unique, digital object identifier, B, if you'd like, we can use our data site DOI's dataset, ...

42:12

dot com, and we can, we're, I think, we're a data site node. So, we can get you a data site DOI's that are unique to your institution. And, those will be minted for each item. Can be reserved and advance there, also version controlled.

42:29

Any file type can be shared. So this offers the same flexibility as Figshare dot com, of course. You as the repository managers can put any restrictions in place that you would like to could encourage, you know, preservation friendly file formats and things like that.

42:44

Which is often something we do when working with researchers, but we recognize that sometimes that flexibility is needed, so any file type can be uploaded up to five terabytes within Browser. Preview.

42:56

So this could be data, code, images, video workflows, papers, theses and dissertations. Anything that you want to host in your repository and your researchers need to share.

43:08

We offer an open Figshare API for upload or download a files, and as well as an FTP server. So, there's a lot of different ways to get large datasets into the repository.

43:22

Custom metadata, so, importantly, we have our Community Standards metadata that goes out to data site, and Google Scholar, and things like that.

43:30

But you with a big institutions portal, you can add additional custom metadata. And you can do this metadata item type, or by the group.

43:38

So, if you have a specific research community in the arts that wants to have their own metadata fields, you can customize it just for that department.

43:48

If you want to add something for clinical trials, for a medical research group, you can add that.

43:55

So really, a lot of ways to customize the metadata with this.

44:01

And then lastly, we have a few features that help with the fairness of the data. So this is our curation module, which allows for review. So you'll see at the top here, datasets come in. They're submitted, and then they aren't published right away if you turn on this review feature, which is optional.

44:21

But many, many organizations find this really helpful for enhancing the fairness of data.

44:27

So then you can have experts at your institution or even experts here at Figshare with our Figshare curation services team, review the datasets, and get in touch with the researchers to make any revisions.

44:40

Whether that's adding a bit more context. But it's making sure all the authors and funding IDs are listed and related resources. Make sure that metadata is complete and high quality as possible.

44:51

Then once it's checked and fair, publishing it in the repository.

44:56

Figshare for institutions also provides more restricted access functionality, so unfixed dot com researchers, Consent Embargoes.

45:04

But at a Figshare for organizations site, you could restrict access to just logged in users, or just to certain groups within the institution say, within their department, or within college.

45:17

Or you could also using our, one of our newer features, which is request access, so for datasets that really can't be shared publicly.

45:25

But you want to have a landing page and a DOI for them, those who are interested in getting the data can request access. And that will send a request to the author who shared the data, as well as to a repository administrator.

45:40

So, if it's it datasets that has restricted access, maybe a data use agreement is required for it, or IRB approval to access it with human subjects.

45:50

This offers that option, which could be quite valuable for the NIH mandate, actually, for sharing some of those more restrictive datasets.

46:00

So you can easily manage this repository.

46:03

You can manage the user groups with a single sign on integration with your HR feed.

46:09

Manage storage for those groups, default storage allocations, administer storage requests for larger datasets.

46:17

The storage is yours to allocate how you like up to up to many terabytes per researcher.

46:25

Then importantly, you can track the impact of this work.

46:28

So tracking impact of work is valuable for the individual researcher, for the funder, and of course, for the institution as well.

46:36

So each item on Figshare as publicly available views, downloads, and citation counts. And these citation counts are pulled from the full text of the scholarly literature, which is really important that we're not just searching references fields here, it's going to capture data, availability statements, and in text citations of datasets, which is often where researchers are citing them.

47:01

So they can see the impact of that. Also altmetrics scores to capture, other types of attention from Twitter, from bloggers, from the news media, and such, which may come before citation, formally in the scholarly literature. So you can see that quicker impact for some types of work. And so the impact can be tracked in Figshare for institutions At the item level, the researcher level.

47:27

The group, say a department or at the whole organization institution level, so you can generate reports and, you know, see, see the usage of data sharing and the impact that that open data is having.

47:47

So that's it for me today. Thank you so much for joining us.

47:51

I hope this gave you a good glimpse into what the state of open data is, where funder policies, including at NIH, are going and how Figshare might be a good way to support your researchers. And I will stop there, you can please go to these URLs to lead to learn more. You're also welcome to get in touch with me directly, or with Megan, So I'll stop there, and Megan, do we have any questions yet?

48:20

Thanks, Ana. Yes, we do have a few questions. Just to note to say, I'll put all of those URLs in the follow-up e-mail as well, so if you missed them.

48:33

So the first one is, so, in general, the State of open data results show that a substantial proportion of researchers care about sharing and using available research data, which is awesome. How much of a concern is there for sample bias, ie. that the respondents to the survey aren't representative of researchers writ large?

48:52

And I can actually all of a sudden, if you've got any follow ups, cyber storm involved in the state of open data survey and report last year.

49:03

And it is possible that there is sort of survey self selection bias and we did find that 72% of the respondents were open science advocates, so that might might lead toward thinking that way.

49:21

We do try and and spreads news of the survey as far and wide as possible.

49:27

So we worked with Springer Nature who actually organize and analyze the survey results for us. So it's, it's a larger group of people who are trying to promote it. But is there is the possibility of sampling bias. I don't know if you've got anything else you wanted to add on.

49:49

Yeah, I would just say that certainly possible. If not likely, that a survey titled Open data would attract those who are already inclined positively towards the practice.

50:02

I've conducted some other surveys in my own work research, data management and sharing practices among MRI researchers and psychology researchers. And I think you certainly see that bias when you look at the results.

50:20

We asked them to rate their, the maturity of their data management practices, and they would say that their practices were much better than the community as a whole.

50:30

So we asked for their perceptions of their research community, and then where they thought they stood in relation to that spectrum. And our respondents almost always thought they were further ahead of their community as a whole.

50:42

So they're reporting that as well, that, you know, so that you clearly see some other researchers in their community not being as far into data management and sharing practices as they are set up.

50:57

We we try our best.

50:59

And I think hopefully it's a shift that will kind of sweep everyone up as the mandates come into play.

51:09

Great, thanks, Ana. And in terms of the NIH policy, there's a question. So, data sharing is not mandated or incentivized by preference and awards still only strongly encouraged.

51:28

Good question.

51:29

So, I'm not sure, entirely what we're talking about here. It could be, It could be a couple of different things, so.

51:41

I think it's going to vary a lot by institute and program officer.

51:47

Um, the If you're thinking about, Does previous data sharing matter for getting awards say, Data sharing that you would report in Biosketch, I think that's going to be quite variable about how that is seen. My hope is that it's starting to be seen more and more favorably. Again, you should probably ask this question to someone at NIH. Not, not me, but I'll give you just my thoughts on it.

52:17

And so, that's growing. But it's similar to thinking about open data as a valuable contribution in, in promotion and tenure, too, right. There's that shift is, is taking a bit of time.

52:30

My sense is that data sharing will be pretty heavily required for new awards after this policy goes into effect.

52:41

Not every dataset will need to be shared. Not every piece of data that is collected. But I do think there will, no one could not get away with having an award and not sharing any data at the end of it. I do think that a program officer would say, no, you must have this plan, and this plan must include some data sharing in the FAQ that NIH released.

53:07

They have a list of reasons that are not good enough not to share your data.

53:13

What some of one of them was, You don't think your data will be useful to anyone else. They were like, No, that's not a reason. Please, share your data. So, I, I think NIH will be enforcing.

53:27

This requirements, more. What exactly that enforcement looks like.

53:32

I think may still be a work in progress, and worth you, or your researchers working, reaching out to specific program officers up to ask how they'll be doing that.

53:42

But researchers should probably be prepared that at least some level of data sharing, certainly not all, it's not a mandate of all data.

53:49

Some data sharing must start to be practiced.

53:54

Thank you There's a question about whether the slides will be available on site, the webinar recording, would you be happy for us to channels? We can, we can post the slides as well. Hmm, hmm, hmm.

54:09

I think there's a question here. Do you have any kind of numbers or testimony about the impact of human reviews on dataset metadata?

54:21

It's an interesting question and I think the data's still a little fuzzy at the moment. So there's not something public I can point you to.

54:29

I can say that one thing we tried to do during the NIH Figshare pilots was to attempt some sort of apples to apples comparison, which was looking at NIH funded data published in the NIH Figshare repository that had undergone metadata review and enhancements Compared to datasets that we believed to be NIH funded on Figshare dot com and believed to be NIH funded meeting. that someone hadn't written into the funding fields. The words, NIH, or National Institutes of Health, for something like that.

55:07

So, but those datasets had not been checked.

55:11

So when we tried to compare those two groups of datasets, what did we see?

55:18

in terms of differences, the most notable?

55:22

Um, difference is simply that there was much more text in those, and I should share, the titles were longer, more characters, and the descriptions field had many more words and characters in it, as well.

55:39

Now, longer better, it's a, it's a little bit of a leap, to say that, but having almost no characters in your dataset description is surely not very helpful.

55:51

So, you can say, or the extreme, that correlation would hold true.

55:56

We then looked at the metrics of these datasets of downloads and views. And we repeated this analysis a couple of times. Although I think probably not since last fall now at this point, and we do see an increase in the views and the downloads of the NIH Figshare checked datasets.

56:18

Compare it to Figshare dot com.

56:19

So the hope is that citations continue to follow that, you know, there's a citation lag. Obviously, we won't see that in just eight months or a year. Citations will also often follow years later for an open datasets and there could be some compounds there that, you know, we also promoted the datasets in NIH Figshare more by writing case studies about them, or they were higher impact journal articles or something like that.

56:49

But I think overall, I would say there is a definite trend that datasets with better metadata have more views and downloads citations as well, and we're gonna keep looking at that data.

57:04

Um, as part of our NIH work, and soon, we'll be able to look at the datasets and Figshare plus, as well that are also checked, Of course, the compound being that they're very large datasets, so how many people download a many terabyte dataset is also throws a wrench into looking at those statistics book.

57:25

I think something we're all interested to explore.

57:30

Thank you. I'll just add to that as well.

57:31

In the 2021 State of Open Data Report, there's a contribution from, uh, the data curation team at 4TU in the Netherlands about the sort of processes they go through when they're enhancing their researchers, metadata, and in their repository, which might be of interest to you as well.

57:53

And the last question is Figshare, considering developing capability to host sensitive data as part of the GREI Initiative?

58:03

Yeah, that's a great question.

58:05

And I think it is on our radar, but not in our plans yet.

58:11

So, it would be a bit of an infrastructure shift for us. So, we would have to, certainly, play it out for that.

58:22

And the question is whether it makes sense for a Figshare to do that, versus partnering with other repositories like ... that are designed to host clinical trials data. But I think we are continuing the kinda the first year of that project. A lot of it is exploring the use cases that exist and the functionality that exists already in that genitals repository space. And then in the out years of that program, will be starting to fill in the gaps.

58:54

So I would say, stay tuned.

58:57

I know it's something also, our institutions that work with Figshare, maybe, kean, to have added. So, our roadmap is heavily informants by client feedback, both end user on Figshare dot com, and fisher for institutions, clients.

59:15

And so, what we hear from them as, as gaps and needs will also inform our development work in that area.

59:27

Eager, eager to work, thereabout Yeah.

59:32

All right, thank you for all of those great answers. And thank you to everyone who asked a question. If you do have a follow up question or something that pops into your head after the end of the webinar, please do get in touch either with myself. Just megan@figshare.com or on figshare.com And we'll be happy to answer that for you.

59:51

And thanks again for coming and have a great rest of your day. Thanks, everyone.

59:56

Thank you.

‍

View transcript