register for our webinar

register to access our webinar

State of Open Data 2021

State of Open Data 2021

State of Open Data 2021

play the webinar

Play the webinar

play the webinar

Play the webinar

Register for the webinar

(registration may be required)

State of Open Data 2021

January 18, 2022

Figshare

The State of Open Data is a series of surveys and reports analysing the current trends in open research data and is the longest running longitudinal survey and analysis on open data. They are a result of a collaboration between Figshare, Digital Science, Springer Nature and other leading industry and academic representatives.In this webinar, Figshare's CEO and Founder Mark Hahnel will present the key findings from the survey and highlight some of the contributing pieces in the 2021 report.

Transcript

Please note that the transcript was generated with software and may not be entirely correct.

0:04

Hi, everyone.

0:06

Thank you for joining us for the sixth State of Open Data Webinar report.

0:13

Yeah, very happy to have you today. I think this is the most attendees we've ever had for a state of open data webinar, which is really exciting.

0:22

Really glad you're here joining us. My name is Megan Hardeman. I'm the product marketing manager at Figshare, and I'm going to be doing a little bit of administration before I hand over to Mark and Greg. And the first is to say that this webinar's being recorded. And we'll send out the recording to all the registrants shortly after the end of the recording.

0:46

If you have any questions or some time at the end for Q and A. So should be a question box, function and webinars. Please feel free to type your questions there at any point throughout the webinar and will answer them at the end. And I'll ask them anonymously, and record your name or not. So, yeah, I'd like to introduce my panel is the founder and CEO of Figshare, and Gregory, who is the research analyst at Springer Nature. Are going to talk today about State of Open Data 2021, so I'll hand over to you, Mark.

1:20

Thank you Megan! Hello everybody. Hopefully you can hear me loud and clear. I'm sure Megan will tell me if not and hopefully you can see my screen so it has been a big year for data in general.

1:35

For those who don't know, the state of open data has been running for six years.

1:40

Figshare itself, where myself and Megan from has been going for 10 years. It was our 10th birthday on Monday.

1:49

And so it took us a few years to understand how people were using Figshare as a platform and how people started make data available on different types of platforms and what the roles were to move this space forward.

2:04

And back then, 10 years ago, it was all about, um, trying to encourage incentives and make it a low barrier to entry.

2:13

I think, as we get into this today, we'll see that the incentives are still a big thing.

2:21

And the low barrier to entry is not so much a requirement anymore, as people get more familiar with it.

2:28

But, you know, over 20,000 respondents from over 192 countries showing this is a global discussion going on in academia and research.

2:38

And this has been able to start to provide us trends over time and start looking at a sustained look at what's happening in this space. So with that, I'll hand over to Greg. So Greg Goodey, Research Analysts, Springer Nature, and he can talk us through some of the setup and thinking behind the survey.

3:02

Thanks Mark, everyone can hear me as well.

3:04

And I thought just to start with, I'll just give a little bit of an information around why Springer Nature is involved.

3:11

So, I mean, we have been participating with it with the survey in particular, for the 46 years, And I think, you know, as to make any movement successfully, I think It's understood that the whole community from, funders, institutions, governments to reach out to themselves, only to make, take concrete steps to achieve growth. And as a business, we are firmly committing ourselves to supporting the open science movement. And we're not perfect. There's still a long way for us to go to be, you know, slowly in support of that, and one of the steps we are taking.

3:46

He's working with our partners from Digital Science, to help build better solutions. But also, and most importantly, for this state of open data surveys, trying to understand perceptions and needs for the open data movement.

4:05

If you move on to the next slide, thanks, Mark.

4:08

And so, how we're involved the last, but myself, in particular. I've been running the survey for the last four years, but as a team, we've been doing it for the 46.

4:18

And that means we host it, we design it and host it, and as Mark said, we, we try to be able to track some of the information.

4:28

So there are questions that we tend to keep the same year on year, and then there are those that we have changed, as well, to try and investigate new topics as the move to open data matures.

4:40

And this year, the survey was translated into three languages other than English, Chinese, Japanese, and German. And that's more just because we tend to see underwhelming response rate from those regions unless they are translated and no other real reason.

4:56

Then the survey ran for two months over the summer of 2021.

5:01

So between May and August. And we generally distribute the survey. And we've tried to keep it consistent for at least the last 2 to 3 years for a number of channels, but predominantly we see most of our responses come through marketing e-mail lists.

5:14

But we do supplement this with social media. In particular in China we say good promotion through WeChat and blog posts. And this year, we've seen, takes a response rate of about 4.5 thousand tasteful usable responses. So we do get larger swamps and that these are the ones which are clean and we feel are reliable to analyze.

5:37

Thanks, Mark.

5:40

I thought just to provide a little bit of context for the results that Mark will discuss in more detail.

5:45

I'd give a bit of an overview of who responded to the survey the shift, so I think you can see from the left hand figure there.

5:53

Yeah, presentation that the majority of the response do come from the northern hemisphere. We got, the largest response size was from the US, 15%.

6:02

We got a good response rate from Europe and Asia and I actually think although significantly smaller, we do have sufficient response from the sort of southern hemisphere consensus on the southern hemisphere to be able to do some nice comparative analysis.

6:16

And additionally, we've, we also collected information around the field of interest, and we call relatively good spread of interests there, because as a business, we're slightly biased with biomedical biological sciences. There are slightly higher response, is there.

6:30

But yeah, we've got good response from Applied Sciences, humanities and Social sciences, Physical Sciences, et cetera. And then one measure that we try to also consider as well as the sort of career stage of the respondents'. This is not by any means, an accurate measure.

6:49

But we do try and infer career stage by publishing history. Say, a late career researcher is someone who has published their first peer reviewed article over 10 years ago, early career within the last five years, and mid-career between that.

7:02

You can see that the panelists somewhat biased to a more established career researcher.

7:08

Thanks, Mark.

7:10

Then this year, we actually tried to do something somewhat different that we'd never tried done before. And that was just to try and understand how receptive our panel were to open science in general.

7:21

Just try and understand, know, how representative our sample is.

7:26

Well, we'll ask what our sample looks like. We did this by asking three questions on a five point scale. Greats disagree.

7:34

Around the three statements list on their side is trying to understand that openness to open access articles and openness to sharing data openly and their openness to just sharing or research output openly.

7:48

And I think we see, we saw that, actually, most people were at, most majority of people were supportive of open access articles, and least may be in line with sharing everything openly.

8:00

And but generally, we then conducted latent class analysis and we have found that, you know, we could probably class 72% of the respondent pool as open science advocates. Say, I don't know how represents this, this is, it's hard without just do in larger and larger surveys. But that script, that's how our panel fail. And I think, yeah, that's kind of all, I've got so much. I'll hand back to you.

8:24

Thank you very much, Greg.

8:26

I think you said a great line there about, you know, we're not perfect, but we're working away and moving in the right direction and I think that's true of the open data space in general, right and the survey.

8:40

And everything that everybody is trying to do, you know, we are trying to move things forward.

8:48

We're very grateful to be partnered with Springer Nature, you know, um.

8:55

Playing to everybody's strengths and making sure, you know, that Springer Nature has such a huge appeal and brands and name in the space. It's a great way to get more of a conversation going around this.

9:09

And it's also OK, not perfect but trying our best is also how I just describe my whirlwind approach to explaining a 40 page document and full survey results in the next 20, 30 minutes. So, do dig a little deeper into the actual state of open data, itself. If you Google, State of Open data 2021, you will find all of this information.

9:37

Um, and, I mean, I want to start with the fact that some people who qualify themselves as anti open science, it seems strange to me that we live in a world today where there are people who are against open science. And I think that's something we have to be aware of, as well, thanks to our strengths.

9:55

But also, be aware of the game. There are perverse incentives in academia. There are different incentives in different parts of the world, but we've always looked at this as a way to try and better understand.

10:08

And then provide our own incentives to help assuage any concerns, or to help motivate people, to share data, because we think it's crucially important in moving academia forward, unresearched forward.

10:25

So, um, if I start here, the three big takeaways, I think we have to start with not ignoring the elephant in the room.

10:37

That is Covid has had a profound effect on how people do their research and share the data and share their papers, and how the general public interprets said findings and research and data. And so, concerns about misuse of data is the number one concern it's higher than ever before.

11:00

You can see on this slide, there's 43% is the highest we get to, but there's a few other ones that are also high up there. We can also look at this in a longitudinal way.

11:15

If you look over the last four years, what I was just highlighting there, The urgent need to better understand and treat the virus in 20 20. Brought Unprecedented, collective, and collaborative action, which is fantastic. You just hope that you wouldn't need a situation like this to kick things along.

11:33

And so, if we look at the concerns of, that are going up over this four year period, you have things like costs, which, I think, is gradually increase becoming more of a concern, as data gets bigger, as people are trying to understand how to publish their data.

11:53

There's obviously this idea that if you need to keep 20 terabytes for the rest of all time, then there's going to be a cost associated with that. And where does that come from?

12:04

I think if you're looking at the, also going up is incentives, know, people still wanting more incentive incentives, not receiving appropriate credit, or acknowledgment, that's gone up year on year. So that's, that's a really interesting one That that was something I was talking about 10 years ago, saying we need to provide better incentives. So I think the question now needs to be where those incentives going to be coming from in terms of appropriate credit or acknowledgement. You can get more citations, we can keep harping on about that.

12:40

We can keep exposing that to people, but how do we really make change at the credit and acknowledgment level in terms of what's going down? People looking for a repository is going down, confused about where to go. I don't know what repository to use. It's gone 23%, 20%, 17%, 16%. I think that's really great to see as the tools are there, right?

13:03

We have this idea of the future is here just unevenly distributed. We have technology. We have experts, we have Research's, but there's still some things that are causing concerns.

13:17

Oh, one last thing about this, the, the unsolvable problem licensing. I don't know. I have concerns about licensing my research. They, for sure about copywriting and data licensing. This is one that we just, it's nobody wants to learn about copyright and licensing, I think if the message, but we've got to keep involving it in the way in which we educate researchers about how to share their data.

13:41

Um so a third of respondents indicated they were used your own or someone else's openly accessible data more during the pandemic done before and I think this is because we're forced to do things, as I mentioned.

13:54

We're forced to work differently.

13:56

And so, if you think about this, in terms of how it's working in academia, the idea that you are only going to re-use people's work, or you're going to re-use your own work, happens when you've got no access.

14:14

The lab just highlights the idea to just the pace up, just keep pushing forward. And we need to look back and think about reproducibility of research. We really need to be thinking about the quality of the research.

14:27

And, as I say, Cov, it is highlighted that we had Natasha ... from RDC in Australia. She is the Associate Director for Data and Services, and she said, hurdles to data sharing in the area of policy and cultural change will fall short if we do not have underpinning research infrastructure. And the experts needed to run the infrastructure.

14:53

And I think that's, as we're moving to this idea around fair data, findable, accessible, interoperable and re-usable data, then there needs to be an even playing field.

15:06

It's, it's gotta be thinking about everybody working together on this.

15:13

So, another takeaway is, there's more familiarity in compliance with the fair data principles, and never before. This is because it's a huge movement globally, right, everyone, sees this as a good acronym to send people towards, to push the space forward.

15:27

Um, I think everybody understands that this findable inaccessible is easier, and then interoperability and reusability is harder, because not everybody has appropriate infrastructure or expertise to guide them around, you know, describing data. Well, that's the interoperability. And the reusability is the technology, focusing on interoperability between systems and metadata, schemas, and re-usable ... is really making sure the data is describe as well and discoverable. And I think that is what Natasha is highlighting here, that this needs to be happening globally.

16:06

The third key takeaway is, repositories, publishers, and institutional libraries all have a key role to play in helping making data openly available.

16:18

When we look to where researchers are looking for advice and who they're relying upon, it's everyone, Right, 35% relying on repositories, 34%, relying on publishers, and 30% are relying on their institutional libraries.

16:33

So, this shouldn't be a tournament of trying to win market share.

16:40

We should be looking for consistency. We should be looking to share knowledge. We should be looking for common standards.

16:47

And having this kind of data, gateway idea of everybody sending people to the appropriate place. And I said, this is, this is not a perfect space.

16:57

It's moving in the right direction, but I think we need to keep, um, competition. I altered it for now.

17:04

If we look at the takeaways for institutions, there's some fantastic universities out the research organizations who have big budgets doing fantastic things, But, um, 58% of respondents would like greater guidance from their institution on how to comply with their data sharing policies.

17:26

I think this is the idea, though, that we would like greater guidance at the point in which we publish our data, the points at which it becomes a problem.

17:36

So if you have courses, and it's, we, we can't work in a world where it is just in time, responses for, I need to publish my data, because I'm my publication, tells me, so right, there needs to be.

17:48

Researchers have a role to play here, as well.

17:52

So the survey itself, these are the three core things that we found. But obviously, you can dig a lot deeper into the different parts of the survey. There's also some really great, really interesting thought pieces from around the world, from different industries, thinking about how they're coping with this. And so if you're interested in the open data space, there may be certain bits that are relevant for your job in particular, or for the problems that you're trying to solve.

18:20

Um, so I'm just going to highlight some of the key takeaways from some of them.

18:25

There's a section from 40 EU, which, as I says, huge research organization, a leader in Europe in terms of data curation, enhancing metadata quality, and just research data services in general.

18:41

So, if you're thinking, how should I start? What should I do when it comes to offering services to researchers, then Jan van der Heul

18:51

as a data curator in the space. 4TU you has some really, really key tips on how to do that and how to encourage people to make good metadata available.

19:06

At the University of Oxford.

19:08

Um, we have this idea of consolidating research data in one way.

19:15

So trying to, they're making use of Figshare infrastructure in the humanities to try and normalize research practices are not having an individual use case for every group and every department.

19:31

And so I'm just going to let Damon talk for a second.

19:36

Yup.

19:43

Hi. I'm Damon Strange, Project manager of the University of Oxford. I'm here to introduce a piece of pulled together for State of Open Data. We've recently launched the service, which looks to support research, is the research data as sustainable as possible, by offering a single basket or area, where research can be managed in one place, rather than historical arrangements, where data silos across many departments with varying degrees to support some localized faculty and developmental IT support, or the external contractors that help maintain databases and data sets.

20:28

Terms of sustainability, accessibility, longevity, openness, credentials of data discussed in the same class.

20:38

So hopefully you'll find it's an interesting read, … though around 5% of the content being eggs and baskets. Thanks.

20:54

And what's great about that from Damon is this idea of no researches are unique and they often feel that their research is unique.

21:06

And no, no more. So than at the University of Oxford.

21:12

Will you find researchers who are very advanced in their fields and want to have a perfect solution, rather than a solution that just works for everybody?

21:22

But this is a journey. It's a step by step process that Damons talking about where they have to start with using vocabularies in the humanities, and it doesn't matter who you are. If you want your research to be impactful long-term, you have to try, and you have to encourage those researchers to do so.

21:42

In terms of tips for engaging with your research's, the University of Pretoria has a long history in this. So we've been doing this. As a longitudinal study for six years.

21:54

The University of Pretoria didn't know the survey back in October 2009. And as such, came up with a Research Data Management Readiness Toolkit. So if you are looking to do something similar yourself, there's a lot in there on Creative Commons and Persuading researchers. The non commercial does not mean that someone is going to take your data and go and build a bazillion dollars.

22:24

Uh.

22:27

Project from it. Or drug from it, and you won't get credit for it, It's all about understanding and a lot, a lot more detail.

22:34

So, three of our colleagues, like told through some of these tips and they also talk through the incentives, right?

22:43

So, because they've been thinking about it for the last 13 years now, um, they have university awards to incense, device researchers, and there's also the N S T S in South Africa Awards, which are full research Data Management Research data sharing. So, there are awards happening at the thunder, but what's really important that is that the folks within the university nominate their researchers for those awards, because somebody has to win. So, why not? Why not there, folks?

23:18

We cross the whole gamut in terms of getting different viewpoints, You know, both Oxford and the University of Pretoria, who I've just mentioned now are using Figshare infrastructure, but we wanna make sure that we're getting a balanced viewpoint.

23:34

So we're very lucky to have Sarah Gonzalez, who's a, a data librarian at north-western, and that, she's talking about how she's working on her project. We've got a little video from her that I'll play as well.

23:47

So, here's Sarah talking us through some of their work.

23:59

Hello, I'm Sarah Gonzalez, author of the contributing piece. Open Source and Open Data Collaboration is key.

24:05

My institution, that's our Health Sciences Library at North-western University's, Feinberg School of Medicine.

24:12

And others like it around the world, are committed to enabling data sharing, open data, and scientific ability.

24:19

We've joined the Open Source Repository Development Project hosted by CERN to upgrade our institutional repository, to a more next generation tool and to bolster and reinforce our commitment to open scientific data sharing.

24:36

Our contribution to this report, we emphasize the contributions of the talented, dedicated partners of the open-source community, both in terms of code and expertise on issues related to the Data model, and the day-to-day work of depositing, cataloging, and storing data.

24:53

Emphasize standards and best practices from the fields of data and file management.

24:58

Enable and get them all of our reference.

25:01

This includes things like leveraging DOI's, the data sites standard data model and controlled vocabularies.

25:08

We emphasize the ability to leverage ...

25:11

as data catalog capable of holding descriptive records about datasets without holding the datasets themselves, which may be needed for adequately sharing biomedical datasets that could potentially contain patient identifiable information.

25:27

To our collaborative and supportive relationships established through the open source community, we enable all this work to happen and plan to continue to work together to develop new modules and feature improvements.

25:41

Thank you, Sarah. And that's a fascinating read in that section as well because it talks through, you know, opportunities for collaboration, and they're working with folks from Nigeria, Area from Turkey, from Japan. You really as a global thing.

25:54

But obviously being based in North America, it also touches on the National Institutes of Health's new policy. That comes out in less than 12 months now, in January 2023, which is, if we fund you, you have to make your data available when you publish your paper. And I think that's something that's really important, is, what are your strengths in your industry?

26:19

We know that. I mentioned already the prestige of publishers can help.

26:23

If you're trying to get published in our journal, we can make you jump for a few hoops. If you're trying to get funded by us, we can make you jump for a few hoops.

26:32

I think the flip side to that is the researchers or institutions And how do we talk to researchers to let them know that there are there is help within their organization?

26:45

three help to make them have more impact with their research. And I think it's, it's all about catering the message that way.

26:54

Really interesting, too, obviously, we heard from Greg about how the different, uh, areas, the different categories, the different domains that SpringerNature, appeals to, and obviously, life science led. But at the same time, if you normalize that data, it's really interesting to see the changes.

27:15

And I think, again, Daniel Kipnis touches on this, the, the impact of Covid.

27:21

If you look at the, Re3data, which is the catalog system for all of the repositories where you can go and publish your data, there's 2700 of them, of which 1500 during the life sciences, so the majority are in the life sciences. And there's already 68 in Covid.

27:42

Um, and this idea, the researches all different in different domains, is definitely true. And I think it's, it really touches on that idea.

27:57

What are you in research for, can have an impact on how you pitch to those certain researchers.

28:03

So, there is a higher percentage in the Life Sciences, who believe that data sharing should be a requirement. Then the rest of the survey itself.

28:13

And you can, you know, you can hypothesize on why that is.

28:17

But I think a lot of people understand that, you know, I Tweeted out today, if you, if, if, if we had data availability upon request for the fact, the everyday covert results, I don't think the general public would stand for that. Right? Or upon reasonable request. So why do we support that When we're making healthcare decisions?

28:36

It's a, it's a lot more urgent when you think about it that way and I don't think that that reasonable request line should be allowed in any publication.

28:46

Speaking of publishes, we find the key takeaways for publishers, 47% said they would be motivated if there was a journal publisher, requirement to do so.

28:59

I think we can frame that question in a way that asks, you know, you would, if you wanted to publish here, would you not publish, if you have to make your data available?

29:11

So, 53% of them surveyed.

29:15

Obtain research data collected by other research groups from within a published research article. I think that's really the way to build on top of the research that's gone before.

29:24

It's, you know, methods are useful, conclusions are useful, but the data is useful to, it's also very efficient in terms of not having to redo experiments, have been done before, over and over and over again, to get to the same starting point.

29:38

And 53% of respondents said it was extremely important, the data available from a publicly available repository. So we're getting strong vives across the whole of the publisher space.

29:49

We're lucky enough to have, um, Graham from Springer Nature. He's the Research Data Management.

29:56

Obviously, Springer Nature have been working and leading the way in the data curation space for a long time from the publisher perspective, plows have been pushing on a very you know, they had a mandate for a long time now and I think ramping up their efforts again.

30:14

So it's really great to see the real leaders in the publication space taking that if you think about publishing research, this idea of curating data as a service, it's my own personal opinion. A lot of people think of it as akin to peer review, it's not peer review. You're not checking for the novelty of the data you're checking.

30:37

Is the data there, and is it well described for the field, right? So you need some subject specific expertise, but you don't necessarily need to be doing peer review in the same anonymized.

30:49

Separate to just the core service of publishing the paper. So, here's Graham.

31:02

I'm Graham Smith, I'm a research manager at Springer Nature and my piece for the State of Open Data Report is looking at the role of publishers can play in supporting data quality.

31:14

Lot of the context in this year's report about the spotlight shown on data by the Covid-19 pandemic, something that we've seen for reasons of mainstream media calls say, OK, show me the data or underlying this particular client.

31:29

But also Spotlight shown on data quality.

31:31

OK, so how trustworthy, all these data that, we're seeing publishes play a key role in terms of trustworthiness, in terms of reliability, quality, of published research and increasingly, This includes data as one of its outputs alongside things like research articles.

31:50

So a lot of the work that's been done in this area by publishers has been the development policies, the rollouts particular tools and services to support data.

32:01

Publishing.

32:02

what this really looks at. What roles, specialists, or can play to support me and publication of data that underlies articles, specifically, what the future of that type of support might look like. Particularly thinking about what sorts of roles built into editorial operations.

32:23

Thank you, Graham. So if you're a publisher, and you're looking to do stuff in this space, I highly recommend talking to Graham.

32:28

He's been working in the space around data curation for a long time, and I think this idea of publishing the paper, but also publishing the data at the same time is a critical point in order to reach researchers, to understand that, where the public interest is, as well to understand that the public wants to see, you know, um, can you backup these claims, so we don't have problems of fake news and false claims and other nonsense.

32:59

So, one interesting area is where these all meet these different industries. So just to round out different sectors. The key takeaways for funders in government agencies.

33:13

73, you can read faster than I can talk, but 73% of survey respondents strongly or somewhat support the idea of a national mandate for making research data openly available. I think we know this is what drives a lot of stuff. So the mandates have to have teeth.

33:30

What is the read? What are you going to do if they don't comply? And we see this. We've seen the effect of, you know, Green Open access, data repositories. Sorry, Green Open Access Paper repositories in the UK as a result of the wrath. I think we'll see more of this coming from the American funding Agencies. We've seen it in South Africa with the one big funding agency saying, you have to make your data available.

33:56

If you're representing one of these organizations from a country around the world, please do take a read and have to think about what you should be doing in order to make this a reality.

34:07

And when we spoke, we heard from Keisuke Edar, from the Japan Science and Technology Agency, ..., also known as J stage.

34:20

So at J at JST, they have J stage data, which is an evident, evidenced based based data platform for Japan's learned society publishing. This is a great combination of the two. So they also work on the publication space. And if you're learned society, you can have your own publication process, and they provide the infrastructure for it. They also had a review. They've had an open up Open Science Expert panel for the cabinet since 2014. And they found that the next thing they needed to do was to add in the data functionality for these journals, so that they could support it in this way.

34:58

And 42% of respondents in the survey, state of Open Data survey in Asia believe the funders should withhold funding, from penalize researchers to not sharing their data. So that's encouraging to see, it's encouraging to see that every angle, every domain, every stakeholder is doing something globally, so it's all moving in the right direction.

35:23

And I think this idea of how we can ensure that research is done, that is done and published, is the highest quality, is really important.

35:36

And this idea of having infrastructure where you can, the researchers can submit it, and experts will check it, And that can be librarians, looking for better, better data, It can be subject specific, metadata experts in subject specific repositories. That is how, and then it's published.

35:54

That is how you can ensure that the highest quality data is published. Because that's what we need when we're getting such strong eyes on things. I think the public perception is they only see they only see data when it's all gone wrong.

36:12

Um, and we see this time and time again, you know, so we've seen it with Eva ..., this is a great way in which having open data helped a real-world problem.

36:29

So when they started looking into the data, because it had to be made openly available, they found two things.

36:35

They found either it was falsifies or duplicated, or it was just an accidental thing, right? So percentages, calculated incorrectly. This is this idea of many eyes, makes all bugs shallow.

36:49

We can, if everybody looks at it, you can get, find the problems faster. But also, you know, that the selection of patients per test groups was not random. And the numbers unlikely to occur, naturally is a really nice thing around open data.

37:04

This idea that humans work, subconsciously, work in patterns, so you can analyze data, and do data forensics on it, and say, Has someone, just fudged the numbers here, have they made it up? We have all these tools.

37:18

We just want it to be sorted before it gets to the stage before it makes the mainstream news, I'm sure we all heard about even happening this year. And so, we were very lucky to have Professor ... bhabha over at QUT.

37:35

Talk about how open data can validate research and combat scientific misinformation.

37:39

And she really echoes this point about, how can we ensure that the research is being done? And published? Is of the highest quality papers? And data that's really the take home message, I think, for everybody, every organization should be asking themselves that.

37:55

Fundamentally at our organization, how can we ensure that research is being published at the highest quality?

38:03

And so open data has two important, overlapping roles to play in increasing the credibility of research.

38:08

It's validating it so that researchers can trust it.

38:12

And it's combating scientific misinformation so that wider size society can trust it.

38:17

I think in terms of credibility, that is definitely one of the ways forward.

38:25

But I also like to think about, we have a huge credibility issue that is drawing attention to this space that we can do more with.

38:34

But ringing your background to, you know, 10 years of being in this space with Figshare, what inspired, a lot of the original starting point Figshare, was this idea of the full paradigm of research data intensive scientific discovery.

38:48

So not just the computer is helping us with research, but the computer's finding patterns that an implicit in Jim Gray. Doctor Gray's fourth paradigm is the ability and the need to share data and we've started to see that come to fruition. So, 2021, 12 years after that was originally brought out. We see real-world, huge changes in this space, Alpha Fold from DeepMind, looking at openly available data from PCB and unit building on top of it using AI. Using machine learning.

39:24

We get a wholesale change in the way the protein structure is thought about.

39:32

The research that's going on, takes a monumental leap forward, and this is the thing I think we need to be thinking about at the top level. And so when you're talking to people, you need to be thinking about what story do I need to be telling them? Is it for the good of humanity? Is it for the good have credibility? Or is it good for the good of their own research?

39:52

The good of humanity gets a lot easier when we're saying, hey, you'll all be allowed out if your house is a lot quicker. If we have the data openly available, I think Kobe. There's been a fantastic way to do this to illustrate this. I wish I hadn't come to this, but it really highlights that how people can build on top of research. Gone before.

40:10

To find treatments to find the amazing speed at which we call the, the vaccine's around the world.

40:19

And to finish off, it's, it's this idea of the speed at which any given scientific discipline, any research discipline advances will depend on how well these researchers collaborate with one another.

40:33

And if we really want to make, it's academia, more efficient, in terms of saving money.

40:39

In terms of moving further, faster, we need to give every researcher the ability to fairly provide them the ability to access to literature, to publish literature, to give them the access to data, and the ability to publish the data.

40:54

And so this involves people, tech, and culture in funders, in government, in academic publishes in institutions.

41:04

And at the middle of this, is funding for posit trees in libraries and archives, and the librarians and the experts themselves. And that includes subject specific experts at publishers. That includes scholarly engagement folks at universities.

41:19

So it's really, uh, hugely encouraging survey this year.

41:26

This just highlights how many different mainstream organizations are pushing the space forward.

41:34

We are aware of what the problems are, and we have tips and tricks to try and make it, to fix those, or to encourage people to act in a proper way. I think the thing that we don't have yet is evenly distributed funding.

41:49

And I think we need to be having that conversation with the funders. We don't have evenly distributed mandates, and we need to be having that conversation with the funders. But the fact that folks like the National Institute of Health are moving forward with this, have working groups to this, the NSF. And then other funders like Bill And Melinda Gates, Wellcome Trust, European Open Science Cloud, UNESCO.

42:10

It's all moving in the right direction. So if you are working in the space, do have a read of it. If you're just interested in one part, I can share these slides, and you can check them out. But that's everything from me.

42:23

Thank you once more, to Greg and everybody at their team. Megan, I'll hand back to you for any questions or comments.

42:32

Great. Thanks, Mark and Greg.

42:35

And, yes, please feel free to type any questions in the question box.

42:39

And there's one comments so far as post your question, but still really interesting, and they said, Interesting, the output sharing is not in favor, since this is the approach to make grants more competitive.

42:50

Jeanette, if you have any thoughts about that, any mark in particular to make output sharing?

42:57

Sorry, I can't see it. Could you repeat it?

43:00

Yeah, interesting. The output sharing is not in favor, since this is the approach to make grants more competitive?

43:07

Right. Yeah. I think, I think grants is another.

43:13

Did you're talking about incentives?

43:15

We have general movement, I think, what we see as general movements that we think you should be doing this, and we may look favorably on you. If you do that until you have qualified, you will get an extra 10 points, if you can demonstrate how you've made your data available. Then, that's really the incentive kicker that it needs to be. And it's different globally.

43:36

We see it, you know, in Australia, they have that, in the UK Ref has talked about it in North America, they don't have anything.

43:46

So, in the public sector, they do in the private sector, but not in the public sector.

43:52

So I think more of that, definitely needs to be happening. I don't want to point fingers are the funders, because they do an amazing job, and they're moving the space forward. And they have to deal with a lot of different areas.

44:05

So they are moving at a pace that is aggressive for funders.

44:11

But that doesn't mean we can't ask them to speed up anyway.

44:15

I think maybe just the other thing, just to add on that, in terms of the data aspect, is that, although the level of support for all outputs being shed was less than just force, publications, is still over three quarters, are in support of that from the survey. So it's still a high proportion of people who are relatively in support of those out that's being shared. It's just not as at the scale of publications.

44:39

Thank you both.

44:41

It's a question. Do you think respondents are more interested in penalties for others for not sharing then it referring to their own practice?

44:53

Do as I say, not as I do. It's a classic. It also works the other way, though. Because this, I think, I think a lot of people feel that they can't share their data. Because, why do I have to share my data, if all of my peers don't? So I think that's why people want it, to happen at a national level. Because if I have to share my data, but my peers don't, they have an advantage against me in a system that encourages.

45:17

Competition.

45:18

All right, if you want to get the next grand, if you want to get that professorship, you have to be better than everybody.

45:23

Perceived to be better than everybody else. So, why would you, you, or your peers, your competition, why would you want to give them an advantage? So, that's why it has to come from the top.

45:33

But, yes, 100%.

45:37

Oh, there's a question for you, Mike. Great presentation and report. Does this leave you hopeful for the near future of open data? Also sort of regarding the global crisis beyond Covid?

45:54

Yeah.

45:55

I think so, Jenny, Barbara makes a great point about this in her comments as well, is, you know, we didn't need Covid come along to highlight, you know, the big problems that need solving.

46:09

You know, we already have climate change. That's that's a big enough one that we could all be working on. And we've seen the problems with climate change papers.

46:18

We've seen the problems with, you know, autism vaccines, cause autism, retracted paper, debunked paper, badge paper, and the effect that has had this probably no added to where we are with, with anti vax is in Novak Djokovic, right?

46:38

So I think there's a lot of things.

46:43

To be optimistic about, what I said about the idea that 10 years is not a long time.

46:51

Be thinking about 10 years because Figshare, But 10 years is not a long time, if you compare it to, you know, 1980’s was when we first got, you know, let's put all the papers in the world in FTP servers.

47:04

And we're 40 years since then, so it's only been 10 years of, of concentrated, let's move the data space forward.

47:12

There's people who've been working in that space for 50 years. I'm just saying, everybody pulling together kind of rank, no policies, data mandate only came in in 2014, 2015.

47:23

So I'm really optimistic because there's such a groundswell of if we can treat the data well. We see the other side now.

47:31

We have to use cases from folks like deep mind that if you can get the data and you curated well, and you use standards, then you can achieve great things and change the face of research. The question then becomes, how do you do that for everything?

47:47

And that's a lot harder, more heterogeneous data, how do you make homogeneous datasets, geographically distributed, heterogeneous data?

48:00

I think that's why we go in, think sportspeople, enable Solve it.

48:11

There's a question about any thoughts, whether the survey respondents may be tilted towards those already seeking and using open data and whether attitudes of the larger community may not be favorable for open science.

48:21

I think this may have been touched on at the beginning of the the webinar. But I don't know if you have anything else to add.

48:32

I mean, how you feel about it, Mark. I mean, I think I said, you know, it's hard to really gage what the broader population fail, unless you just do bigger and larger surveys.

48:40

I know that we when you look at things by the different geographies.

48:49

In there, there were there were some differences in terms of how that distribution in terms of Advocates, a agnostic, and anti that. That did have a slight variants on the distribution. So is there is going to be unnatural difference?

49:08

So yeah, I mean, I don't know is the honest answer, I can only present in the data. That way that we get given and likelihood is there is a slight bias there.

49:15

Just because, you know, how many of us respond to surveys, the things that we're not particularly interested in it? It's one of those things with market research in general.

49:25

I still think that's probably because of the scale of the response for positivity, I think is that, you know, it's, it's likely that the majority still do, are in preference of that move to Open Science.

49:39

Yeah.

49:39

I also think there's a little bit there around, you know, if you ask people, do you think in what society properly perceives is a good thing, is a good thing then you often get the answer, Yeah, but will they act that way in, in. In reality, maybe you know, the good thing is though, that we see it.

49:58

If you look at the numbers of datasets that are being published and you can go to Google dataset search, any other Google dataset search data site, the places like that. You see that the actual volume of data that's coming out is huge. The amount of citations from papers to repository light Figshare talking about the Figshare data itself is exponential curves, right. It's like this.

50:20

So I think you can say that people might be paying lip service on some things, but you can also look at the core data and see that it is growing exponentially.

50:32

And maybe just one more thing to add to it as well is that some people may not be classed as advocates because they're actually held back from doing those things. It's not so much that they're not in support of it, but they just don't have the infrastructure in place to be saying, yes, this is something that I can get behind. So actually, you know, some of those things that Mark was talking about could actually change those numbers, as well, and towards the positive, as well. So it might be that they're not necessarily advocates at the moment, but they could be if they were given the proper structure.

51:05

Thank you.

51:08

It's a question here about data citation practice. So that continues to be poor, or inconsistent data. Citation practice. Makes it difficult to track the use and impact of our data. Just wondering if you have thoughts on what we can all be given to boost data citation practice, Publishers, repositories beyond?

51:29

It's a difficult one.

51:35

The, one of the things I think there is, is, one of the things I can say is, there is some, one of the big funders is funding generalist repositories to get on the same page, basically.

51:49

So they are funding grants.

51:51

It's not being announced yet, but there are a group of repositories that will be encouraged to work together on standardization of things. So they are putting their money where their mouth is. These things will not happen in silos and older.

52:06

Repository's had a chance to get involved. So, I think there's positive, happening stuff happening there.

52:12

I do think the, the open site citations work that's being done is really good to unleash this information that was previously trapped.

52:24

Um, the, the, the problem needs to be that it needs to be operating on a level that is consistent for all, but then, if somebody has a advantage, they can, they can control it. The thing I'm talking about here is if.

52:47

If you have an open set of data that everybody can query for citation counts, that's fantastic, but if someone has a bigger set of data that gives you higher citation counts, then it's very hard to stop that group from using that data, because the researchers will see that this highest citation counts.

53:04

So, it is a, a hot topic.

53:09

I think, but I think it's completely resolvable. And I think things like the collaborations between repositories can help inform other areas as well. It's not just the repositories that conform publishes.

53:22

And I think a lot of robots can come in and help us, you know, if you're checking to see is this a valid DOI on a spring in Nature paper.

53:32

I'm sure somebody's pitched that idea to Springer Nature in the past.

53:38

If it doesn't exist already. I don't know.

53:43

Cool.

53:44

You haven't seen anything else, But thank you.

53:52

Put your contact details at the end of that slide, Mark.

53:55

And but, yep, Please feel free to get in touch. They've got any questions or anything, It's popped into your head after the webinar. So, we'll send the recording around.

54:10

Then, the next couple of days, and the slides around, as well, so you want to read up on anything that was in any of the slides in further detail that you'll be able to. But just a big thank you for attending a big thank you to Mark and Gregg for presenting the webinar this afternoon. And have a great rest of your day, everyone.

54:31

Thank you.

‍

View transcript