register for our webinar

register to access our webinar

Lean in! Why publishers should be taking care of research data

Lean in! Why publishers should be taking care of research data

Lean in! Why publishers should be taking care of research data

play the webinar

Play the webinar

play the webinar

Play the webinar

Register for the webinar

(registration may be required)

Lean in! Why publishers should be taking care of research data

March 30, 2022

Figshare

Should publishers be the ones to take care of their author’s research data and supplementary information (SI) when there are so many places that offer support for them already? Come along to this webinar to find out why it’s not only the right thing, but also strategic for publishers to lean in and offer these services to their authors. We’ll discuss the top three things publishers should be doing to support their authors in storing and sharing their research data and SI, including how the new NIH Policy for Data Management and Sharing impacts you. We’ll also reference the results of the State of Open Data, our annual survey on researcher practices of and attitudes toward open data organized in conjunction with Springer Nature, to better understand the role publishers should play in storing and exposing research data.

Transcript

Please note that the transcript was generated with software and may not be entirely correct.

0:03

Hello everyone.

0:04

Welcome, thanks for joining us for our webinar today on my publishers Should Be Taking Care of Research Data. My name is Megan Hardeman and the Product Marketing Manager at Figshare.

0:16

Before I hand over to Mark Hahnel, our founder and CEO for the webinar today, few pieces of admin to share with you. The first is that, we are recording the webinar and we'll share that around. And if anyone who's registered afterwards, feel free to pass that on to any colleagues who might be interested.

0:36

And if you have any questions, there's a question box on the GoToWebinar Control Panel and Jeffrey's question at any time during the webinar or in the chat. and there'll be some time set aside at the end to answer them.

0:52

That's everything from me, I'll hand over to you, Mark.

0:56

Thanks, Megan. You can hear me loud and clear, and you can see my screen?

1:00

Yes, fantastic. Well, thank you, everybody, for joining today.

1:06

Something's just gone wrong straightaway. We stop sharing my screen.

1:16

All right on cue.

1:19

Oh, something's causing me trouble, sorry, one second.

1:28

Just.

1:32

Hmm, There we go.

1:36

OK, so, starting with the fact that we are 10, you might wonder why this is relevant to the conversation today.

1:45

We as fixture turned 10 at the beginning of the year and I bring this up because we'll be talking a lot today about results from the State of Open Data Report, which has been going for six years. And what we find researchers are saying around research data, and the non traditional research outputs, the things that go around the paper, but also 10 years has been a good time to reflect. There's been a lot of conversation about where data is moved to in the last 10 years, and what we're looking at in the next 10 years of data publishing.

2:21

one thing I will say is, I think that it's moving at a much faster rate than we used to, with traditional academic publishing acceleration. There's a lot of innovation in academic publishing all the time.

2:39

But if I think back to 10 years ago, when fiction I started, it started because I was an academic who had a lot of research outputs that wouldn't fit into the traditional publication method, Right, I had lots of videos, and so this is, this is a old flyer from when we just got started with a fixture saying, is this what your research looks like? Is there so much research that you have so much so many files, so much data that doesn't have a home?

3:11

We can build a place where people can make this of these other outputs available.

3:17

And one of the things that has come along with that as the acceleration of dissemination of content for various reasons that I'll go through has has come along is who is responsible for this newfound kind of explosion of information and who is the gatekeeper of it so to speak. And it's more kind of who is the gatekeeper? To let it out there, right?

3:41

Because anybody can just make any of their research outputs available on a blog these days Or on, you know, a medium post, there's lots of ways in which you could disseminate your research. So, why do we exist? And who should be thinking about getting all of this data, all of these files outside of the file drawer?

4:02

It's.

4:06

I talk about the acceleration of data in those last 10 years, and this is, this is a classic hockey stick. The the year on year. It's been hockey sticking for years. But if you look at it from 20 11 to 20 15, that probably looks quite flat. But if you look at just those dates, it was hockey sticking even then, so it's, it's hockey sticking up. What's what this is, is just one very quick look at the explosion of data publishing.

4:34

So this is researchers who have made the data available on three generalist repositories, Figshare.com, ... Dryad.

4:45

This is where researchers have made their data available and they link to it from a publication. So the publishes within their articles have a link to a DOI that is on one of these three places.

4:59

Um, and what's really interesting about this is this is kind of the basic level, if you've nowhere else to make your data available, put it here. That's where these, these three generalist repositories, as we call them live.

5:16

And what's very interesting about that is, as we move towards data becoming more of a part of the traditional publication workflow, who should be taking hold of it, and how can we improve it so that it's more useful going forward?

5:32

And so, on the right hand side, here, this is using a simple search using dimensions dot AI, and you can see all the publications, the link, to these repositories, where, who are the publishers, right? And so, it's as you'd expect, lots of big name publishes. You may work for one of these publishers.

5:56

And if we dig a little bit deeper into this data, we can also look at who funds those papers, right?

6:04

And so on the left hand side you see it is truly global, US, China, Belgium, Germany, Canada, US, UK, Switzerland's, US, UK. Japan, Australia, right? It is a global phenomenon.

6:18

And if we look at where those papers, that link to the datasets that site datasets are from it pretty much mirrors the publication density globally, right? It's way you'd expect it to come from.

6:33

So what we can draw some broad brushstroke conclusions from on this, is that, it's a global thing.

6:42

Researchers are now expected to make their datasets available, the videos, the files, that come along with the publication, right, they're supposed to be making them available now.

6:53

And this is largely due to funding mandates that have happened globally, but also we have now, perhaps the biggest big news funder that made.

7:05

Nature had come out with this seismic mandate quote, which is gone as a lot of attention.

7:10

Is the biggest medical researcher in the world saying if we fund you at the point at which you publish your paper, you should make your data openly available, And here are the guidelines for the types of places you should make it available.

7:26

And the guidelines are basically indicating technical requirements.

7:34

It should have a DOI, it should be persistently available. It should have metadata in this format.

7:40

And so, for us, a fixture who build infrastructure for these types of things, that's very easy to work with.

7:47

because you, as a publisher, might be thinking, Where should I be making?

7:51

When if we're Where do we already have to make our data available, or these are the files available, and how do we comply with this policy when it comes out in January 2023?

8:02

So, if we look, as I mentioned, we do a state of open data report every year. If you just Google State of Open Data, you can find it.

8:08

I'm wearing our six year, was, was last year, And so we, we track trends around what researchers are saying with open data.

8:17

And one of the things to think about is, from 10 years ago, when I didn't have a good place to make my data available, my videos of stem cells, moving from one side of the screen to the other. I went to publish them in a paper. And the publisher, I went to work with. Even in the supporting information, they couldn't support files of that size.

8:38

So, we've gone from, I want to make my files available.

8:43

two, You have to make your files available, and what does that mean for the researchers? In the middle, there's a lot of confusion.

8:50

There's a lot of concern, and so, that's one of the take home messages we found this year.

8:56

The end of last year was, there is more concern about sharing data sets than ever before.

9:01

So, in this year survey, the proportion of respondents indicating they have concerns about misuse of data don't receive enough credit or acknowledgement for sharing data or unsure about copyright and licensing has gone up compared to previous years. So here's that in a bit.

9:18

More longitudinal way, there is things here.

9:23

Concerns about, misuse of data, not receiving appropriate credit, right?

9:29

And when we think about credit in the academic system, I think one thing that we can all, I don't think I'm being too controversial, that people appreciate the citations and, uh, publishing papers is where they get the credit, which is the currency for their career.

9:45

And then there's some more granular things, unsure about copyright.

9:51

I don't know about the sensitive information. I don't know if I have permission. I don't know if I've organized it in the right way. And then, obviously, the costs and the lack of time, there's never going to be enough time. It's another, it's another box, The ticket's, another burden for researchers.

10:07

So if we're searches are struggling with some aspects and open data, then why are we making them do it? When I say we, I mean the funders. Why is academia writ large saying, you have to make all of these files available? And so this is just my own opinion, reproducibility. I mean, if it's not reproducible, what good is it? What are we really doing? Here is the reason why. This peer reviews is, the reason why people are checking research before it's made live.

10:35

Transparency to allow other researchers to query the findings. It's all well and good saying that you did this, and this is what you found.

10:43

But unless you can back it up with the actual data behind the story, it's very hard to qualify, whether people are taking advantage of perverse incentives, basically, efficiency. I think, this idea that there's an ability to reduce unnecessary experiments when I've been looking at the last 10 years and thinking about how we could, how, how fast things have moved in terms of open data and publishing, non traditional research outputs. I think one thing where we haven't seen as much progress as I've thought we'd have seed is around publishing of negative data, publishing of negative results.

11:25

And there's a lot of publishers that do, except that as publications, but some other publications are more interested in novel research.

11:32

And so, why? When I say efficiency here, I really mean that, if I was reading a paper and I saw the data behind it, I could build on top of that data. As opposed to reduce reproducing the whole experiment from scratch in order to build off the raw data.

11:48

And this general idea to move research further faster, I think, know, with Cov it, we've all seen that, the idea that the need to collaborate, the need to share research information.

12:02

And the data behind the findings means that if you have this idea of three labs working separately on, say, a vaccine for go of it, by being able to build on top of the research without having the restrictions of no data available upon request, well, are they ever going to respond to my e-mail? If it's just mandated that has to be openly available, then we can make it available to both humans and machines to consume that information in a more interesting way and rarely move the needle. So, I think that's really important to think about when we're thinking about creating new tools, create new workflows, or the researchers. We need to think about this, is ultimately for massive amounts of good, in terms of efficiencies of research.

12:49

Um, one other thing we found in the states of open data is this mix on where people are looking to, uh, for help.

13:01

Right?

13:01

If there's a whole new field of dissemination of content, we find the repository's publishes an institutional libraries, have a key role to play in helping making data openly available, because researchers required, when they asked, were asked if they required help, who would they go to?

13:19

Third, replied home repositories A third place on publishes, a sub replied on institutional libraries.

13:25

So, I'm guessing, for those of people on the call who work at publishes this this sudden, influx of a third of all researchers are suddenly coming to you with this brand new problem. How can we, how can you help me with this problem?

13:41

I need to, if the NIH is saying, I need to make the data available at point of publication than the obvious person to speak to, is the publisher, OK. Can you help me, can you guide me, here? And that's what we want to talk about today is, is where we can close the gap on that and where we can build on top of smarter workflows in order to make this easier for everyone involved and where the potential opportunities are.

14:06

So, a few other key facts from the state of open data.

14:11

21% of respondents report the lack of time. Or is this last minute of? Oh, I mean, if I need to do it, I haven't thought about it beforehand.

14:20

Over half are confused about what licensing even means.

14:25

And what we found is, if you show someone how to do it one, so you just tell them, Then, it's fine, you know. You can make this openly available, you should be making it openly available under a good license, and this is the one we suggest. They're happy. They just don't know where to turn to begin with.

14:42

And then in the middle, basically half of researchers being told, suggesting that they're motivated to share their data if there's a journal or a publisher requirement to do so.

14:54

So not only will they do it, but half of them are motivated to do it if there's a requirement to do so from the publishers. So publishers really are sitting in the middle of this situation of this conversation.

15:05

And this is my own personal thoughts.

15:08

When we think about dissemination of new types of content, we've had a lot of people think about, Well, OK, are we just going to have the data publishes in the same way that we have just paper publishes. Right. Are the publisher is going to have a journal section. Or they're going to have a Dataset section. If we don't know the answer to this yet.

15:31

We've seen lots of experimentation in the face, but in my opinion, one thing I think is is clear at this point, is, to me, anyway, is the datasets don't need peer review in the traditional sense.

15:45

So, when I looked around that kind of guidance for researchers and, and, um, what function we are looking for, when we're looking to publish research? It kind of falls into the research integrity function at publishes. Right, this idea that you have instructions on the top left. Sorry, I haven't got the logo there. That's Taylor and Francis: Guidance for instructions, telling you what to do. Here's everything you need to do. Right. Make your submission, draft your article, making revisions. It's talking you through the steps.

16:17

So that when researchers don't know what they're doing, and if I go back here, they're unsure about certain things, they're unsure about organizing their data in and presentable and usable way. It's very much like the preparing for a journal that the research integrity or you may have a different name for it in your publish at your publisher.

16:42

Have to think about, and again, down here in the Frontiers Research Integrity team.

16:49

So it's pre and post review quality screens, right? So, acceptance career criteria. So it's less about how novel is this research is more about. Is this well described? And does it tick the boxes that we're, we're trying to tick, which are largely around metadata or datasets.

17:11

So when I talk about metadata, you may or may not be familiar, I don't mean to patronize if you are with the fair data principles. It's been five years since the publication on fair data principles came out.

17:22

And it's just become the global plan for loosely describing a strategy for making datasets, non traditional research outputs, again, datasets or videos, 10000 images, spreadsheet data code, when we're thinking about datasets in this, in this common umbrella.

17:44

So there's more familiarity or more familiarity in compliance with the Fair data Principles.

17:50

They stand for findable, accessible, interoperable and re-usable data.

17:57

So, 66% of respondents to our state's open data survey had heard of the fair data principles and this is encouraging.

18:06

But it also, uh, has come from lots of different areas.

18:11

And I think when we talk about publishers guiding in the in the way that I showed with the Taylor and Francis and the frontiers messaging on their website, we've seen it from other places, too. We saw it from plows in 20 14 With a Research Data mandate.

18:26

If you publish with us, you have to make your data available, and here's some guidance on it, and we've also seen it with this spring in nature, introducing consistent policies across their 2500 journals. And they've made the research data policy texts available for the research data community under a CC by License.

18:49

So, there is this community of, within the publisher world, of being consistent with policies for each of the journals, so things like all the way up to type four here.

19:03

Data sharing, evidence sharing, peer review data is required. Again, some of the nomenclature is different between different places, but it's really interesting to see that researchers are becoming more aware of the requirements.

19:17

Because of the publishers who are thinking about these workflows and encouraging people within their organizations and outside of their organizations to be thinking of this.

19:28

So, within that state of Open data report, we have Greg Smith, Research Data Manager at Springer Nature, talking through the checking process for data at some of their journals. And so we'll share these slides afterwards, so you don't have to read everything.

19:45

But this is, this is what they're thinking about, is the data themselves.

19:50

The metadata describing the data, and this idea of the infrastructure, the hosting, the linking, and the preservation, And realistically, when we talk about the infrastructure, as I said, Figshare provides that infrastructure.

20:04

It doesn't matter.

20:07

Once you solve that side of things, it gets a much more interesting problem in terms of, what are the policies that we're going to have as, as a publisher? And how do we enforce them using the infrastructure we have? The infrastructure is just a box ticking exercise at that point.

20:21

So where are the opportunities and this is, this is why I was talking about, it's more of a curation of data. It's a research integrity and editorial service, in my opinion.

20:36

And so we as Figshare do offer fixture curation services where you have this, somebody submit some content. We check the metadata for fairness, and it's published on your journal portal.

20:51

So we do this for individual journals, for reviewing metadata of datasets before they get published and guiding researchers through the data.

20:58

And we do some file and metadata checks.

21:00

So we do this, because not everybody has a team to do this, but if you have a team to do this, then what's great about publishers is a lot of the time, the journals are field specific, and so the field specific can develop a little bit more.

21:20

Guidelines specifically related to that community, I mentioned before, the if there is a subject specific portal or repositories make this data available, and that's the best place for it, the more subject specific metadata, the more we can move further faster because the more homogenous the data is, and the more the machines can find, some of the new trends in this research data that we as humans just calm.

21:48

So I mentioned that we do offer that service. But I think where we have a better fit with publishers is around providing that infrastructure. And so when we talk about providing that infrastructure, it's really this.

22:01

Playing the game of what is best for your authors, And how can we make it as seamless an experience as possible?

22:10

So, as I mentioned before, the infrastructure we provide, we've failed to take all of those boxes.

22:16

It's fast moving space, but we make sure that we just got some funding by the NIH to make sure that we're up to date with their policies, as we move forward with the infrastructure we provide for everybody.

22:29

So if you think about our portal, on the left-hand side, here, you can see a portal we provide for the royal Society.

22:35

We work with lots of different publishers to showcase either journal data, book data, and it's most importantly, it's custom branded.

22:43

So it's, you can have, every file is on its own page, every journal can have its own portal for making content available in a compliant way. So you don't have to worry all about any of that stuff, you just have to worry about the workflows to make sure that your researchers.

23:03

Taking that box within your system.

23:05

What's, what's really good about it is then we can focus on some of the stuff that you as a journal, as a publisher may be interested in. You know, so on the left-hand side we can see the branding is all here. It can live on your own domain so it doesn't save Fighare anywhere. It is your portal. It is your infrastructure for disseminating all of these different content. We have all kinds of nice workflows for making it only available to certain groups of people. Complying with policies that way.

23:38

And we also have on the bottom right here, you can see some of the metrics that we track.

23:43

So the example for the citation here is a sage output. We work with sage, and you can see that it's a branded sage DOI. We can provide those all for you. It's a very out of the box solution, We take all of the boxes so that you can focus on the workflows and how you want to have the different policies at the different journals. We also track usage metrics. So here is you can see we track views, we track downloads, we tracked citations, we track altmetrics scores.

24:13

We actually find it, it's published now that publications that make data available have, on average, a 25% higher citation to the paper itself.

24:26

So it's very good for improving citation counts of the actual papers, but we can also track the impact of the other outputs as well. And where this is really useful is if you wanted to use the faceted search for your portal. So you can just say, here's some plus content where I've said, OK, just show me videos from this particular journal.

24:49

I want to sort by the highest alt metric attention school. So you can see what are the different types of outputs on top of the peer reviewed article. Are getting attention, are useful for social media to get more eyes on the content, and ultimately, keep them all on your platform, because it's all on your domain?

25:10

Of course, with things like this, all of these things, we, we are a tech company, so we we build things from A A tech minded perspective. Everything is powered via the API. So if you, you might be wanting to say, OK, I want to have a feed of, I can search for media files, video, audio, and filter it via alt metric, attention score, or citation count. But I actually want to pull that into another part of our system. Everything you can pull out via the API, you can pull it into other systems. Where the statistics, the files, themselves, everything is taken care of on that level.

25:49

On the other side to it, we also have what we call the Figshare viewer.

25:52

So a lot of publishers work with us for improving the supporting information in a sense that A is pre viewable within your within your article itself. So, here you can see, this is a applause article, where you can see, if you scroll down to the supporting information, you have a popup, and we accept any file format. And we aim to preview it in the browser. We preview something like 3000 different academic file formats, so your researchers don't have to worry about converting files into different file formats. You don't have to worry about it adding load to your page, we'll optimize the page, loading workflows.

26:32

It's very simple to integrate, so here you can see, if it's a video, it, you just click Play.

26:38

Um, every supplemental data object is assigned a unique DOI and you can have the viewer popup as many times within your system as you'd like it to.

26:48

So if you expand on that viewer, here's what you see, there was 12 files in this particular.

26:55

I see that this is file 10 of 12. So you click this popup icon.

26:59

The purple color here is the plus journals colors. So it's all in line with their workflows and their and their branding.

27:06

And here you can see all of the different files. So there was some tiff files, but there's a lot of video files, and I can just click through them. But if it was a spinning molecule file, a map file, a Jupyter notebook, it would all be pre viewable on your platform in your site.

27:21

So you don't have researchers leaving the actual article to download a file and go and try and open it elsewhere, And so that's all plugged in.

27:30

So, what we are seeing in this space is more and more as things as things are growing exponentially, we have different publishers at different levels of the workflow, right?

27:39

So we have people starting with improving what they have already, the supporting information.

27:46

And you find if it's just content, there's so much great content within journals already that all hidden away in supporting information for, for a lot of people. And then coming up with a data strategy.

27:58

And how new business models around different types of content can be, can be evolved as the communities, particularly society, communities, as well, society journals. Having these thematic workflows around what should be the metadata. We are accepting for our research outputs as a community. And we're also seeing day-by-day, continued innovation, I say, day, day by day.

28:23

We saw yesterday, plus, doing the next phase of the workflows and trying to encourage more dataset, publication, encourage more best practice.

28:39

And so this is some experiment. They've got an experiment, they got funded for for making, putting these badges in their articles. So the data is accessible.

28:51

And so find out how research articles qualify for this feature encourages researchers to make their data available, and so you could have it that if you wanted your own portal.

29:02

It would take all of the best practice around accessible data, take all of the funder policies going forward.

29:12

Uh, with a Figshare portal, branded with your own branding.

29:17

The other thing we've started looking at, again, is, is this gap of big datasets that there is a large amount of researchers who have large, large data?

29:28

And again, you might be thinking, well, we as a publisher, can't handle.

29:33

no terabytes of data are systems that support that. What's great about our workflows is we can do that, and we can, we are just infrastructure, so we can have it branded, however you want it. So, if you wanted to support publication of big datasets, you, as a publisher, can do that right now.

29:52

The last thing I wanted to talk about on this kind of scope of where we're moving in the non traditional research outputs, the, the publishers being in, that the research is saying that they want help from publishers with publishing data, and data can mean a lot of things, mainly because of the policies coming from on this. So there's, there's a natural fit there.

30:15

But when we think about broader, non traditional research outputs, we're seeing this need for fast. But good publishing, right?

30:25

You can publish data fast, because it doesn't need peer review, but it should have checks. Who is going to do those checks? How can the publishers guide that researchers through the process in a better way? And also you have this idea of a big topic of conversation.

30:42

Being non traditional research output is the similar hockey stick in the last 10 years of pre print publication. And I think what's really interesting here is, you know, I just picked up on research square of Springer Nature, Elsevier in Cold Spring Harbor Labs, that the exponential growth of their publication of pre prints.

31:02

And what I wanted to mention this for is because with Figshare infrastructure, we were set up for this workflow, right?

31:12

Publish a, upload a file. Add some metadata. Somebody has a check of it and make it publicly available without peer review. That is what we also have within our workflows. So the infrastructure that we power, that takes all of those boxes around the boring stuff, as I like to think about it.

31:31

The framework for publishing outputs in this fastboot, good model, We do have fixed, yet pre prints for launching out of the box branded pre print services, So if you're thinking about what your policies are going to be around the, the non core peer reviewed journal article and you'd like to have a conversation about what's available from workflow is light fixture, infrastructure led us now, I think.

32:01

What's really key to remember here is that with all of this, new types of content being made available, the key central point of dissemination of academic content now, and for all lifetimes, is going to be the peer reviewed article. That is the context. That is the description, that is the, the peers, explaining why this is important.

32:27

What we're talking about now is this. How can we move things further faster, and how can we provide a fuller picture?

32:36

And if you are the publishers, are going to continue to be the center point for this conversation, then how can you be providing a fuller picture around the articles? That is all of the files that go with it.

32:49

Then pre prints, this is, obviously, as the name suggests, before, this comes out, and then all of the content that needs to come with the publication in a way that is compliant with publisher, with funder policies. This is where we see ourselves as an infrastructure provider fitting in.

33:07

So when we're working with publishers, you can see on the left a lot of the publishers we work with Israeli about this idea of compliance with new policy. The pace of change, you know.

33:18

If, if you decide to come up with a data strategy that involves building a lot of stuff yourself, and then six months down the line, the NIH brings out a change to their policy. It's very much something that we have within our wheelhouse, and we can just make sure that you and your branded portals are compliant with all of these portals.

33:38

The last two points here is the author services and the development of new business models.

33:43

The other thing to think about here is, when I say, uh, the future is going to be the peer reviewed journal article with all of the content that goes with it. The plethora of information that goes with it and in the case of pre prints, the, the catching researchers at the point of first submission.

34:03

Then, that whole workflow has new, interesting bit of business models around how you can support the researchers, who are crying out for help in this space, right. And it's not that every publisher has to re-invent the wheel and go from scratch.

34:18

But, when we think about the skill sets of public publishers and what they already have within, um, their departments. It's this. We can check the files.

34:27

We can make sure that everything that is coming through is in line with the standards that we create, And then, from a technology point of view, this idea of, we just need something that works. We don't want to have to worry about that.

34:42

the Russian government, the Chinese government, the South African government, the American government, changing their form, the policies when it comes to research data, and we're going to have to, you know, come up with a new project to build all these new tools to do with that. So you can see a bit more of that on our Knowledge poll. So at this URL below.

35:00

And if you would like to hear more about the portals or the viewer, or any of the functionality that I've been talking about today, of course, we're going to be back in person at the London Book Fair next week.

35:12

Some of our sister organizations will be represented there, as well repeater or metric and dimensions. So, please come and have a chat with us. And if you do have any questions about any of this now, I'm happy to take some questions. Now, we can share these slides around afterwards and the recording, as Megan mentioned.

35:29

But that's everything I was going to talk about today. So I'll leave that.

35:35

Thanks, Mark.

35:37

Aye.

35:38

Give people a chance to partner questions. And I just want to follow up with one thing just to say there's going to be a webinar on the 17th of May. She said it was like a fireside chat panel conversation with a few publishers on their data policies. So this is something that's of interest to you.

35:58

And we'd love to see there on May 17th, and we'll send around at the sign-up details a bit closer to the day. It just seemed a good time to mention it.

36:10

Couple of questions through Mark. one is Have you seen any changing trends in research data publication after the pandemic.

36:20

Yeah.

36:20

Well, what we also saw within during the pandemic as well was that a lot of researchers, it kinda came in two fold, right.

36:28

The general public knew that we needed to have more information, to backup claims. Things, we need to have more people's.

36:36

It needs to have the data behind Eva, ..., and things like this, right.

36:41

This got into the general consciousness, but what we also saw was, And I'd have to query the actual statistics from this day of open data report. but more researchers re-used either their own data or other people's data for the first time. So there wasn't such a push for, well, because if you imagine your wet lab researcher, you can't be creating more data, so you have the time to go back that back and look for more information in either your data or in other people's data. If data wasn't being made available, you couldn't do that right. So there is this opportunity for a building on top of the research that's come beforehand in a way that hasn't been done before And so we saw that during the pandemic.

37:23

It'll be interesting to see obviously pandemics not over. It'd be interesting to see with this year's state of open data, whether that trend continued and whether it's this is sustained trend within the research community.

37:39

Kenya, do you have any comments on the language used in the NIH? policy? Mark says they expect data to be shared, but not require it or mandate it?

37:50

Yeah. This is, this is the main pushback that the nature article had, right?

37:55

Was nature called it a seismic shift.

38:00

And people said, well, I'm not sure it is seismic, if they're not mandating it, whereas I would have liked to have seen it mandated.

38:09

I think the general rule direction of where we go with this, as I mentioned, plows mandated in 2014. 8 years ago that you have to make data available when you publish with them and they've seen great success with it, right.

38:24

We've also seen 57 publishers as funders globally mandate open data.

38:31

So, the NIH is the big one, because it's the biggest funder of medical research in the world. And so, it's seen as such, a huge change to how much of research will be made, will be published, going forward.

38:46

And I think, from our conversations with the NIH, the, the idea is that this is a movement towards all people have, having to make their research data openly available, or in the future. They won't get funding, again, from the NIH. Right.

39:01

And I think it's enough to make the researchers know that they have to do something when we, we did a pilot study with the NIH two years ago, where we created a repository for them, and port all under their own branding for them to make their own stuff available.

39:21

And we found out, from a lot of the conversations that what the state of open data report talk tells us is true.

39:29

Researchers need help.

39:32

And so, this is why I think it's very important in this conversation, is if they need help, and the, the NIH says, at the point at which you publish your research data, your papers, you should be making a research data available.

39:48

Then there is going to be.

39:50

If you run a medical journal, medical research journal, or you publish research on in the broad topic that the NIH funds, then those researchers are going to get to the point at which they're publishing the paper.

40:03

Go to publish, and have to make their data available.

40:06

And they're going to be asking your editorial staff, what am I supposed to be doing?

40:11

Right.

40:11

And so, I think getting ahead of that is a problem. It well, it's a solvable problem. And a lot of publishers are already way. I've been thinking about it for years, right?

40:22

So, in terms of the wording itself, could have been stronger, yes, Do I think we're past the point of No return, also, yes.

40:32

Thank you.

40:33

There's a question around data that's uploaded by authors versus uploaded, by publishers and whether there's a, A better workflow, which which one is better between the two, if you have any thoughts on it, because we support both of them.

40:50

Yeah, I, like, I think the ideal scenario is you get the the authors to upload the data, and then you have somebody check, Or, even better, I think some of the things we're working on is this idea of having automated checks.

41:05

You know, no, you can't call your dataset dataset in the same way you can't call your paper paper.

41:10

Right. It just doesn't make any sense.

41:13

And so I think there's a good combination that was the scholarly kitchen post that went out yesterday, that talks about some of this. The idea that the combination of humans and machines is often the best bet with these things.

41:25

And I think, as Megan mentioned, if you want, we have author uploads workflows with publishers.

41:33

We have publisher upload workflows.

41:37

And I think it's really dependent on what size, what volume you're thinking about, if you have 10 million supplemental file data files coming through each year, and it's very hard to check all of them, you know, it's going to change the model.

41:50

But, if you have different tiers of outputs, is that supporting data is the data behind the article that's critical?

42:00

Then you can maybe do checks on the more important stuff, and allow people to make supporting information available in the way that they always have done, but you get the added bonus of previews of different files.

42:11

So they don't they can stay on your site. And they don't have to jump around and download different files. But I think it's really dependent on the size of the publisher and the scope of the publisher policies today.

42:27

Do you see mandates coming from other funders, IE, in the physical sciences?

42:33

Yeah, the SRC, the Engineering and Physical Sciences Research Council, was the first one in the UK where I am, where it snowed today, and it's now sunny.

42:43

And so that was in 20 15.

42:47

We've also seen ourselves started working, and providing portals for organizations.

42:53

Like the National Institute of Materials, Science in Japan. So it truly is a global thing, and it truly is global, in terms of the types of content.

43:05

Because, on the one hand, you have the funders saying, this is good for, you know, when these, you want to think about. Is pushing the needle on medical research, when we think of cov it.

43:18

But if you're in the humanities, it's also very useful for all of the, you know, if you've just written a monograph on tracking groups of people over time.

43:31

And it's very hard to do that in a, in a non visual way.

43:34

So having lots of different types of files available that are just pre viewable in this beautiful thing called the Internet makes the transmission of information a lot, a lot Freya. So, yeah, we see it from all types of funders, humanities, social sciences, physical sciences, natural sciences.

43:57

Great, thank you.

43:59

Those are all the questions, and so I think probably ended there. Thank you for coming, And thank you for asking questions. Thank you to Mark for presenting, and it will send the recording around with all the links and slides that were mentioned. And a couple of days, if you have any questions, feel free to get in touch. We'll be happy to answer them for you.

44:20

Thanks, everyone, and have a great rest of your day.

‍

View transcript