Play the webinar

(registration may be required)

Springer Nature & Figshare: driving open science through data sharing

September 29, 2022

Mark Hahnel

Graham Smith, Dr Maria Hodges and Dr Erika Pastrana

‍

Researchers are increasingly encouraged to share data supporting their publications. The case for Findable, Accessible, Interoperable, and Reusable (FAIR) data is reflected in a growing number of policies and initiatives to make this achievable while supporting data integrity and openness.

In this panel discussion, members from across Springer Nature’s research data management, editorial, and community teams come together to discuss how Springer Nature is supporting authors in open science practices through data availability as well as reproducibility, quality and integrity.

Topics for discussion included:

Springer Nature’s successful history of data availability policies
Their publication process developed to support researchers making data available
The role of data in publisher approaches to open science

Moderator:

Dr Mark Hahnel, Figshare Founder and CEO

Panelists:

Graham Smith, Open Data Programme Manager, Springer Nature

Dr Maria Hodges, Executive Editor, BMC

Dr Erika Pastrana, Editorial Director, Nature Research and Open Science Planning

‍

Transcript

Please note that the transcript was generated with software and may not be entirely correct.

We have a fantastic panel from Springer Nature.

3:09 You can see by their job titles that we have folks who have a good background in open science in general. Open research, open access, and open data.

3:21 And today we're going to be talking about data mandates and and the state of open data and how that is driving, understanding about the way that researchers are coming to tasks with this new area of their job, which is publishing research data, and how they can do that going forward.

3:43 And I think for those who are online, a lot of you will be aware, that this headline of NIH issuing a seismic mandate to share data publicly has really made people think about the rubber as hitting the road. Obviously we've had a lot of mandates for a long time, but this has really started the conversation, the biggest funder in the world.

4:09 Saying, As of January 2023, you will have to make your data available when you publish your paper. We've seen the OSTP memo come out since then as well.

4:20 As I mentioned, they're not the first we have where I am in the UK. We had the EPA SRC in 2015 mandate that you had to make your data openly available.

4:31 If you go to the Sherpa Juliet site from just QC 52 funders listed, they say they require data archiving. So this is what I'm saying.

4:40 When we're thinking about it really is the fact that researchers are going to be making the data available globally, it's, it seems like a simple ask.

4:52 It's proving to be full of questions full of different areas where we can get hung up on this, but one of those interesting things when we do the state of open data The FIG share in Spring, an HSA to open data every year. We survey researchers to try and understand what they're thinking, and what's motivating them to do things.

5:12 Interestingly, this last line here: Sneak peak to peak of the report that's coming out in the next couple of weeks, is more than two thirds of respondents to the State of Open data 2022, a supportive, to some extent of a national mandate for making research data openly available.

5:28 This number has been declining since 20 19, so still, the majority say, Let's do it. Yes, we understand the reasoning. We understand that a national mandate will mean that everybody has to do it.

5:39 So there's no favoritism, but the declining might be something to do with, um, the idea that, actually, now we have to do it. It's a little more work than we anticipated. There's a lot of new stuff we need to learn, and who's going to help us with it?

5:53 Well, if you ask them who researchers would be willing to receive support from to help in reviewing, curating, and preparing their data for public release, we have 41% saying they really rely on the publishers.

6:07 The place where you disseminate content generally, so that that makes a lot of sense, and 38% relying upon their institution. If you're lucky enough to be at an institution that has librarians thinking about this, that's it. That's the scenes. Well, as well for you, in answering your questions.

6:22 And finally, we had our sister organization repeats and put out a report last week.

6:30 The states of trust and integrity and respect are in research, perspectives on data sharing policies and practices. And you can see that the majority of researchers are putting data in a repository. This is from five different funders, from five different countries.

6:45 The orange one is a German Funda.

6:47 You can see they still have most they're the only one that has more data available upon request than in a repository, but otherwise, these things seem to be tracking in the right direction.

6:59 So, enough about me and my slide catastrophes. Let's get onto the the main meat of the conversation today.

7:06 As mentioned, I am joined by risk folks from Springer Nature.

7:14 Um, we have several different perspectives, Graham in the open data program manager, doctor Maria Hodges, the Executive Editor of BMC and Research Data, and doctor Erica ..., who is the Editorial Director at ... to Research and Open Science Planning. So, I'm going to start with you. Graham.

7:36 What do you think are the challenge is around research data, Research data sharing for authors, publishers, anybody else you think might you might like to mention?

7:50 Yeah. Sure. Thanks small, Can we, We have quite a unique perspective on this, because, as you say, we've been collaborating on the State of open Data report that comes out every year, which gets great insight from researchers about what some of the key challenges are. So, we have this year on year insight, right?

8:07 But we also have insights from delivering data services, from publishing data journals, where we can actually see what some of those issues that like in practice, and from talking to editors about some of the challenges they see with their authors.

8:22 And really, I would say, from a research point of view, some of the key challenges that we see being mentioned, Really recognition and reward, or credit, however you want to put it. That's a big one. We've seen 75% researchers in the states voted in data report this year, really indicating that they feel they receive today to credit for data sharing, and there's a big conversation around what what form this credit should actually take in terms of the existing research assessment methods. Is it really look at it through the lens of citations? For example, or is there some separate recognition of research outputs like data like code?

9:06Like protocols, some way of assessing those more robustly?

9:12We've also deliver these curation services that I mentioned at spring in H so I've been quite heavily involved with these all the way through both the flagship data, general scientific data, and separately. In research data support, something we previously run, data notes, publication, one of the main things is just around complexity. It's not always easy to share data. Well.

9:37 Researchers have told us they have challenges in this area. Some of them relate to that wider credit related discussion. Some of them are really just about how best to organize data, applying appropriate master data, finding out of their standards, finding out which repository to use, how to use it. I think one of the big takeaways there is really a lot of these are easily solvable issues, but the complexity arises in who provides the solution, and at what point?

10:04 And we've seen a lot of movements recently, to actually improve access to the solutions that help data sharing for research is just in a better way.

10:14 Yeah, I mean, that makes a lot of sense, right?

10:16 We have aye for coming from a purely data repository sense of the world.

10:23 I feel that we have been very lucky when it comes to compare comparisons to traditional publication processes in that no Springer Nature is hundreds of years old. So it's got lots of legacy content and lots of things. Lots and lots of content. It has to deal with.

10:42 Was, we can solve the tech problem relatively easy, but it's more around the cultural side of it, the people, and understanding the author's having these changes.

10:53 And I wondered, Maria, on your end, are you seeing similar, um, considerations with authors at BMC? Are they having, do you think the same problems throughout research and publications these days?

11:10 Yeah, definitely the problems, the challenges that grant outlines.

11:16 I definitely think there on BMC, it's mainly biomedical portfolio.

11:22 So this brings up additional challenges about sensitive information, that we tend to have more patient information, more genomic information.

11:32 And, the challenge is, how to share that, to share it.

11:35 So, researchers obviously, have to comply with the legal requirements of their country, and the ethical review boards.

11:45 And so, it's really a question of how we can partner, how we can help, and how repositories can help to enable sharing data when, when it's confidential, or whenever fearless patient identities.

11:59 So, there's various things that like ..., working and recommending controlled access repositories, and as I alluded to, we also have a helpdesk ticket to try to do that but actually is quite challenging to do this nothingness.

12:16 This is a legitimate talend or, it appears like a barrier, although, in most cases, there are a solution, robust, data sharing.

12:29 I'd also like to echo the point about credit.

12:33 This comes up over and over again and actually Indifferent, different countries.

12:39 Every every talk that we gave, it comes up very, very highly, as the lack of talent and lack.

12:51 Put it the amount of effort that goes into sharing that data and hopefully this will change quite a lot in with the funders. They mandate it well, but it's also about us as publishes.

13:04 How do we encourage people to cite datasets? Will we have that in our policies already?

13:11 But it's actually quite a cultural change, even, to get people to cite a dataset, we used to citing Knobel Research papers, that we want to change this. We can say, We monitor it.

13:23 We can see it, and in increasing, but the more that we start to do that, starts to be normal, I think, more than begin to tackle the challenge of giving credit and citations.

13:37 Yeah, you have that with this, you know, I make sweeping statements about, oh, we can solve the technology, no problem. And already Crumbles on the first interrogation, right? But there's this idea of the European Commission has a great line as open as possible, as, close as necessary. Which, you know, not everything can be made available. And.

13:58 Know, with the push from folks like the NIH, obviously, it's more skewed towards the life sciences. You see it with medical research And I, I imagine, Erica.

14:07 It's it's a similar problem across the different publishing sections of Springer Nature that, you know, there is a life science theme to it and that brings in more complications and so there is a publisher.

14:24 A publisher, sets of issues that need to be worked through in order to try and support this, which some people could say, Hey, you know, publishers don't need to do this, but, obviously, we've seen writ large publishers have stepped up and gone above and beyond, um, but it's a lot of work, Right?

14:43 Yeah. No, I fully agree with with the points that have been made.

14:46 I think one of the key issues for publishers and, you know, getting around these policies, is that, you know, they resonate with us as much as they do with the research community or the funding community. Or, you know, the government's really, in the sense that, I mean, we've been doing this for many years, as you said, Mark, right. We have a long tradition of peer reviewing manuscripts. And yet we are all aware that things, obviously, have to change, so we can do things much better than we did 150 years ago, or 100 years ago. Because of technology. And because of policy, because of awareness.

15:22 So things that come out from, you know, mandates from funders such as, you know, the, the ones we've been talking about, the NIH, and the OSTP memo recently, are, in a way, very aligned with our own thinking Riotous publishers that you know receiving a manuscript. It's a static view of a research story, our research finding.

15:45 Peer reviewing it, and then publishing it is only tapping the very top of the iceberg right, of, what is that finding really saying.

15:55 What is really the underlying, you know, science tools and resources that we're making available through this finding?

16:03 So, you know, publishers like us, and many others have been thinking for many years now as to, you know, how do we really make that publication more attuned to what research is now? And a lot of the research can be shared digitally in a different way than it was before.

16:21 So, now, obviously, you're thinking about no datasets, you can, you know, have a repository online that that provides you.

16:30 Some of these datasets are not all, you know, through certain no limitations insensitive data. So, the same goes with code or protocols.

16:41 You know, the idea that research generates all these different outputs, and at the end of the day, the paper is just this story that that is, you know, basically backed up or relying on these outputs, is it's very much accepted now.

16:58 So, you know, for us, I think, and many others, it's about thinking about how we embrace those policies with new workflows and new systems, so that we can embed into the publication.

17:15 No cycle, other outputs, and hopefully add value to them.

17:21 The way we feel we add value to, to the paper, The story, through things like peer review, management of peer review, you know, editing and all this stuff that, you know, we feel, if, if the data is an essential part of that paper, we can help authors share that data. Even make it available to the reviewers. And complement the review of the paper with the actual access to the dataset.

17:48 And, you know, ensure that at the end of the day, when, when we're releasing all these things, and, and, you know, when researchers feel, OK, this, this story is ready sort of for prime time and, and to be put out there as the final peer reviewed version, it is accompanied by by by the actual output that we'll ensure.

18:11 know, that the data's is there, for others to see, which then leads to all the benefits we, we will cover, as well in this talk, around Reproducibility, Transparency, as well as something that Maria and Graham touched on, which is, you know, expanding the credit, landscape, and signed. So you're, you're not just being measured by this one thing that you produce the article at the end of a, you know, a five year projects.

18:39 You're also recognized for the dataset you created, or the code you shared, et cetera.

18:46 Yeah, and, I mean, my own personal view there is, you know?

18:52 Working in academic tech, you hear a lot of people saying, well, if if publishing was invented today, it would look like this.

18:58 But it wasn't. And there's a whole legacy of incentive structures that are going on. And on the paper is always going to be the context. The king or the queen, the understanding of the story is what I did, and here's what I found. And the data in the code is just, and here's how you can help you. help yourself add to it.

19:17 I think you mentioned the funders, that the funders, when you speak to researchers about this, or in the state of open data, as we do and, you know, it kind of, I kinda get the feeling that, that, they feel it's just happening to them. You know, the funders are just being told you, sorry, the researchers have been told, you'll need to make your data available.

19:37 And I think it's a heavily commendable that the publishers, Springer Nature, comes out with policies for reasons of improving research before the funders make it happen.

19:50 But I also get the feeling that the publishers have one of the carrots there already.

19:56 Which is, if we tell you, you should be making that, your data encode available, you also have kind of a carrot, which is, will publish, if we won't publish you until you see your content of a bailable in the ways that we want to.

20:10 Because, of the way that the traditional academic, you know, how you advance your career, does work.

20:19 I feel that the researchers probably don't push back on publishers as much as they push back on Twitter about funders. Would you agree with that?

20:32 Or would you say, that you get just as many questions, and just as many concerns from researchers about all of the extra stuff they have to do? And they're all very busy.

20:41 You don't have to I'm happy to take that.

20:44 No, I mean, honestly, we have a lot of tradition. Graham, you know, can probably talk about this much more than I can. But, you know, with certain datasets we have been mandating for, for many, many years, I myself, when I was a PHD Student, I had to make my microarray data public, it was required. It was that experience, as you just described. Mark, you go into the mindset of, or, now I'm ready to publish.

21:08 Know, you look at the guidelines of, in the journal, and you know, this is a requirement, you get that done. It actually won't let you submit until you've got that thing done. So it's a natural thing for us, as publishers, to be, I don't want to use the word lightly, but gatekeepers, or at least sort of supporters of those policies that we ourselves also believe in right that are necessary to make the paper more useful.

21:34 So, it does, totally makes sense to me that, that, you know, we take responsibility, and also, you know, opportunity, like I said before, to add value to that process and support authors, But it doesn't just have to be that way.

21:52 You know, personally, I also feel, you know, when we look at other outputs like the preprint, for example, we see in some communities like the physics community that the publication and the preprint have run in parallel for many, many years. Without really colliding with each other. And, and being perfectly sort of flourishing in their own niches, right. You can publish the preprint and then you can also get published. And both things are, you know, enriching science in some way. And I feel for data and certainly for codes that is also happening and that is also going to grow, and that is also very positive. That, you know, some authors have a lot of experience with, with code in particular, which, you know, some people bundled with data.

22:33 So, think I'm, I'm OK, mentioning it in this webinar, but, you know, there is, obviously, a lot of tradition of sharing code as it's being developed, and having it sort of reviewed and checked by the online community. As it's been developed as a work in progress, That is perfectly wonderful. It's a way to sort of open up your research early. It doesn't, in any way hinder your capacity when you're publishing a paper that uses that code to think about how you're going to share that code. And there are some nuances that are important, right? You, you don't necessarily want something that's just living and changing, and can be removed, et cetera, attached to the paper, because we think of the paper as sort of a version of record. So there are going to be some things we, as publishers, care about, and have to work with our authors to say, Great, you've already shared, Now, let me help you think about sharing in the context of your version of record, because it's a little bit different.

23:26 So I still think there's gonna be some support that we need to lend, whether or not the researchers are deciding to share the data, from the beginning. You know, here it is. As I go and do my experiments, I'm sharing them with the world. Or Whether They're waiting for the final paper? And then they're coming to us and saying, OK. I'm ready.

23:46 Would you guys need I can put the data here, or I can, you know, publish the code there.

23:52 So But it's it's certainly something we're going to all learn from, right, as as mandates.

23:58 Really, You know, make this real for researchers.

24:03 Yeah. And I mean, the way I always think about it as well, is, what are the what are the files? You're producing when people say, well, it's data because you will speak to you know humanist, and they say, I don't have any data, and say, What are you producing the, I do have 14 hours of video footage. That's your data.

24:18 It's just nomenclature, right?

24:21 And that's the same is true of code.

24:23 It's also interesting, that software encode of all the output shared across all of the FIG share infrastructure, I think eight out of the top 10 most cited things, are software encode, because just naturally, people describe it well for re-use. So they go the way, they're very used to that practice.

24:40 And I think share, you know, taking bits from different communities can really help people get around this.

24:46 And, as I say, for one of the legacy rules around publications, the one great thing about it is, for me, again, in the data world, referencing papers, talking about papers makes it very easy to understand when someone says, I'm going to need to delete my data. Now. This is the published version of Records, you can just delete your papers, right?

25:09 Think of it like a paper, think of it as the the published part of the scholarly record.

25:15 And I think, as I say, I think publishers should be commended for for taking on more work in to add more value and a view of making pushing things along.

25:27 And when it comes to research and fast, but good publishing and pre prints and everything like that.

25:34 But when, when it comes to policies that, as you referred to, there have been policies at spring in nature for awhile now.

25:45 Graham, perhaps you could talk about how they came about what they are, and what's the impact of them. Have you seen them have an effect that people follow to people, listen to you?

25:56 Yeah, definitely, Yeah, no, I mean, I would just build on the points the Erika ... think very useful in putting those mandates.

26:03 In context switches, though, what we're really looking at here is, it's it's a process of value change. And the most effective way above five, you're changing culture change, actually takes place, is through communities. And, in fact, the communities that we've been talking about with genetics and genomics, that was how it started. The publisher mandates, followed the community, not coast, and that's a lot of the time. What we've been trying to do as a publisher, is to reflect these community standards, these community requirements. Whereas if you take a very top-down approach and drawn apply everything across all scientific disciplines, at the same time, saying this all has happened in the same way. That can be present, I think, and that can go wrong. And really, that's been a lot of where the nuances come in.

26:46 And it's about how to actually make this relevant to specific researchers, but to actually have something that can be measured that can be tracked, And that really supports your overall ambitions.

26:59 So spring and nature's research data policies right now.

27:04 So we have over 2000 journals that are now on a research data policy. And these were introduced in 20 16. So this was a standardized, tiered set of dice policies that really paved the way for a lot of other policy frameworks such as those created by the Research Data Alliance. And we've really seen that growing. So, in the past five years, that's a doubling of the number of data of journals with the data policy. Really, what we've been focusing on is moving towards a policy focused on transparency.

27:38 And really that is to require data availability statements and in certain cases to enforce deposition of data where it's really expected by the community. So, again, we might be talking about nucleic acid sequences there. And what we've really seen as the impact from that is, we've seen a lot of studies coming out recently, and a lot of initiatives, based on just the availability of this data, this data, about research data. Having all these data availability statement says, allowed us to analyze what day sharing practices actually look like across journals, which we weren't really able to do before. Now, we can actually say, OK, well, this percentage are sharing in these repositories. No data available on request is still present in these places, and we can compare and contrast across these.

28:24 one of the great strengths, I think, of having us a policy framework like this is it allows you to then build on it. It's not the be all, and end all. And, in fact, if you look at policies to introduce initially, a lot them simply encourage data sharing, or encourage data availability statements. If we move to the place of transparency here, that's really, really reinforcing the fact that what we're aiming at is openness and data really supports open science, sports integrity, and sports, transparency, in that respect.

28:54 And it gives us a platform on which we can do things like, for example, adding repository integration source systems and saying, this force is how book or enforce it helps. Provide a practical solution, how you can actually comply with those policy, and similar things we're doing with code, preprint.

29:12 Yeah, it's those little nudges. I think as well if you can.

29:17 If you can meet the researchers where they are, as I say, with this, the points in which they're publishing a paper, ideally, they'll have thought about the data in their code before that, but at least you can have a snapshot there, and say, hey, this is the private to public domain kind of point. I'm just going to check in, because I know we have got a lot of people on the call.

29:3 7Laura, I didn't know if there was any, any questions that have come through so far, None as of yet, but I'll keep looking.

29:47 So if anybody does have any questions, we'll try and sneak them in.

29:51Just right in the questions section on the webinar, let us know a little bit about where you're coming from, if you're a publisher, and what have you.

30:01 I think this idea of following on from what you were saying, Graham about having policies and nudging people into better practice is, is fantastic for moving the needle. Right.

30:14 It gets people thinking about it, it gets people pushing on it.

30:18 I mentioned at the beginning this idea of no data available on request, which is one of my pet peeves.

30:24 And I imagine Maria, you mentioned before about obviously Biomedical.

30:32 It's A in taking on more work as a publisher by saying we're going to help you support this work.

30:40 I was saying about integrations, with repositories and nudging people into best practice, that that's, you know, a lot of work. That's a lot of stuff to do.

30:50 And so I guess you have people who shouldn't be making that data available struggle to make their data available in the medical fields when it should be.

30:59 You know, it is sensitive data, but also you'll have people who say, my data is available under a quick upon request, because I can't be bothered. I'll never reply to your e-mail kinda things now, that I'm suggesting that happens from everybody and everybody's well intended.

31:12 But do you see that as a point you've got to, Or is that, Is there any way in which you think we can nudge better behavior from researchers?

31:22 Or is it all just going to be culture as well?

31:26 Applicants adjourning had him say he was one of the bus to mandate a data availability statement then, that helps us. Once you've got a policy in place, that means that you can check that compliance.

31:38 So we check every manuscript is a statement there and at least, you know, once you've got a statement, you can then begin to examine. It is plausible that the data is available on request, is this legitimate and in this circumstance So it does allow us to to question and it also helps review is really, really important.

32:04 Again, looking at the genomics community, you can see that reviewers will ask for the data.

32:10 And this is really important.

32:12 I think this is going to be a partnership between us, making sure that nice statements are available, and the community asking and expecting it.

32:22 Graham says, you've got 2000 channels.

32:25 Now that have a policy and add the ... to get every single down on our spring an H, it will take us a little while to every channel.

32:35 That we'll have a data availability statement. Once we've got that, we can then start to think, you know, what's the next step.

32:41 The first thing is making sure that everybody says weather data aids and then build from that.

32:48 Yeah, as I say, we've been chatting about this year about FIG, share being 10 years old. And it's, it's, I always feel the thing I've been thinking about. It's good to look back and think I've been thinking about is the last 10 years have been about putting files on the Internet.

33:03 How can we encourage researchers to put their files on the Internet. And then the next 10 years is how do we make those files useful?

33:09 Or how do we, how do we move the needle on from there? And, you know, what you were saying about genomics as well being some, some things it makes.

33:18 Gotta move fast because it makes a lot more sense, you know.

33:21 The human genome was sequenced in 2001. It's relatively new science. It's all computational.

33:27 I heard from Spark that when they were talking to the different governments, they even, when they're speaking to the Trump administration, they said, Yeah, they were all for open data, because you just put a dollar value on making the genome openly available.

33:42 Yeah, jump, Don't take books. more money, OK. So it's, it's a multi-faceted thing in terms of how deep it goes.

33:50 Graham, you mentioned the policies that, that you have the moment.

33:55 Do you think there's going to be development in the future across spring in nature?

33:59 Obviously, it's a gargantuan task to get 2000 journals to sign up to to some level of consistency. There is that, is the aims to push it further in the future?

34:12 Yes. And I will tell you the position wherein so of those 2000 journals, around half of them already have a policy that requires Data availability statement and enforces certain dice to be deposited. That's really the direction we're moving. And I think as Maria alluded to, that's, that's where we're going with this. We are getting to a point where we, we can see all of our Jones moving towards that requirement for transparency around where the data are actually shed. And that's, to some extent, I think what the future is, the state policies, as it's now we've, we've rolled out a framework which has really allowed different research areas and journalists to take on a date policy where they weren't previously able to.

34:57 Now we can actually move to slightly ramping that up to certain extent and saying, now we can actually simplify this, make this more straightforward. And I think that that also ties into that.

35:09 Future. I think of data sharing in general is better integrated solutions. Mook Something that's a bit more user focused. Now we've really focused on best practice. I think a lot as a community which is great. Now. It's starting to be married off a bit more with the user experience making it clearer for authors, and actually tying together a lot of these outputs.

35:27 They've almost worked in isolation to a certain extent.

35:32 But we sought to look at things like data and code in parallel as well as in parallel with our protocols, pre prints really looking at that whole package of what researcher is creating beyond just the research article itself.

35:47 Yeah, I mean, I love this idea, I'm a big fan of persistent identifiers for everything, because this idea of, you know, if further, funders to, to be able to have this level of equitable tracking.

36:01 The people who are applying for grants is, you know, Is person with this OK, to apply for this, got this grant, and use the funding to create all of these outputs, whatever these outputs are.

36:11 And they're this organization, so it helps with, you know, different rules are different from the levels and different government levels, and things like this.

36:18 And, I mean, Erika, you have Open Science Signor ..., which I'm very jealous of, but it, it, data is one part, right? There is.

36:31 You mentioned pre prints already. open science, covers a whole plethora of things. We have the OSTP thing saying that by 2026. There's going to be zero embargoes on papers.

36:42 How, how do you think research data sits amongst these other areas?

36:49 And how how can you there's only a finite amount of people who work for companies that can achieve these things, right?

36:58 So I think of it like the researchers, you know, data is on their radar, but so it was getting an old kid.

37:04And if you're in the UK, submitting your papers for Raph and all of these things. So where does it fit in your world? And how do you how do you balance those?

37:15 Yeah, It's a difficult question to answer, but you know, I'll try my best.

37:18 So, I mean, I think from our perspective And, you know, a little bit of what I've been doing over the last few months, is as a publisher, We are looking at all these different outputs together. As as as Graham was saying.

37:32 We, we, and many others, I think, in the community, had been more focused on one or the other in the past.

37:41 So, you know, data has been, you know, probably the leading output in terms of policies, and, and we're taking these very progressive, step wise approach with, well, let's just make sure people are at least acknowledging where the data is in the paper. Then let's try out providing, you know, some support for authors to actually share, using, you know, repositories that we offer at submission, which is kind of a next level where you're, you've got the policy.

38:11 Now, let's give them some support and see if that sort of helps compliance and helps everybody follow best practices.

38:19 Say, you know, you're giving them a ready to use, set off system, that will give them the permanent identifier will do all that for them, so they don't have to worry, because this is the problem.

38:29 Also, we're expecting researchers to now become, know, getting a PHD on open science in order for them to fulfill all their funder mandates, then we're really screwed. So we need to make it no easy and and kind of agnostic to their expertise. We lose sight.

38:47 You know, this is the best practice here, here's how we can help you. So with data, I feel where most advanced data encode.

38:55 But the way we want to think about it is, is and, you know, since we're making all an effort, now to think about open research more widely. Right, and that that came out of the UNESCO recommendations, and a lot of different institutions that have put together documentation recently about certain aspects, like Open Access. Why limit yourself to open access right to just the paper? Let's think about the detail is, Think about the code. Let's think about the other elements of Visa.

39:23 So that's what we're doing and Springer Nature, as well, Taking a step back and thinking, you know, where do we want to be, as a publisher, supporting our authors and funders and the community in the full Open Science sort of space? In the next few years? That doesn't mean that we're going to take the same approach with all the outputs, right, because, obviously, different outputs are different, have different requirements, and are at different stages within the community. So, we are looking at, you know, as Graham said, primarily, those four outputs. The data, the code, the preprint, and the protocol, as well as the peer review file. As as important outputs that sort of live alongside the paper.

40:06 But, we will take, you know, specific approaches for the different outputs that are needed.

40:12 So, I see data in that picture very much as, as, as taking the lead, because it has so many advantages to also inform the community of the value of that effort. Right, With something.

40:28 I mean, I believe that exists also for a protocol. Don't get me wrong, it is absolutely the case that if you go through the trouble of creating a protocol, it will have immense re-use value for the community at the end of the game. But, you know, doing so with data seems more achievable because of the repercussions and reproducibility and transparency that it brings as well, Right.

40:49 It's obvious to everybody that if you have access to exactly know what data went into that figure you achieve a level of transparency for the for the paper. That is that is very desirable. And Yeah, and then you know, we have partnerships like like with Fixture Rights that have a long tradition of working together, And we're more advanced with that than probably anything else to try and and Envision how that next step of support Could really play out and work of Graham and Maria.

41:22 But, you know, we've now integrated some of this, you know, in a very, sort of, user centric way, where we're trying to make it sort of, easy for the author, as are coming in, to think about sharing and supporting that sharing in a way that is, you know, more embedded into how we work with them.

41:44 Yeah.

41:45 I think one thing there, as well, is just the Evian you mentioned, to like five things like peer review, we're gonna talk about peer review. That, you know, that, the complexity of code sharing, and even what that means, You know, as I say, if you got a little off script, or a little containerized thing, that can be shed its, it's relatively easy.

42:03 But I think there's also this kind of people see data journalism and executable files within you know, Guardian news articles and they think that's what academia should be like. And it's like, yeah.

42:17 But, they have one consistent. This is one page. This isn't 10000 labs around the world, using 10000 different code dependencies. And so even that as a, as a moving the needle, I think is going to be an awful lot of work going forward. And I know that probably takes up a lot of your your time as well.

42:39 So as we think about where we're going and how things are moving forward, um, we have the state of open data that we are very happy to do with Springer Nature. It's got another, you know, more than 5000 respondents this year. It's a license queue but it's been great to track these changes over time.

43:04 I think one thing that I found interesting was this idea that the researchers, they are positive around open research and open science, and, and they're starting to, it's the numbers around the positivity, is starting to dwindle a little bit.

43:22 I don't want to ever put a negative spin on open signs, but I think it should be talked about that.

43:29 My hypothesis would be that this is the amount of work that needs to this being put on researchers.

43:36 I mean, Maria, I know you've worked in open access and open publishing for for awhile now, and so you probably have similar feelings around it, so it's all for the good, but are you concerned that the burden on researchers will become too much? Or there'll be, there'll be a pushback against it?

43:55 My feeling is that the fund is, are on board with this, and that will see a change over time at the moment. The funders are encouraging, and strongly encouraged.

44:08 And I think the reality as they start to absolutely mandate the sharing of the data, that the father, the reality, is that there has to be some sort of solution that's easier, unscalable and multiple solutions and multiple partnerships that, that are available to do this.

44:29 But the size, the size of the requirement to share, I think, will start to drive type changed in this area.

44:38 And one area that I think might well be very important change is that the Humanities and Social Sciences are actually named in that way, then they mimic the White House and we are careful to say we were used to certain types of data that will say, you know, over and over again.

45:00 And it will be interesting, I think there's a whole new area, I think now, What do we expect to see in the humanities and social science as well?

45:08Like, how, how do you share qualitative research? How do you share surveys, do shed, said that this how do we respect confidentiality?

45:18So, I think, I think that's an awful lot of work. A lot of thinking today. But I was a partnership that will be needed.

45:27 Here, I think, I think it will take us several, several years to achieve this.

45:33 But I do think that this amount of pressure for change, usually use it, suddenly see, a whole change in the landscape when you have this sort of pressure applied.

45:46Yeah, And I suppose the flip side to this is that, you know, 10 years ago, when I was first starting out in this space, the, you know, my, my friends.

45:56 by colleagues and saying, there's a bit of a negative sentiment about dropping about open science. You know, I say, I'd say to my, my postdoc by, postdoc colleagues, you know you're going to share your data. Is that, why would I share my data? Then? Like that, I can get scooped by my competitors, right? So, the fact that, that has moved on and we're past that point now, you'll still get that, I'm sure.

46:17 But the fact that the growing sentiment around it from the drive, from publishers, funders, institutions, everybody, UNESCO has mentioned, you know, the, that everybody is looking at this combined with things like the Sustainable Development Goals, and how we can, you know, solve the big problems, I think everybody looked at.

46:40 No code, but it is a great example for open data, because you can now see why open data helps us move faster, so we can get to a vaccine, as if we don't have other monumental problems to solve as humanity, light climates, and all the rest, right?

46:56 So, and there is there is in the State of Open data on the Humanities.

47:01 one of the reports is several publishes have done a survey on the humanities data. Blacksmith background has evolved. and the facts in one thousand Taylor and Francis and while you, I think the three, so that, that is coming out as well, soon.

47:19 Graham, I know you've seen the results of the state of open data was just to just to tease it to everybody. Was there anything that surprised you in the real results that we're seeing so far before it's published in the next couple of weeks?

47:33 Yeah.

47:33 I mean, I would say in terms of that, that, that drop that you mentioned and maybe some of the positivity around open size, that there has been a corresponding increase in knowledge.

47:44 The the level of concern around a lot of misuse of data, which I think was something is also very strong during the pandemic.

47:51 We saw this real focus on speed of publication and we saw some related rise.

47:57 And perhaps some of the concern about those, just being misused, misinterpreted, and you know some keys ice points, they relate to you know, I've already treatment bogus treatments for for covalent real just research malpractice. But that said, I think the overall, so push and drive towards open science, it's still, it's still stronger than that, really. It's it reinforces the role, I think of quality in research and integrity in research, which is something that we will have a role to play in, that, from the state of open data, I think some of the other interesting results, which I've seen some of the key motivators which are really picked out by researchers, is why they, sharing, would be important, all still related to articles. The things like the visibility of the research paper itself. I think that gives us, as publishers quite unique position to be. And I'm really reinforces the fact that tying these two things together is really significant making it a part of our systems. And part of our experience for users, there is a really relevant place to do that.

49:05 As well as answering some of the questions about what does data re-use actually look like?

49:10 And there's been a real focus on that, like I think a lot of the initiatives that we as a data community have come up with, were initially formed around the author, so formed around that data.

49:22 Produce. Now we're thinking a bit more about the data. Consumer or the reader is where you would put it from the publisher's perspective. We were thinking more about what, What does this actually help in? Like, how re-usable is this data? And is it actually being re-used now, it's, I would say some of those results that, we're seeing really sore backs that up there, is that link there.

49:41 Yeah. On my, on my end.

49:45 Know, this, one of the great things about it is, we've been doing it for awhile. It's not starting at zero in January with the NIH mandate, right?

49:54 It's been happening for awhile and so I think you can start to demonstrate to people, you know, that there was a study that came out that said, if you share your dataset in a repository associated with a paper, on over half a million papers, it was associated with a 25% increase of citations to the paper.

50:12 So you can, you can speak to researchers in the language of the, of their career, right, which is citations and impacts.

50:21 And the, the really interesting stat I found was 75% of researchers who responded said, You know, I still feel we're not getting enough credit of those people who'd shared data. 66% of them had said, I have received some impact, whether it's in the majority, in the form of a citation, or a collaboration, or something.

50:43 So we want more. I hope there's more coming. I'm sure there will be as the funder start tracking this more.

50:50 But at the same time, we're already seeing that people are benefiting, and it's not just stick, stick stick, The carrots are there as well.

50:58 I'm just going to check in with Laura again, and see if there's any questions that have come up from the audience.

51:04 We do have a couple. So the first one was for Maria, so really interesting to hear. The reviewers are actually asking for the data. Do you get the impression that the reviewers are actually reviewing the data and analysis alongside the article?

51:21 Yes, definitely. This tend to feel specific.

51:25 But, yes, oh, we'll say it is most striking amongst genomics that they will ask.

51:32 Cool thing. For the dataset. And they will look at it, not all of them.

51:39 There's a very high awareness of the importance of sharing, and people do on the whole, the accession numbers present in the manuscript. There were lots.

51:46 And you can see lots of comments will come back the data's.

51:52 They're in the repository, but it's not available, you need to change that.

51:56 But people will actually, it is reasonably common that people will actually ask for access to it.

52:02 And actually, why wouldn't I'm actually insistent on, say in that data that's how I started encouraging. Is that everything aspects of community.

52:11 And I really appreciate that interview is take their role seriously.

52:16 And this shift the NATO. But the next.

52:20 Great, thank you. Next one up is for the group. So be good to get all your thoughts. So question from a gold OA journals specialist. I truly believe that compulsory sharing of data submission is the way, can you think of the biggest cons of such a compulsory policy?

52:42 I mean, I'm so sorry. Yeah.

52:46 Um, I can tell you first initially, I agree that that is the endpoint. Really, we want to get to, I think, where I would have a bit of a challenge, or that is whether you can just implement not straight off the bat. Because I feel like this, this relates to the conversation we're having about. Data are available on request previously. And I think I've said this previously in other forums that we've had around it, and I think, ultimately, getting to a point where authors are not sharing, That data available on request is the way to go. But I feel like it's a symptom of a wider problem, which is that if you've got to that stage, and you're only sharing data available on request, it's symptomatic of a lack of research data management, up to that point.

53:27 And it's a similar thing. If you were saying you must share your data at this point, it's maybe, you know, at the point of submission of an manuscript is probably a bit too late to be thinking about doing a whole process of research data management.

53:40They said, well, I think there is a role for mandates to play, but they have to play a role within the wider context of the community change, and that culture, change in behavior change within the community. Really? So, yes. I would agree. Getting to that point is what we would do.

53:57 I would disagree with actually putting it there straight away because that is also going to be off putting for research. Then he's gonna potentially overload research's with requirements when really they need to be going through a process to actually get to that point.

54:15 As far as the cons are really the best people, we'll want to publish somewhere else, but less stringent data availability requirements and there's sleeves, potentially good manuscripts, but the data sharing is important.

54:36 So so you know, if you have a policy and you need to stick to it and I guess for the author to some degree the race, there is a risk in being scooped if it's a longer hit peer review process.

54:51 But there are ways, just like a halfway house where you make, you can make your data available to reviewers and the data there, it can be tested and check, and then really, it's that synchronous publication.

55:04 We publish them, publish a paper, on the data, should be available on that very first day of publication, the idlewild marketing.

55:14 Yeah, I was gonna, I was gonna mention a similar thing as Maria, that, you know, as a con. In our experience, For example, with code where we have mandated code, sharing for peer review and submission.

55:27 For papers that were, you know, primarily focused on, on the computational approach of the nature journals for many years now, since, I think it was 2007, Or, so, when we started doing some of that at methods at Nature Methods.

55:44 A lot of the con that comes with that is, is that, that, you know, there are obviously people that you don't want to publicly share at that point, they're not ready to put it out in the, in the fully open yet. Even though they perhaps will agree to doing that at the publication stage, or will agree to some level of controlled access, etcetera. Depending on what kind of data code we're talking about, that, you know, if we just navigate the world, of at least making them comfortable sharing it for peer review, and have the right systems. So that we can, you know, support them. And sharing that in a, in a private way, with the peer reviewers, have, the peer reviewers be able to check it. And verify that, you know, the data is there, and, and the claims are supported by that data that. then, you know, the next step of making it public becomes easier, for certain authors that are not ready to. So, but, that's something we have experience on.

56:41 It's just, obviously, it requires resources, and it requires the right platforms to, to be able to have the data in this sort of more private setting until it's released. So, in general, I think the con is really around support, because, Until we're ready to support everybody with this, Right? So you don't want to have, as a mandate in your journal that says, everybody has, to share. And you know, then you're gonna favor certain, you know, individualist and some institutions that have a lot of facilities to do all this thing. Can just get it done. And You're gonna, You know, make it much harder for certain individuals. They just don't have those resources, or don't have that support in their institution, or you know, or their community. Just doesn't have those repositories obviously available.

57:28 So, from our perspective, as a, you know, journal publish, I would say, if you, if you've got the support, by all means, you know, make it mandatory, right? If you're offering them a way to do it. Then there's nothing stopping you from saying, well, it's a requirement. And I'm, you know, I'm supporting you in doing this, and I'm actually confident that we can navigate your specific case if you have one. So we've done that with, No. And so that's where policies can be 100% when you have full support and experience to come to the author and say, And you know, we will, we will walk you through this, but you will have to share.

58:06 Thank you all for those answers, That's all the questions we've got at the moment. I don't know, Mark, if any closing thoughts for everyone will be able to see if. I think I think we, we sped through all the time we had available so. Thank you very much for everybody who has come and join us.

58:22 I really think it's fantastic to see the.

58:28 The amounts of thought going into it, as, as an organization, as a publisher. That was one of the biggest in the world, in terms of scale, in terms of prestige, in terms of everything else, The amount of effort that needs to go in, just to think through these different things, as you were saying, Erica, as I always say to you, Erica, on top of all the other things that are happening in the academic publishing landscape. So I wanna thank Erica. I want to thank Maria. And I want to thank Brian for fantastic comments today, and conversation. And I'd like to just close by saying, I think it's, you know, it's not a, it's not a step change.

59:03 We haven't, we haven't started today, it's been going on for years. It's going to continue to go on. It's just, the snowball rolling down the hill is getting bigger and bigger, and the moment is growing. So it's also hard work ahead, but the right people in the right places to make that happen. And it takes a village. So thank you for everybody for joining. And thanks once again for all speakers.

59:23 Thank you. Thank you.

59:25 Thanks. Bye, everyone.

‍

View transcript

register for our webinar

register to access our webinar

Springer Nature & Figshare: driving open science through data sharing

Transcript