register for our webinar

register to access our webinar

FAIR Data on 4TU.ResearchData

FAIR Data on 4TU.ResearchData

FAIR Data on 4TU.ResearchData

play the webinar

Play the webinar

play the webinar

Play the webinar

Register for the webinar

(registration may be required)

FAIR Data on 4TU.ResearchData

September 28, 2021

Connie Clare, Kees den Heijer

In this webinar, 4TU.ResearchData’s Connie Clare and Kees den Heijer will share the processes for both researchers and data stewards for ensuring that data uploaded to 4TU.ResearchData is as Findable, Accessible, Interoperable, and Reusable (FAIR) as possible for optimum discoverability and reuse.For more information on 4TU.ResearchData — powered by Figshare — visit https://data.4tu.nl/.

Transcript

Please note that the transcript was generated with software and may not be entirely correct.

0:05

Hello, good afternoon and welcome to the Fair Data on 4TU.ResearchData Webinar. My name is Megan Hardeman. I'm Head of Engagement Figshare.

0:16

And before I introduce and hand over to our presenters this afternoon, I just wanted to go through a few pieces of housekeeping, and the first is just to say that this webinar is being recorded and it will be shared with all registrants after the webinar is complete.

0:35

And if you have any questions, please feel free to put them in either the chat functionality or the questions box. And we'll have some time at the end of each section for the presenter to answer those questions.

0:50

So, our presenters today are fair, and taste, and hire, some 4TU here to present.

0:57

The process is 4TU uses for sharing that data uploaded to the repository is as findable, accessible, interoperable, and re-usable affair as possible for optimum discovery and re-use. So, with that, I'll hand over to Connie.

1:15

Thank you, Megan, thank you for the introduction.

1:17

I'm just gonna paste a couple of links in the chat for anybody that wants to be able to access the slides. They are available.

1:25

Answer, no, no, I'm hoping everybody can see the chat now, and should be sharing my screen already. So, with that, I think we can get started.

1:35

So we're talking today about fair data on 40 research data.

1:43

So a brief introduction to myself. My name is Connie Claire, and I have a background in developmental biology, and I studied at the University of Nottingham in the UK.

1:54

Then last year in October, I joined the team at 4TU Research Data as a full-time community manager. So that's building the human infrastructure around the technical infrastructure. So I'll be presenting during the first part of this webinar, and then following on from that case then hire, who is our software developer, 4TU research data. We'll talk in more detail about the technical infrastructure to support researchers with fair data.

2:22

So the goal for today is that we'd like to share 4TU research, daters, aims, objectives, and perspectives on fair data.

2:30

So as I mentioned, I'll be talking about building a community for their data. We'll talk a bit about the organization, why we need community, and the members, and what we do together.

2:41

Then case, we'll talk about creating fair data and software, reproducibility, and some of the more technical aspects of the repository.

2:52

So building a community for fair data, what is four T research data to give you some background about the organization.

3:00

It's an international data repository for science, engineering, and design disciplines. And as you might have guessed, we use Figshare as our technical infrastructure for that.

3:09

And you can access the repository, using this link here, to the data portal, and the repositories available to researchers all around the world. So any researcher, in any institution, can actually deposit their data and software code in 4TU research data.

3:27

And we aim to support researchers with the curation, sharing, access, and long term preservation of research data. So we do have the corporate seal of approval, which means that we're a trusted digital repository.

3:41

We were founded more than a decade ago now, back in 2010, as part of the ...

3:47

Federation and thought EU stands for full technical universities.

3:52

So there are four different universities within the Netherlands that actually drive the agenda of the repository.

3:59

And this is Delft University of Technology into the University of Technology, the University of Twente and more recently a member institution, slackening in University and Research.

4:12

We're really pleased that over the past decade or so, we've had more than 8000 datasets deposited within 4TU research data.

4:22

And this just shows you a snapshot of the portal to just show the diversity of different disciplines that we can support.

4:28

So from mathematics through to Agriculture and Bettine Sciences, from information and Computing Sciences through to business and management, but alas, the portion of our datasets are actually coming from the Earth and atmospheric sciences at Geography and Climate Sciences.

4:47

So, what we're thinking about is we have this good technical infrastructure but when it comes to really supporting researches and providing them with tailored discipline, specific solutions, or data, we think that we need to work with human infrastructure. So, we need a community.

5:05

We realized that having good infrastructure on its own isn't enough for ways to really support researchers in making their data signs of all accessible, interoperable, and re-usable. So, we need to work with and invest in experts, data stewards, data managers, research software engineers, for example.

5:25

We also appreciate that research communities and disciplinary problems, cross institutional borders, and we're all trying to tackle very similar things when it comes to engaging researchers with data management.

5:38

And that's where a consortium is technically in institutions.

5:42

We believe that we can come together. We're stronger together, and we can collaborate in solving them.

5:50

I think it's important just to reflect on what we mean by community.

5:53

And I've adapted a definition from this organization here called the ..., And that stands for Center for Scientific Collaboration and Community Engagement.

6:05

So, when we think of community, it's not a group of disconnected disbanded people, but it's actually when people with a shared purpose come together to achieve goals and tasks that they wouldn't otherwise be able to do on their own.

6:20

It's also a space where information flows in multiple directions.

6:23

So it's not just coming from the same place all the time, but we also like to think that we're creating a Community of practice which provides opportunities for learning for members.

6:35

So, the idea is that members can come together, be part of the community, and learn skills, and, and various techniques that they can then put into practice in their daily working lives.

6:47

And community also means that members foster meaningful connections with one another, so they feel like they belong within the community. They feel included, and encouraged and empowered to contribute.

7:01

So this just shows you, one of my favorite slides that shows some of those individuals who've come together in the first year of the community, helped develop to develop what we do.

7:12

And you can see some of these member profiles on our community website with the URL here.

7:18

So, who are our community members?

7:20

Just to show you a breakdown of the composition, At the moment, we have around 41% who are researchers. So, that's ranging from master's, PHD through to associate an Assistant Professor.

7:34

And just under a quarter of the community, are full-time data stewards working in various different institutions.

7:40

Then, 12% fall under this broad umbrella of research support. And, as you can see, looking at the bios at the members online, we have information specialists, open science, officer, library, and we have communications experts, as well as program and team leaders.

7:58

And, I think it's also important, important to mention that we have 9% of the community represented by our technical team, so that also includes case who'll be speaking today.

8:10

And I think it's important that they're there to answer any queries and questions that the researchers and the data support staff might have about the repository.

8:20

Then, a smaller proportion, so less than 4% in each case, are represented by research, software engineers, community managers, data managers, and also data trainers.

8:33

So, what's the goal as a community?

8:35

Broadly speaking, it's to provide a space for these multi-disciplinary personnel, researchers and data supporters to connect with one another and exchange knowledge about the best practices for creating and re-using fair data, say, from the going to the project, beginning, the creation of a data management plan all the way through to the publication and of data and software within our repository.

9:02

So what did we do together and tau? And I'm going to touch upon three main areas. So the first one is Engagement and Education. So we're quite focused on bringing members together. We have a few working groups, and also focus groups to engage members.

9:17

And I'll also share some examples of some of the training workshops that we actually are involved in, and but also demonstrations of how to upload data, for example, using the repository.

9:30

And we're also focused on reward and recognition.

9:33

So I'll give some examples of how we showcased researchers and data support stories within our community channels.

9:41

And how we also have some celebrating data initiatives within our social media.

9:46

Then finally, I'll share a couple of examples of Fair Data Incentives.

9:51

So we have a fair data fund, and we also have an internship program that we're planning on launching early next year.

10:00

So to begin with Engagement and Education, as I mentioned, we have these Working groups, and these run every month. And they're primarily for the data stewards, at those 4TU partner institutions. So that's delved into vin and 20.

10:14

And we establish these working groups back in January this year.

10:18

For the data stewards to come together for an hour, every month. And then we work on various different activities in-between is as necessary.

10:27

So, the idea is that the data stewards can share experiences, ask one another questions about their role, and learn from one another.

10:36

Because we have data stewards that have been around for four years also and are quite established in their role.

10:42

We also have new data stewards who've started just recently in the pandemic. And so it's important that they can gain competence and a role that's not always very well defined.

10:53

We also want to come together in these groups to contribute ideas, and ultimately co create resources that are of benefit to the research community.

11:03

So we asked the data stewards what they're interested in, and organically, we have three working groups that has grown, and developed.

11:10

So we have one on fair and reproducible code, one on privacy, and G, GDPR, and one on engagement and education. And I'll just briefly talk a bit about each of those now.

11:23

So, full fair and reproducible codes. The idea is that the data stewards can learn about different tools and environments for managing software code, how to help researchers with writing, documenting, and testing code.

11:37

We've also talked about using Git version control and collaboration, and case. We'll talk a bit about Git and his part of the webinar.

11:46

The idea is that hopefully we can establish pipelines for researchers to work more seamlessly with programming languages, and how to really manage and write readable and reproducible code that can be then archived within our repository.

12:02

So one of the deliverables that we've recently achieved was a train the trainer workshop.

12:08

So we ran this in June on writing center and reproducible code, and we do have some data stewards who are quite capable and have a background in software engineering.

12:17

So they were serving as instructors during the sessions, whereas we had 12 data stewards who were then learning.

12:24

And the idea is that those data stewards can become trainers, hopefully, in the future.

12:29

And we wrote a blog article about that, if you're interested in learning how that process evolved.

12:37

And we also have a working group on privacy and GDPR.

12:41

This is where the data stewards learn how to best advise researchers who handle personal or commercially sensitive data.

12:50

So in this group, we talk about collecting informed consent, where those informed consent forms should be stored, how to anonymize data, and any kind of legal ethics or privacy related concerns that we need to consider.

13:06

one of the deliverables we're currently working on is a published set of guidelines about various data agreement.

13:13

So, this includes data processing agreements, data transfer agreements, joint controller, ship, and non disclosure agreements.

13:22

The Final Working group is a more general one on engagement and education.

13:26

So this is where the data stewards share experiences about their role more generally, but also can develop workflows and training to support research's with their data management. So we have a couple of deliverables.

13:40

The first one that we started to create was some kind of inventory of various different research data management training, generic and disciplinary that are currently being provided by those 4TU partner institutions. So the idea is that we can compare and cross-reference and share training materials among the universities.

14:01

And the second deliverable that we're working on is that we'd like to publish a set of disciplinary use cases.

14:07

So these are projects that the data stewards have been working on with researchers that really demonstrate discipline specific practices, as well as research data, management, challenges and solutions. So hopefully we'll be able to share that with a wider audience in due course.

14:25

And also focusing on engagement and education or training and workshops, so we do have to data train as a part of the community.

14:33

We have power here and we have arena and we also have a membership with the software carpentry since 20 18 and as the plum, one of the data stewards to you Delft is also a very organized and and she's really heavily involved in co-ordinating the carpentry. So these so those of you that don't know, ah, basic training on software and data management.

14:58

They were noted a full half days and we've been involved in conducting these workshops at Delft on 2020 and also Leyden in The Netherlands.

15:10

And also as part of training, we have a massive open online course on Open Science which is for researchers and support staff.

15:18

We also share news about the Research Data Netherlands Essentials for Data Support course, which is open for data support professionals.

15:27

We also share any opportunities or developments on disciplinary courses.

15:32

So the example here is a course that was developed about pi sin for geospatial researchers.

15:41

And this was developed by our Data Manager, Ashley, on a reset Software Engineer, Jose, who are from ... Digital Competency Center, and they've written a really nice article about the success stories and lessons learned of developing this course and then delivering it online during the pandemic.

16:01

So moving on to reward and recognition, I just want to now share a researcher story. So this is an example.

16:08

And this is a story about Nadya ..., who's one of our community members, and a researcher from Amsterdam, and this is her member profile from our platform online, and Nadia researchers, Tropical Cyclone risks, and using computational models. And you can see that she's actually provided a really nice bio here, as well as a link to her datasets and e-mail address.

16:35

Analogy with the Dutch Data Prize last year for a dataset that was published in 4TU research data, and the stage that underlies a peer reviewed publication in nature.

16:46

And you can see it has been viewed more than 2000 times, downloaded more than 4000 times. So when we hear stories of researchers that have these really high impact and high value datasets, we want to know more and really showcase what they do.

16:59

So we interviewed Nadya, and we published an article online. And then that was also further Publicizes in our social media channels.

17:09

And we also share that to me, to news on our platform as well, So this could be recognition of service over time and ... articles. So in this case, it's a testimonial to our data trainer, Alan Berber Cole, who retired last year.

17:24

We also share new community initiatives, So in this example here, one of our data stewards Santos has developed a data access committee that will align with 4TU research data and some of those datasets that are published under embargo.

17:41

We also share information about cross institutional collaborations.

17:45

So with this example here, we're working with the research from Eindhoven Mateus funk, who has developed a database for design data.

17:53

So currently, in this database, called Data Foundry, has 200 student design projects.

18:00

And the students work live within the database. But once the project is closed, and the data is complete, we're looking to find some kind of semi automated process by which they can push the data over to 4TU research data for long-term preservation and storage.

18:18

And we'll talk a bit more about some of our repository features, but we also share these on our platform as well as any news about vacancies and opportunities to join the team.

18:31

And also, under reward and recognition, I'd like to talk about an initiative that we've devised, say, myself. And our Communications advisor, Digital casella, which is called your Daily Dose of Data.

18:42

So every Wednesday morning, we tweet about a most recently published dataset from each of the partner institutions.

18:49

So you can see, we've got biomedical engineering data, mined haven, maritime engineering dataset from daoust.

18:56

And then, a geology geospatial data set some 20.

19:01

And just to show you the impact this has, here's an example of a dataset that was published by a PHD researcher from the University of Twente Lou Camerata, and he's published this dataset on fatigue detection and runners at the end of March here. And then we tweeted about it a few days later, and you can see the number of views as Pete to 70, and the number of downloads to 50.

19:24

So I think this is a nice example of how showcasing the community's achievements and datasets on social media can really help to raise the profile and visibility for them.

19:36

Then every month at the end of the month we have a feature in our newsletter.

19:42

Did you see these datasets?

19:44

So this showcases the top downloaded data set from the previous month.

19:49

So here's one that's from the University of Twente, from ..., and then one from Delft, and then a similar kind of example to show you here.

19:59

Here's the dataset that was published from Delft and we added it to our newsletter at the end of March. We then tweet about the newsletter a few days later. And you can see the number of downloads at this dataset is continuing to rise for for up to a few, few days afterwards.

20:16

And you can see now this data that has been downloaded more than 2000 times, almost 3000 times.

20:24

So I'm now finally going to talk to you about the two D Fair Data Incentives.

20:29

So we have a set data fund at 4TU research data, and this is open to researchers from those partner institutions. So Delft, ...

20:40

and the Cool Opens twice a year. So once in the Spring and the next call we have will open on the first of October.

20:46

This allows researchers from those institutions to apply for a budget of up to €3500, to be able to prepare and curate their data, an existing dataset, in a way that they can make it available online.

21:01

So the kinds of activities that this fund can actually support the implementation of metadata standards. For example, it might be cleaning up and documenting an existing dataset.

21:13

It could be the ... or aggregation of personal or sensitive data.

21:19

It might be moving from a proprietary to an open-source tool of software, or it could be visualizing the data in some way that makes it more re-usable and more understandable.

21:31

But also, not that this has happened recently with the corona virus situation, but it could be attending a conference, or talk about a third dataset on our repository.

21:43

And in the spring that we're really grateful, and that we could actually send seven researchers from those institutions.

21:50

And you can see they come from a wide variety of disciplines from urban design through to biomedical and health, all the way through to engineering and cognitive robotics.

22:02

And coming soon, hopefully, next year, will be an internship program.

22:06

So, this will be a global initiative, which we'll welcome, hopefully, interns from around the world to come and work with us on various fair data initiatives and projects. And the idea is that we can advance the mission and vision of 4TU research data set.

22:22

And, hopefully, on the other hand, this will allow those interns to learn more about fair data and the open science movement more generally.

22:30

They'll also have an opportunity to drive new developments. Learn and practice new skills.

22:35

Hopefully, they could create publications such as the white paper and attend comfort conferences, which hopefully can help to boost their CV, and they can learn about career opportunities within the third space, such as the data stewardship, data management roles, for example.

22:51

So, I think on that note, we're at the end of my part of the webinar. and the slides are available.

22:58

So, you can check out any of these links and I will stop sharing so that then we can go to the Q and A.

23:07

I think I should stop sharing now.

23:10

Yes.

23:11

Yep. Perfect. Thanks so much, Connie.

23:13

And yes, please feel free to put any questions in the question area, or the chat area.

23:22

And there's one around, and sort of recruiting members for your community. She had such a wide variety of types of people, Particularly with regard to researchers.

23:35

How did you recruit researchers to join your community?

23:40

That's a good question. At the moment, it's up to recruitment by myself reading through 1 to 1 engagement. So, it could be the fact that they've published a dataset, or they're asking questions about 4TU research data.

23:53

And I tend to speech them on a one-on-one basis, and talk to them about the community, and show them what we do, and also how they can join.

24:01

And, essentially, what they do is they, they join on our website, first, create a bio, and then I invite them over to our Slack channel, and then show them what kind of channels we have available to them.

24:12

In an ideal situation, going forward, it would be nice. We have some core members now, particularly the data stewards from the institutions that that could actually help to recruit members as well.

24:22

So, then it would, sort of, Yeah, open it up to more people.

24:26

So that's the ideal future, I think.

24:31

Thank you.

24:36

Looks like there are any other questions or anything on the track current?

24:41

Oh, one just came in. And could you tell us how many followers for your social media channels you have your messages looked to be effective?

24:51

That's a really good Christian, actually. So we have 95 community members, and we have 150 newsletter subscribers. And I'm just gonna see if I can check on Twitter.

25:01

I think we have a few thousand on Twitter, and, and just see if I can actually give you, um, an exact number.

25:13

So, we have 2 2429 followers on Twitter.

25:18

And I'm not sure how many followers we have on LinkedIn.

25:23

So it's quite nice to see the I think the messages, especially with the datasets we don't always get a lot of retweets and likes on Twitter when we post those.

25:31

But we do tend to see that they are viewed and downloaded, So obviously, people silently go in and have a look at the dataset, which is that, what we aim for, mainly, so, I'm quite pleased.

25:44

Thank you.

25:49

There's a question here about saying very interesting to hear that you have an open science officer. Could you elaborate a bit more on the role of this person?

26:00

So, they are the roles, as I can tell you, that from the University of Twente, and they run, they co-ordinate the Open Science Community in 20.

26:11

So they also run their own community, and, but I do have a bio for this person. I could share more information about them. Or maybe put them in contact, but, otherwise, I'm not sure that we have very different titles of job roles. And I'm sure they do a variety of different roles in research data management at that university.

26:34

Any further questions about that particular person, I'm happy to answer.

26:37

Obviously, they can always contact me.

26:43

Thank you.

26:44

And there's one more question about, on the statistics of one of the items that you showed, there was no, 2000 downloads, but the citations for a bit lower around five, and this has sort of any significance.

27:02

Or They're just asked to elaborate a bit on sort of the difference between us.

27:08

Those two figures: yet like that dress San Francisco. I didn't know also if case has got any comments on the statistics, but I would say that the datasets are often viewed and downloaded a large number of times.

27:21

But we don't always know whether, or not until they're cited in another publication or another resource, whether or not they have actually being We use the search and so I think that that downloads and citation metric is quite difficult to really gage on re-use. I don't know if case you have any thoughts as well.

27:44

Yeah, indeed, it is not always clear.

27:49

Yeah, what it means, a download, and how that relates to two, actually using the data.

27:56

Also, in my, in my presentation.

28:00

Later on, I will talk Rudner TTF, and opened up, and there are, you can also see that using the data does not necessarily require downloading the data, So don't you more query the data, like a database?

28:19

And so that is another way of counting, if you would like, see downloads, or use, or citation.

28:32

So did that is not always as clear as as you would like it to be.

28:42

Yeah, thank you.

28:46

So, there's a question about, Do you have any examples of researchers that have got a fair data? incentives?

28:51

How they've used it, come up with datasets are openly shared? Do you have any examples of researchers that have got fair data incentives? That's just been copies. We can say, Yeah, it's just, yeah. So do we have examples of researchers that have had the third day to incentive select the Fair Data fund, for example? Yes. So one thing that we've always done, 32 films with revamped two bit this year. so it opens into separate calls, But we have had them running in the past.

29:17

And one thing we always do is follow up with those researchers that get that grant to find out exactly what they've done with it and to make sure that they do follow through and obviously then publish it on the repository. But we will also showcase an article. Usually. It's a written article.

29:33

But what we've done this time with the community is also, we have regular meetings with them, So, before, we would sort of give them the fund, and then we would hear back from them maybe a year later.

29:43

But what I'm, what I'm trying to do at the moment is, brings the grantees together on, like, a quarterly basis, to find out how they get in on, do they need some support from data stewards.

29:53

And then we'll showcase what they've done, and they said dataset at the end of the process.

30:00

So, I'm hoping also, and that they will also serve as mentors for previous calls.

30:07

So if we have a researcher that applies from a similar discipline, I'll know, to put them in contact with, somebody previously, that's done something similar, and then we can help each other that way.

30:17

So I hope that answers the question, but we do have some examples.

30:22

and on the website, we have a page with their data from stories so they can go and check out and researchers to use the funds to either anonymize data will prepare data for sharing online.

30:36

Wonderful. Thank you. We've got two more that have come through, so I'll just ask those and then if any more come through, we'll ask them at the end as I just I guess, case time.

30:44

And so there's a question on the rewards and recognition who supports this in terms of funding is it like the library or the research assets?

30:56

Actually, it's part as well it's fun, it's under the umbrella of for TV Set, Status button so it's actually coming through my role and deirdre's role is Communications and Community Manager.

31:07

So it's, it's really interviewing them, spending the time with them and then showcasing what they do is we're class, this is rewards and recognition, but there aren't any other real monetary funds, it's actually within our budget at 4TU you, which is actually contributed by those partner institutions and member institutions.

31:28

Thank you. And then the last one is just for the Did you see these datasets cases that you include in your newsletters?

31:36

Have you considered spotlighting data that are not top downloaded items to sort of increase their visibility?

31:44

Yes, That's a really good point. And so with the tweets, they are recent dataset, so they're not necessarily top downloaded with a newsletter. That's exactly right. They are the top downloaded datasets. one thing we're thinking of doing, maybe, in the future, is having a more thematic newsletter. So rather than it be top downloaded, it will be around a specific discipline.

32:04

So, yeah, January could be about material science, for example. And then, so great could be about geospatial data.

32:12

So that would be how we could and not focus so much on the downloaded datasets, but more by interests.

32:18

So that's something that we're thinking of working on in the future.

32:25

Thanks very much, Connie. And thanks everyone for asking questions.

32:28

Please feel free to keep asking them in the questions box and we'll have some more time at the end to answer them.

32:35

And with that, I'm going to hand over to Kate.

32:43

Over TPS.

32:53

Thank you, Megan.

32:55

So I continue this on the more technical side of fair data, as we are doing at 4TU research data.

33:07

Um, to start with a little bit more about myself, I'm ..., I half.

33:15

I am currently working as a senior software developer, at 4TU research data, but I have a background in civil engineering, and I did, several years, research in several civil and coastal engineering.

33:32

I did my PHD at Theo Delft and after that I worked in both in University and also in consultancy a Data Manager. I did data management of various projects. Over the last four years, I worked as a data steward to do Delft.

33:55

So I offer the quieter for writing background in research and support, and it helps me now as a developer to to work on fair data services that will fit the needs of the research community.

34:18

So the topic of this part of the presentation that is more on.

34:27

Reproducibility, in the context of fair data.

34:32

I will be talking about the Net CDF file format.

34:36

Also related to opened up protocol to serve this kind of data and our facility to connect Git repositories with software codes 2. two fixture.

34:58

And I also will show a bit about our server architecture for your research data.

35:08

So when we talk about reproducibility, we can ask ourselves, if we see this picture where a guy in this kind of environment is trying to make something.

35:23

Well, if we translate that to research, then you typically can see that this is not the way of forking that you could reproduce that and create another thing that looks exactly the same.

35:42

I can rebuild it.

35:43

So this is typically what we don't like when we are working on fair data and reproducible data.

35:53

three, rather, I would like to, to simulate, is enabling the reproducibility by design.

36:02

So creating an environment and infrastructure that helps researchers two, to generate reproducible and fair data. And then, if we look into the roots of software development, then you will often hear about the term pipelines.

36:27

Also, when you talk in the database terms, then you often see extract, transform, and load.

36:39

So you could use those practices, and say, well, if you would like to publish a dataset, then you need to take raw data.

36:53

Somewhere, you need to collect the data. You need to do some processing in a reproducible way.

37:00

And you want to publish the end results.

37:03

So basically, wait out the bit out manual interaction, uh, along the way.

37:17

And that's typically what helps to make data better, reproducible and more fair.

37:28

So, also here, another visualization of a typical pipeline where you don't have the ability to to interfere along the way, So you put Rod anytime.

37:45

Also the Coach, let's say, the pipeline to, that does the work.

37:51

You can create a snapshot of those of that code and also deposit that in the repository.

38:00

And the final results, all of the data, so the process data, you could also put that in the repository that makes it like a good collection of the different elements that underpin the research results.

38:20

Well, one of the file formats that we have quite a lot of users in our community at 4TU research data audit is not CDF.

38:33

That is appreciation for network Common Data form.

38:38

Um, that CDF is typically designed to be structured and self describing and strip church that that means, or that.

38:56

You can see that it has dimensions.

38:59

So it is a multidimensional array data formats.

39:06

in this picture, you see a three-dimensional array, but it also can be 4 or 5 or more dimensional.

39:16

So you typically can put Larch aeration there.

39:22

You can have, you know, have variables in there.

39:27

And you can add attributes at different levels, and that also contributes to the self describing character of the files, that you can put unlimited metadata.

39:41

In the fall, So you kind of metadata on the file level, but also want to variable levels. So you can each variable. You can give metadata like, What are the units of the, of the data of the variable.

39:57

Also, all there are their alternative names, or other, uh, important information.

40:06

two, To make sure that the data can be used in a proper way and understood well.

40:17

So, A very big advantage, also, of Nat CDF is that you can query it.

40:30

Yeah, similar to a relational database, and it's just very suitable for archiving because it acts.

40:40

So, so it's, it's database alike characteristics also can be used online. We'll see that later on.

40:54

But it also will always be available also in an archive space.

41:00

So you don't need a server running to have this characteristics of this data for months to be able to query it and to take particular slices of the file.

41:13

So you don't need to read the complete file that you can read particular slices, which makes it efficient also for larger files.

41:30

So that brings us to the open dub protocol, which is a way to interact with ... data over HTTP.

41:44

So, you can browse the file online, You can see the metadata, as you can see in this screenshot.

41:52

Or you can see various global attributes, which are the metadata, and also variables, including some metadata on a variable level, so you can easily take a look at the file and see what's in there.

42:13

And then, a very important, there is a data URL in there.

42:20

Uh, and with this data URL, you can query to file and read metadata, or read the data, either a fool.

42:35

Variables are the contents of the full variable, or a slice of a variable, and then this data URL works exactly the same as you would have when you download the file and read it locally from your, or from your computer.

42:59

So this also allows you, if there is a need, maybe because of performance reasons, or because of the lack of internet a few, I would like to work offline, then you can download it and do whatever you want.

43:18

Badu.

43:19

She'd say, well, I just would like to be able to take the latest version.

43:26

Then you can use, there's this URL, which is always up to date.

43:38

So when we look in our fixture instance, then we see here how it looks like.

43:45

So we did a preview of the file, there is a link to our sure for recall that opened up. After the protocol.

43:55

It's searched and that CDF data with and also in the metadata box, we see the format mentioned and we see the data link to this very same surfer.

44:14

Um, but the preview, we actually did a trick because originally fixture shows it like this.

44:24

Where it says files stored somewhere else, we'd rather like to say it can be accessed via the link, and especially the note below. We didn't really, like, because it is actually stored in our forward to your research data repository. So just might confuse users.

44:44

So we created the workarounds to show our own message and link to data in this way.

44:59

So, that brings us to the next topic about gifts.

45:04

So, in fixed share, we have currently this integration, where it or Does connection, which we get, particularly Git Hub.

45:19

Recently, there were developments to also make it similar, connection with other good providers, being good luck, and big buckets.

45:30

So, we are currently in the testing phase, and it allows you search to publish code that is stored and maintained in a Git repository.

45:45

It's just common practice for software development to make a snapshot of that code in the, in effect, share repository.

46:00

And that is what we very much encourage two, to make sure two half older, real versions.

46:11

So to say, that are used in the research. Do I have Those Snapshots?

46:18

Pinpointed.

46:20

So here we have a nice example from our repository. This is a toolbox and analysis toolbox.

46:30

Well, it's several elements in there.

46:34

We see in the metadata that, in the references, there is a link to the original GitHub repository where this code comes from.

46:43

So this is a snapshot from this particular kit's repository.

46:47

There's also a link to the, to the documentation of this Toolbox.

46:53

So that also contributes to the fairness of this data.

47:00

Uh, well, of course, there is a license, which is a copy from the license that was selected in indicate repository.

47:11

Um, that's not shown in this slide here, but there is, there are several related datasets.

47:21

So there there is using this toolbox.

47:25

There was the, the research that was, that was done using this toolbox.

47:33

The the resulting data both in that CDF and in interstitial realizations was also published in our repository.

47:45

So that makes it a nice combination of software and data coming together.

47:58

Then, a bit about our server architecture.

48:02

Basically, our front ends it's a combination of three elements where, of course, our fixture repository is?

48:16

It's the core.

48:18

Um, but, uh, in front of the stricture repository, we have our own resource proxy server.

48:27

Uh, and that allows us to better integrate our fixture instance with our portal page that's also commonly referred to.

48:40

And also our documentation in an hour info paige's.

48:48

Uh, and it also allows us to do some styling on the on the fixture in instance adding well, our own logo and color, Also huffing our menu bar.

49:07

It links to these different elements, like the portal, and the documentation and the community page.

49:18

Uh, and also adjusting some content. Like I just showed.

49:24

If there is a message in one of the datasets, that, from fixed shared a tree, uh, don't quite like them or you can adjust that to make it more suitable to our audience.

49:42

Now, here is an example, or the difference between the original Fix Share page showing our datasets.

49:54

So, with the fixture logo and so on and the colors.

49:58

And this is what we might in our own domain.

50:03

So, with adding the logo, adding a bar with several drop-down menus, uh, we also adjusted the footer a little bit.

50:17

We adjust the colors to our own house style, orange.

50:23

So that makes this, this, the look and feel, step three, that we envision.

50:34

And that brings me to the end of my presentation. So, dorris room for questions.

50:45

Wonderful, Thank you. OK.

50:47

Then, there was one question which has, I mean, probably perhaps been answered Solly by your server architecture side, but it was a question around the changing of the workflows, and how do you change that text? I don't know if you have anything else you wanted to add or if it's the answer is yes.

51:08

You had answered it after they asked the question.

51:13

Yeah.

51:14

Yeah, so, so, what we do by offering this reverse proxy that they set, all the traffic, all the HTTP traffic that comes in to our site, we can, we can see that, and we can, uh, for instance, what we did is free.

51:34

Add some extra CSS for the styling and some JavaScript for also for the styling, and that allows us to add a bit of our own content to the the existing fixture, HTML and JavaScript. So that may exist.

52:00

It is a bit, you could consider it as a bit hacky, but it's it works.

52:09

It allows us.

52:13

In this context, too, create our own look and feel.

52:23

Thank you. And they sent through, a follow up question is the underlying fix your site blocked from public access, and if so, how?

52:34

I'm not sure, but all right quite understand what question is, could you repeat it?

52:43

And they just asked, is the underlying Figshare site blocked from public access then? If so, how?

52:54

Well, yes, the, so most of the datasets are, are public access.

53:02

Yeah, so you can, you can access straightaway fire fixture or fire our own uh, a URL. That's exactly the same.

53:16

The only difference is that if you access directly fricke Sheridan, you get your search results might include also outer other datasets when you access our own instance then that is a choice, two includes also non 4TU research data data sets in your search results if it answers the question.

53:46

OK, thank you.

53:50

Do you have a recommendation for fair data generation and planning for qualitative research projects, or the projects you work on fully only quantitative data?

54:03

Well, I have more, more experience with quantitative data, Uh, but for qualitative data, I think it is even more important to, in an early stage two, to plan.

54:22

What to collect, how to collect, how to store, and how to handle, I think, often, also, it, uh, qualitative data, you can do some statistical, they use some statistical methods, but then, in a, in a different way.

54:46

Then the quantitative data, I think, uh, yeah, try trying to ultimate, where possible.

54:59

And documenting, well, what is like the manual work that.

55:11

That are important things to take into account here, and especially throughout.

55:18

The research, rather than at the end tried to reverse engineer what you did.

55:29

Yeah, thank you very much.

55:31

There's a question that might refer back to your presentation, Connie's you mentioned in your slide set about 8000 datasets have been published.

55:42

Do you have an idea and growth rate concerning active data publication by T researchers, and what the yearly growth, volume, and storage, is?

55:52

We do. Certainly. We do. Yeah, we do certainly have the statistics, and we also have a strategy online that does actually indicate the breakdown of growth over the years.

56:04

And we also can provide statistics based on the institution, and the number of datasets, and the volumes. And we don't, I don't believe we make that we've got it available in the strategy, and we can make it available upon request as well, to our partner institutions.

56:21

But at the moment, I don't think we actually share that anywhere else, case Dewey.

56:28

No, I don't think so.

56:30

And I think also, over the, uh, past years, I think D growth rate might have increased and might increase a bit boring in the upcoming years.

56:46

Also, because of the increased attention for open science and requirements from the different partner institutions, or the researchers from the fund are supposed to have a bit or a lot more data published and publicly available.

57:05

Yeah, I think as Kay said, with the policies, and also even even in the pandemics, I think in 20 20 there was quite a significant rise in published datasets because I think a lot of researchers were taking time for administration and actually taking the time to publish datasets during that period too. But it has been an incremental rise, I think, since the repository was established.

57:28

We also have, I think, I've added them in the slides. So people should be able to access.

57:32

We have a highlights and document as well, which actually does showcase as well, where we said, we'll be updating that yearly, but that showcases the last year with regards to the number of datasets published.

57:46

So perhaps this is something we can think about making available online.

57:55

Thank you.

57:56

And this is a question for case the use of open depth is interesting. I spent a great deal of time uploading CDF files to our instance eviction, you saying that offering the open that link to users will be helpful because they would not need to download all those big files. I'm trying to better understand how to use open data.

58:19

Yeah.

58:19

So we have for some datasets we also have normally uploaded to fixed share showed there.

58:28

People can download it, But we usually do, instead three, sometimes four single net CDFIs that are not too big query.

58:39

Sometimes decide to put them in fixed cherished right away, or asked for bigger collections, or larger files, three, rather, command to put them on our opened up server.

58:55

Uh, yeah, showed it. That makes it easier, too.

59:02

To crawl them, so you can use this opened up server to crawl all the files.

59:07

There are also possibilities to aggregate the files.

59:11

So then it's, for Julie Bee hives as one big file veris. It can be a large collection of smaller files.

59:25

So there are nice features in there to two X Florida files and to select particular slices from the files, without downloading.

59:42

Also, what we have, we have some datasets that are growing over time.

59:50

...

59:50

files are added, uh, and in the same directory.

59:58

So if you have a script they're running that crawls these directories on the server, then it always can take into account the latest available data rather than having.

1:00:15

now downloads of particular files and then manual downloading to include newer data.

1:00:29

There's one follow up question, is can say, that all will ask, ask where we're at time, But.

1:00:37

Would suggest: no question related to that: would you work with the researcher to select the slices when using Open Data?

1:00:46

Sorry, what I work with researchers too.

1:00:51

to select the slices and when using Open Tab, this is.

1:00:58

So, when the file is in there, then yeah, you can, as a user, you can see how big the variables are.

1:01:11

You came and can make selections on whatever your needs.

1:01:16

So, there is no involvement needed at Arden, uh, for instance, a training on the use of that CDF or that involves and creating that CDF to get a better sense of the type of file and reading that CDF and then how to how to handle it.

1:01:40

But yeah, it's full self-service, so to say so, there's no need for involvement from us to pray define any slices or so.

1:01:54

Thanks for clarifying that.

1:01:56

And, yeah, So that's, that's all the questions. And thank you very much to anyone has a question.

1:02:02

And I'll also send, although the links that Connie shared in the chat, I'll follow up with the recording, the webinar, So you have access to those links as well, and such sees me to say a massive thank you at planning case for presenting this afternoon.

1:02:20

Really informative and really interesting. And thank you all for attending, and have a good rest of your day.

1:02:28

Thank you, everyone. Thanks, Megan, thanks, case.

1:02:31

Thank you.

‍

View transcript