Play the webinar
Play the webinar
Register for the webinar
March 1, 2022
Dan Crane, Joao Neves
In this webinar, Dan Crane (Research Data Manager) and Joao Neves (Senior Application Analyst) from King's College London (KCL) will present their implementation of Figshare as a data repository and migration from their previous repository including mapping, processes, and scripts using Figshare's API.
Please note that the transcript was generated with software and may not be entirely correct.
0:04
Hi everyone, thanks for joining today. My name is Megan Hardeman, I'm a product marketing manager at Figshare.
0:11
And before I hand over to Dan and Joao from King's College London to talk about their Figshare implementation and migration and just have a few pieces of housekeeping. So the first is that this session is being recorded and we will send the recording around to all registrants following the end of the webinar.
0:29
If you have any questions, There is both a question box and the chat box. So feel free to put your questions in either place, and there's some time at the end where Dan and I will answer them.
0:43
Yeah. So, feel free to put those in anytime, either at the end or throughout and we will answer them at the end.
0:49
So, I think that's everything for me. I'll hand over to you now, Dan.
0:53
Thank you very much, Megan. And hello everybody. And I can't see who's here. Hello to everyone who is. So, yeah, my name is Dan. Just like we're really happy to be able to do this today. We were going to come and talk at Figshare Fest last year. But we have to consider short notice, so we're really glad to be able to do this now. Yes, my name is Dan. You can see our names and contact details on the screen.
1:15
I am the Research Data Manager at the King's College London, in the library, seven libraries, and collections, as part of the research support team. And I'll let Trout introduce himself when we can come to his bit.
1:29
But yes, I'm going to talk about the first half of this this presentation talking about kind of the background of asking Figshare at kings and a bit about our implementation.
1:36
then I'm going to hand over tissue out, took by the really clever stuff, which is migrating our datasets into it to get us up and running. So OK, I'll get started, and you can see on the screen there, the address of our, of our site ... dot com if you want to have a look while we're talking.
1:54
But Yeah, I'll start off with just a bit about King's College London, for anyone who doesn't know us as it says there. Yeah, we're a large research intensive university and we've been around for almost 200 years.
2:05
We have five campuses in London. I'm speaking to you today from ASAN London Bridge: Next guy's Hospital.
2:14
Those are our student numbers and staff numbers, and it's their postgraduate research students and the academic research staff that we're we're Focusing on with with Figshare. So there there are users here. And as you can see, that we've got nine faculty, schools, and institutes, and they cover a broad range of, of research. So yeah. So we're really covering all bases in terms of research output, and the things that we research here at King's.
2:39
Just wanna briefly mention King’s strategic vision, so you say the university's vision research is really obviously high up on the agenda. They're aiming to do the things that it says on the screen. And the library evolution strategies is closely tied to that. And open NIST is kind of fundamental to our evolution. Strategy, and open research is really important.
3:00
So, again, that's, that's where we're coming from in our, in our support for for open research.
3:06
And the last little bit of organizational information is just to say a little bit more about our team in library collection. So, it's the research support Team.
3:14
We support research data management, through the research data repository, through data management plan support, and all kinds of queries and questions that come to us throughout the research life cycle team also supports scholarly publications. So, yeah, doing a lot of work with open access and that pure publications, repository, researchers, profiles, E theses and so on. So, lots of work there And we also support copyright copyright questions come to us too. And that's our website and you can go and find out things that we, we do that.
3:47
But, yes, we're here to talk about Figshare at Kings.
3:50
Figshare at King's is our research data repository to say we have a separate repository for publications, Figshares for data records because we like a nice acronym, starting with Kenya Kings. And so this is King's Open Research Data System, and it gives us a nice, snappy word to use when we're talking lots and lots about it. So yes, we thought we'd give it short and memorable name.
4:14
So I just thought this is probably fairly obvious to many of us here today, but just wanted to say why we have a research data repository.
4:20
So, we want to provide long-term storage, and preservation for, for datasets produced, by research here, want to share data, to provide a showcase, and a catalog of our datasets.
4:33
And why?
4:34
Because we have a commitment to supporting open research. We want to make our research as open, and transparent, and understandable as possible, and meeting those requirements. And policies from funders, publishers, and the institution itself.
4:49
So we did have a research data repository, rather. We have had 1 since 20 16, and that was developed in house back in 2016. And …. knows more about that than I, because he was, he was instrumental in it. So, it was a system, which was free media to deposit. So researchers would come to our team.
5:08
We would create a metadata record, we get the files from them, we'd upload them, we'd create links to them to provide access via SharePoint.
5:18
The records will be published via a library catalog, discovery tool, so that was MF and Primo at the time.
5:25
So it, it did a job, but there was a cross kings group from, from the library, from IIT, from Research Management Innovation Directorate got together, and agreed that there was a need for a new repository to improve our offer. And speaking to researchers, there was an all round appetite for a new system that they could use to.
5:44
So, we said we looked around all the options and we chose Figshare for the reasons that we can see on the screen there.
5:53
So it's, you know, we know it's well established, is well used, and it's well known, semi researchers knew about it already, and we could provide that self deposit route. So researchers can create their own data set records, deposit their own data, that was really focused on them, and easy to use.
6:10
It provided that now showcase for, for our research and the faculties, where we could drill down to those and make those, those keywords and research data being discoverable accessible excitable.
6:21
So yeah, we decided in 20 19 that's that's what we wanted and we began procurement. So procurement I think takes a long time wherever you are and it's a very thorough process. But we also had a pandemic start in the middle of that process.
6:34
So, it took us a couple years to get to get that done, but it was done in mid 2021, The contract was done, and then we could pick it in implementation.
6:46
But, we then had a very hard deadline of September 2021 to get the site configured, to get our datasets migrated in, and to be launched ready for the new academic year. And that was because we had, in tandem with, with our rollout Figshare, we had a new library management system being put into place.
7:07
So we're moving to Alma and the way that our datasets for surface to the old system wouldn't be available anymore. So, to have that continuity of having our datasets available, that, we'd already published, we needed to get them into, the new system, has been available in Figshare. So, we haven't really hard deadlines, which, which, with the team at Figshare led by Kay Lino worked really well, and we thank the team for making that as as streamlined as possible, but it worked well.
7:34
In terms of our implementation, we are pretty straightforward. I think in that configuration, we're not doing anything too different, we have one level of groups, and nine faculties is a group, and in the, in our setup.
7:46
Here's the rows are quite simple, to our library team, our admins and reviewers.
7:52
We've got curation or review of datasets on, we're using Figshare storage. We've got a custom license, we've got some custom metadata when we change some of the footer information.
8:03
We're not using the most sophisticated publishing, restricted publishing options, other than embargoed. And our creation yeah, his by a single sign on and yet now datasets are discoverable through our new primo interface. So that's that set up again, and I think that's pretty straightforward. And that's good for getting us up and running quickly.
8:28
Other than the technical aspects of getting Kent Figshare in place, we also took the opportunity to review processes, governance, and support, as it says there. So we worked with our research governance office information complaints from IT, to make sure our processes were up to date, Particularly in terms of handling data that's generated from research with human participants.
8:50
We, we do a lot of work with NHS partners and without NHS partners on kind of health and biomedical research, just a lot of research involving participants, so we wanted to make sure that we gave as much guidance as we could to researchers when they came to deposit, what things needed to be done, to be able to do that properly. So yes, so we reviewed all of those things, We wrote some guidance for researchers, and that's on our web pages. That you can, you can go and have a look.
9:19
So the implementation, yeah, was whisper setup, and configuration was, was done. And at the same time, we looked at this dataset migration, and I'm going to hand over to show in a SEC just to talk more about that.
9:29
But I just a quick summary of it there, we had about 90 datasets, requisite datasets that we wanted to migrate, approximately a terabyte of data there. They already had DOI's.
9:41
Some of them were restricted access, so we wanted to make sure that we, we mirrored all of those conditions when they were migrated across, we have some work to do to sort of check and tidy up I records before they were copied across, at to think about how to map our metadata Figshares.
9:56
We also went through sort of a courtesy process, really contacting the owners of the datasets to say, We were moving, and we'll let them know when they're available in the new repository.
10:05
And we also had to agree a kind of cutoff date for when we couldn't accept anything in a new repository before, we got the new and up and running, and they've got staff researchers, what they could do in that, in the meantime.
10:18
So, that's, that's, I guess, yeah. Preparing the ground for dataset migration, and now, I'll now hand over to Joao to tell you what, what he did, in order to make it work.
10:34
Hello. Hello everybody, my name is Joao
10:38
I've been at Kings for 10 years, working side, where we live resistance, I have a lot of experience.
10:47
I'm a senior IT analyst reaching yet. I have a lot of experience with data migration.
10:58
To any system, and so basically I'm going to show a little bit some slides but also to try to show hands-on, OK?
11:18
I'm not sure whether you're seeing my screen.
11:23
I can see a chat.
11:25
Alex's OK.
11:27
Very well. So.
11:32
We at King's have several.
11:35
The IT department is has three main departments, governance which looks after all aspects regarding maintaining the systems.
11:50
All the legal parts, et cetera.
11:53
Then we have the IT solution, the IT services team, which looks after our systems. And then we have the IT solutions teams, which I belong to.
12:04
And we're supposed to be implementation of new systems, ..., bonds, etcetera.
12:12
That's a little bit of our IT, I'm going to talk about project age. I'm not sure whether this.
12:20
When you sit on my screen the goto Webinar is showing on top.
12:24
I'm not sure it's showing for you guys, I hope not, but, basically, my presentations divided into five groups, and I'm gonna spend most of the time showing a little bit of what was done to move data from one system to the app.
12:44
So, I'm going to start by the project aims, what, what we, at heart, wanted to make sure that we didn't lose any data.
12:55
There were no service disruption to users, and we had the big, big constraint then that was that we had to have that completed before we went live with a new library management system, and this was, this was imposed on us, shake state and it was very complicated to have everything done.
13:18
Um, We, we made it, so our previous solution was based on SharePoint. We have around 90 datasets, which is not very much, as you will see in the demo.
13:33
But we had close to one terabyte of data and that took three days to move across from the existing system, SharePoint to Figshare.
13:47
And it took a little bit of work to get there. Especially the data streams.
13:55
Not complicated, but down a little bit, Labor intensive, our previous solution was more or less, like, as Dan said, there was mediated.
14:06
So users did not have access to this disease, our previous solution.
14:14
Let's see, OK, basically, composed of two, They call it Lists in SharePoint, their tables, where, basically, you can place where you can put data. And, we basically had two major tables, one with research datasets, and the other one that was used for controlling the publication of these datasets, because the solution that we implemented also get the meeting of the noise at the datasets that were Atkins. We also allowed for external get, the sense, of course.
14:50
But, basically, this was what that the library system dance team work with so that they receive the data sets from the polling, from the researchers. They would describe them here and then there was a process that automatically move this data on to the platforms that would expose it externally.
15:17
Those were the our library system at the time which was college and the landing page of the DOI's was in Primo.
15:27
So, we have the management process, which is described here, where the left side is the mediated bit. And I've just showed you where the datasets were published. And then, when ready to be transported to be made public, they will just check one box in the list of the research that that data set. And then it would enter this automated process that was published mean DOI's and publish the data sets on to the outside world.
16:03
So, this is the previous solution.
16:06
Um, now, I'm going to talk a little bit about the migration, there has three big statutes.
16:17
That way, I think it makes sense.
16:19
So, the first one was to export the metadata, dataset, files, the binary files from SharePoint, So, we had to use some tools to do that.
16:29
And, to, to then import them, go to the second stage, which was to import that metadata on the binary files dataset files to cheat sheet.
16:41
And then at the end, because we already had GOI minted, we needed to change the URL at the landing page, so, that would reflect, she shared page.
16:54
So, these were the three processes, that we had in place, to be able to do it, to do the migration successfully.
17:04
So the tools we used, because we were using for chic shape, we're using more or less open tools, SharePoint, appropriate charities.
17:16
It's far less open.
17:19
So, we, we use available tools to do the migration, to export from SharePoint.
17:28
We use the toolkit from Microsoft, call integration services, SSIS and which with some toolkits, some, some, some controls, specialized to work with web services. And also with SharePoint.
17:47
To import the feature, we use the Figshare API and the same tools.
17:53
But we split the link, the import, the metadata from the employer binary files. And we use the API using two different tools to import the metadata that created that script in SSIS. That will generate the metadata in the format that we want it.
18:16
And I must, I must.
18:20
Make a note that we didn't want to lose any of the metadata that was added in SharePoint because that takes time.
18:29
So we created new fields and she checked to accommodate some, some metadata, some descriptive metadata that we had on the datasets and in order to tweak that the SSIS Toolkit, it's a bit more flexible. So we'll use that the SSIS Toolkit to generate the metadata in future articles.
18:51
And then we use a shelf squared that is provided by.
18:58
She chose to use digital signage API teach to upload the binary files, because I felt was the easiest way to do it.
19:07
And actually, I'm just going to then provide an example of how to using a very simple Bash script to import data, import something into she shared, actually, I'm going to try to demonstrate that during this game, we needed a lot of this space, because you have one terabyte of almost one terabyte data. So we have to request disk space internally for this work.
19:39
And then the last edge.
19:42
And then we have that intermediate database table.
19:46
To store some results to make sure that we lose the links between the articles and the binary files.
19:56
So going forward, I'm just going to show you a little bit of the tools that I was able to create. Because the API is very flexible, Very easy to use. I works very well as well, I didn't have any issues. I tried some tools that were available, but they didn't really do the job.
20:19
I actually didn't.
20:20
I tried one of them and I couldn't work working.
20:24
So, decided that I'd just going to, I tried this, the Bosh Script.
20:29
You will find, so I went with that and then I decided that I would create adopt our SSIS scripts to use the API from Figshare. That's literally I'm going to be showing now.
20:44
So, you know, that when we're trying to develop something, you spend most of the time, I'm not sure what the audience here, but if you're working in IT, you know that you spend 90% of time testing and 10% developing.
21:00
So basically, what I've done, I've done, I've done several small tools to help with the import process of migration proceeds. So the first tool I created was a mechanism to generate to convert.
21:19
The date that we had in this particular list to a fixed share so that we could import the description, it scripted metadata on to the system.
21:34
So these are the properties that we used to have lemons check so that you can see the description that was done on the datasets. OK, so we wanted to port this to fix ****.
21:47
So, basically, I'm going to show you my interface here.
21:53
Uh, this is the Figshare, a staging environment.
21:58
If you look at my date that, I only have two datasets here.
22:03
Right now, These are the datasets, and I hit the publish button, so I cannot be done. So, they will be, they will remain there.
22:11
But what I'm going to do, I'm just going to import that list into the system.
22:18
And so, if I run this pro, this procedure, this is an exercise procedure that basically works with data, kinda like, looks at it, you can transform it, and then export it to an external system.
22:32
In this case, I'm using the API to directly add a file the, each, each one of the records, metadata records, to share. So it's running.
22:44
It takes a little bit to fetch the data, OK, bear with me a second.
22:52
Shouldn't take much longer.
22:56
Basically, why it's going, what it's doing now, it's fetching data from SharePoint and processing it. It's not starting. It's still starting, OK? It's just the initial ID. Sorry, it's the problem with live demos. So basically, fetch the data, it converted it. Now it's adding it to Sheikh shoe.
23:16
So if we go back to share, and if I click on My Dad, ALC that I'll have a bunch of new records which are the ones that were in our previous research data management system based in SharePoint. So here are all the datasets that were.
23:39
In our previous system I can show you the metadata and basically do a preview of this item as they would be shown to the public if I would publish them.
23:49
And this is basically did this teacher that did this interface now.
23:56
The other tool that we would like to show you just to basically, I want to demonstrate how easy is to create these toolkits. It's Because my colleagues from The Library, Dan and some colleagues, they asked us to change some of the metadata, was not display incorrectly Figshare. So, we wanted to have a mechanism to repeat the conversion we needed, so, I've created another tool.
24:30
Very simple. It just goes and deletes all the data sets in my area.
24:38
So, when I run the script, it basically goes to all the, to my area and deletes all the data sets. Are there?
24:47
OK, so, if I go back here and click on my data.
24:59
All the datasets that I've just imported are gone so this would allow me to have a clean environment each time I wanted to do a test or an iteration of that migration process.
25:10
So these are two simple tools that are using the ... SSIS Toolkit with the API to allow the addition of metadata, articles, and deletion of articles.
25:25
Now the binary streams, I use this shell Bash script. And this was good that was provided by Digital Science IP. I just used it, which is very good, because it saved me a lot of time, and thank you guys, Digital Science, for doing that work for me.
25:45
And so, what I thought is that I couldn't resist sharing this, which you guys, so if I've created, actually.
25:56
Aye.
25:57
An area in GitHub, where you can download, and I will, I will have this, here, my presentation.
26:06
And, so, what I'm gonna do is, I'm gonna download this code voted zip file.
26:14
I'm going to copy this zip file to OK, contents.
26:21
OK, I'm going to copy this to a directory that I have here, that is shared with the virtual machine that I'm running.
26:30
Linux on my computer. I'm running a virtual, bought a virtual Linux box here.
26:38
Add, This directory here is shared between the, my host computer and the virtual box.
26:46
And so, what I'm going to do is, I'm going to edit, This is file here, because one of the things that you need to do, on this particular, with this particular demo. It's all explained in the files.
27:03
There's no need to go add, Generate.
27:10
Sorry.
27:14
Added this file here.
27:23
OK, and you need to enter your token here.
27:28
So, what I'm gonna do, I'm gonna go to check ...
27:35
Environment, to my area, two applications.
27:43
And, I'm going to generate, create a personal token, OK?
27:48
demo, OK? I'm going to call it Demo.
27:53
Oh!
27:54
Let's call it Live Demo.
28:01
And it generates this bunch of numbers, which is the token.
28:04
I'm going to copy it and go back to that area and paste care, right.
28:11
Save it.
28:13
Now, the way the script runs is that, if you run it on the host machine, of course, you just found this small screen, and it picks up all the files that are here.
28:26
And at, basically, it imports this binary file and generates a record.
28:33
I'll just open it.
28:35
So, if you want to add any file is just come here, tell where the file is.
28:42
What is the file description, and this is the title that is going to be given to the procedure, to the, to the record, when it's imported into Figshare.
28:56
That's an example.
28:59
I can just go here to my Linux box, and if I run the demo.
29:15
OK.
29:18
That's the only thing I didn't test, OK, That's good, because it's taking a little bit too long.
29:25
OK, it's not running, Let me just check that I've saved.
29:30
Now I did Save it.
29:46
It's not going to run, because it's not. It's probably this, your virtual machine is not accessing the Internet.
29:53
OK, I'm gonna try it, check.
30:07
Yeah. It's not accessing.
30:08
I have a problem with computers, I didn't test that, I should have tested, but anyway, trust me that This will run and I'll, I can run it when I'm Offline.
30:20
Um, that, you just don't have access this virtual machine doesn't seem to have access to the Internet. There you go name, Name, resolution issue.
30:30
But basically, I'm sorry about this guys. If you download and install it and run it locally, you'll see that it runs fine, I promise.
30:40
Um, and now going back to my demo.
30:47
Uh!
30:49
There are issues though.
30:50
We have one big issue with share. Or there were some issues when we were doing the implementation. I believe that, I should also talk about this, who are also benefits, of course, but they should talk about this. So there were problems with the allies when we were important to metadata, but these were fixed, OK? Some of them were fixed by Digital Science.
31:19
These are, we solve the issues and the incubation use you use use with.
31:24
ACL, sure, Because we have our, our application is done through ...
31:31
Azure AD, and now there's a serious issue that we have that we need to mitigate somehow is that the platform Figshare does not scan files that are over 50 megabytes.
31:45
I mean, virus scan, This is a big problem for us because someone moll intended use. It could upload.
31:56
virus to platform, we're mitigating it by making sure that our researchers do virus scan on the files before they are uploaded.
32:09
But I would strongly recommend that this is done at the by the platform that no file is a molecule that to be active before: Fires candidacies, recommendation.
32:27
Thank you. I'm so sorry that. I wasn't able to show you the Bash script bonding guy.
32:33
You can download it to test yourself. I can guarantee you.
32:37
That's the way that we actually uploaded all binary files Yep, and that's it. That's it for me.
32:48
Thank you both very much. And please do feel free to put any questions in the chat or the Question box.
32:59
Sure, we'll be happy to get to them.
33:00
I also wanted to mention that when we send around the recording, we'll send around links to everything that was mentioned today, so that you have access to the script or anything like that.
33:14
I'll start maybe with a, oh, there's a question here, do you use two factor authentication to mitigate security issues?
33:24
Well, you can use this two factor authentication, but it's not, basically, it doesn't mitigate issues, I don't really, I don't know if I understand the question. We do have two factor authentication, but it's external of its ... on our authentication platform, which is Azure AD?
33:46
If that's the question. Yes, we do. I don't know what, what. What is meant by mitigating issues?
33:55
I'm sure that will have answered the question.
33:58
Now, do you have long term preservation strategies for your data and Figshare?
34:07
I'll relay that, want to, Dan.
34:12
So, that's our next big, big piece of work, was released relation to Figshare and, and generally at Kings is to get something in place that has a proper long-term strategy. So, yeah, something we're working on.
34:27
Great.
34:28
….
34:30
Someone's asked if you could clarify, clarify the virus vulnerability. And just the, I guess, just said there's a threshold. The file size, which, items, and make sure it will be scanned in. Anything beyond that doesn't get scanned.
34:45
Um, so, I don't know if there's anything else you want to add, but hopefully, that, yeah, that's basically, what happens is, the platform only scans files up to 50 megabytes.
34:56
I'm not sure if it's 50 or 100, but it's two-way size. And the reasoning behind this is that it would take too long to scan larger files.
35:07
But still, I believe that the platform should do a scan regardless of the size of the file. If it takes two hours, it takes two hours. That's, that's, It is what it is.
35:17
And it would mitigate issues, because at King's, if the researcher is using necessarily a computerized built by IT, The computers, they are built in such a way, that it's very difficult to have buyers. because they are, they have, again, to fire as you cannot. Starting to turn it off. So, it is complicated for, but the problem is that they can use devices to upload stuff, especially these APIs, they can use. Any research computer that has a very old operating system that can cause problems.
35:53
I'm strongly, I strongly, halt that the, digital science, that's in the future. I know that the big platforms at the player, SharePoint didn't need to fire scans before 2018 as well.
36:11
So, it's, it's something we can see in the future, hopefully.
36:18
Yeah, definitely. So.
36:19
I mean, that's something that we would look at as part of a feature request. So for anyone who doesn't use ... currently, we have a feature request forum that institutions using Figshare can submit feature requests, and this will be one of them so that we can look at implementing a larger threshold in the future. That's something that we can accommodate.
36:38
Um, so, yeah, from our side, follow up.
36:45
Yeah, There's a question about, would you have to be running Linux to run this migration tool that you've developed?
36:52
Well, I didn't develop guys, OK, Don't give the credit to me. It was done by digital signs. Now, you can run it. The API, actually, I showed you, that I've used Windows, SSIS, the Microsoft Windows SSIS as a tool to import the data. This particular script is is what's created in in Bash, which is Linux, but you can, you can have Mochi Windows as well if you install, if you install it. They are. There are open-source bosh interfaces Or the script could also be adapted to use PowerShell.
37:30
OK, you can do anything, and that's the beauty of using APIs.
37:34
You, you just need, if you shoot your issue, if you're knowledgeable about that particular technology, can adapt the API to use that knowledge.
37:47
Thank you.
37:50
There's a question about how long your previous data repository was up and running beforehand, it was running from 20 16 onwards until last year, OK?
38:06
Thank you. This is not a question, but it's a statement Kay has joined us, and she just wanted to give you a shout out for being such a great team to be working with ... UK here, Yeah, I think the pleasure was mutual.
38:28
Very young, complicated project. I would say, imperatively some other work that we have had in the past.
38:35
That's great to hear. You migrated one terabyte of data to cords. How much total data do you hold now and cords And is there any limit per research dataset that can be uploaded?
38:47
I'll leave that 1 to 10. Thank you very much. So, that's a very good question in terms of how much we currently have said we've probably added.
38:57
I'm guessing, right, top of my head, but, but, but maybe a few. A few more gigs, Not a huge amount so far, in terms of volume of files. In terms of the limit whipping, we're being a little kind of suck it and, see, in terms of how much we allow. So, we have a 25 gig initial capacity or initial allowance that we gave, everyone, and we say, come to us, if you want more, And we're really, going to see what, what people are going to ask for before we impose a limit. But, but yeah. So let's see how how people want to use it, what the request, and then we will look at putting in some other limits in future if we need to. But we're very much encouraging people to come to us now, and boop, boop, have discussions if we need to remind me then. Well, let's say, what's our total that we have for Kings right now?
39:47
We have Regarding our tendency.
39:50
Yeah. So what did the amount of capacity that we have? Yes. Yeah. For that with a current contract, it's 100 terabytes.
40:01
Yeah.
40:01
We have, we have lots to use.
40:10
Are you able to elaborate on how searching of the datasets is conducted by a Figshare, or is this additional spoke?
40:19
Well, I can elaborate on the bit that makes the datasets discoverable.
40:25
regarding future, I don't know exactly what is meant, but what we do currently sell the Figshare has ... interface. And we, we, we, we, we collect the data.
40:39
With our current pre movie implementation, which is our discovery platform, and we just pull the data into pre movie without pull it to the catalog. We live with that way, way way, is we just pull it for the terms of indexing into pre movie, and they are searchable through pre movie.
40:58
The other thing that we noticed is that the Eli's because they are they are meant to a dataset that the sites also also pushes the data and makes the record the datasets discoverable.
41:15
Discoverable automatically, so if you search for a particular data set, you'll find it on the web because they just type has asked that for us as well.
41:25
I don't know how Figshare works sorry.
41:31
Yeah, I can elaborate that as well on that as well, in terms of just the standard search functionality.
41:38
So every institution that you just take share has a portal that has a search tool within it, so you can filter by certain facets, and then there's free text search as well. So that comes as standard and anything that is done bespoke. Sort of on top of that can be done using the API. And some institutions have done that already, and in terms of sort of indexing and things like that.
42:07
So I items uploaded onto Figshare, indexed in Google Scholar and Google Dataset Search.
42:13
So, findability beyond and outside of the repository is is there as well as sort of a standard BEM.
42:23
Yeah, tool as well. I guess that makes sense.
42:28
And as part of that as well, we're encouraging people to click the links to the applications. And vice versa say that we recreate and there's links between different research outputs from the same piece of work.
42:42
Are you promoting the service internally in any way?
42:47
We are. Yeah, So we had, we had to, I guess, I guess we did. We had launched in September when it first went live. So we e-mailed around all of the, you know, the newsletters and user groups to say that it was now available to people. And we did and we continue to go and talk to people say, wherever we can get on a research faculty meeting or any other kind of meeting presentation will go and do 5-10 minutes, tell people about it.
43:12
Absolutely exponent is. It's a constant process to think of just reminding people that it's there and getting their attention At the at the time when they're they're ready to use it. I think so. Yes, we are and we continue to do is going to be an ongoing process.
43:27
Thank you. Last question. How does the script deal with your custom metadata?
43:36
Well, this script does not feel so basically.
43:40
what happens that I have an SSIS procedure that generates the, the metadata.
43:51
According to how we decided to describe it, basically, and then that data is imported into an article.
44:00
And if you are using the API, the binary data, that script that imports binary is going to attach the binary files to the already existing article, a Figshare, that's how it works.
44:14
There's no, basically, you have to pick when you, when you're defining the metadata, how the records are, to be added the Figshare. You have, you need to know exactly how they are defined, and then after that, you just import them.
44:32
The way that you have to describe them is you have to use the proper descriptors.
44:40
There are, you find, when you are changing.
44:48
The metadata structure, she shared, right, that's how it works. So you're basically fine structure, then you accommodate your metadata on that structure, and that script just did that.
45:02
I think we haven't 4 or 5 additional fields from the standard metadata that we had. Then, I don't remember that. I actually think you did it.
45:11
You guys did it.
45:15
Yeah, that's another thing that can help just within the team saying, I paste these custom fields.
45:23
All right.
45:24
Thank you very much, and Yeah! Just thank you for all your questions. And thanks very much to Dan's outflow of presenting this afternoon. Really, really interesting, and, like I said, the recording be sent round shortly afterward, and it'll include any of the links that were mentioned today. So, you can follow up, if you'd like to know a bit more. Thanks again, everyone. And have a great rest of your day.
45:48
Bye!
45:50
Bye. Bye.