Play the webinar
May 4, 2022
The Figshare API provides access to all metadata in a repository and you do not need to have experience with programming languages to use it. This webinar will go over basic use cases using the API web interface and also demonstrate a few examples of using Python scripts to expand those use cases.
- View record metadata
- Search for records
- Create a new record or update one
- See all users
- See all groups
- Create a user and add rolesScripting
- Get all records for a search
- Get all records from an institution
- Upload a file
- Get all metadata
- Create network graph of authors
Please note that the transcript was generated with software and may not be entirely correct.
Hello everyone. Thank you for joining the webinar today.
And if you're watching this later as a recorded version, thanks for clicking on the link.
My name is Andrew, I'm a product specialist with Figshare.
I'll be conducting the webinar today and my colleague Megan is also here to help field any questions at the end put some links in the chat that might be useful.
I think we're gonna get started, I am planning on keeping this within 30 minutes.
Might go a little bit over, but it should be a nice, quick webinar.
So we're talking about the Figshare API, specifically for librarians or those who are helping researchers manage and work with shared outputs.
I have a couple of slides here to start off with.
The first thing I want to do is just mentioned some upcoming webinars. So next week, we have a webinar on using the State of Open Data Survey to put the NIH Policy on Data Management and Sharing into. Practice going to be a great one. And keep your eye on the chat. Hopefully, Megan, you can put that. The sign-up link in there, if you want to sign up for that and haven't already. And then later, this summer, doing a webinar and reporting and statistics and Figshare. And then, you know the rest of the year. There'll be more webinars, for example. We'll be doing a webinar on the API for researchers, among other things.
So, keep your eye out for all of those.
The audience for this webinar, as you guessed, is, are our librarians, but really, you know, anyone who's working with researchers and repositories, specifically at the ...
repository, So, if you're an administrator for a Figshare repository, if you're a librarian or or data librarian who's pointing researchers to Figshare dot com, probably find some things useful here. And also if you're just interested in a Figshare API, this, there'll be a great webinar for, you know, API experience or coding experience is needed.
I'm going to try to cover the basics and also, you know, just caveat. I am also not an expert, mostly self taught. It's something I enjoy doing. I like using it in my own work to make my workflows more efficient and to just get more out of out of the Figshare platform. And I hope you see some examples that you can use as well.
So, an API is an Application Programming Interface, basically allows two systems to talk to each other and as a human, we can use that too to do things that we might not be able to do otherwise in a general user interface.
So, specifically for the Figshare API, it enables functionality that you can't do within the admin interface or your, you know, user account interface.
It allows us to automate common reporting or maintenance tasks, and you can look up specific repository information for your repository, for example, like information about the groups in your repository And ultimately, by chaining together, API calls on these endpoints on API endpoints, you can work with many records. You can plot tons of information. You can create really sophisticated or simple workflows.
At this point, you might be asking yourself, what is an endpoint?
So here, this is an example of one. It's a URL. And it's basically programmed to provide a set of information.
When you visit that URL.
Sometimes you need to provide some information. So in this case, it accepts an article ID.
You can see the actual endpoint here.
And it sends back this little screenshot here that I took from my browser, all of the metadata for this specific article, formatted as JSON, then at that point, you can do whatever you want with that, read it, copy it, use it in another program.
Do you want to make a note that, in the Figshare documentation for the API?
Today, it refers to articles. And this is kind of a legacy term.
But article is the same as a record, is the same as a Figshare items.
I'll try to keep, keep that clear. But if I mentioned any of these terms, it's all just referring to that the basic, you know, the metadata with a DOI that has zero to many files.
Now you can get very sophisticated with the API, and we have, we have clients who have done that, so the University of Arizona has created a really great curation workflow using.
The API Basically creates a readme file for the, the researcher as they're inputting their metadata, and we have a webinar on that.
So that's really cool. It's called Data Curation in the Cloud with the Figshare API.
The University of Sheffield has done an incredible job, just creating a fully customized, front end for their repository, and I took some screenshots here.
They also have really interesting stats pages that I'll just pull information from the Figshare API and then present it in the way that they want, but we're not talking about those examples today. We're talking about examples for the individual.
So, some things that you might be doing and these are these are all things that I've needed to do, so, harvesting metadata records in general. For example, data associated with theses and dissertations did it.
I did a little, my own personal research project, recently, and I was just kinda looking at metadata completion there.
You might want to create a table of views, and downloads by item slash record for a researcher.
You might want to change the file preview for your record, or, you know, if you were running a Figshare repository, if you're managing it, you can use the API to upload a controlled vocabulary from a CSV for custom fields. So you can get a really long list of values in there. I'm going to show you examples of these last three today.
We're going to be using, primarily the documentation site for the API. And this is because you can actually interact with the API through this site. So quick tour, on the left. Well, the the web address, you can go to it now is docs dot Figshare dot com.
On the left side are all the various endpoints that provide various pieces of information or accept various information.
And the middle describes the endpoint. Some of them allow you to interact directly with the endpoint. And it provides code snippets to use if you'd like to use, you know, some scripting. The right side provides an example format of what's produced by that endpoint.
And it also, really importantly, provides a example of if the endpoint requires information from you. It tells you what format, you need to send that in.
If if there's ever you like an error, it's usually because I didn't format something right.
Quick note if you're doing this on a laptop, you might need to zoom your browser out to show all the panes that's on this, on this page.
So keep that in mind.
Uh, I want to mention that we do have a help page for the API. That is, it's a work in progress, so we're still developing it. It's not comprehensive, but it's mostly just there to say, like, here's how you can start with the API. Here's some things you can do in the Docs documentation page.
There are also links, though, to some scripts that you can use as a kind of a seed for your own scripting. And I'll show you some examples. This is a screenshot of one that I'm going to use today.
Actually, some of them you can just run directly in Google Colab.
So as long as you have a Google account, you don't need to worry about downloading, you know, Jupyter Notebook, and Python, and all this stuff.
You can just run it, using Google Colab, which is really terrific.
Some of them you do need to download just because I'm not super sophisticated.
No Python person. So I didn't figure out how to upload files from Google Drive, for example.
OK, so with that, I'm going to do some live demos. And please, you know, bear with me something you never know when something's gonna go wrong. And if something does go wrong, it's most likely just my, you know, user error.
But I'm going to spend a little bit of time just kind of showing you around the documentation site and demonstrating some of those end points, and then at the end, all show you two examples of scripting that hopefully, will inspire you to do more.
So, first of all, uh, I just clicked on a link, It's taking us to the documentation site, specifically to get article details. So once again, we have all the endpoints on the left.
Information in the middle, and then what comes out of that end point, or what you put in on the right side.
This, we want to see all the metadata for an item.
I just want to Figshare dot com. I'm going to click into search. We'll just grab an article ID, or a record ID, or an item ID from a recently published item. Let's see here.
The list looks like we have a bunch of journal contributions.
So this one, so the Article ID is in the URL, it's the last number of the URL, and I'll copy that, and I can just paste it right in here, and hit Try, and it'll bring up all of the metadata for that record that we just visited on Figshare dot com.
I notice it has information about the custom fields, and I'm sorry for all the scrolling here, but I want to show a couple of things, uh, it'll have author detail. It have embargo, information.
Really it's the full metadata record.
Uh, Alternatively, you can search for metadata records in the API, and these were return a different set of information. So, we have two other endpoints here.
Um, public article search, you can see what, how we can search for articles.
And if I click this panel over on the right, it will pre fill that search box for me. I can open this up a little bit.
And then, scrolling down here, this is what's returned, and you'll notice it returns a lot less metadata.
So, if you wanted to, say, search, I'll do a quick search here, this is where I always run into errors. You know.
Mistakes Because I am, no, I mean, a comma in there, something that shouldn't be there, so it needs to be formatted as.
OK, yep, so they're going to deal with that last comma, so, it's going to search for, you know, 10. It's going to be 1 page 10 results searching for.
I'm gonna change this to frog, Try that.
OK, so we've got 10 results here, and Frog is somewhere in the metadata for all of these. It's just a minimal amount of metadata.
If I wanted to retrieve the full set of metadata, then I would use this ID with the, the first endpoint I showed you, and I can iterate through all these records and pull out all the, all the metadata for wanted to create a script, that's kinda how you can chain together these endpoints too retrieve the information you need.
OK, so that was some basic working with article metadata I want to talk about creating a new record and updating one.
To do that I'm gonna use a sandbox version of Figshare. So you've seen, other webinars I've done, I'm usually usually using this sandbox, Aber College.
Um, it has, some, I've already logged in as a, as an administrator.
No matter what your privileges are, if you're using Figshare dot com, if you are at a university or institution that uses Figshare, your account will be able to create tokens to use with the API.
So if I I went to, I should just mention that. Went to applications from my little user menu here. Scroll all the way down. And I can create personal tokens to the API.
We have an API documentation for the sandbox. It looks exactly the same. This is it. It's just a slightly different URL.
And you can see, I've put in a token up here in the top left, that exists on the other documentation site to just put your token in there, and you can start manipulate manipulating the records that are part of your account, or that you have privileges to use.
So I want to point out that there are public article endpoints.
So these are for obviously anything that's publicly available. And on the repository, Then, there are a lot of private endpoints, So I'm looking over on the left side of the screen here.
Every public article has a corresponding private version. You can make edits to the private version.
Nothing will show up publicly until you you either publish it through the user interface, or you publish it using the this publish endpoint.
If I wanted to create an article, it's really easy. Here, I'm at the Create an article endpoint.
Once again, if I want to, no, get the right format, I can just click over on the right side, and it'll pre fill all this stuff.
And let's demonstrate that this, this works so I'll delete this, comma and uh, follow-up as well.
Great, it gave me a response code of 201 which means that it did create this article called called “Test Article Title” and we can see that in here, let's go back to my account in Figshare.
Um, and there it is. So it doesn't have any metadata, it doesn't have any files with it, it's a draft. I can fill in the rest of the metadata here, or I can upload files.
If I wanted to do that, I can also update articles through the API. Maybe, maybe it'd be useful for you, or easier for you than going through a user interface.
But I wanted to point out some of these really simple tasks.
Another thing that you might want to do with an article that you own or is in your repository, and you have privileges, if you're an administrator to manage, is change the preview image for the item.
So, on my sandbox here do have this article, the ... Fjord and Ice Shelf Aerial Analysts.
It's the previews this picture of a, you know, a rocket or something. I don't know what this is, but it does.
You know, the real thing is it has this nice aerial image in it and I want the aerial image to preview to do this.
I'm going to well, I'll go to the endpoints first.
There is an article version, private updates endpoint.
And I did the little drop-down here, and it has two options.
I can version the article, and I can also update the thumbnail for that record, And we need three pieces of information.
We need the article ID, version, and the file ID, and note that the file ID actually needs to be in JSON format. This is where I usually mess up, as usual. But here's the format. So I just click there and it pre-fills that for me.
So I'm going to just quickly grab the article ID and put that in here, and to get the file ID, you just got to view the file that you want to be the thumbnail.
And in the URL, you see now that there is a file ID, copy that.
That here, I happen to know, this is version two.
I put two in here.
OK, fingers crossed that I did everything properly.
Great. So T 0 5 means that it it worked, supposedly.
So let's see if that, actually.
Great. So now it previews the correct the image that I want to preview. So, nice and easy.
Notice, I didn't have to do a publish API call or go through publishing to change the thumbnail, just does that for you.
So a couple of other things, and these are going to be specific to those who have access to a Figshare repository, and our administrators, or have some privileges in the repository.
I'm going to just talk about how you can gather some information about your repository and add a controlled vocabulary list.
So there's a whole institutions' endpoint set here and you can get information about your institution. So I can see the the ID for favorite college. I can use this ID with some other endpoints to pull out, you know, a bunch of records that have been published since a certain date.
And it's really easy just to, to confine those to my institution using this ID. So, that's useful.
I can also see all the accounts that, as an administrator, I have privileges over So, I'm at this Private Account Institution Accounts Endpoint.
We can see the, the format of what the information looks like when it comes out of here. And, there are ways we can kind of narrow down what we're looking for. I actually have narrow this down. And I'm getting users with an ID that is greater than or equal to this value. Just so I don't have a huge list of users from this repository.
So if I try, we get the user accounts here.
This is useful to get the, uh, the ID here, account ID or the user ID, um, depending on what you need. So the ID is the one that owns the ID that you'd use to find records that are owned by this person.
User ID is associated with there, the public Records.
You can also get this information from your Administrator panel in Figshare. So I'll just show that, too, just to see the corresponding user interface version.
So, well, I got into administration.
Users can select all these users and download the user report.
And, and that looks like this.
So once again, we have the account ID, the user ID, lots of information about them, even more so than the than the API.
So, multiple ways to do that.
But it can be useful to see if the user's here and just grab the information quickly. Kind of along the same lines, you can see all the groups that you, as an administrator, have rights over. So, I just went to this Groups endpoint.
And, importantly, we can get the ID for a group.
This will be useful in a few minutes, or something else.
Just check my list here.
So, see, users see groups. You can do some. You can create users.
You want to think about this carefully, because creating a user outside of the system that you usually create users with, like a single sign on system or an HR feed.
Means that you have to manage that account. Now, outside of that, you have to manage it manually.
But, it is possible, once again, it's, you just fill in this information here, and it'll create the user for you, and when that user logs in with their e-mail, it will then prompt them to create a password.
one thing that can be very useful, I think, though, is if you have multiple groups in your repository, and you want a user account to be a reviewer, say for multiple subgroups in here, but not everything, you could manually go in, configure the group, choose a user, roll, and, you know, add them as a reviewer.
You can do that for every group.
You can also just do it automatically all in one fell swoop through the API.
So, one way to kind of speed up, you know, workflows.
Um, so I've gone to this Add institution, account role, group roles, endpoint, it looks a little cryptic over here just a bunch of numbers.
This is the information we're going to send to assign roles.
But what we're looking at here is this is that group ID that I pointed out in the, and that other end point.
And these values are the role ID, and we can see that here. So private account institution roles.
And I can see all the roles.
Oops, sorry, clicked out of it, could see all the roles that I could assign someone to, depending, if they're in the right place, in the repository, And here's the ID.
So I can grab this information and I can no, all in one fell swoop setup or an account as a reviewer across multiple, uh, groups.
OK, so I know I didn't do a lot of demoing there. I was just kind of showing you around. Do you want to show how you can upload a controlled vocabulary list or a custom metadata field?
So let's see.
I'm going to go back I'm going to create a custom metadata field in this theses and dissertations group.
So configure, scroll down, this is a relatively recent addition to Figshare.
I'm going to call this language and I'm going to just have it set for items.
Now we've added this drop-down large list option, and so I'll just put in language for this required field.
There's some notes here about what I need to do now to upload a list of languages for this field, When I click save, now, there won't actually be anything in the field, and I will allow multiple selections here so I can save this configuration.
I've come back to the API.
So, if I scroll up, custom fields, values, files upload, OK, here's this endpoint.
It says that I need the custom field ID, OK, I don't have that. I don't know where to get it, but I can look at this endpoint that's just above that.
Returns the custom field in a group and that returns their ID.
You can see that example over here, but I need the group ID, OK. So now we gotta just go and check our list of groups so I can try it here.
There's theses and dissertations, triple 9, 5, so I'll copy that.
Go back up to the fields, custom fields.
End point, Say I've entered it before, And right now, we see this custom field that I've added. So I can grab this ID.
Go back to this file upload endpoint, put the ID in here.
I have created a CSV previously of languages, just a few languages, I've uploaded that.
A note, you can only do, I think, one upload per every 60 minutes, so if, for some reason it doesn't, it doesn't work.
I won't, I won't be able to try again, but it did work, it says the messages, OK. It can take awhile for the list to upload. Well, but it will show up eventually.
So we can doubt, it'll show up this quickly, but, I'm just gonna go back to my data.
I'm opening up a draft item.
I can select the thesis and dissertation group.
that means that the custom fields will now show up for me in this form.
Let's see, Language, doubt at all, have shown up yet.
Oh, great. It actually did, so I can put that in.
I think there are some, that probably need several letters to it will show you multiple options if you have things that start with the same three letters. I can select the language and now that's selected.
Great, I'm glad. Glad that worked.
OK, so, uh, the next thing I wanted to talk about was just some of the examples of scripting.
So how can you take a bunch of these endpoints, put them together to make your workflows easier, to get the information you need out of the repository?
And I've previously opened up our Help site, or Help page, how to use the feature API.
As I scroll down, there's a lot of text here. So, sorry about that, But there's a lot of good information.
There are some examples of, like, how you can you don't need to use scripting. So there's some links there and some short descriptions. Some things we've already talked about.
Some things for possible repository managers. Then examples that use scripting.
I do want to mention that some of these are now basically obsolete. Because Figshare just launched a brand-new batch metadata tool, I'll just show you where that is.
And the admin panel, if you're an administrator, you'll have a batch management tool here, and we'll do a webinar on later this year, I believe, just to show how that can work, where you can see it's powerful. And grab some or all of your metadata, edit it, add new stuff, and then re upload it with files. So really powerful.
Make some of these scripts, a little obsolete, but you never know when you might want to customize something. So hopefully these are useful. The one I'm going to use is, all the way at the bottom.
For administrators, repository administrators, the very last one, get a table of an authors, items and collections with views and downloads.
So useful for researchers like, know, gee, if you want to encourage people to continue using your repository, you can send them this table and say, well, this is your most popular item and downloaded this many times.
You can open this example, Jupyter Notebook.
And as I mentioned before, it opens in Google Colab. You need a Google account to run it.
I'm actually going to create a, um, A copy, just so I don't mess up, This is the one.
If you go to this link, and I encourage you to do it right now, You can just run it right in the browser. But I don't want to mess this one up, so close that. So I'm going to copy of it looks the same. So we import the libraries.
I actually am going to search for my own views and downloads, I have my ORCID ID here.
We set the base URL, to the API, base, API URL.
And we see here that we can search by username, or sorry, just author name, but it won't disambiguate people with the same name. Could be useful if you know that the person is in your repository. And you can put your institution ID in and you can be pretty certain You'll only get their records. I'm gonna keep scrolling down to retrieve metadata by ORCID.
So I will go in here, put my work it in.
And for just getting this is a I have a Figshare dot com account.
So I'm just going to put a zero in here, but it says to do run that.
This next block visits the API and searches for that orchid and pulls back, returns the records. So it says, I've collected three metadata records like and blows up a little bit. Or go.
Then I'm going to convert that. It comes out in JSON, so pythons nice. it will just convert that to a data frame. It's the spreadsheet.
Then I want to visit the stats endpoint, which by visit the Docs page again, and scroll down, whole stats end point, we can get the totals, or views and downloads from this endpoint.
So I will do that.
It takes that ID from the metadata returned earlier, visits the endpoint for the both views and downloads, and pulls back, and it pulls out the number.
So, great, it created a data frame with three rows.
I have two data frames.
Now, I'm just going to merge them into one, And it gives me, it actually shows some of the metadata.
So it's just the basic metadata, and then all the way over on the other side, we have the views and downloads in two columns.
If I were to have collections, there are some blocks of code here that will gather the views and downloads for those collections, sorry, scrolling.
But let's keep going.
Just run those if you have collections.
Now, I should actually start even more scrolling.
The dates do come out as JSON and they, they're just formatted, you know, in this kind of funny way, within a cell.
Kind of hard to read, so um, this next block here will format the dates for you just to make them more readable.
We'll click that.
It says date's split out and merged.
And I can see a summary of my views and downloads. Really give me myself an ego boost there. And we can also see what those dates would look like.
Scroll down enough.
So it's added these other columns after the views and downloads with just the dates format, and more nicer. Then we can export that into Google Colab or if you've downloaded this, you can save it as a spreadsheet and send off to somebody.
So, a quick example of how you can, you know, gather some statistics information by item, which you can't really gather from like a profile, it just shows aggregated views and downloads in the user interface.
You can also borrow this code, you know, and make it your own to gather whatever information you need. So, hopefully, that that would be helpful.
The last thing I want to mention is that you can do pretty sophisticated things.
And it's it's not that hard as just so, to show you an example. I am in now my own Jupyter notebook that I have on my machine here.
And what this is, is a scrolling up. It's a way to visualize an author network.
So if you don't have a CRIS system at your, your institution that can, like, show you, like, how all the faculty are connected through research at your institution, if you do a good job of putting everything in your repository, the way Figshare’s set up, you can pretty easily create a network graph.
And I've already run a bunch of this code.
It's basically using Carnegie Mellon University's Public articles, and specifically looking at the non traditional research outputs, so like Figure's, media datasets, posters, and things like that.
And, um, once again, all this scrolling, Basically, what it does is it pulls out the author information, creates pairs of authors, then counts up, those, those pairs to create some networks. And this last block of code uses Pi this.
And I just did some Google searching here.
I didn't, I'm not a not an expert here. And if if it all works out, it's going to load up.
What I think is a pretty impressive network graph, it does take a moment, though to do that. So we'll come back to this in a moment.
Um, as a kind of ending point though, I'm gonna go back to my slides, all that loads, if you do have questions, please e-mail me or get in touch through info at Figshare dot com.
I hope that you've been no inspired to at least explore it. The Figshare API, give it a try.
Maybe you've seen how you can retrieve some information for your institution or for your own Figshare dot com account, it's very powerful.
And I think there are a lot of really, really exciting ways to use it, And so I'm always excited to hear about how others are using the, the API.
So please be in touch with any questions or comments, I'll come back to here.
Of course, it's taking a long time, probably my, my machine is a little a little overloaded right now.
Know, what, I'll just show you the screenshots I took for that loads.
So, what this creates is a an interactive network that you can zoom in on various parts of, of this network and see the name of the node.
And so, you can see, kind of, you know, who's at the center of a lot of collaboration, In this case, I know it's hard to read.
It says, Future Tennant gallery, like a really interesting art, um, uh, initiative. That supported, I guess, a lot of artists and made a lot of art available, but you can see there are other kind of nodes throughout this network as well.
So, it's fun to visualize, and I didn't know it was going to take this long to load.
Thanks, that's maybe maybe we could go through it.
Cool, thank you, even I learned something. That's good.
So, thank you.
And there's a question about, is it possible to edit published datasets?
So, we'll come back to this in a second, so, I'm back in the, the documentation.
So, you can edit published datasets just through your user interface, um, even if it's public.
So, I'm looking at all the datasets or records I own, this one is public, it's green here, but I can go in and make changes. Whoops, sorry.
Make changes here by clicking on the little pencil thing, and I can, you know, maybe change it out on a poster.
Scroll all the way down, do publish changes.
It will then go into review, so now it's a green circle and it will have to be re reviewed Ivor view turned on for this platform.
You want to use the API to do that.
You can use the articles, private articles, because you wanna make changes to, can only make changes to a private, article or record, and you can update that article.
So, you can send any of this information to the record, for it to be changed.
When you do that, it'll change it, but it won't publish it.
So, you'd have to do this, and you'd have to hold on to the item ID and goto private article publish, but that ID in there, and hit the trie button.
And, of course, you can do that through scripting.
I hope that was what, hope that answered, the question.
I think so, but, if not, just, kind of follow up next to them.
Doesn't look like there any other questions but looked like maybe Yeah, yeah, This is loaded so we can zoom in and, uh, you know, explore it.
Basically we can see some people who who seem to have a lot of connections here, Um, I don't know there are probably way much more interesting ways to to use this and build this. But I just wanted to show an example of what's possible with how Figshare stores author information and record information.
So thanks for bearing with me as that loaded, and I think we'll end it there.
If you do have other questions, please be in touch.
Thank you, everyone.