Google Scholar discoverability

This article is intended advice for the Figshare for Institutions product.

Discoverability of content is vital for success and one of the most important scholarly discovery portals is Google Scholar.

As part of our quarterly meetings with the Partnerships team at Scholar, we have the opportunity to ask questions, follow their advice and answer some FAQs from our customers, both on the Figshare and Scholar sides. If you have an FAQ that is not addressed in this article, please email info@figshare.com

 

How do I get my repository content indexed by Scholar?

You don’t need to do anything. If your repository has content that is of interest to Scholar, it will be picked up in due course as soon as publicly-available items with appropriate URL markers (outlined later) are on your repository.

 

How long does Scholar take to index new repos?

From the point of the first appropriate items being publicly published, you should start to see partial indexing within 2 months, with increasing coverage from there on in. If you don’t start to see partial indexing within 2 months of submission, please contact info@figshare.com

 

Do mixed content repositories affect indexing? 

With the Figshare setup utilisting URL markers, mixed content repositories will not affect indexing whatsoever, no matter the ratio of content within your repository.

 

What are URL markers?

URL markers are a new feature added to Figshare where the item type of the object is added to the URL to enable more effective indexing e.g. https://figshare.com/articles/journal_contribution/Long_noncoding_RNAs_in_B-cell_development_and_activation/12602240

 

What content is indexed by Scholar?

Scholar indexing is managed by item type. These item types are the types of content of interest to Scholar:

    • Poster Journal contribution
    • Conference contribution
    • Preprint
    • Presentation
    • Thesis
    • Book
    • Chapter
    • Report
    • Standard
    • Monograph

 

Do cover sheets affect indexing?

The stance of Google Scholar is that they do not recommend cover pages.

"We don't recommend cover pages of any length. They can and do break identification of documents as scholarly articles."

Scholarly articles have common structures. The indexing system depends on these structures to identify a document as a potential scholarly article. Cover pages make this process hard - which is the reason why they interfere with the identification of scholarly articles.

If you would still like to use cover sheets, the way Figshare handles cover sheets is the optimal way for indexing. The cover sheet is prepended at the point of download by the end user and is not served when the file is accessed via the citation_pdf_url which is what is utilised by the crawler.

 

Sitemaps and metatags

The setup of metatags and sitemaps including sitemap size have been set up in cooperation with Scholar as outlined in the inclusion guidelines. This setup has been approved as fully compliant:

    1. Include specific meta tags in the HTML pages of the records; currently, the mentioned item types have the following tags included in the page: citation_title, citation_publication_date, citation_author, DC.identifier and the repository-specific CitationDissertationInstitution.
    2. Provide a direct link to PDFs: this is included in the citation_pdf_url meta tag, similarly to the tags above.
    3. Provide a sitemap: Google advises on using the sitemaps.org standard and Figshare provides a sitemap for both its free offering (https://figshare.com/sitemap/siteindex.xml) and each customer portal. The sitemap is linked in the footer of each landing page.

 

If the item type is in the group name, will that help with discovery in Scholar?

No, this will have no impact at all. There is no need to consider this.

 

Scholar is directing users to the PDF from the title link. What is going on?

The default happy-path behaviour for the Scholar indexing system is that it wants to use the HTML page for the title link and to link to the PDF for the access link.

It will revert to using the PDF for the title link if it finds too many errors in the metadata  and thus can't "trust" the metadata on the page. These errors are overwhelmingly publication date errors.

If you are experiencing this problem, please contact info@figshare.com and we can work together to resolve.