Figtionary

 

Glossary of Figshare & Open Data terminology

 

 

Figshare data model and concepts

 

Bulk - Data is available in bulk if the entire dataset can be downloaded easily and efficiently to a user’s own system. Conversely, it is non-bulk if one is limited to getting small parts of the dataset, for example, if you restricted to a few elements of the data at a time and therefore require thousands or millions of requests to get the entire dataset. The provision of bulk access is a requirement of open data.

 

Category - One of the compulsory fields users need to complete in order to publish the item with a DOI. All 1,238 6-digit Fields of Research codes from the Australian and New Zealand Standard Research Classification are included as categories. Multiple categories can be included but at least one must be included in order to publish.

 

Collection - Allows users to group together relevant content from within Figshare into themed “Collections”. These Collections can be publicly shared with their own URL and view counts to gauge popularity. When a Collection is published it generates a DOI (digital object identifier) to provide more stable linking. These Collections can be updated and edited by the owner at any time to include new versions of research and new research outputs.

 

Item - A Figshare entity that has its own identifier such as a DOI and can contain both metadata and associated files. Figshare has extended the item to encompass public files, embargoed files, confidential files and linked files, and has also introduced the concept of metadata-only records.

 

Embargo - An item under embargo is an item that is being published at a later date. This date is decided by the item owner in advance in the item overlay. This option is commonly used by authors when they know that the item cannot be made accessible to the public, until the publication day of the associated research, due to the Journals own policies.  

 

Confidential - A confidential item with file(s), is a public item with no public file(s). This option is used when the research outcome has sensitive data and cannot be released to the public. Confidentiality can be set only after files have been uploaded. Confidentiality can be revoked by the author at any time. If the item was already public then the removal of confidentiality would generate a new item version.

 

Metadata-only - Similar to confidential items, however the research is too sensitive that the author does not to upload the file(s) but can choose to release the metadata.

 

Linked files - A linked-file is used when the research data is located elsewhere. The author is able to indicate the URL where the file(s) are located. However Figshare is not responsible to verify the authenticity of the URL or to maintain it.

 

Fileset - A fileset represents an Item which has multiple files attached. A user can manage the files from the Manage screen that allows ordering, downloading and deleting files, as well as adding new files.
 
 

Figshare for Institutions -  Is a cloud-based (see SaaS) repository designed to allow academic organizations to store, manage, and publicly share their research outputs.

 

Figshare for Publishers - Is a cloud-based (see SaaS) repository built for Publishers to visualize and host large amounts of data in their articles.

 

Project - A collaborative space that allows you to share your research and collaborate with designated members. You can have multiple collaborators with different access restrictions. There is no concept of versioning within a project, and there is no DOI for the public space.

 

Private sharing link - A private link can be sent via email and the recipient can access the data without logging in or having a Figshare account. This feature was designed for blind peer review, so the page that a private sharing link leads to is anonymised - it does not include the Author field or any non-figshare branding. It is important to note that these links expire after one year and should not be cited in publications.

 

Users - Anyone can become a Figshare user by signing up to figshare.com, a user can create Items, Projects and Collections. The user does not have a public profile page until they have public content.



Other useful definitions

 

API - Application Programming Interface -  Our application programming interface (API) is a set of routines, protocols, and tools for building software applications, helping to automate researcher workflows. The Figshare API allows you to manage your figshare data (push data to figshare or pull data out), create collections out of public content or build applications on top of the functionality. Our API is fully documented.

 

App / Application - A computer program designed to perform a group of coordinated functions, tasks, or activities for the benefit of the user. The Figshare desktop uploader allows quick and easy upload of your research outputs, straight from your desktop. All files are uploaded into your private space on figshare, where you can choose whether to make them public or manage them privately. More details can be found here: https://figshare.com/tools

 

Big Data - A collection of data so large that it cannot be stored, transmitted or processed by traditional means. The increasing availability of and need to process such datasets (for example, huge collections of weather or other scientific data) has led to the development of specialised computer technologies, architectures and programming languages.

 

Cloud storage - Data stored ‘in the cloud’ is handled by a hosting company, relieving the data owner of the need to manage its physical storage. Instead of being stored on a single machine, it may be stored across or moved between multiple machines in different locations. The hosting company is responsible for keeping it available and accessible via the internet.

 

Copyright - A legal right over intellectual property (e.g. a book) belonging to the creator of the work. While individual data (facts) cannot be copyright, a database will in general be covered by copyright protecting the selection and arrangement of data within it. Within the European Union separate ‘database rights’ protect a database where there was a substantial effort in ‘obtaining’ the data. A copyright holder may use a licence to grant other people rights in the protected material, perhaps subject to specified restrictions.

 

Creative Commons - A non-profit organisation founded in 2001 that promotes re-usable content by publishing a number of standard licences, some of them open (though others include a non-commercial clause), that can be used to release content for re-use, together with clear explanations of their meaning.

 

CSV - A comma-separated values (CSV) file stores tabular data (numbers and text) in plain text. Each line of the file is a data record. Each record consists of one or more fields, separated by commas. The use of the comma as a field separator is the source of the name for this file format.

 

DataCite - A non-profit organisation that provides persistent identifiers for research data. Their aim is to help researchers locate, identify and cite research data.

 

Data management - The policies, procedures, and technical choices used to handle data through its entire lifecycle from data collection to storage, preservation and use. A data management policy should take account of the needs of data quality, availability, data protection, data preservation, etc.

 

Data Portal/ Portal - A web platform for publishing data. The aim of a data portal is to provide a data catalogue, making data not only available but discoverable for data users, while offering a convenient publishing workflow for publishing organisations. Typical features are web interfaces for publishing and for searching and browsing the catalogue, machine interfaces (APIs) to enable automatic publishing from other systems, and data preview and visualisation.

 

DOI - A Digital Object Identifier is a persistent identifier or handle used to uniquely identify objects, standardized by the International Organization for Standardization (ISO). All figshare content has its own DOI.

 

DOI reservation - DOIs can be reserved meaning they are created but the DOI is inactive. This allows users to put the reserved DOI in a journal article but only allow access to the data by publishing the item and thereby activating the DOI.  

 

Database - A collection of information that is organized so that it can be easily accessed, managed and updated.

 

Dataset -  Is a collection of data. Most commonly a data set corresponds to the contents of a single database table, or a single statistical data matrix, where every column of the table represents a particular variable, and each row corresponds to a given member of the data set in question. The data set lists values for each of the variables, such as height and weight of an object, for each member of the data set. Each value is known as a datum. The data set may comprise data for one or more members, corresponding to the number of rows.

 

Data Access Protocol - A system that allows outsiders to be granted access to databases without overloading either system. DAP is a protocol for access to data organized as a name-datatype-value tuples. It is particularly suited to accesses by a client computer to data stored on remote (server) computers which are networked to the client computer.

 

Data Centre -  Is the department in an enterprise that houses and maintains back-end information technology (IT) systems and data stores; by default Figshare uses Amazon AWS.

 

Data Collection - Datasets are created by collecting data in different ways: from manual or automatic measurements (e.g. weather data), surveys (census data), records of decisions (budget data) or ongoing transactions (spending data), aggregation of many records (crime data), mathematical modelling (population projections), etc.

 

Data preservation - Is the act of conserving and maintaining both the safety and integrity of data. It refers to the series of managed activities necessary to ensure continued access to digital materials for as long as necessary. Preservation is done through formal activities that are governed by policies, regulations and strategies directed towards protecting and prolonging the existence and authenticity of data and its metadata. The main goal of data preservation is to protect data from being lost or destroyed and to contribute to the reuse and progression of the data.

 

FAIR - Findable, Accessible, Interoperable, Reusable - A set of guiding principles in order to make data Findable, Accessible, Interoperable, and Reusable for both humans and machines. Figshare are actively following these guidelines, please see our blog post.

 

File format - Figshare allows any digital file to be uploaded. It also allows over 1,200 file types to be viewed in the browser without the need for additional plug ins.  

 

Harvest - Data harvesting is the gathering of data from multiple sources into a single source from where it can be re-published.

 

IR - Institutional Repository - Is an archive for collecting, preserving, and disseminating digital copies of the intellectual output of an institution, particularly a research institution.

 

Licence - A legal instrument by which a copyright holder may grant rights over the protected work. A range of standard open licences are available, such as the Creative Commons CC-BY licence. For more information please visit: https://knowledge.figshare.com/articles/item/copyright-and-licence-policy

 

MD5 checksum - This is a function used to validate data. The hash algorithm generates a unique fingerprint against the source data when uploaded and again at its final storage destination after a period of time. The fingerprints should match, indicating that the data is intact and has not been altered.

 

Metadata - This is essentially data that describes and gives information about data.

 

Open Access -  The principle that access to the published papers and other results of research, especially publicly-funded research, should be freely available to all.

 

Open data - Data that is available under an open (data) license that permits anyone freely to access, reuse and redistribute.

 

ORCID ID - An alphanumeric code containing 16 digits to uniquely identify researchers as a way of disambiguating similarly names authors.

 

Preprint - A draft that has not yet been peer reviewed for formal publication. Preprint Servers, such as ChemRxiv, host these drafts that typically go through a basic screening and are assigned a DOI.

 

For more open data terms and definitions please visit: http://opendatahandbook.org/glossary/en/