How-To

Usage

Before to use the software it is needed to create the file which will be use to fill the necessary metadata needed by the data repository to create the data record.

Zenodo Metadata

Zenodo data repository is asking for a very specific informations which should be included in every data record.

Zenodo metadata which should always be include in any metadata are:

  • title
  • description
  • upload_type
  • creators
  • [access_right] (optional, default: open)
  • [license] (optional, default: cc-by-0])

some additional values can be needed depending on the values of the upload_type and access_right.

If:

  • upload_type is equal to publication implied the presence of publication_type metadata keyword.
  • upload_type is equal to image implied the presence of the keyword image_type
  • access_right is not present or equal to open (default value), it implied the presence of license keyword if not license will be: cc-by-4.0
  • access_right is equal to embargoed implied the presence of license (or license will be set to cc-by-4.0) and embargo_date keywords
  • access_right is equal to restricted implied the presence of access_conditions keyword

Zenodo API documentation present all the different metadata keywords accepted by it. In any doubt this is the reference page.

Until Zenodo is providing their own schema there are no possibility to be sure that the schema is up-to-date. Datalight developer will try to keep up-to-date and be sure that any change in the Zenodo API will be reflected in the yaml schema.

The associated informations to the records has to be provided in a text file following the YAML format.

We decided to use YAML format because it is easier for humans to read and write than other metadata formats like XML or JSON.

This format is describe in the following paragraph.

YAML Format.

Basics

Warning

Using a WYSIWYG (Microsoft Word) to create or edit a YAML file is not advised.

YAML files can optionally begin with and end with That indicates the start and end of a document.

Comments

It is possible to add comments in a yaml file. The character used to declare that everything after that character is considered as comment is # (similar to python):

# This is a comment in yaml
title: a title # <- The line read up to this character.

Key-value pairs

YAML keep informations using a key which is associated to a value (it is a called hashtable or in python dictionary):

title : A title for our data

A string can be extended on multiple lines by three differents way:

title : "A title for our data
        which extend on multiple lines using quote"
title : |
        A title for our data
        which extend on multiple lines "Literal Block Scalar" |
        will include the newlines and any trailing spaces.
title : >
        A title for our data
        which extend on multiple lines "Literal Block Scalar" >
        will fold newlines to spaces; it’s used to make what would
        otherwise be a very long line easier to read and edit.

In either case the indentation will be ignored.

Value can be more complex than just a string but list or combinaison of list and key-pair values. YAML will consider that lines prefixed with more spaces than the parent key are contained inside it; Moreover, all lines must be prefixed with the same amount of spaces to belong to the same map

We are going to see some examples to understand what it means in the context of datalight metadata.

A list in YAML where the key has the name alist and the value is a list of three elements:

alist:
  - first element
  - second element.
  - third element

For datalight we can use it to list, for example, the creators of the data:

creators:
  - name: Jane Doe
  - name: Alan Smith

The value associated to a key can be a list but also another hashtable to add more informations. Here an example where we are providing the name for every creators but also the affiliation:

creators:
  - name: Jane Doe
    affiliation: University of Neverland

  - name: Alan Smith
    affiliation: University of Shire

Every element of the list will have two informations (name and affiliation).

This is the description of the YAML format needed to create a metadata file to upload on our favorite data repository.

In the following section, we are going to see different examples of valide metadata for Zenodo repository.

Zenodo metadata examples

Minimal

title: A small title describing our data

description: "Description of the dataset that
              is going to be upload"

upload_type: dataset

creators:
  - name: Jane Doe
    affiliation: University of Neverland

  - name: Alan Smith
    affiliation: University of Shire

This metadata will be sufficient to upload successfully a dataset on Zenodo.

Zenodo will add the following one which will have their default value:

access_right: open
license: CC-BY-4.0

A more complete (realistic) one

title: "A very long
       title"

description: "Description of the data"

creators:
    - name: John Doe
      affiliation: The University of Neverland

    - name: Jane Doe
      affiliation: The University of Shire
      orcid: 0000-0000-0000-0007

upload_type: dataset

access_right: restricted

access_conditions: "Only available through contact to myroject project"

communities:
    - identifier: mycommunity

thesis_supervisors:
    - name: Jane Doe
      affiliation: The University of Shire
      orcid: 0000-0000-0000-0007

contributors:
    - name: Alan Smith
      affiliation: The University of Mars
      orcid: 0000-0000-0000-0005
      type: ContactPerson

license: CC-BY-4.0

keywords:
    - MyProject
    - another_keyword

notes: "If grant number is not reconize by OpenAir this is where you
       indicate the information related to the grant (as mention in
       the Zenodo documentation)."

# If project known in the OpenAir
#grants:
#    id:

language: eng

subjects:
    - term: Fantasy and SF
      identifier: http://id.loc.gov/authorities/subjects/sh000000
      scheme: "url"

or one where data are emboargoed until a certain date:

title: "A very long
       title"

description: "Description of the data"

creators:
    - name: John Doe
      affiliation: The University of Neverland

    - name: Jane Doe
      affiliation: The University of Shire
      orcid: 0000-0000-0000-0007

upload_type: dataset

access_right: embargoed

embargo_date: 2022-12-31

communities:
    - identifier: mycommunity

thesis_supervisors:
    - name: Jane Doe
      affiliation: The University of Shire
      orcid: 0000-0000-0000-0007

contributors:
    - name: Alan Smith
      affiliation: The University of Mars
      orcid: 0000-0000-0000-0005
      type: ContactPerson

license: CC-BY-4.0

keywords:
    - MyProject
    - another_keyword

notes: "If grant number is not reconize by OpenAir this is where you
       indicate the information related to the grant (as mention in
       the Zenodo documentation)."

# If project known in the OpenAir
#grants:
#    id:

language: eng

subjects:
    - term: Fantasy and SF
      identifier: http://id.loc.gov/authorities/subjects/sh000000
      scheme: "url"

Datalight usage

When you have a file(s)/directory containing your data and a proper metadata file associated, you can upload your data to Zenodo data repository.

$ datalight file1 -m <name of the file which contains zenodo metadata>
$ datafile file1 file2 -m <name of the file which contains zenodo metadata>
$ datafile directory -m <name of the file which contains zenodo metadata>

The –metadata argument should point to the file which contains the Zenodo metadata as describe above.

Warning

-m <metadata_file> or –metadata=<metadata_file> argument is not optional. It has to be provided.

You can add as many argument as you want and they should be the name of file or the name of a directory which contains the data you want to upload to the data repository.

Example:

$ datalight metadata.yaml -m metadata.yaml

Where metadata.yaml will contains the following values:

title: A small title describing our data

description: "Description of the dataset that
              is going to be upload"

upload_type: dataset

creators:

  - name: Alan Smith
    affiliation: University of Shire

In this example, the file containing the metadata will be upload as a data file and store on Zenodo.

Publishing the data at the upload time

By default the data will be upload on the data repository but they will not be published. You can ask datalight to do it using the option -p or –publish:

$ datalight metadata.yaml -m metadata.yaml -p

or:

$ datalight metadata.yaml -m metadata.yaml --publish

The file metadata.yaml will be upload with the information found in the file containing the metadata (here the same file) and will be publish on the data repository.

Warning

data which have been publish cannot be removed. They will be present forever on the data repository.

The finalisation of the data and the publication can always be done through the web interface provided by Zenodo: Zenodo webinterface

Testing the upload

If you prefer to test the upload of your data, Zenodo is providing a sandbox and you can use it for the tests by using the option –sandbox:

$ datalight metadata.yaml -m metadata.yaml --publish --sandbox

will publish the data on the sandbox website.

Warning

  1. To be able to use the sandbox you need to create the token from: https://sandbox.zenodo.org.

2. The token should be copied in the file .zenodo with the keyword [sandbox.zenodo.org] e.g.:

[sandbox.zenodo.org]
lightForm = <zenodo sandbox token>

[zenodo.org]
lightform = <zenodo token>
  1. Zenodo sandbox is not really reliable and the tests can failed with an error 500. That does not necessariliy means than the upload didn’t work but that at least one step where missing. That can create a record on the sandbox that you will have to clean manually.