Before to use the software it is needed to create the file which will be use to fill the necessary metadata needed by the data repository to create the data record.
Zenodo data repository is asking for a very specific informations which should be included in every data record.
Zenodo metadata which should always be include in any metadata are:
some additional values can be needed depending on the values of the upload_type and access_right.
If:
Zenodo API documentation present all the different metadata keywords accepted by it. In any doubt this is the reference page.
Until Zenodo is providing their own schema there are no possibility to be sure that the schema is up-to-date. Datalight developer will try to keep up-to-date and be sure that any change in the Zenodo API will be reflected in the yaml schema.
The associated informations to the records has to be provided in a text file following the YAML format.
We decided to use YAML format because it is easier for humans to read and write than other metadata formats like XML or JSON.
This format is describe in the following paragraph.
Warning
Using a WYSIWYG (Microsoft Word) to create or edit a YAML file is not advised.
YAML files can optionally begin with — and end with … That indicates the start and end of a document.
It is possible to add comments in a yaml file. The character used to declare that everything after that character is considered as comment is # (similar to python):
# This is a comment in yaml
title: a title # <- The line read up to this character.
YAML keep informations using a key which is associated to a value (it is a called hashtable or in python dictionary):
title : A title for our data
A string can be extended on multiple lines by three differents way:
title : "A title for our data
which extend on multiple lines using quote"
title : |
A title for our data
which extend on multiple lines "Literal Block Scalar" |
will include the newlines and any trailing spaces.
title : >
A title for our data
which extend on multiple lines "Literal Block Scalar" >
will fold newlines to spaces; it’s used to make what would
otherwise be a very long line easier to read and edit.
In either case the indentation will be ignored.
Value can be more complex than just a string but list or combinaison of list and key-pair values. YAML will consider that lines prefixed with more spaces than the parent key are contained inside it; Moreover, all lines must be prefixed with the same amount of spaces to belong to the same map
We are going to see some examples to understand what it means in the context of datalight metadata.
A list in YAML where the key has the name alist and the value is a list of three elements:
alist:
- first element
- second element.
- third element
For datalight we can use it to list, for example, the creators of the data:
creators:
- name: Jane Doe
- name: Alan Smith
The value associated to a key can be a list but also another hashtable to add more informations. Here an example where we are providing the name for every creators but also the affiliation:
creators:
- name: Jane Doe
affiliation: University of Neverland
- name: Alan Smith
affiliation: University of Shire
Every element of the list will have two informations (name and affiliation).
This is the description of the YAML format needed to create a metadata file to upload on our favorite data repository.
In the following section, we are going to see different examples of valide metadata for Zenodo repository.
title: A small title describing our data
description: "Description of the dataset that
is going to be upload"
upload_type: dataset
creators:
- name: Jane Doe
affiliation: University of Neverland
- name: Alan Smith
affiliation: University of Shire
This metadata will be sufficient to upload successfully a dataset on Zenodo.
Zenodo will add the following one which will have their default value:
access_right: open
license: CC-BY-4.0
title: "A very long
title"
description: "Description of the data"
creators:
- name: John Doe
affiliation: The University of Neverland
- name: Jane Doe
affiliation: The University of Shire
orcid: 0000-0000-0000-0007
upload_type: dataset
access_right: restricted
access_conditions: "Only available through contact to myroject project"
communities:
- identifier: mycommunity
thesis_supervisors:
- name: Jane Doe
affiliation: The University of Shire
orcid: 0000-0000-0000-0007
contributors:
- name: Alan Smith
affiliation: The University of Mars
orcid: 0000-0000-0000-0005
type: ContactPerson
license: CC-BY-4.0
keywords:
- MyProject
- another_keyword
notes: "If grant number is not reconize by OpenAir this is where you
indicate the information related to the grant (as mention in
the Zenodo documentation)."
# If project known in the OpenAir
#grants:
# id:
language: eng
subjects:
- term: Fantasy and SF
identifier: http://id.loc.gov/authorities/subjects/sh000000
scheme: "url"
or one where data are emboargoed until a certain date:
title: "A very long
title"
description: "Description of the data"
creators:
- name: John Doe
affiliation: The University of Neverland
- name: Jane Doe
affiliation: The University of Shire
orcid: 0000-0000-0000-0007
upload_type: dataset
access_right: embargoed
embargo_date: 2022-12-31
communities:
- identifier: mycommunity
thesis_supervisors:
- name: Jane Doe
affiliation: The University of Shire
orcid: 0000-0000-0000-0007
contributors:
- name: Alan Smith
affiliation: The University of Mars
orcid: 0000-0000-0000-0005
type: ContactPerson
license: CC-BY-4.0
keywords:
- MyProject
- another_keyword
notes: "If grant number is not reconize by OpenAir this is where you
indicate the information related to the grant (as mention in
the Zenodo documentation)."
# If project known in the OpenAir
#grants:
# id:
language: eng
subjects:
- term: Fantasy and SF
identifier: http://id.loc.gov/authorities/subjects/sh000000
scheme: "url"
When you have a file(s)/directory containing your data and a proper metadata file associated, you can upload your data to Zenodo data repository.
$ datalight file1 -m <name of the file which contains zenodo metadata>
$ datafile file1 file2 -m <name of the file which contains zenodo metadata>
$ datafile directory -m <name of the file which contains zenodo metadata>
The –metadata argument should point to the file which contains the Zenodo metadata as describe above.
Warning
-m <metadata_file> or –metadata=<metadata_file> argument is not optional. It has to be provided.
You can add as many argument as you want and they should be the name of file or the name of a directory which contains the data you want to upload to the data repository.
Example:
$ datalight metadata.yaml -m metadata.yaml
Where metadata.yaml will contains the following values:
title: A small title describing our data
description: "Description of the dataset that
is going to be upload"
upload_type: dataset
creators:
- name: Alan Smith
affiliation: University of Shire
In this example, the file containing the metadata will be upload as a data file and store on Zenodo.
By default the data will be upload on the data repository but they will not be published. You can ask datalight to do it using the option -p or –publish:
$ datalight metadata.yaml -m metadata.yaml -p
or:
$ datalight metadata.yaml -m metadata.yaml --publish
The file metadata.yaml will be upload with the information found in the file containing the metadata (here the same file) and will be publish on the data repository.
Warning
data which have been publish cannot be removed. They will be present forever on the data repository.
The finalisation of the data and the publication can always be done through the web interface provided by Zenodo: Zenodo webinterface
If you prefer to test the upload of your data, Zenodo is providing a sandbox and you can use it for the tests by using the option –sandbox:
$ datalight metadata.yaml -m metadata.yaml --publish --sandbox
will publish the data on the sandbox website.
Warning
2. The token should be copied in the file .zenodo with the keyword [sandbox.zenodo.org] e.g.:
[sandbox.zenodo.org]
lightForm = <zenodo sandbox token>
[zenodo.org]
lightform = <zenodo token>