.. _howto: ###### How-To ###### Usage ===== Before to use the software it is needed to create the file which will be use to fill the necessary metadata needed by the data repository to create the data record. Zenodo Metadata --------------- Zenodo data repository is asking for a very specific informations which should be included in **every** data record. Zenodo metadata which should **always** be include in any metadata are: - title - description - upload_type - creators - [access_right] (optional, default: open) - [license] (optional, default: cc-by-0]) some additional values can be needed depending on the values of the *upload_type* and *access_right*. If: - *upload_type* is equal to *publication* implied the presence of *publication_type* metadata keyword. - *upload_type* is equal to *image* implied the presence of the keyword *image_type* - *access_right* is not present or equal to *open* (default value), it implied the presence of *license* keyword if not *license* will be: *cc-by-4.0* - *access_right* is equal to *embargoed* implied the presence of *license* (or *license* will be set to *cc-by-4.0*) and *embargo_date* keywords - *access_right* is equal to *restricted* implied the presence of *access_conditions* keyword `Zenodo API `_ documentation present all the different metadata keywords accepted by it. In any doubt this is **the** reference page. Until Zenodo is providing their own schema there are no possibility to be sure that the schema is up-to-date. Datalight developer will try to keep up-to-date and be sure that any change in the Zenodo API will be reflected in the yaml schema. The associated informations to the records has to be provided in a text file following the `YAML format `_. We decided to use YAML format because it is easier for humans to read and write than other metadata formats like XML or JSON. This format is describe in the following paragraph. `YAML Format. `_ --------------------------------- Basics ^^^^^^ .. warning:: Using a WYSIWYG (Microsoft Word) to create or edit a YAML file is not advised. YAML files can *optionally* begin with `---` and end with `...` That indicates the start and end of a document. Comments ^^^^^^^^ It is possible to add comments in a yaml file. The character used to declare that everything after that character is considered as comment is **#** (similar to python): .. code-block:: yaml # This is a comment in yaml title: a title # <- The line read up to this character. Key-value pairs ^^^^^^^^^^^^^^^ YAML keep informations using a key which is associated to a value (it is a called hashtable or in python dictionary): .. code-block:: yaml title : A title for our data A string can be extended on multiple lines by three differents way: .. code-block:: yaml title : "A title for our data which extend on multiple lines using quote" .. code-block:: yaml title : | A title for our data which extend on multiple lines "Literal Block Scalar" | will include the newlines and any trailing spaces. .. code-block:: yaml title : > A title for our data which extend on multiple lines "Literal Block Scalar" > will fold newlines to spaces; it’s used to make what would otherwise be a very long line easier to read and edit. In either case the indentation will be ignored. Value can be more complex than just a string but list or combinaison of list and key-pair values. YAML will consider that lines prefixed with more spaces than the parent key are contained inside it; Moreover, all lines must be prefixed with the same amount of spaces to belong to the same map We are going to see some examples to understand what it means in the context of datalight metadata. A list in YAML where the key has the name *alist* and the value is a list of three elements: .. code-block:: yaml alist: - first element - second element. - third element For datalight we can use it to list, for example, the *creators* of the data: .. code-block:: yaml creators: - name: Jane Doe - name: Alan Smith The value associated to a key can be a list but also another hashtable to add more informations. Here an example where we are providing the name for every creators but also the affiliation: .. code-block:: yaml creators: - name: Jane Doe affiliation: University of Neverland - name: Alan Smith affiliation: University of Shire Every element of the list will have two informations (name and affiliation). This is the description of the YAML format needed to create a metadata file to upload on our favorite data repository. In the following section, we are going to see different examples of valide metadata for Zenodo repository. Zenodo metadata examples ------------------------ Minimal ^^^^^^^ .. code-block:: yaml title: A small title describing our data description: "Description of the dataset that is going to be upload" upload_type: dataset creators: - name: Jane Doe affiliation: University of Neverland - name: Alan Smith affiliation: University of Shire This metadata will be sufficient to upload successfully a dataset on Zenodo. Zenodo will add the following one which will have their default value: .. code-block:: yaml access_right: open license: CC-BY-4.0 A more complete (realistic) one ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. code-block:: yaml title: "A very long title" description: "Description of the data" creators: - name: John Doe affiliation: The University of Neverland - name: Jane Doe affiliation: The University of Shire orcid: 0000-0000-0000-0007 upload_type: dataset access_right: restricted access_conditions: "Only available through contact to myroject project" communities: - identifier: mycommunity thesis_supervisors: - name: Jane Doe affiliation: The University of Shire orcid: 0000-0000-0000-0007 contributors: - name: Alan Smith affiliation: The University of Mars orcid: 0000-0000-0000-0005 type: ContactPerson license: CC-BY-4.0 keywords: - MyProject - another_keyword notes: "If grant number is not reconize by OpenAir this is where you indicate the information related to the grant (as mention in the Zenodo documentation)." # If project known in the OpenAir #grants: # id: language: eng subjects: - term: Fantasy and SF identifier: http://id.loc.gov/authorities/subjects/sh000000 scheme: "url" or one where data are emboargoed until a certain date: .. code-block:: yaml title: "A very long title" description: "Description of the data" creators: - name: John Doe affiliation: The University of Neverland - name: Jane Doe affiliation: The University of Shire orcid: 0000-0000-0000-0007 upload_type: dataset access_right: embargoed embargo_date: 2022-12-31 communities: - identifier: mycommunity thesis_supervisors: - name: Jane Doe affiliation: The University of Shire orcid: 0000-0000-0000-0007 contributors: - name: Alan Smith affiliation: The University of Mars orcid: 0000-0000-0000-0005 type: ContactPerson license: CC-BY-4.0 keywords: - MyProject - another_keyword notes: "If grant number is not reconize by OpenAir this is where you indicate the information related to the grant (as mention in the Zenodo documentation)." # If project known in the OpenAir #grants: # id: language: eng subjects: - term: Fantasy and SF identifier: http://id.loc.gov/authorities/subjects/sh000000 scheme: "url" Datalight usage =============== When you have a file(s)/directory containing your data and a proper metadata file associated, you can upload your data to Zenodo data repository. .. code-block:: bash $ datalight file1 -m $ datafile file1 file2 -m $ datafile directory -m The **--metadata** argument should point to the file which contains the Zenodo metadata as describe above. .. warning:: **-m ** or **--metadata=** argument is **not optional**. It has to be provided. You can add as many argument as you want and they should be the name of file or the name of a directory which contains the data you want to upload to the data repository. Example:: $ datalight metadata.yaml -m metadata.yaml Where *metadata.yaml* will contains the following values: .. code-block:: yaml title: A small title describing our data description: "Description of the dataset that is going to be upload" upload_type: dataset creators: - name: Alan Smith affiliation: University of Shire In this example, the file containing the metadata will be upload as a data file and store on Zenodo. Publishing the data at the upload time -------------------------------------- By default the data will be upload on the data repository but they will not be published. You can ask datalight to do it using the option **-p** or **--publish**:: $ datalight metadata.yaml -m metadata.yaml -p or:: $ datalight metadata.yaml -m metadata.yaml --publish The file metadata.yaml will be upload with the information found in the file containing the metadata (here the same file) and will be publish on the data repository. .. warning:: data which have been publish **cannot** be removed. They will be present forever on the data repository. The finalisation of the data and the publication can always be done through the web interface provided by Zenodo: `Zenodo webinterface `_ Testing the upload ------------------ If you prefer to test the upload of your data, Zenodo is providing a sandbox and you can use it for the tests by using the option **--sandbox**:: $ datalight metadata.yaml -m metadata.yaml --publish --sandbox will publish the data on the `sandbox website `_. .. warning:: 1. To be able to use the sandbox you need to create the token from: https://sandbox.zenodo.org. 2. The token should be copied in the file **.zenodo** with the keyword *[sandbox.zenodo.org]* e.g.:: [sandbox.zenodo.org] lightForm = [zenodo.org] lightform = 3. Zenodo sandbox is not really reliable and the tests can failed with an error 500. That does not necessariliy means than the upload didn't work but that at least one step where missing. That can create a record on the sandbox that you will have to clean manually.