Skip to content

Latest commit

 

History

History

dataset

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 

Subak Data Catalogue Dataset Metadata Specification

In order for Subak to index new datasets in the Data Catalogue we need to collect various information about the dataset which we store as metadata.

Data Catalogue Metadata Schema

We provide a YAML schema file to help data contributors understand what metadata we collect and the format we expect this to take. For each field in the schema we describe its type, a description of what it is for, whether it is strictly required, a restricted list of options (if applicable) and an example value.

We also provide a template metadata.yml file ready for you to populate based on the rules of the schema and an example pre-filled metadata.example.yml file. The README.md and SOURCES.md files in the template provide a convenient way to populate the description and data_sources metadata fields which are generally more verbose and better described via markdown.

If you have any issues when working with our schema, please start a discussion and we'll address any issues you have to get your dataset indexed in the Data Catalogue.

Publishing your dataset in the Data Catalogue

Currently, you must be a member of an organisation set up in the Data Catalogue in order to create a new dataset. We will be working to improve this process in the coming months. Please create a user account on the Data Catalogue and file a request to setup your organisation. You can then either use the 'Add dataset' form to populate the dataset metadata from your metadata.yml file manually, or submit it to us to upload on your behalf.

We will soon be providing an interface for loading/updating your dataset automatically using your filled in metadata.yml file.