In order for Subak to index new datasets in the Data Catalogue we need to collect various information about the dataset which we store as metadata.
We provide a YAML schema file to help data contributors understand what metadata we collect and the format we expect this to take. For each field in the schema we describe its type, a description of what it is for, whether it is strictly required, a restricted list of options (if applicable) and an example value.
We also provide a template metadata.yml file ready for you to populate based on the rules of the schema and an example pre-filled metadata.example.yml file. The README.md and SOURCES.md files in the template provide a convenient way to populate the description
and data_sources
metadata fields which are generally more verbose and better described via markdown.
If you have any issues when working with our schema, please start a discussion and we'll address any issues you have to get your dataset indexed in the Data Catalogue.
Currently, you must be a member of an organisation set up in the Data Catalogue in order to create a new dataset. We will be working to improve this process in the coming months. Please create a user account on the Data Catalogue and file a request to setup your organisation. You can then either use the 'Add dataset' form to populate the dataset metadata from your metadata.yml
file manually, or submit it to us to upload on your behalf.
We will soon be providing an interface for loading/updating your dataset automatically using your filled in metadata.yml
file.