This repository contains data production pipelines for building Darwin Core datasets for publication in the Global Biodiversity Information Facility, with permanent archiving in Zenodo
Notice: These are pre-production URLs, for testing purposes only
- 818 – https://www.gbif.org/tools/data-validator/1614975526032 https://sandbox.zenodo.org/api/files/6554259d-f42c-4904-b4bb-015cfd1bdb2e
- 1420 – https://www.gbif.org/tools/data-validator/1614975526031
- 2751 – https://www.gbif.org/tools/data-validator/1614975526029
-
Export EcoTaxa data as TSV (using DOI export with images)
-
Publish untreated TSV and images to Zenodo
-
Create Darwin Core occurrences in NDJSON from EcoTaxa TSV, using ecotaxa-darwin-core
-
Create unique Darwin Core sampling events in NDJSON by reducing the occurrences
-
@todo Merge with other/authoritative event metadata (eg. sampling volumes)
-
Create lists of ignored (not-living) and rejected (non-Eukaryota) objects
-
Create lists of rejected events (non-unique or invalid/non-consistent metadata)
-
Finish local processing by executing Darwin Core pipelines below
gbif-no-darwin-core$ ./bin/ecotaxa-pipeline 1420
- Create taxonomy NDJSON by extracting occurrence taxa and checking against GBIF Species API using WoRMS
- Create lists of possible taxonomy issues (not found or incertae sedis)
- Extract time coverage (start/end, years, months, days, dates)
- Extract space coverage (bounding box/depths)
- Extract sampling protocols
- @todo Create EML XML
- Create meta.xml with file metadata for event core (event.tsv) and extensions (occurrence.tsv taxonomy.tsv)
- Set default fields for occurrenceStatus ("present"), basisOfRecord (MO?) and organismQuantityType ("individuals")
Event Core
- Reduce occurrences by rolling up to one line per taxon per sample and summing organismQuantity
Occurrences extension
- Update resulting occurrences by appending authorship into scientific name and merge-in relevant fields from taxonomy (in particular taxonID)
- Publish NDJSON distribution with zipped Darwin Core archive in Zenodo
Taxonomy
@todo
This project was co-funded by GBIF Norway, see Data management plan for further details.