This is a lesson on OpenRefine data cleaning tool derived from the Data Carpentry's Data Refine for Ecology.
- The data used in this lesson ata set is derived from The Portal Project Long-term desert ecology project data. This data file was downloaded and then modified specifically for use with OpenRefine.
- Taxon names were put back into the file.
- Globally Unique Identifiers (in the form of UUIDs) were added.
- These modifications were made in order to illustrate some features of Open Refine.
- Errors were added to the taxon names (
scientificNamefield), to demonstrate OpenRefine's ability to find likely mis-entered data. - These errors can be found using clustering algorithms on the
scientificNamecolumn, showing the power of the algorithms to find discrepancies quickly and making it simple to fix all issues found.
- Errors were added to the taxon names (
Current maintainers of this lesson are:
A list of contributors to the lesson can be found in AUTHORS.
To cite this lesson, please consult with CITATION.