|
| 1 | +# Overview of changes in version 1.0.0 |
| 2 | + |
| 3 | +AUTOTYP version 1.0.0 is a completely new release that focuses on usability, |
| 4 | +documentation and completeness. It has been radically overhauled compared to |
| 5 | +the earlier 0.1.x version. The sheer number of differences makes it |
| 6 | +impossible to provide a comprehensive list of changes. What follows is a |
| 7 | +quick summary of the most important of the new release as well as notes on |
| 8 | +migrating from the old database releases. |
| 9 | + |
| 10 | +## Major new features in version 1.0.0: |
| 11 | + |
| 12 | +- New naming conventions for datasets and variables, focusing on usability |
| 13 | + and clarity. All names now consistently follow the CamelCase convention and |
| 14 | + are based on verbose descriptions that provide more context about the variable |
| 15 | + (e.g. `Position` -> `VerbInflectionMarkerPosition`). Hundreds of variables have |
| 16 | + been renamed to fit these criteria. |
| 17 | + |
| 18 | +- The datasets are now organized into thematic modules, rather than each dataset |
| 19 | + constituting a module on its own. |
| 20 | + |
| 21 | +- Published data now includes the raw exported database data, in addition to the |
| 22 | + previously published derived aggregated tables. All aggregation scripts used to |
| 23 | + compute derived data are published as well (see |
| 24 | + [`aggregation-scripts`](aggregation-scripts)). Please feel free to inspect the |
| 25 | + scripts and modify them to suit your own needs. |
| 26 | + |
| 27 | +- Many improvements to variable descriptions and metadata. The metadata YAML files |
| 28 | + are now simpler and more compact, which should make the documentation more |
| 29 | + accessible. |
| 30 | + |
| 31 | +- Overhauled the data architecture to allow nested and repeated table fields (see |
| 32 | + [Data Architecture](readme.md#data-architecture)). This allows many datasets to be |
| 33 | + expressed in a more natural, conceptually simpler fashion. |
| 34 | + |
| 35 | +- New R and JSON exports for users who want quick access to the data using their |
| 36 | + preferred data wrangling environment. |
| 37 | + |
| 38 | +- Language name and glottocode is exported for every dataset in addition to the |
| 39 | + internal language ID |
| 40 | + |
| 41 | +## Major changes to individual datasets/modules: |
| 42 | + |
| 43 | +- `GrammaticalRelations` module now encompasses all data on grammatical relations |
| 44 | + and alignments. We now fully provide the underlying raw database data in addition |
| 45 | + to the aggregated alignment data and the scripts used to produce these aggregations. |
| 46 | + |
| 47 | +- `VerbSynthesis` has been overhauled to include detailed list of inflectional |
| 48 | + categories expressed on verbs |
| 49 | + |
| 50 | +- `LocusOfMarking` module now contains the raw database data in addition to the |
| 51 | + previously published aggregations. |
| 52 | + |
| 53 | +- `GrammaticalMarkers` dataset has been overhauled to include a detailed list |
| 54 | + of marker hosts and marked categories |
| 55 | + |
| 56 | +- `MorphemeClasses` replaces the previous aggregated `Morpheme_types` dataset |
| 57 | + and exposes the information about individual language-specific morpheme classes. |
| 58 | + The information previously available in `Morpheme_types` is now integrated into |
| 59 | + the improved `MorphologyPerLanguage` aggregated dataset. |
| 60 | + |
| 61 | +- New module `Categories` groups together datasets that provide information about |
| 62 | + selected grammatical categories |
| 63 | + |
| 64 | +- New module `Definitions` provides access to underlying definitions of categorical |
| 65 | + variables used across AUTOTYP |
| 66 | + |
| 67 | +- New module `PerLanguageSummaries` groups together various per-language aggregated |
| 68 | + summaries (code to generate these summaries is available under |
| 69 | + [`aggregation-scripts`](aggregation-scripts)) |
| 70 | + |
| 71 | + |
| 72 | +## Notes on migration from older AUTOTYP release |
| 73 | + |
| 74 | +If you have been using the AUTOTYP version 0.1.x you will notice that many datasets |
| 75 | +have been moved or renamed. The following list should help you to find the new |
| 76 | +location of the data: |
| 77 | + |
| 78 | +- **`Agreement`** is now exported as `Categories/Agreement` |
| 79 | +- **`Alienability`** is now exported as `Categories/Alienability` |
| 80 | +- **`Alignment`** is now exported as `GrammaticalRelations/Alignment` |
| 81 | +- **`Alignment_per_language`** is now `PerLanguageSummaries/AlignmentForDefaultPredicatesPerLanguage` |
| 82 | +- **`Clause_linkage`** is now `Sentence/ClauseLinkage` |
| 83 | +- **`Clause_word_order`** is now `Sentence/ClauseWordOrder` |
| 84 | +- **`Clusivity`** is now exported as `Categories/Clusivity` |
| 85 | +- **`Gender`** is now exported as `Categories/Gender` |
| 86 | +- **`Grammatical_markers`** is now exported as `Morphology/GrammaticalMarkers` |
| 87 | +- **`GR_per_language`** has been superseded by `GrammaticalRelations/GrammaticalRelationCoverage` |
| 88 | +- **`Locus_per_language`** is now `PerLanguageSummaries/LocusOfMarkingPerLanguage` |
| 89 | +- **`Locus_per_macrorelation`** has been superseded by `Morphology/DefaultLocusOfMarkingPerMacrorelation` |
| 90 | +- **`Locus_per_microrelation`** has been superseded by `Morphology/LocusOfMarkingPerMicrorelation` |
| 91 | +- **`Markers_per_language`** is now `PerLanguageSummaries/GrammaticalMarkersPerLanguage` |
| 92 | +- **`Morpheme_types`** has been superseded by `Morphology/MorphemeClasses` and |
| 93 | + `PerLanguageSummaries/MorphologyPerLanguage` |
| 94 | +- **`Morphology_per_language`** is now `PerLanguageSummaries/MorphologyPerLanguage` |
| 95 | +- **`NP_per_language`** is now `PerLanguageSummaries/NPStructurePerLanguage` |
| 96 | +- **`NP_structure`** is now `NP/NPStructure` |
| 97 | +- **`NP_structure_presence`** is now `PerLanguageSummaries/NPStructurePresence` |
| 98 | +- **`Numeral_classifiers`** is now exported as `Categories/NumeralClassifiers` |
| 99 | +- **`Register`** is still `Register` |
| 100 | +- **`Synthesis`** is now `Morphology/VerbSynthesis` |
| 101 | +- **`Valence_classes`** is now `GrammaticalRelations/PredicateClasses` |
| 102 | +- **`Valence_classes_per_language`** is now `PerLanguageSummaries/PredicateClassesSemanticsPerLanguage` |
| 103 | +- **`VInfl_counts_per_position`** is now `PerLanguageSummaries/VerbInflectionAndAgreementCountsByPosition` |
| 104 | +- **`VInfl_cat_*`** is now `PerLanguageSummaries/VerbInflectionCategoriesAggregatedBy*` |
| 105 | +- **`VInfl_macrocat_*`** is now `PerLanguageSummaries/VerbInflectionMacrocategories*` |
| 106 | +- **`VAgr_*`** is now `PerLanguageSummaries/VerbAgreementAggregatedBy*` |
| 107 | +- **`Word_domains`** is now `Word/WordDomains` |
| 108 | + |
| 109 | + |
| 110 | + |
| 111 | + |
| 112 | + |
| 113 | + |
| 114 | + |
0 commit comments