Skip to content

Commit 88133f0

Browse files
committed
AUTOTYP v1.0.0
1 parent 74aaf40 commit 88133f0

File tree

329 files changed

+3352908
-354175
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

329 files changed

+3352908
-354175
lines changed

Diff for: .gitignore

+1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
.DS_Store

Diff for: CHANGELOG.md

-40
This file was deleted.

Diff for: CHANGES-1.0.0.md

+114
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,114 @@
1+
# Overview of changes in version 1.0.0
2+
3+
AUTOTYP version 1.0.0 is a completely new release that focuses on usability,
4+
documentation and completeness. It has been radically overhauled compared to
5+
the earlier 0.1.x version. The sheer number of differences makes it
6+
impossible to provide a comprehensive list of changes. What follows is a
7+
quick summary of the most important of the new release as well as notes on
8+
migrating from the old database releases.
9+
10+
## Major new features in version 1.0.0:
11+
12+
- New naming conventions for datasets and variables, focusing on usability
13+
and clarity. All names now consistently follow the CamelCase convention and
14+
are based on verbose descriptions that provide more context about the variable
15+
(e.g. `Position` -> `VerbInflectionMarkerPosition`). Hundreds of variables have
16+
been renamed to fit these criteria.
17+
18+
- The datasets are now organized into thematic modules, rather than each dataset
19+
constituting a module on its own.
20+
21+
- Published data now includes the raw exported database data, in addition to the
22+
previously published derived aggregated tables. All aggregation scripts used to
23+
compute derived data are published as well (see
24+
[`aggregation-scripts`](aggregation-scripts)). Please feel free to inspect the
25+
scripts and modify them to suit your own needs.
26+
27+
- Many improvements to variable descriptions and metadata. The metadata YAML files
28+
are now simpler and more compact, which should make the documentation more
29+
accessible.
30+
31+
- Overhauled the data architecture to allow nested and repeated table fields (see
32+
[Data Architecture](readme.md#data-architecture)). This allows many datasets to be
33+
expressed in a more natural, conceptually simpler fashion.
34+
35+
- New R and JSON exports for users who want quick access to the data using their
36+
preferred data wrangling environment.
37+
38+
- Language name and glottocode is exported for every dataset in addition to the
39+
internal language ID
40+
41+
## Major changes to individual datasets/modules:
42+
43+
- `GrammaticalRelations` module now encompasses all data on grammatical relations
44+
and alignments. We now fully provide the underlying raw database data in addition
45+
to the aggregated alignment data and the scripts used to produce these aggregations.
46+
47+
- `VerbSynthesis` has been overhauled to include detailed list of inflectional
48+
categories expressed on verbs
49+
50+
- `LocusOfMarking` module now contains the raw database data in addition to the
51+
previously published aggregations.
52+
53+
- `GrammaticalMarkers` dataset has been overhauled to include a detailed list
54+
of marker hosts and marked categories
55+
56+
- `MorphemeClasses` replaces the previous aggregated `Morpheme_types` dataset
57+
and exposes the information about individual language-specific morpheme classes.
58+
The information previously available in `Morpheme_types` is now integrated into
59+
the improved `MorphologyPerLanguage` aggregated dataset.
60+
61+
- New module `Categories` groups together datasets that provide information about
62+
selected grammatical categories
63+
64+
- New module `Definitions` provides access to underlying definitions of categorical
65+
variables used across AUTOTYP
66+
67+
- New module `PerLanguageSummaries` groups together various per-language aggregated
68+
summaries (code to generate these summaries is available under
69+
[`aggregation-scripts`](aggregation-scripts))
70+
71+
72+
## Notes on migration from older AUTOTYP release
73+
74+
If you have been using the AUTOTYP version 0.1.x you will notice that many datasets
75+
have been moved or renamed. The following list should help you to find the new
76+
location of the data:
77+
78+
- **`Agreement`** is now exported as `Categories/Agreement`
79+
- **`Alienability`** is now exported as `Categories/Alienability`
80+
- **`Alignment`** is now exported as `GrammaticalRelations/Alignment`
81+
- **`Alignment_per_language`** is now `PerLanguageSummaries/AlignmentForDefaultPredicatesPerLanguage`
82+
- **`Clause_linkage`** is now `Sentence/ClauseLinkage`
83+
- **`Clause_word_order`** is now `Sentence/ClauseWordOrder`
84+
- **`Clusivity`** is now exported as `Categories/Clusivity`
85+
- **`Gender`** is now exported as `Categories/Gender`
86+
- **`Grammatical_markers`** is now exported as `Morphology/GrammaticalMarkers`
87+
- **`GR_per_language`** has been superseded by `GrammaticalRelations/GrammaticalRelationCoverage`
88+
- **`Locus_per_language`** is now `PerLanguageSummaries/LocusOfMarkingPerLanguage`
89+
- **`Locus_per_macrorelation`** has been superseded by `Morphology/DefaultLocusOfMarkingPerMacrorelation`
90+
- **`Locus_per_microrelation`** has been superseded by `Morphology/LocusOfMarkingPerMicrorelation`
91+
- **`Markers_per_language`** is now `PerLanguageSummaries/GrammaticalMarkersPerLanguage`
92+
- **`Morpheme_types`** has been superseded by `Morphology/MorphemeClasses` and
93+
`PerLanguageSummaries/MorphologyPerLanguage`
94+
- **`Morphology_per_language`** is now `PerLanguageSummaries/MorphologyPerLanguage`
95+
- **`NP_per_language`** is now `PerLanguageSummaries/NPStructurePerLanguage`
96+
- **`NP_structure`** is now `NP/NPStructure`
97+
- **`NP_structure_presence`** is now `PerLanguageSummaries/NPStructurePresence`
98+
- **`Numeral_classifiers`** is now exported as `Categories/NumeralClassifiers`
99+
- **`Register`** is still `Register`
100+
- **`Synthesis`** is now `Morphology/VerbSynthesis`
101+
- **`Valence_classes`** is now `GrammaticalRelations/PredicateClasses`
102+
- **`Valence_classes_per_language`** is now `PerLanguageSummaries/PredicateClassesSemanticsPerLanguage`
103+
- **`VInfl_counts_per_position`** is now `PerLanguageSummaries/VerbInflectionAndAgreementCountsByPosition`
104+
- **`VInfl_cat_*`** is now `PerLanguageSummaries/VerbInflectionCategoriesAggregatedBy*`
105+
- **`VInfl_macrocat_*`** is now `PerLanguageSummaries/VerbInflectionMacrocategories*`
106+
- **`VAgr_*`** is now `PerLanguageSummaries/VerbAgreementAggregatedBy*`
107+
- **`Word_domains`** is now `Word/WordDomains`
108+
109+
110+
111+
112+
113+
114+

Diff for: LICENSE

100755100644
File mode changed.

Diff for: R/autotyp.utilities.R

-99
This file was deleted.

Diff for: VERSION

-1
This file was deleted.

0 commit comments

Comments
 (0)