-
Notifications
You must be signed in to change notification settings - Fork 28
Climsoft Version 5 database model proposal
This proposal builds on the model changes first discussed at the Climsoft Developers Workshop, Kitale, Kenya 18 – 22 March 2019 and further discussed on GitHub issues including #475.
The database model in this proposal is built from three key ideas:
- The Climsoft long data format. The observation table follows this structure and we have also replicated it for other components of the database. This generalisation means there is the potential to store spatial data and the paper archive images in powerful ways as well as increase the software's flexibility through flexible storage of properties.
- Git versioning. We have conceptualised the version control of records in tables which will allow for a full audit trail of changes. The version control system proposed closely resembles the basis of the Git version control software (which is the basis for GitHub). This should be reassuring as it shows we are mirroring and building on established concepts. The proposed version control system is a simplified Git versioning system which can provide a robust full audit trail for all database actions.
- The Open Geospatial Consortium (OGC) Standards. It is important to be aware of and, where possible, align with relevant international standard to ensure Climsoft is considered as a serious software worldwide. The database model in this proposal closely follows the standard's concept of a "feature", which generalises the concept of a "station". This has proved useful in simplifying the database implementation.
The sections below describe the core ideas of the database design and describe each proposed table, its use, and examples. The example tables are intended to illustrative of a use of the table, but may not show the complete functionality, or all possible variations. However, the examples are intended to be consistent with each other, so data in tables which are linked should have links that are consistent. Tables are prefixed with c5_ to identify them as Climsoft version 5 tables and distinguish them from version 4 tables that they may sit along in the same database.
The proposed database design is intended to enable replicating the current functionality of Climsoft, as well as providing notable additional features, particularly around the use of metadata (properties, auditing, comments and events). This is intended to be a minimal design, and additional tables/fields may need to be added to ensure all current functionality can be achieved.
When implementing this database, fixed views can be used to improve efficiency and usability. For example, fixed views may be created for (all) tables showing only the most current version of each record. This would also restore many-to-one relations.
Figure 1: Proposed version 5 database model
- Overview
- Audit Trail
- Observation tables
- Feature tables
- Auditing tables
- Elements and Measurement tables
- Artifact and Measure group tables
A major addition in the proposed database design is the audit trail. The audit trail ensures that it is possible to record every change to the database. This is achieved through the audit table and related tables, as well as the addition of version_number
and current
(or current_best
) fields to all tables excluding the audit tables.
All tables, apart from c5_audit
and c5_observation_entry_action_type
tables, will have the fields: id
, version_number
and current
(or current_best
). id
and version_number
form a composite primary key for each table.
For observation data, current_best
indicates which version is the current best version for each id by a positive integer value. The value of the positive integer indicates the state of the observation. Rows which are not the current best will either be NULL
or -1
. NULL
indicates not the current best and -1
indicates a candidate current best which requires action (a different value from double data entry). For example, a correction to an observation will lead to a new row in the c5_observation
table for the corrected value. The current_best
field will indicate that this is the current best value for that observation. The record with the previous value will remain in the c5_observation
table but its current_best
field will be modified to indicate it is no longer the current best value for that observation. Retaining the previous values ensures a full audit of changes is recorded.
For metadata, the current
field is used in a similar way. However, the value of the positive integers in the current
field have a different meaning and multiple positive integer values per id is possible. The positive integers are defined by events. For example, if a station moved actual location then the new record with the new location would have current
equal to 2
and row with the previous location before the move event would have current equal to 1
. The positive integers therefore show the history of the station. Moving the location of a station would also be recorded in the c5_event
table, which records the date of the event. This allows for reconstructing the history.
For both observation data and metadata, each change to a value is also recorded in the c5_audit
table. When a change is made, the change and the previous and new value of the version_number
field is recorded in the c5_audit
table to allow for reconstructing the history of changes.
The Auditing tables section provides a detailed description of the tables used for auditing.
This table records all observations. This replaces the functionality of the data entry form tables, and the observation initial table, through the use of current_best
. The value of current_best
can be used to replicate the save and upload processes in Climsoft.
id |
feature_measurement_instrument_id |
obs_date |
version_number |
qc_log |
current_best |
value |
flag |
period |
---|---|---|---|---|---|---|---|---|
1 | 1 | 17/01/2020 | 1 | 1 | 1 | 20 | ||
2 | 2 | 17/01/2020 | 1 | 1 | 1 | 12.5 |
-
feature_measurement_instrument_id
– The feature_measurement_instrument_id that this observation is for.feature_measurement_instrument_id
includes the station, measurement (element) and instrument details. -
obs_date
– The date the observation was made on. -
qc_log
– The QC processes that have been run on the observation. This may be stored as numbers separated by comas. -
current_best
– Integer values can be codes to replicate the current Climsoft save and upload processes, as described here #475. Negative values for current best can be used to indicate proposed values which are waiting to be confirmed, probably requiring human action. -
value
– The actual value of the observation. -
flag
– The flag associated with the observation.
period – The period associated with the observation.
Properties of observations are separated out into property tables. This allows for greater flexibility of user defined properties. This table defines the properties that may be recorded for observations.
id |
version_number |
name |
type |
current |
---|---|---|---|---|
1 | 1 | acquisitiontype | Int | 1 |
2 | 1 | units | char | 1 |
-
name
– The name of the property. -
type
– The data type of the property. This could be stored as coded integers.
This table records property values of observations.
id |
version_number |
property_id |
observation_id |
val_int |
val_num |
val_char |
current |
---|---|---|---|---|---|---|---|
1 | 1 | 1 | 1 | 0 | NULL | NULL | 1 |
2 | 1 | 2 | 1 | NULL | NULL | mm | 1 |
3 | 1 | 2 | 2 | NULL | NULL | Celsius | 1 |
-
property_id
– The ID of the property the value is recorded for. -
observation_id
– The ID of the observation the property value is recorded on. -
val_int
– The property value if the property type is integer. -
val_num
– The property value if the property type is numeric. -
val_char
– The property value if the property type is character.
This table defines types of features. A feature is a geographical shape (point, polygon etc.) and/or a collection of other features. This is similar to use of feature and feature_collection in the Open Geospatial Standards documents. This generalises the concept of station to include other geographical features.
In Climsoft, features will be of different types. Examples of feature types are: station, country, county, authority, admin region, drainage basin, and qualifier.
id |
version_number |
name |
current |
---|---|---|---|
1 | 1 | station | 1 |
2 | 1 | country | 1 |
3 | 1 | admin region | 1 |
4 | 1 | provinces | 1 |
4 | 2 | county | 2 |
-
name
– This will store the different types of features.
This table defines the properties of feature types. For example the station feature have properties StationId, WMOId, ICAOid. The country feature type may have CountryId as a property.
id |
version_number |
feature_type_id |
name |
type |
unique |
required |
current |
---|---|---|---|---|---|---|---|
1 | 1 | 1 | StationId | String | true | true | 1 |
2 | 1 | 1 | WMOId | Int | true | false | 1 |
3 | 1 | 1 | ICAOId | Int | true | false | 1 |
4 | 1 | 2 | CountryId | String | true | true | 1 |
5 | 1 | 2 | Agricultural_practice | String | false | false | 1 |
-
feature_type_id
– The ID of the feature type the property is being defined for. -
name
– The name of the feature. -
type
– The data type of the property. This field may be coded by integers. -
unique
– Whether the property should be unique for each features of that type. -
required
- Whether the property is required for all features of that type.
This define the features. Features will belong to a feature type. All features will have a name that could be used to display them.
id |
version_number |
feature_type_id |
name |
current |
---|---|---|---|---|
1 | 1 | 1 | BINGA | NULL |
2 | 1 | 1 | KARIBA AIRPORT | NULL |
3 | 1 | 1 | KAROI | 1 |
4 | 1 | 2 | Kenya | 1 |
5 | 1 | 2 | Uganda | 1 |
6 | 1 | 2 | Rwanda | 1 |
7 | 1 | 1 | RUKOMICHE RES. STN. | 1 |
8 | 2 | 1 | BINGA AGROMET | 1 |
9 | 1 | 1 | Kisumu Station | 1 |
10 | 1 | 1 | Kisian Station | 1 |
11 | 1 | 1 | Ahero Station | 1 |
12 | 1 | 1 | Awasi Station | 1 |
13 | 1 | 4 | Kisumu County | 1 |
-
feature_type_id
– The ID of the feature type of this feature. -
name
– The name of the feature.
For the station feature type, it may be desirable to have either the station's name stored in the name field or the station ID in the name field. Either implementation is possible depending on the requirements of a individual Met Service. If the station id is used in the "Name" field then the station name will be a property of station, or vice versa.
Therefore, an alternative to the first seven rows of the table above is:
id |
version_number |
feature_type_id |
name |
active |
current |
---|---|---|---|---|---|
1 | 1 | 1 | 67755030 | true | 1 |
2 | 1 | 1 | 67761060 | true | 1 |
3 | 1 | 1 | 67765020 | true | 1 |
4 | 1 | 2 | Kenya | true | 1 |
5 | 1 | 2 | Uganda | true | 1 |
6 | 1 | 2 | Rwanda | true | 1 |
7 | 1 | 1 | 67766010 | true | 1 |
This table defines property values of features. Each row corresponds to a single property value of a single feature. The advantage of a property table over having properties as additional fields in the features table is that users can define their own features and properties can be defined differently for different feature types.
Only one of the val_*
fields will be non-NULL
since property values can be of different types. Other data type can be added if necessary.
id |
version_number |
feature_id |
feature_type_property_id |
val_int |
val_double |
val_char |
val_date |
current |
---|---|---|---|---|---|---|---|---|
1 | 1 | 1 | 2 | 14578 | NULL | NULL | NULL | 1 |
2 | 1 | 1 | 3 | 536 | NULL | NULL | NULL | 1 |
3 | 1 | 4 | 4 | NULL | NULL | KE | NULL | 1 |
4 | 1 | 5 | 4 | NULL | NULL | UG | NULL | 1 |
5 | 1 | 4 | 5 | NULL | NULL | Maize | NULL | 1 |
-
feature_id
– The ID of the feature this property value is for. -
feature_type_property_id
– The ID of the property this value is for. -
val_int
– If the property value is an integer, the value is stored here. -
val_double
- If the property value is a double, the value is stored here. -
val_char
- If the property value is character, the value is stored here. -
val_date
- If the property value is a date, the value is stored here.
This table defines a collection of features. This can be used to group features together. For example, Kisumu County
could be a feature collection containing the stations Kisumu
, Ahero
, Awasi
and Kisian
.
id |
version_number |
feature_id |
includes_feature_id |
current |
---|---|---|---|---|
1 | 1 | 13 | 9 | 1 |
2 | 1 | 13 | 10 | 1 |
3 | 1 | 13 | 11 | 1 |
4 | 1 | 13 | 12 | 1 |
-
feature_id
- this is the id of the feature that contains the collection of other features in this case –Kisumu County
. -
includes_feature_id
– these are the ids for the features in the collection in this case the ids ofKisumu
,Ahero
,Awasi
andKisian
stations.
The example above shows that the stations with id: 9
, 10
, 11
, 12
are contained in feature collection with id 4
(Kisumu County
).
This table defines the geometry of features. We expect features to be of different geospatial types: e.g. polygons, points, lines. For example, Kisumu County
may be defined as a polygon. The exact implementation of storing geospatial objects will depend on database implementation and may vary from the example below.
id |
version_number |
feature_id |
geometry_type |
shape |
current |
---|---|---|---|---|---|
1 | 1 | 13 | polygon | () | 1 |
-
geometry_type
- The type of geospatial object (geometry) of the feature. -
shape
– The geospatial object of the feature.
There are two tables that record changes to data and metadata. The c5_audit
table records changes to observation data and metadata. This includes making corrections to data.
The c5_event
table records events i.e. a change of state. For example, if a station moves location then its geographical information will change. The station moving on a specific date will be recorded as an event. Changes to the stations geographical information will be recorded in the c5_audit
table.
The c5_audit
table records changes from a previous version to a new version by making a correction. The event table records the date an event happened. The event dates provides the history.
id |
action_id |
table |
entry_id |
version_old |
version_new |
---|---|---|---|---|---|
1 | 1 | feature_type | 4 | NULL | 1 |
2 | 2 | feature_type | 4 | 1 | 2 |
3 | 1 | event | 1 | NULL | 1 |
4 | 1 | feature_type_property | 1 | NULL | 1 |
5 | 1 | feature | 1 | NULL | 1 |
6 | 2 | feature | 1 | 1 | 2 |
7 | 1 | feature | 2 | NULL | 1 |
8 | 3 | feature | 2 | 1 | NULL |
9 | 1 | feature_defination | 2 | NULL | 1 |
10 | 2 | feature_defination | 2 | 1 | 2 |
11 | 1 | event | 2 | NULL | 1 |
12 | 2 | feature_defination | 2 | 2 | 3 |
13 | 1 | event | 3 | NULL | 1 |
-
action_id
– Actions performed by users in the front end will often result in multiple changes to the database (almost) simultaneously. It may be useful to group these changes to enable viewing/undoing all changes resulting from a single action. The action ID allows for this. -
table
– The name of the table in which the change was done. -
entry_id
– The ID of the entry in table that was changed. -
version_old
– Theversion_number
of the value within entry_id that the change is being made on. -
version_new
– The newversion_number
of the changed value withinentry_id
.
id |
version_number |
description |
event_date |
current |
---|---|---|---|---|
1 | 1 | Provinces changed to counties in Kenya. | 15/01/2010 | 1 |
2 | 1 | Station BINGA was opened. | 15/01/2020 | 1 |
3 | 1 | Station KAROI moved to a new location. | 17/01/2021 | 1 |
-
description
- A human readable description of the event. -
event_date
– The date the event happened.
This table indicates which records in observation and metadata tables are affected by an event.
id |
version_number |
event_id |
table_id |
entry_id |
entry_current_before |
entry_current_after |
current |
---|---|---|---|---|---|---|---|
1 | 1 | 1 | c5_feature_type | 1 | Province | County | 1 |
-
event_id
– The event that affects the record. -
table_id
– The name of the table that contains the record affected by the event. -
entry_id
– The id intable_id
of the record affected by the event. -
entry_current_before
– The value ofcurrent
inentry_id
before the event happened. -
entry_current_after
– The value ofcurrent
inentry_id
after the event happened.
This table defines the types of actions. Actions performed by users in the front end will often result in multiple changes to the database (almost) simultaneously. An action corresponds to an action by the user in the front end. This table defines types of action, which may be useful when searching for actions of different types.
Action types may correspond to clicking a particular button on a form in the front end. For example, FrmDaily2:Save
is the event of a user clicking the save button in FrmDaily2
.
If the front end attempts to perform an action where the action type does not exist in the database, a new action type will be added to the database with the corresponding name and an empty description
.
id |
version_number |
name |
description |
current |
---|---|---|---|---|
1 | 1 | frmDaily:Save | Save executed on frmDaily. | 1 |
2 | 1 | frmDaily:Upload | Upload executed on frmDaily. | 1 |
3 | 1 | frmMetadataStation:Import | Import of stations executed on frmMetadata. | 1 |
4 | 1 | frmDaily2:Save | Save executed on frmDaily2. | 1 |
-
name
– The name of the action type. This is unique for eachid
. -
description
– A human readable description of the action.
This table records all performed actions.
id |
version_number |
action_type_id |
operator_id |
action_date |
current |
---|---|---|---|---|---|
1 | 1 | 1 | 11 | 17/01/2020 14:27:00 | 1 |
2 | 1 | 2 | 12 | 18/01/2020 17:43:00 | 1 |
3 | 1 | 5 | 06 | 18/01/2020 18:44:00 | 1 |
-
action_type_id
– The id of the action type of this action. -
operator_id
– The id of the operator who performed the action. -
action_date
– The datetime when the action was performed.
In the table above, the first row demonstrates that the operator with id
11
clicked the save button on frmDaily2. On the next day, operator with id
12
, (possibly an administrator), uploaded data from frmDaily2.
The last row demonstrates that operator with id
06
, clicked on Import for stations on the metadata form.
This table provides a more direct links between c5_observation
and c5_action_type
. This could be important, for example, to efficiently determine which data entry form was used to enter each observation. This information is already available through c5_audit
, although this could be inefficient, particularly if c5_audit
becomes large. This table could be replaced by a fixed view in implementation.
id |
observation_id |
version_number |
action_type_id |
---|---|---|---|
1 | 1 | 1 | 1 |
2 | 2 | 1 | 1 |
This table records comments on records in any other table. Comments may be recorded on individual observations or metadata for example, to note something unusual about a record.
id |
version_number |
table_id |
entry_id |
comment |
current |
---|---|---|---|---|---|
1 | 1 | event | 3 | The station was moved due to construction of a new road. | 1 |
2 | 1 | action_type | 4 | This record has been auto generated. | 1 |
-
table_id
– The name of the table in which the comment is associated. -
entry_id
– The id of the record in table_id that the comment is associated with. -
comment
– The comment.
This table defines elements. Elements are independent of periods. For example, there is only a single rainfall element. But there would be multiple rainfall measurements for different measurement periods.
The WMO's essential climate variables was a starting point for this concept, but it is likely too broad for use in this context. The exact classification of elements needs further discussion.
id |
version_number |
name |
current |
---|---|---|---|
1 | 1 | rainfall | 1 |
2 | 1 | mintemp | 1 |
3 | 1 | maxtemp | 1 |
4 | 1 | windspeed | 1 |
5 | 1 | wet bulb | 1 |
6 | 1 | dry bulb | 1 |
-
name
– The name of the element.
This table records measurements they are made. A measurement is something that is recorded for an element over a specific period at a specific height. For example, daily rainfall.
id |
version_number |
element_id |
period |
height |
units |
current |
---|---|---|---|---|---|---|
1 | 1 | 1 | Day | mm | 1 | |
2 | 1 | 1 | Hour | mm | 1 | |
2 | 1 | 2 | Day | Celsius | 1 | |
3 | 1 | 3 | Day | Celsius | 1 | |
4 | 1 | 4 | Hour | Celsius | 1 | |
5 | 1 | 5 | Day | Celsius | 1 |
-
period
– The time period of the measurement. -
height
- The height at which the measurement is recorded. -
units
– The units the measurement is recorded in.
This table defines types of instruments.
id |
version_number |
name |
current |
---|---|---|---|
1 | 1 | Rain gauge | 1 |
2 | 1 | Maximum temperature thermometer | 1 |
3 | 1 | Minimum temperature thermometer | 1 |
4 | 1 | Wind Sock | 1 |
-
name
– The name of the instrument type.
This tables records instances of instruments that exists as physical objects.
id |
version_number |
instrument_id |
location |
current |
---|---|---|---|---|
1 | 1 | 1 | 1 | |
2 | 1 | 2 | 1 | |
3 | 1 | 3 | 1 | |
4 | 1 | 4 | 1 |
-
instrument_id
– The ID of the type of instrument of this record. -
location
– The location of the instrument.
A record in this table defines a combinations of a feature, a measurement and an instrument instance. This links directly to c5_observation
as part of uniquely defining observations.
id |
version_number |
feature_id |
measurement_id |
instrument_instance_id |
current |
---|---|---|---|---|---|
1 | 1 | 1 | 1 | 1 | 1 |
2 | 1 | 1 | 2 | 2 | 1 |
3 | 1 | 2 | 1 | 3 | 1 |
-
feature_id
– The ID of the feature this record refers to. -
measurement_id
– The ID of the measurement this record refers to. -
instrument_instance_id
– The ID of the instrument instance this record refers to.
This table records artifacts, which is a generalisation of observations to include other data types. Examples of artifacts could be files received from an AWS, text messages, or a pixel from satellite data.
id |
version_number |
feature_measure_group_id |
obs_date |
qc_log |
current_best |
value |
---|---|---|---|---|---|---|
1 | 1 | 1 | 13/1/2020 | 1 | [satellite image] |
-
feature_measure_group_id
– The ID of thefeature_measure_group
this artifact refers to. -
obs_date
– The date the artifact was recorded on. -
qc_log
– The quality control that has been carried out on the artifact. -
value
– The artifact value. This may be need to be multiple fields depending on implementation.
These table defines groups of measurements. For example, tmin, tmax, wet bulb and dry bulb may form a temperature measure group. This is analogues to features and feature groups.