You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: src/pages/gsoc_ideas.mdx
+19-5
Original file line number
Diff line number
Diff line change
@@ -82,32 +82,46 @@ Medium
82
82
---
83
83
### 3. Database and Data Improvements{#db}
84
84
85
-
PEcAn relies on the BETYdb database to store trait and yield data as well as model provenance information. This project aims to separate trait data from provenance tracking, and ensure that PEcAn is able to run without the server currently required to run the Postgres database used by BETYdb. The goal is to make PEcAn workflows easier to test, deploy, and use while also making data more accessible.
85
+
PEcAn relies on the BETYdb database to store trait and yield data as well as model provenance information. This project aims to separate trait data from provenance tracking, ensure that PEcAn is able to run without the server currently required to run the Postgres database used by BETYdb, and enable flexible data sharing in place of a server-reliant sync mechanism. The goal is to make PEcAn workflows easier to test, deploy, and use while also making data more accessible.
86
86
87
87
88
88
**Potential Directions**
89
89
90
-
-**Minimal BETYdb Database:** Create a simplified version of BETYdb for demonstrations and Integration tests.
91
-
-**Non-Database Setup:** Enable workflows that do not require PostgreSQL or a web front-end.
90
+
-**Minimal BETYdb Database:** Create a simplified version of BETYdb for demonstrations and Integration tests, which might include:
91
+
- Review the provenance information we currently log, identify components that no longer need to be tracked or that should be temporary rather than permanent records, and build tools to clean unneeded records from the database.
92
+
- Design and create a freestanding version of the trait data, including choosing the format and distribution method, implementing whatever pipelines are needed to move the data over, and documenting how to use and update the result.
93
+
- Review the information we currently log, identify components that no longer need to be tracked or that should be temporary rather than permanent, and build tools to clean unneeded/expired records from the database.
94
+
95
+
-**Non-Database Setup:** Enable workflows that do not require PostgreSQL or a web front-end, potentially including:
96
+
- Identify PEcAn modules that are still DB-dependent and refactor them to allow freestanding use
97
+
- Implement mechanisms for decoupling the DB from the model pipelines in time and space while still tracking provenance. Perhaps this could involve separate prep/execution/post-logging phases, but we encourage your creative suggestions.
98
+
- Create tools that maximize interoperability with data from other sources, including from external databases or the user's own observations.
99
+
- Identify functionality from the "BETYdb network" sync system that is out of date and replace or remove it as needed.
92
100
93
101
**Expected outcomes**:
94
102
95
103
A successful project would complete a subset of the following tasks:
96
104
- A lightweight, distributable demo Postgres database.
105
+
- A distributable dataset of the existing trait and yield records in a maximally reusable format (i.e. maybe _not_ Postgres)
97
106
- A workflow that is independent of the Postgres database.
98
107
108
+
**Skills Required**:
99
109
110
+
- Familiarity with database concepts required
111
+
- Postgres experience helpful (and required if proposing DB cleanup tasks)
112
+
- R experience helpful (and required if proposing PEcAn code changes)
100
113
101
114
**Contact person:**
102
115
103
116
Chris Black (@infotroph)
104
117
105
118
**Duration:**
106
119
107
-
Suitable fora Medium (175hr) or Large (350 hr) project.
120
+
Suitable for a Medium (175hr) or Large (350 hr) project.
0 commit comments