Skip to content

Commit c971cd3

Browse files
authored
Merge pull request #73 from PecanProject/gsoc2025
[WIP] Update 2025 GSOC ideas
2 parents 72c5d87 + 9fc6daa commit c971cd3

File tree

1 file changed

+63
-85
lines changed

1 file changed

+63
-85
lines changed

src/pages/gsoc_ideas.mdx

Lines changed: 63 additions & 85 deletions
Original file line numberDiff line numberDiff line change
@@ -1,102 +1,127 @@
11
---
2-
title: 'GSoC 2024 - PEcAn Project Ideas'
2+
title: 'GSoC 2025 - PEcAn Project Ideas'
33
---
44

55
# [GSoC - PEcAn Project Ideas](#background)
66

7-
Ecosystem science has many components, so does PEcAn! Some of those components where you can contribute. Below is a list of potential ideas. Feel free to contact any of the mentors in slack, or feel free to ask questions in our #gsoc-2024 channel in slack.
7+
Ecosystem science has many components, so does PEcAn! Some of those components where you can contribute. Below is a list of potential ideas. Feel free to contact any of the mentors in slack, or feel free to ask questions in our #gsoc-2025 channel in slack.
88

99
---
1010

1111
## [Project Ideas](#ideas)
1212

13-
Following is a list of project ideas, use this list to contact the appropriate mentors on slack. Feel free to propose your own ideas as well, in this case contact @kooper in slack so he can put you in contact with the best mentors.
13+
Following is a list of project ideas, use this list to contact the appropriate mentors on slack. Feel free to propose your own ideas as well, in this case contact @kooper in Slack so he can put you in contact with the best mentors.
1414

1515
---
1616

17-
#### [Machine Learning downscaling of PEcAn outputs](#ml)
17+
#### [Global sensitivity analysis / uncertainty partitioning](#sa)
1818

19-
This project would extend an existing prototype that takes ensemble-based outputs from the process-based PEcAn models (and the data assimilation code in particular) and use ML models to make predictions to new locations where the PEcAn models were not run (a.k.a. downscaling). Existing code downscales the low-frequency (monthly to annual) carbon pool outputs using a random forest model and a harmonized stack of gridded spatial data (climate, land use/land cover, soils, topography). The current system also preserves the covariance structure across variables, space, and time by downscaling each model ensemble member separately and then using the downscaled ensemble to calculate summary statistics. Also included are some basic assessments of (cross-)validation skill and variable importance.
19+
This project would extend PEcAn's existing uncertainty partitioning routines, which are primarily one-at-a-time and focused on model parameters, to also consider ensemble-based uncertainties in other model inputs (meteorology, soils, vegetation, phenology, etc). This project would employ Sobol' methods and some uncommitted code exists that manually prototyped how this would be done in PEcAn. The goal would be to refactor/reimplement this prototype into a reliable, automated system and apply it to some key test cases in both natural and managed ecosystems.
2020

21-
**Expected outcome:**
21+
22+
**Expected outcomes:**
2223

2324
A successful project would complete at subset of the following tasks:
2425

25-
1. Extend the code to downscale higher-frequency (hourly to daily) carbon flux outputs
26-
2. Develop tools for aggregating downscaled outputs to user-specified spatial units (e.g., political boundaries, atmospheric model grid cells)
27-
3. Explore alternative ML models and multi-model ensembles.
28-
4. Extend the set of covariate data to make use of time-varying inputs (e.g. that year’s weather rather than the climatological mean), additional remotely sensed observations, and the previous ecosystem state.
29-
5. Improving the downscaling validation checks, potentially adding additional corrections to the computed uncertainties (current prototype tool tends to underpredict the ensemble spread).
26+
* Reliable, automated Sensitivity analyss and uncertainty partitioning
27+
* Applications to test case(s) in natural and / or managed ecosystems.
3028

3129
**Prerequisites:**
3230

33-
- Required: R (existing prototype is in R); basic familiarity with ML techniques and packages
34-
- Helpful: familiarity with large spatial gridded data (e.g., GIS, R terra, remote sensing); more advanced statistics, ML, or data science; Python
31+
- Required: R (existing workflow and prototype is in R)
32+
- Helpful: familiarity with sensitivity analyses
3533

3634
**Contact person:**
35+
3736
Mike @Dietze
3837

3938
**Duration:**
40-
Size: 175 hours for 1-2 tasks, 350 hours for 3 or more tasks
39+
40+
Flexible to work as either a Medium (175hr) or Large (350 hr)
4141

4242
**Difficulty:**
43+
4344
Medium
4445

4546
---
4647

47-
#### [Adopting data schema for field management events](#management)
48+
#### [Parallelization of runs](#hpc)
4849

49-
This project aims to adapt a data schema for an R shiny application called fieldactivity. Fieldactivity is an application that allows field operators and researchers to enter field information about management activities through UI to aid bookkeeping of such events. The management activities and associated information are then stored in json files from which the information can be used for modelling.
50+
This project would extend PEcAn's existing run mechanisms to be able to run on an HPC using apptainer. For uncertaintity analysis, PEcAn will run 1000s of runs of the same model with small permutations. This is a perfect use for an HPC run. The goal is to not submit 1000s of jobs, but have a single job with multiple nodes that will run all of the ensembles efficiently. Running can be orchistrated using RabbitMQ but other methods are encouraged as well. The end goal should be for the PEcAn system to be launched, and run the full workflow on the HPC from start to finish leveraging as many nodes as given during the submission.
5051

51-
The fieldactivity application uses UI elements that are created with RShiny and therefore follows the R coding conventions. At the moment, to meet these R coding criteria, the data structure is read from a json file called ui_structure_json, which contains the necessary attributes to create the UI with R. As this json file is independent and does not communicate with any other data sources, it must be manually updated if the data requirements are to be kept up to date with other data sources. To overcome the potential differences between the data sources, we have created a json data schema ([management-event.schema.json](https://github.com/hamk-uas/fieldobservatory-data-schemas/blob/main/management-event.schema.json)) to act as a single source of truth for different data sources. The GSoC task is to incorporate this schema into the fieldactivity shiny app such that it can read the variable information from the schema and store the data in the correct structure. In addition, the app should be made flexible such that when a change is made to the json schema, it can deploy and change / create UI elements accordingly on the fly. To achieve this, the functionalities around how the applications store the data need to be reconstructed.
52+
**Expected outcomes:**
5253

53-
**Expected outcome:**
54-
55-
The project can be divided to following subtasks:
54+
A successful project would complete at subset of the following tasks:
5655

57-
1. The fieldactivity application will be able to handle/read the data, which have been stored in the current way or structured according to the management data schema.
58-
2. The data storage convention will be changed for those management cases, where it is possible to store multiple incidents at once. Currently these cases are stored in a list in a format that the data schema doesn’t support.
59-
3. Include the data schema as part of the fieldactivity code:
60-
- Variable names and metadata are read from the data schema. This also requires translation of the data schema information so that UI elements can be created in R Shiny.
61-
- Stored data follows the structure and the names of the data schema.
56+
* Show different ways to launch the jobs (rabbitmq, lock files, simple round robin, etc)
57+
* Report of different options and how they can be enabled
6258

6359
**Prerequisites:**
6460

65-
- Required: R and RShiny, json
61+
- Required: R (existing workflow and prototype is in R), docker
62+
- Helpful: familiarity with HPC and apptain
6663

6764
**Contact person:**
68-
Henri Kajasilta
65+
66+
Rob @Kooper
6967

7068
**Duration:**
71-
Flexible to work as either a Small (175hr) or Large (350 hr)
69+
70+
Flexible to work as either a Medium (175hr) or Large (350 hr)
7271

7372
**Difficulty:**
73+
7474
Medium
7575

7676
---
77+
#### [Database Improvements](#db)
78+
79+
**Chris TODO**
80+
- decouple traits from provenance
81+
- make betydb.org data available through R package
82+
83+
84+
85+
**Contact person:**
86+
Chris Black (@infotroph)
87+
88+
**Duration:**
89+
Flexible to work as either a Medium (175hr) or Large (350 hr)
90+
91+
**Difficulty:**
92+
Medium, Large
93+
94+
---
95+
96+
#### [Development of Notebook-based PEcAn Workflows](#notebook)
7797

78-
#### [PEcAn Code Hardening by Integration Testing](#testing)
98+
The PEcAn workflow is currently run using either a web based user interface, an API, or custom R scripts. The web based user interface is easiest to use, but has limited functionality whereas the custom R scripts and API are more flexible, but require more experience.
7999

80-
The proposed project aims to enhance the reliability of PEcAn's integration tests by prioritizing packages associated with overall workflow bottlenecks. The focus will be on preparing contributors to gain an in-depth understanding of PEcAn's inner workings and the interactions between modules. It will commence with prioritizing basic runs to establish a robust foundation that include single site, single model runs to cover the major models. Subsequently, attention will shift towards ensemble runs, diversifying testing scenarios to ensure comprehensive coverage. A specific emphasis will be placed on Data Simulation models for single site, single model runs, with a focus on prominent models. This initiative aims to provide contributors with a holistic perspective on PEcAn's functionality, fostering a deeper understanding of how individual modules contribute to the overall workflow. By combining these elements, the GSoC project seeks to create a structured and immersive learning experience that equips participants to contribute effectively to PEcAn's development while addressing critical workflow bottlenecks.
100+
This project will focus on building Quarto workflows aimed at providing an interface to PEcAn that is both welcoming to new users and flexible enough to be a starting point for more advanced users. It will build on existing [Pull Request 1733](https://github.com/PecanProject/pecan/pull/1733).
81101

82102
**Expected outcome:**
83103

84-
- Increased module and model coverage in PEcAn’s automated integration tests; contributors can understand which components are and are not covered by existing tests.
104+
- Two or more template workflows for running the PEcAn workflow. Written vignette and video tutorial introducing their use.
85105

86106
**Prerequisites:**
87107

88-
- R
108+
- Familiarity with R. Familiarity with R studio and Quarto or Rmarkdown is a plus.
89109

90110
**Contact person:**
91-
Chris Black (@infotroph), Shashank Singh (@moki1202)
111+
David LeBauer @dlebauer, Nihar Sanda @koolgax99
92112

93113
**Duration:**
94-
Flexible to work as either a Small (175hr) or Large (350 hr)
114+
Medium (175hr)
95115

96116
**Difficulty:**
97-
Medium, Large
117+
Medium
98118

99-
---
119+
120+
121+
<!--
122+
123+
124+
# This comment section for ideas that may be potentially viable in future (with revision)
100125

101126
#### [Optimize PEcAn for freestanding use of single packages [R package development]](#freestanding)
102127

@@ -124,12 +149,11 @@ Flexible to work as either a Small (175hr) or Large (350 hr)
124149

125150
**Difficulty:**
126151
Medium, Large
127-
128152
---
129153

130154
#### [PEcAn model coupling and development [Data Science]](#coupling)
131155

132-
PEcAn has the capability to interface multiple ecological models. The goal of this project is to improve the coupling of existing models to PEcAn (specifically FATES) and add new models (specifically a simple vegetation model that is under development). It is also possible to contribute to the development of the simple vegetation model which is written in fortran.
156+
PEcAn has the capability to interface multiple ecological models. The goal of this project is to improve the coupling of existing models to PEcAn (specifically FATES) and add new models (specifically a simple vegetation model that is under development). It is also possible to contribute to the development of the simple vegetation model which is written in Fortran.
133157

134158
**Expected outcome:**
135159

@@ -149,51 +173,5 @@ Flexible to work as either a Small (175hr) or Large (350 hr)
149173
Medium
150174

151175
---
176+
-->
152177

153-
#### [Development of Notebook-based PEcAn Workflows](#notebook)
154-
155-
The PEcAn workflow is currently run using either a web based user interface, an API, or custom R scripts. The web based user interface is easiest to use, but has limited functionality whereas the custom R scripts and API are more flexible, but require more experience.
156-
157-
This project will focus on building Quarto workflows aimed at providing an interface to PEcAn that is both welcoming to new users and flexible enough to be a starting point for more advanced users. It will build on existing [Pull Request 1733](https://github.com/PecanProject/pecan/pull/1733).
158-
159-
**Expected outcome:**
160-
161-
- Two or more template workflows for running the PEcAn workflow. Written vignette and video tutorial introducing their use.
162-
163-
**Prerequisites:**
164-
165-
- Familiarity with R. Familiarity with R studio and Quarto or Rmarkdown is a plus.
166-
167-
**Contact person:**
168-
David LeBauer @dlebauer, Nihar Sanda @koolgax99
169-
170-
**Duration:**
171-
Small (175hr)
172-
173-
**Difficulty:**
174-
Medium
175-
176-
---
177-
178-
#### [PEcAn in the cloud](#cloud)
179-
180-
The PEcAn system is a complex system with many microservices such as the database system, frontend, models, job management etc. These microservices lend themselves to be deployed in the cloud. We have an existing helm chart that should get you most of the way there and should allow you to deploy pecan on kubernetes. Additionally there is a docker-compose file that should allow you to deploy PEcAn on a single server using docker.
181-
182-
This project will take the helm chart and docker-compose files and harden them and upgrade them to use the latest versions of containers. The current system uses the shared folder not only to deploy data in all services, but also uses it to let the central system know when executions are finished. We would like to move away from this shared system and use the message system to indicate executions are done, and use a file service to pull and push data (for example from/to S3).
183-
184-
**Expected outcome:**
185-
186-
- Updates to docker-compose and helm chart, as well as code submissions to mark executions as finished using RabbitMQ and file push/pull functionality when executing jobs.
187-
188-
**Prerequisites:**
189-
190-
- Familiarity with Kubernetes, Docker, Helm and R. Familiarity with RabbitMQ and postgreSQL is a plus
191-
192-
**Contact person:**
193-
Rob Kooper @kooper, Samu Varjonen @samu, Istem Fer @istfer
194-
195-
**Duration:**
196-
Large (350 hr)
197-
198-
**Difficulty:**
199-
Medium

0 commit comments

Comments
 (0)