Added Outfilling dialog #9381

Vitalis95 · 2025-01-22T07:15:04Z

Fixes #9363
@rdstern @lilyclements @jkmusyoka , please check the progress.

Omit Months is not working yet, and I have raised that in the issue.

rdstern · 2025-01-23T07:30:40Z

@Vitalis95 many thanks. I show the current dialog here to help @jkmusyoka and @lilyclements to contribute to the review:

And very well done. This looks really nice and neat. I suggest it is important also for our audience. I note you already have the automatic filling of the controls, when the data are defined as climatic. I suggest having the dialog - rather than just the script - is important in our strategy of involving the ZMD staff centrally in this stage of the research on outfilling. I suggest many staff could contribute well via the dialog route and it would be much harder for them otherwise.

Also:

a) @jkmusyoka I assume you have a version of the data from Eastern Province that we used for the outfilling. Can you add a copy here, so we can try it with the dialog?
b) @lilyclements I assume we do want to use "our" version of the Omit Months, (see the Rainfall QC dialog)?
This feature is important to get the testing working ok for Zambia. I am happy if we decide instead to have a simpler control - used in the current function, which uses the month numbers. But I am nervous that we may want to use this facility with the shifted data - James what did we do in Zambia in December? We could alternatively tell users this currently is only for non-shifted data, so easy to give month numbers as Lily does currently?
c) Lily, what did Emily say about the Station to Exclude control. Before asking Vitalis to improve it, I wonder on the reply as to whether it is needed at all - given we can use filters in R-Instat. Can we omit that?
d) Vitalis, one omission you could sort out now is that we need a Store Result control at the bottom- the same as in the Calculator and other dialogs. Lily, you are going to make the corresponding change in the function? (It currently makes a new data frame, despite just producing a single column of the same length. It will be so much easier to use when it just adds to the existing dataframe. What should the default name be? Maybe outfilled!
e) While you are doing that could you add a blank at the top of the Omit station control, - currently we have to omit a station.
Could you also add a numeric version of the Omit Months checkbox, on the left instead of "ours". If checked this opens a drop-down (same control as for your current bins) with 5,6,7,8,9 as the default and one can type numbers into it. Have 1, 2, 3, 11, 12 as a second and 1,2,12 as a third.
And in the drop down for bins, change 5 to 2 in the second option there.
f) Lily do we need a More button towards a sub-dialog that facilitates changing the other arguments, or can that wait?

Vitalis95 · 2025-01-23T10:19:55Z

@rdstern , sorted out e). I will add Store Result control once the function is changed.

rdstern · 2025-01-23T11:56:40Z

zambia_data_for outfilling2.zip

@Vitalis95 @lilyclements and @jkmusyoka this is wonderful. The dialog is working already!

a) I attach the data file for Eastern Province (first data frame) together with the results from 3 runs of the dialog (Tamsat, chirps and era5) :

Amazing it is working immediately. So full marks to Lily, for the outfillingR package. I installed it directly from the menu. (I'm still not clear whether the importing dialog needs the James adjustment - I just use the dialog.)

Then it ran first time!

Lily I still would far prefer it to produce a new column in the same data frame, rather than a whole new data frame.
And I still think it is important to have an option of a specified random number seed. The whole thing is comparisons, and this includes comparisons of the different methods - which should use the same sequences.

But should we merge this now, while we wait?

Vitalis95 · 2025-01-24T11:12:56Z

@rdstern , have a look at it

rdstern

@Vitalis95 and @lilyclements that's wonderful - what I have tested. It seems to work.

Vitalis, the only trivial point is that the label Stations to Exclude is too long, and is therefore only incomplete. Could you please change it to Omit Stations.

@N-thony this is a major new facility. Can you please check so it can be merged?

@lilyclements I tried with the 3 methods and all seem to work on these data. (tamsat, chirps and era5.) I also tried era5 twice with the same initial random number seed and got the same answers. Brilliantly done therefore.

The only result I don't understand is that tamsat outfilling gave missing values when tamsta was missing. But the rainfall was not missing then. I thought that it would give the station rainfall except when that was missing. So now I'm not sure what the results are?

lilyclements · 2025-01-24T11:54:20Z

@Vitalis95 nice. I ran this for my data with custom_bins=c(1, 3, 5, 10, 15, 20) and got an error in the R code. I did not get this error with custom_bins=c(1, 5, 10, 15, 20). Since this is a problem with the R code, I will document it in the outfillingR repo here.

I do not understand the system well enough to know a suitable fix. But essentially what is happening is an NA is generated for the SD of rainfall in one month due to small bin sizes, and so it throws an error. Something to discuss with Emily, but @rdstern if you could take a look at the issue in case it is something you can understand then that would be very useful!

In terms of the dialog, this looks all good - except that one of the checkboxes is too small and cuts off the word "Exclude".

Roger Addition: Agreed with the last sentence - Could it be changed to the shorter Omit Stations

rdstern · 2025-01-27T16:21:50Z

@lilyclements did you see Emily's answer to my query above? Essentially that it might not yet be running the second part of the function? If that's the case, we should also check whether the current results are useful as part of the checking process. of how to adapt the different methods. We then leave the data for the main stations intact, which is what we want, when applying them to more stations.

lilyclements · 2025-01-27T18:37:25Z

@rdstern yes, it sounds like something we can go through next week when we meet? I'm not sure if I will be able to come in person due to my ankle but I can certainly join on teams for the afternoon.

Vitalis95 · 2025-02-10T13:18:43Z

@rdstern , have a look at it
@N-thony , is it possible to set the prefix for the two columns produced at the same time, eg tamsat_est, and tamsat_out. ?

N-thony · 2025-02-10T15:07:11Z

@rdstern , have a look at it @N-thony , is it possible to set the prefix for the two columns produced at the same time, eg tamsat_est, and tamsat_out. ?

@Vitalis95 that should be easy in R. Can you check with @lilyclements?

lilyclements · 2025-02-10T15:51:44Z

@Vitalis95 how about:

infilled_data <- outfillingR::do_infilling(...)  # run this as you currently do (I'm not sure what you assign it to, but just put `infilled_data` as a hypothetical name for now

new_colnames <- colnames(infilled_data) 
colnames(infilled_data) <- paste0("tamsat", "_", new_colnames)

I believe the output data frame is just two columns - our outfilled_rain and generated_rain. I think this should work!

Vitalis95 · 2025-02-11T08:29:59Z

@lilyclements , Roger suggested that the two new columns should have the suffixes _est and _out. @rdstern, is that correct? Do we still need the save control?

lilyclements · 2025-02-11T12:00:15Z

@Vitalis95 can you try this:

   infilled_data <- outfillingR::do_infilling(...)  # run this as you currently do (I'm not sure what you assign it to, but just put `infilled_data` as a hypothetical name for now
    colnames(infilled_data) <- c(paste0(rainfall_estimate_column, "_gen"), paste0(rainfall_estimate_column, "_out"))

Where rainfall_estimate_column is the variable name for the variable in the receiver for the rainfall estimate
And generated_weather is the name of the do_outfilling script.

Can the save control still do a prefix check to it - so we add a prefix to it if that column already exists?

rdstern · 2025-02-12T08:27:13Z

@lilyclements it still says the package is corrupt! Back to you!
@Vitalis95 Great that you have the button for the stations. Could the label be just Stations. The button could be smaller and just say Include. And a small tweak, is that if you tick the checkbox and then include none, then Ok is disabled. If easy, maybe the default could be that everything is ticked when you first use it?

lilyclements · 2025-02-12T14:29:59Z

@rdstern I am not getting a corrupt error when downloading it. Can you try removing it completely by running this:

remove.packages("outfillingR")

Then try downloading it again?

rdstern · 2025-02-12T16:38:34Z

@Vitalis95 and @lilyclements I followed Lily's instruction above and it is now working!
I don't quite understand why - but that's not particularly important.
On this topic, if that is something that we may want to do in the future, then I wonder if:
a) We add a third button to the Tools > Install R-Package. Maybe called Remove?
b) The command remove.packages("outfillingR"). Gave zero output. It does give an error if I mis-type the name, but nicer if it said if it works?

Ok , but to the results. Here is a snapshot:

It looks sensible. I did 2 runs, tamsat is the tamsat original data. tamsat_ and tamsat_2 are the 2 estimates we had last time, while tamsat_1 and tamsat_21 are the outfilled variables.

Lily and Vitalis I thinking that early on we probably just want tamsat_ and tamsat_2 when we are checking which method to use. Then we don't need to outfill. Later, once it is working well we need the outfilled data, but maybe not the checking variables?
If you agree we should have two individual storing variables, each with a checkbox, and default checked. The checkbox labels could be Estimated and Outfilled and the default names maybe tam_est and tam_out. (what's the better word than estimated?) So with the first 3 letters from the Estimates receiver.

Vitalis95 · 2025-02-14T09:20:10Z

@rdstern , have a look at it

rdstern · 2025-02-14T10:29:00Z

@Vitalis95 it is looking very good.
a) Could you swap the filled names round please. So chirps.out goes with outfilled, and gen with generated!
b) The Stations part looks good, but there is a bit for you, and also maybe something for Lily to consider:

The control still says Filter From Factor. Could it say Stations to Include here.
The setting has no memory yet. I ran selecting 2 stations. When I returned the ncontrol was empty again.
Only if easy. When you first go into the dialog could it start by including the first factor by default. Also if possible, by default, having all selected? And I'm very happy if that also applies to the Filter from Factor sub-dialog.

Now the @lilyclements question. There is currently no output to record what we have done. This is more general than just the Stations report. Could we have a summary of the settings, etc in the output window? It could also mention which variable names, in the data were produced. We quicly get a lot of columns to compare and it would be good to have a record of how each one was produced.

For @lilyclements I also hit the error she reported.

It is a nice message, but reminded me that you and Emily wqere going to tweak the code so the bin limits and somethoing else, were unneceassary?

When you do, then I wonder about bin sizes? I guess the more data you have the more bins you could allow and I assume (up to a limit) the more the merrier? And if the contents will be roughly equal then could we choose what the value will be, say between roughly 10% so 10 bins and maybe 20%? And maybe the top bin has half the others, because we need data there, but it is the most important?
Trying to tempt your discussion with Emily!

lilyclements · 2025-02-14T10:40:05Z

@rdstern thanks for the reminder. I have just messaged Emily about it and we will hopefully meet next week

rdstern · 2025-02-14T10:57:19Z

@lilyclements I have also added to the comment above suggesting another task for you! That's for the function to add a report of the settings for each run into the output window! I hope that's easy.

rdstern · 2025-02-17T11:01:36Z

@Vitalis95 nice that it fills by default. And it remembers now too.
a) Oh no it doesn't remember. Maybe this is still work in progress? If I just select 2 stations and return, then all 5 stations are ticked again.
b) And the name of the sub-dialog is still Filters from Factors.
c) And it doesn't choose the first factor by default, when you first go in.

d) And just to mention another @lilyclements task. I remember Emily said it was very quick to execute in Python and you were going to see if there was anything obvious in the running of the R code?

update

Vitalis95 · 2025-02-18T13:06:05Z

@rdstern , if you want the first factor by default, then we make the first label by default be selected, not all selected as it is now.

rdstern

@Vitalis95 looks great. I am approving.
@N-thony can you please check and merge.
@lilyclements you were going to examine why it takes so much longer than in python. I assume that can wait for other changes after you have discussed with Emily.

Vitalis95 added 4 commits January 17, 2025 15:43

changes to the code

e0c8735

changes

4c522e5

changes

a886cd6

changes

86f2eb6

rdstern added the Climatic label Jan 23, 2025

changes to the code

7b33222

Vitalis95 added 2 commits January 24, 2025 14:10

changes to the code

68dff72

changes

c32edf6

rdstern reviewed Jan 24, 2025

View reviewed changes

Vitalis95 added 3 commits January 30, 2025 10:56

changes to the code

dde7391

changes

1674e85

changes

384c4dd

changes

fa6e141

changes to the code

b75ed71

changes to the code

85a4026

Vitalis95 and others added 3 commits February 18, 2025 15:53

changes

b2e5bd5

Merge pull request #319 from IDEMSInternational/master

f9f2769

update

changes

74c438e

rdstern approved these changes Feb 18, 2025

View reviewed changes

N-thony approved these changes Feb 24, 2025

View reviewed changes

N-thony merged commit b993868 into IDEMSInternational:master Feb 24, 2025
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added Outfilling dialog #9381

Added Outfilling dialog #9381

Vitalis95 commented Jan 22, 2025

rdstern commented Jan 23, 2025

Vitalis95 commented Jan 23, 2025

rdstern commented Jan 23, 2025

Vitalis95 commented Jan 24, 2025

rdstern left a comment

lilyclements commented Jan 24, 2025 •

edited by rdstern

Loading

rdstern commented Jan 27, 2025

lilyclements commented Jan 27, 2025

Vitalis95 commented Feb 10, 2025

N-thony commented Feb 10, 2025

lilyclements commented Feb 10, 2025 •

edited

Loading

Vitalis95 commented Feb 11, 2025

lilyclements commented Feb 11, 2025

rdstern commented Feb 12, 2025 •

edited

Loading

lilyclements commented Feb 12, 2025

rdstern commented Feb 12, 2025

Vitalis95 commented Feb 14, 2025

rdstern commented Feb 14, 2025 •

edited

Loading

lilyclements commented Feb 14, 2025

rdstern commented Feb 14, 2025

rdstern commented Feb 17, 2025

Vitalis95 commented Feb 18, 2025

rdstern left a comment

Added Outfilling dialog #9381

Added Outfilling dialog #9381

Conversation

Vitalis95 commented Jan 22, 2025

rdstern commented Jan 23, 2025

Vitalis95 commented Jan 23, 2025

rdstern commented Jan 23, 2025

Vitalis95 commented Jan 24, 2025

rdstern left a comment

Choose a reason for hiding this comment

lilyclements commented Jan 24, 2025 • edited by rdstern Loading

rdstern commented Jan 27, 2025

lilyclements commented Jan 27, 2025

Vitalis95 commented Feb 10, 2025

N-thony commented Feb 10, 2025

lilyclements commented Feb 10, 2025 • edited Loading

Vitalis95 commented Feb 11, 2025

lilyclements commented Feb 11, 2025

rdstern commented Feb 12, 2025 • edited Loading

lilyclements commented Feb 12, 2025

rdstern commented Feb 12, 2025

Vitalis95 commented Feb 14, 2025

rdstern commented Feb 14, 2025 • edited Loading

lilyclements commented Feb 14, 2025

rdstern commented Feb 14, 2025

rdstern commented Feb 17, 2025

Vitalis95 commented Feb 18, 2025

rdstern left a comment

Choose a reason for hiding this comment

lilyclements commented Jan 24, 2025 •

edited by rdstern

Loading

lilyclements commented Feb 10, 2025 •

edited

Loading

rdstern commented Feb 12, 2025 •

edited

Loading

rdstern commented Feb 14, 2025 •

edited

Loading