Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Outfilling dialog #9381

Merged
merged 16 commits into from
Feb 24, 2025
Merged

Conversation

Vitalis95
Copy link
Contributor

Fixes #9363
@rdstern @lilyclements @jkmusyoka , please check the progress.

Omit Months is not working yet, and I have raised that in the issue.

@rdstern
Copy link
Collaborator

rdstern commented Jan 23, 2025

@Vitalis95 many thanks. I show the current dialog here to help @jkmusyoka and @lilyclements to contribute to the review:

image

And very well done. This looks really nice and neat. I suggest it is important also for our audience. I note you already have the automatic filling of the controls, when the data are defined as climatic. I suggest having the dialog - rather than just the script - is important in our strategy of involving the ZMD staff centrally in this stage of the research on outfilling. I suggest many staff could contribute well via the dialog route and it would be much harder for them otherwise.

Also:

a) @jkmusyoka I assume you have a version of the data from Eastern Province that we used for the outfilling. Can you add a copy here, so we can try it with the dialog?
b) @lilyclements I assume we do want to use "our" version of the Omit Months, (see the Rainfall QC dialog)?
This feature is important to get the testing working ok for Zambia. I am happy if we decide instead to have a simpler control - used in the current function, which uses the month numbers. But I am nervous that we may want to use this facility with the shifted data - James what did we do in Zambia in December? We could alternatively tell users this currently is only for non-shifted data, so easy to give month numbers as Lily does currently?
c) Lily, what did Emily say about the Station to Exclude control. Before asking Vitalis to improve it, I wonder on the reply as to whether it is needed at all - given we can use filters in R-Instat. Can we omit that?
d) Vitalis, one omission you could sort out now is that we need a Store Result control at the bottom- the same as in the Calculator and other dialogs. Lily, you are going to make the corresponding change in the function? (It currently makes a new data frame, despite just producing a single column of the same length. It will be so much easier to use when it just adds to the existing dataframe. What should the default name be? Maybe outfilled!
e) While you are doing that could you add a blank at the top of the Omit station control, - currently we have to omit a station.
Could you also add a numeric version of the Omit Months checkbox, on the left instead of "ours". If checked this opens a drop-down (same control as for your current bins) with 5,6,7,8,9 as the default and one can type numbers into it. Have 1, 2, 3, 11, 12 as a second and 1,2,12 as a third.
And in the drop down for bins, change 5 to 2 in the second option there.
f) Lily do we need a More button towards a sub-dialog that facilitates changing the other arguments, or can that wait?

@Vitalis95
Copy link
Contributor Author

@rdstern , sorted out e). I will add Store Result control once the function is changed.

@rdstern
Copy link
Collaborator

rdstern commented Jan 23, 2025

zambia_data_for outfilling2.zip

@Vitalis95 @lilyclements and @jkmusyoka this is wonderful. The dialog is working already!

a) I attach the data file for Eastern Province (first data frame) together with the results from 3 runs of the dialog (Tamsat, chirps and era5) :

image

Amazing it is working immediately. So full marks to Lily, for the outfillingR package. I installed it directly from the menu. (I'm still not clear whether the importing dialog needs the James adjustment - I just use the dialog.)

Then it ran first time!

Lily I still would far prefer it to produce a new column in the same data frame, rather than a whole new data frame.
And I still think it is important to have an option of a specified random number seed. The whole thing is comparisons, and this includes comparisons of the different methods - which should use the same sequences.

But should we merge this now, while we wait?

@Vitalis95
Copy link
Contributor Author

@rdstern , have a look at it

Copy link
Collaborator

@rdstern rdstern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Vitalis95 and @lilyclements that's wonderful - what I have tested. It seems to work.

Vitalis, the only trivial point is that the label Stations to Exclude is too long, and is therefore only incomplete. Could you please change it to Omit Stations.

@N-thony this is a major new facility. Can you please check so it can be merged?

@lilyclements I tried with the 3 methods and all seem to work on these data. (tamsat, chirps and era5.) I also tried era5 twice with the same initial random number seed and got the same answers. Brilliantly done therefore.

The only result I don't understand is that tamsat outfilling gave missing values when tamsta was missing. But the rainfall was not missing then. I thought that it would give the station rainfall except when that was missing. So now I'm not sure what the results are?

@lilyclements
Copy link
Contributor

lilyclements commented Jan 24, 2025

@Vitalis95 nice. I ran this for my data with custom_bins=c(1, 3, 5, 10, 15, 20) and got an error in the R code. I did not get this error with custom_bins=c(1, 5, 10, 15, 20). Since this is a problem with the R code, I will document it in the outfillingR repo here.

I do not understand the system well enough to know a suitable fix. But essentially what is happening is an NA is generated for the SD of rainfall in one month due to small bin sizes, and so it throws an error. Something to discuss with Emily, but @rdstern if you could take a look at the issue in case it is something you can understand then that would be very useful!

In terms of the dialog, this looks all good - except that one of the checkboxes is too small and cuts off the word "Exclude".

Roger Addition: Agreed with the last sentence - Could it be changed to the shorter Omit Stations

@rdstern
Copy link
Collaborator

rdstern commented Jan 27, 2025

@lilyclements did you see Emily's answer to my query above? Essentially that it might not yet be running the second part of the function? If that's the case, we should also check whether the current results are useful as part of the checking process. of how to adapt the different methods. We then leave the data for the main stations intact, which is what we want, when applying them to more stations.

@lilyclements
Copy link
Contributor

@rdstern yes, it sounds like something we can go through next week when we meet? I'm not sure if I will be able to come in person due to my ankle but I can certainly join on teams for the afternoon.

@Vitalis95
Copy link
Contributor Author

@rdstern , have a look at it
@N-thony , is it possible to set the prefix for the two columns produced at the same time, eg tamsat_est, and tamsat_out. ?

@N-thony
Copy link
Collaborator

N-thony commented Feb 10, 2025

@rdstern , have a look at it @N-thony , is it possible to set the prefix for the two columns produced at the same time, eg tamsat_est, and tamsat_out. ?

@Vitalis95 that should be easy in R. Can you check with @lilyclements?

@lilyclements
Copy link
Contributor

lilyclements commented Feb 10, 2025

@Vitalis95 how about:

infilled_data <- outfillingR::do_infilling(...)  # run this as you currently do (I'm not sure what you assign it to, but just put `infilled_data` as a hypothetical name for now

new_colnames <- colnames(infilled_data) 
colnames(infilled_data) <- paste0("tamsat", "_", new_colnames)

I believe the output data frame is just two columns - our outfilled_rain and generated_rain. I think this should work!

@Vitalis95
Copy link
Contributor Author

@lilyclements , Roger suggested that the two new columns should have the suffixes _est and _out. @rdstern, is that correct? Do we still need the save control?

@lilyclements
Copy link
Contributor

@Vitalis95 can you try this:

   infilled_data <- outfillingR::do_infilling(...)  # run this as you currently do (I'm not sure what you assign it to, but just put `infilled_data` as a hypothetical name for now
    colnames(infilled_data) <- c(paste0(rainfall_estimate_column, "_gen"), paste0(rainfall_estimate_column, "_out"))

Where rainfall_estimate_column is the variable name for the variable in the receiver for the rainfall estimate
And generated_weather is the name of the do_outfilling script.

Can the save control still do a prefix check to it - so we add a prefix to it if that column already exists?

@rdstern
Copy link
Collaborator

rdstern commented Feb 12, 2025

@lilyclements it still says the package is corrupt! Back to you!
@Vitalis95 Great that you have the button for the stations. Could the label be just Stations. The button could be smaller and just say Include. And a small tweak, is that if you tick the checkbox and then include none, then Ok is disabled. If easy, maybe the default could be that everything is ticked when you first use it?

@lilyclements
Copy link
Contributor

@rdstern I am not getting a corrupt error when downloading it. Can you try removing it completely by running this:

remove.packages("outfillingR")

Then try downloading it again?

@rdstern
Copy link
Collaborator

rdstern commented Feb 12, 2025

@Vitalis95 and @lilyclements I followed Lily's instruction above and it is now working!
I don't quite understand why - but that's not particularly important.
On this topic, if that is something that we may want to do in the future, then I wonder if:
a) We add a third button to the Tools > Install R-Package. Maybe called Remove?
b) The command remove.packages("outfillingR"). Gave zero output. It does give an error if I mis-type the name, but nicer if it said if it works?

Ok , but to the results. Here is a snapshot:

image

It looks sensible. I did 2 runs, tamsat is the tamsat original data. tamsat_ and tamsat_2 are the 2 estimates we had last time, while tamsat_1 and tamsat_21 are the outfilled variables.

Lily and Vitalis I thinking that early on we probably just want tamsat_ and tamsat_2 when we are checking which method to use. Then we don't need to outfill. Later, once it is working well we need the outfilled data, but maybe not the checking variables?
If you agree we should have two individual storing variables, each with a checkbox, and default checked. The checkbox labels could be Estimated and Outfilled and the default names maybe tam_est and tam_out. (what's the better word than estimated?) So with the first 3 letters from the Estimates receiver.

@Vitalis95
Copy link
Contributor Author

@rdstern , have a look at it

@rdstern
Copy link
Collaborator

rdstern commented Feb 14, 2025

@Vitalis95 it is looking very good.
a) Could you swap the filled names round please. So chirps.out goes with outfilled, and gen with generated!
b) The Stations part looks good, but there is a bit for you, and also maybe something for Lily to consider:

image

  1. The control still says Filter From Factor. Could it say Stations to Include here.
  2. The setting has no memory yet. I ran selecting 2 stations. When I returned the ncontrol was empty again.
  3. Only if easy. When you first go into the dialog could it start by including the first factor by default. Also if possible, by default, having all selected? And I'm very happy if that also applies to the Filter from Factor sub-dialog.

Now the @lilyclements question. There is currently no output to record what we have done. This is more general than just the Stations report. Could we have a summary of the settings, etc in the output window? It could also mention which variable names, in the data were produced. We quicly get a lot of columns to compare and it would be good to have a record of how each one was produced.

For @lilyclements I also hit the error she reported.

image

It is a nice message, but reminded me that you and Emily wqere going to tweak the code so the bin limits and somethoing else, were unneceassary?

When you do, then I wonder about bin sizes? I guess the more data you have the more bins you could allow and I assume (up to a limit) the more the merrier? And if the contents will be roughly equal then could we choose what the value will be, say between roughly 10% so 10 bins and maybe 20%? And maybe the top bin has half the others, because we need data there, but it is the most important?
Trying to tempt your discussion with Emily!

@lilyclements
Copy link
Contributor

@rdstern thanks for the reminder. I have just messaged Emily about it and we will hopefully meet next week

@rdstern
Copy link
Collaborator

rdstern commented Feb 14, 2025

@lilyclements I have also added to the comment above suggesting another task for you! That's for the function to add a report of the settings for each run into the output window! I hope that's easy.

@rdstern
Copy link
Collaborator

rdstern commented Feb 17, 2025

@Vitalis95 nice that it fills by default. And it remembers now too.
a) Oh no it doesn't remember. Maybe this is still work in progress? If I just select 2 stations and return, then all 5 stations are ticked again.
b) And the name of the sub-dialog is still Filters from Factors.
c) And it doesn't choose the first factor by default, when you first go in.

d) And just to mention another @lilyclements task. I remember Emily said it was very quick to execute in Python and you were going to see if there was anything obvious in the running of the R code?

@Vitalis95
Copy link
Contributor Author

@rdstern , if you want the first factor by default, then we make the first label by default be selected, not all selected as it is now.

Copy link
Collaborator

@rdstern rdstern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Vitalis95 looks great. I am approving.
@N-thony can you please check and merge.
@lilyclements you were going to examine why it takes so much longer than in python. I assume that can wait for other changes after you have discussed with Emily.

@N-thony N-thony merged commit b993868 into IDEMSInternational:master Feb 24, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Outfilling Dialog
4 participants