-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added Outfilling dialog #9381
Added Outfilling dialog #9381
Conversation
@Vitalis95 many thanks. I show the current dialog here to help @jkmusyoka and @lilyclements to contribute to the review: And very well done. This looks really nice and neat. I suggest it is important also for our audience. I note you already have the automatic filling of the controls, when the data are defined as climatic. I suggest having the dialog - rather than just the script - is important in our strategy of involving the ZMD staff centrally in this stage of the research on outfilling. I suggest many staff could contribute well via the dialog route and it would be much harder for them otherwise. Also: a) @jkmusyoka I assume you have a version of the data from Eastern Province that we used for the outfilling. Can you add a copy here, so we can try it with the dialog? |
@rdstern , sorted out |
zambia_data_for outfilling2.zip @Vitalis95 @lilyclements and @jkmusyoka this is wonderful. The dialog is working already! a) I attach the data file for Eastern Province (first data frame) together with the results from 3 runs of the dialog (Tamsat, chirps and era5) : Amazing it is working immediately. So full marks to Lily, for the outfillingR package. I installed it directly from the menu. (I'm still not clear whether the importing dialog needs the James adjustment - I just use the dialog.) Then it ran first time! Lily I still would far prefer it to produce a new column in the same data frame, rather than a whole new data frame. But should we merge this now, while we wait? |
@rdstern , have a look at it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Vitalis95 and @lilyclements that's wonderful - what I have tested. It seems to work.
Vitalis, the only trivial point is that the label Stations to Exclude is too long, and is therefore only incomplete. Could you please change it to Omit Stations
.
@N-thony this is a major new facility. Can you please check so it can be merged?
@lilyclements I tried with the 3 methods and all seem to work on these data. (tamsat, chirps and era5.) I also tried era5 twice with the same initial random number seed and got the same answers. Brilliantly done therefore.
The only result I don't understand is that tamsat outfilling gave missing values when tamsta was missing. But the rainfall was not missing then. I thought that it would give the station rainfall except when that was missing. So now I'm not sure what the results are?
@Vitalis95 nice. I ran this for my data with I do not understand the system well enough to know a suitable fix. But essentially what is happening is an In terms of the dialog, this looks all good - except that one of the checkboxes is too small and cuts off the word "Exclude". Roger Addition: Agreed with the last sentence - Could it be changed to the shorter |
@lilyclements did you see Emily's answer to my query above? Essentially that it might not yet be running the second part of the function? If that's the case, we should also check whether the current results are useful as part of the checking process. of how to adapt the different methods. We then leave the data for the main stations intact, which is what we want, when applying them to more stations. |
@rdstern yes, it sounds like something we can go through next week when we meet? I'm not sure if I will be able to come in person due to my ankle but I can certainly join on teams for the afternoon. |
@Vitalis95 that should be easy in R. Can you check with @lilyclements? |
@Vitalis95 how about:
I believe the output data frame is just two columns - our |
@lilyclements , Roger suggested that the two new columns should have the suffixes _est and _out. @rdstern, is that correct? Do we still need the save control? |
@Vitalis95 can you try this:
Where Can the save control still do a prefix check to it - so we add a prefix to it if that column already exists? |
@lilyclements it still says the package is corrupt! Back to you! |
@rdstern I am not getting a corrupt error when downloading it. Can you try removing it completely by running this:
Then try downloading it again? |
@Vitalis95 and @lilyclements I followed Lily's instruction above and it is now working! Ok , but to the results. Here is a snapshot: It looks sensible. I did 2 runs, tamsat is the tamsat original data. tamsat_ and tamsat_2 are the 2 estimates we had last time, while tamsat_1 and tamsat_21 are the outfilled variables. Lily and Vitalis I thinking that early on we probably just want tamsat_ and tamsat_2 when we are checking which method to use. Then we don't need to outfill. Later, once it is working well we need the outfilled data, but maybe not the checking variables? |
@rdstern , have a look at it |
@Vitalis95 it is looking very good.
Now the @lilyclements question. There is currently no output to record what we have done. This is more general than just the Stations report. Could we have a summary of the settings, etc in the output window? It could also mention which variable names, in the data were produced. We quicly get a lot of columns to compare and it would be good to have a record of how each one was produced. For @lilyclements I also hit the error she reported. It is a nice message, but reminded me that you and Emily wqere going to tweak the code so the bin limits and somethoing else, were unneceassary? When you do, then I wonder about bin sizes? I guess the more data you have the more bins you could allow and I assume (up to a limit) the more the merrier? And if the contents will be roughly equal then could we choose what the value will be, say between roughly 10% so 10 bins and maybe 20%? And maybe the top bin has half the others, because we need data there, but it is the most important? |
@rdstern thanks for the reminder. I have just messaged Emily about it and we will hopefully meet next week |
@lilyclements I have also added to the comment above suggesting another task for you! That's for the function to add a report of the settings for each run into the output window! I hope that's easy. |
@Vitalis95 nice that it fills by default. And it remembers now too. d) And just to mention another @lilyclements task. I remember Emily said it was very quick to execute in Python and you were going to see if there was anything obvious in the running of the R code? |
@rdstern , if you want the first factor by default, then we make the first label by default be selected, not all selected as it is now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Vitalis95 looks great. I am approving.
@N-thony can you please check and merge.
@lilyclements you were going to examine why it takes so much longer than in python. I assume that can wait for other changes after you have discussed with Emily.
Fixes #9363
@rdstern @lilyclements @jkmusyoka , please check the progress.
Omit Months
is not working yet, and I have raised that in the issue.