-
-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ML pipeline to make samples #62
Comments
Example of config, not tested
|
Thank @peterdudfield , this clears up a lot of things !, I have now an idea of how things would look like and lots of code I wrote for previous issue could be reused here, I would start working on it. |
Thats great, please do reach out if there is something else confusing. Happy to help clarify things |
Hi @peterdudfield , I’m really interested in this issue and would love to contribute. Is there any part of the task still open that I could help with? |
Hi @peterdudfield, I'm interested in contributing to this project and would love to get involved. Could you guide me on where to start? I'm particularly keen on understanding how the data pipeline is set up and how I can help in making the data samples more manageable for ML training. Looking forward to your guidance. |
Might have to ask @siddharth7113 for an update, and if its working or not? |
Thank you for your interest in the issue, but right now , a PR (openclimatefix/ocf-data-sampler#199) is already opened regarding the GFS functionality implementation , once that is implemented I think it would be easier to deal with this issue here |
Hi @siddharth7113, Thank you for the update! Feel free to ping me if anything comes up! |
Detailed Description
Following on from #1 I wanted to write this issue.
We currently have lots of NWP data and PVLive data. Its too much to go into memory, so we have to cut it down ready for ML experiments. The way we've done this in the past is to create samples of data. This are smaller chinks of the data, that contain specific data for a certain time (and space). These then get batched up in a dataloader and the ML model can then train from them.
So we want to build a pipeline for making these samples (most of the work is done in ocf-data-sampler)
Context
Possible Implementation
There are lots of ways to do this, but theres a suggestion to use ocf-data-sampler
You start with a data configuration (see below) that tells ocf-data-sampler what to load and other specific bits.
It would be really great to
The ocf-data-sampler class we recommend using is PVNetUKRegionalDataset, but there are a few things that might need adding like
The text was updated successfully, but these errors were encountered: