Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

script to make samples #8

Open
Tracked by #5
peterdudfield opened this issue Dec 4, 2024 · 7 comments
Open
Tracked by #5

script to make samples #8

peterdudfield opened this issue Dec 4, 2024 · 7 comments

Comments

@peterdudfield
Copy link
Contributor

peterdudfield commented Dec 4, 2024

clear script to make batches, do we need extra compute for this? We might have this already.

Something in https://github.com/openclimatefix/ocf-data-sampler might already be there

@peterdudfield peterdudfield changed the title clear script to make batches, do we need extra compute for this? We might have this already. clear script to make batches Dec 4, 2024
@peterdudfield peterdudfield transferred this issue from openclimatefix/PVNet Dec 4, 2024
@peterdudfield peterdudfield changed the title clear script to make batches script to make samples Dec 18, 2024
@alirashidAR
Copy link
Contributor

Hi @peterdudfield,
I would love to contribute to this project and help with this issue. Could you please guide me on how I can get started? Let me know if there are specific tasks or areas where assistance is needed.

@peterdudfield
Copy link
Contributor Author

This would be great, whats needed first is openclimatefix/PVNet#2 thought, so you might want to start there first

This was referenced Jan 13, 2025
@siddharth7113 siddharth7113 mentioned this issue Jan 29, 2025
4 tasks
@siddharth7113
Copy link
Contributor

@peterdudfield What exactly needs to be done in this? Can you provide any references, so I can get started on this ?
Also does need to be precceded by #60

@peterdudfield
Copy link
Contributor Author

We have done something similar in PVnet, for example https://github.com/openclimatefix/PVNet/blob/dev-data-sampler/scripts/save_samples.py
but it wont be exactly the same

@peterdudfield
Copy link
Contributor Author

peterdudfield commented Feb 11, 2025

Notes from our meeting:
@Sukh-P @siddharth7113 @jcamier

Flow diagram to show how ocf-data-sampler works: https://github.com/openclimatefix/ocf-data-sampler/tree/main/ocf_data_sampler/torch_datasets

The torch dataset in datasampler which can be used for sample creation:
https://github.com/openclimatefix/ocf-data-sampler/blob/main/ocf_data_sampler/torch_datasets/datasets/pvnet_uk.py#L171

The branch on PVNet which supports ocf-data-sampler:
PVnet current branch - https://github.com/openclimatefix/PVNet/tree/dev-data-sampler

The configuration file example which will need to be set to correct values for GFS/other NWPs
config - example - https://github.com/openclimatefix/PVNet/blob/dev-data-sampler/configs.example/datamodule/configuration/example_configuration.yaml

The script to run to make samples - https://github.com/openclimatefix/PVNet/blob/dev-data-sampler/scripts/save_samples.py

Would be good to commit data configuration in open-data-pvnet repo, so people can use it same. Later on we can add the PVnet ML configuration as well

Agree that using ocf-data-sampler would be good to use.

@jcamier
Copy link
Collaborator

jcamier commented Feb 13, 2025

@alirashidAR I believe you have worked on ocf-data-sampler. I am getting an AttributeError: 'Dataset' object has no attribute 'installedcapacity_mwp' when trying to use the PVNetUKRegionalDataset and having a config that points to the PVLive data on our s3 bucket: s3://ocf-open-data-pvnet/data/uk/pvlive/v0/target_data.nc
I only see the following attributes in the target_data.nc:
<xarray.Dataset> Size: 4MB
Dimensions: (index: 87699)
Coordinates:

  • index (index) int64 702kB 0 1 2 3 4 ... 87695 87696 87697 87698
    Data variables:
    gsp_id (index) int64 702kB ...
    datetime_gmt (index) datetime64[ns] 702kB ...
    generation_mw (index) float64 702kB ...
    capacity_mwp (index) float64 702kB ...

It appears we have capacity_mwp and not installedcapacity_mwp . Do you have any thoughts on this as you have worked on the PVLive data? Trying to determine if this is a bug on PVNetUKRegionalDataset side or if we are missing an attribute in the pvlive target_data ?

@alirashidAR
Copy link
Contributor

Hello @jcamier , yes i believe the installedcapacity_mwp attribute is missing from the target_data.

data = pv.get_data_between(start=start, end=end, extra_fields="capacity_mwp")

This script was used to collect the data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

No branches or pull requests

4 participants