Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Config and mapping files for ADPUPA prepBUFR #17

Open
wants to merge 9 commits into
base: develop
Choose a base branch
from

Conversation

PraveenKumar-NOAA
Copy link

@PraveenKumar-NOAA PraveenKumar-NOAA commented Dec 17, 2024

This PR adds following config/ and mapping/ files for the ADPUPA prepBUFR:

  • config/bufr_bufr_backend_adpupa_prepbufr.yaml
  • config/bufr_script_backend_adpupa_prepbufr.yaml
  • mapping/bufr_adpupa_prepbufr.py
  • mapping/bufr_adpupa_prepbufr_mapping.yaml

All of the four jobs (bufr2netcdf, script2netcdf, bufr_backend, script_backend) were run and tested using serial and parallel execution with nproc=2.

Copy link
Contributor

@nicholasesposito nicholasesposito left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some things need to be addressed.

#!/usr/bin/env python3
import os
import sys
sys.path.append('/scratch1/NCEPDEV/da/Praveen.Singh/HERA/bufr-query/build/lib')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Get rid of these sys.path.append lines.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nicholasesposito thanks, removed these lines.

log_method(message)


#def Compute_dateTime(cycleTimeSinceEpoch, hrdr):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dateTime should stay in. For the regular mapping yaml, @emilyhcliu and I discussed and said that DHR should determine dateTime if the script is not being used, BUT the script is being used, dateTime should be computed using the script.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nicholasesposito added a Python subroutine to compute dateTime in the Python script.

dump/mapping/bufr_adpupa_prepbufr.py Outdated Show resolved Hide resolved
logging(comm, 'DEBUG', f'Do ps ObsError calculations')
poe = container.get('variables/pressureError')
psoe = np.full(poe.shape[0], poe.fill_value)
psoe = np.where(cat == 0, poe, psoe)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to define these (and the corresponding container.replace) as a method? There so many repeat codes here.

@PraveenKumar-NOAA
Copy link
Author

@nicholasesposito @xincjin-NOAA @BrettHoover-NOAA @emilyhcliu @rmclaren: please review this PR, thanks!

log_method(message)


def Compute_dateTime(cycleTimeSinceEpoch, hrdr):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pep 8 naming def _compute_datetime(cycleTimeSinceEpoch, hrdr)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PraveenKumar-NOAA could you update the function name to comply with the prep8 naming convention?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rmclaren done, thanks!

return dateTime


def Mask_typ_for_var(typ, var):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pep 8

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rmclaren done, thanks!

def _make_description(mapping_path, cycle_time, update=False):
description = bufr.encoders.Description(mapping_path)

ReferenceTime = np.int64(calendar.timegm(time.strptime(str(int(cycle_time)), '%Y%m%d%H')))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pep 8- reference_time

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

str(int(cycle_time)) ??

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pease address the reviewer's comments. Thanks.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rmclaren done, thanks!

longName=var['longName']
)

description.add_global(name='datetimeReference', value=str(ReferenceTime))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should you format the reference time in some way or is it just seconds since January 1970?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basd on IODA convention table:
https://docs.google.com/spreadsheets/d/1fHb2QIHOwgp4d6plyn9Dj9HZFZsDRxcPaUV6hdTb5BU/edit?gid=46200857#gid=46200857

Look for the global attribute tab.
The dateTimeReference format is string with ISO form.
ISO-8601 string like 2022-02-15T12:00:00Z

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rmclaren done, thanks!


logging(comm, 'DEBUG', f'Do DateTime calculation')
hrdr = container.get('variables/obsTimeMinusCycleTime')
hrdr_paths = container.get_paths('variables/obsTimeMinusCycleTime')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you never use this result... same for other paths

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rmclaren done, thanks!


logging(comm, 'DEBUG', f'Change longitude range from [0,360] to [-180,180]')
lon = container.get('variables/longitude')
lon_paths = container.get_paths('variables/longitude')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rmclaren done, thanks!


logging(comm, 'DEBUG', f'observationType')
typ = container.get('variables/observationType')
typ_paths = container.get_paths('variables/observationType')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dittoo

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rmclaren done, thanks!


logging(comm, 'DEBUG', f'Do ps calculation')
pob = container.get('variables/pressure')
pob_paths = container.get_paths('variables/pressure')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rmclaren done, thanks!

logging(comm, 'DEBUG', f'Do tsen and tv calculation')
tpc = container.get('variables/temperatureEventProgramCode')
tob = container.get('variables/airTemperature')
tob_paths = container.get_paths('variables/airTemperature')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rmclaren done, thanks!

# ObsError: specificHumidity
- name: "ObsError/specificHumidity"
source: variables/specificHumidityError
longName: "Specific Humidity Error"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this have a unit?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nicholasesposito thanks, added unit for specificHumidity.

Copy link
Contributor

@nicholasesposito nicholasesposito left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes requested for global attributes.
One other question

type: netcdf

globals:
- name: "data_format"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Standard is dataOriginalFormatSpec

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nicholasesposito, thanks. Fixed.

type: string
value: "prepbufr"

- name: "data_type"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Standard is "platforms"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nicholasesposito, thanks. Fixed.

type: string
value: "ADPUPA"

- name: "subsets"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nicholasesposito, thanks. Fixed.

type: string
value: "ADPUPA"

- name: "data_provider"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Standard is "dataProviderOrigin"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nicholasesposito, thanks. Fixed.


- name: "data_provider"
type: string
value: "U.S. NOAA"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change to "U.S. NOAA NCEP"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nicholasesposito, thanks. Fixed.

type: string
value: "U.S. NOAA"

- name: "data_description"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Standard is "description"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nicholasesposito thanks, done.

@emilyhcliu
Copy link
Collaborator

Tested bufr2netcdf conversion and found a problem with MPI - The total number of obs from the MPI runs are not consistent with the run without MPI. Issue #46 opened for this in bufr-query.

The number of obs in output:

MPI = 0  ----> 270600
MPI = 1  ----> 270600
MPI = 2 ----> 261786
MPI = 4 ----> 250079
MPI = 8 ---->  239868

We are losing more data in the output as we increase the number of MPIs

@nicholasesposito
Copy link
Contributor

nicholasesposito commented Jan 22, 2025

Tested bufr2netcdf conversion and found a problem with MPI - The total number of obs from the MPI runs are not consistent with the run without MPI. Issue #46 opened for this in bufr-query.

The number of obs in output:

MPI = 0  ----> 270600
MPI = 1  ----> 270600
MPI = 2 ----> 261786
MPI = 4 ----> 250079
MPI = 8 ---->  239868

We are losing more data in the output as we increase the number of MPIs

@emilyhcliu this may be the same issue as NOAA-EMC/bufr-query#40

@PraveenKumar-NOAA
Copy link
Author

@nicholasesposito yes, this is a similar issue and associated with the group_by_field variable. Looking into the details.

@emilyhcliu
Copy link
Collaborator

@PraveenKumar-NOAA Please address the reviews' comments, get approved, and then we can merge this to SPOC.

@@ -163,30 +163,30 @@ def _make_obs(comm, input_path, mapping_path, cycle_time):

logging(comm, 'DEBUG', f'Do DateTime calculation')
hrdr = container.get('variables/obsTimeMinusCycleTime')
hrdr_paths = container.get_paths('variables/obsTimeMinusCycleTime')
#hrdr_paths = container.get_paths('variables/obsTimeMinusCycleTime')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete commented out code, this example and the following ones..

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rmclaren thanks, deleted all the commented lines.

cycleTimeSinceEpoch = np.int64(calendar.timegm(time.strptime(str(int(cycle_time)), '%Y%m%d%H')))
dateTime = Compute_dateTime(cycleTimeSinceEpoch, hrdr)
dateTime = _compute_datetime(cycleTimeSinceEpoch, hrdr)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

result of dateTime doesn't seem to be used

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rmclaren thanks, reading dateTime now.

logging(comm, 'DEBUG', f'dateTime min/max = {dateTime.min()} {dateTime.max()}')

logging(comm, 'DEBUG', f'Change longitude range from [0,360] to [-180,180]')
lon = container.get('variables/longitude')
lon_paths = container.get_paths('variables/longitude')
#lon_paths = container.get_paths('variables/longitude')
lon[lon>180] -= 360

logging(comm, 'DEBUG', f'observationType')
typ = container.get('variables/observationType')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typ doesn't seem to be used

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rmclaren thanks, fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants