Potential incomplete logic in extract_file_name Method for Handling File Names #29

AissaGeek · 2024-10-29T13:45:45Z

Description

In the extract_file_name method within the DownloadWorker class in the downloader/__init__.py file. The method is responsible for extracting the file name and extension from a download URL. However, there is an uncertainty about handling file names with a specific pattern.

Example

Here is an example of payload representing this issue

{
   "topic":"cache/a/wis2/us-noaa-synoptic/data/core/weather/surface-based-observations/synop",
   "payload":{
      "id":"585436b8-95fb-11ef-845c-e43d1a213544",
      "type":"Feature",
      "version":"v04",
      "geometry":{
         "type":"Point",
         "coordinates":[
            -101.04662,
            39.42746
         ]
      },
      "properties":{
         "data_id":"us-noaa-synoptic/data/core/weather/surface-based-observations/synop/WIGOS_0-840-0-KCBK_20241029T133500",
         "datetime":"2024-10-29T13:35:00Z",
         "pubtime":"2024-10-29T13:40:21Z",
         "integrity":{
            "method":"sha512",
            "value":"Rt3ZDweAXB6Kl6xYpLLf/DXZJU0X1SkWw+wdlh6Shb2orW96IO9I/kq09dKMc9zqsgQRS91iQC9rWTvdIIPFiQ=="
         },
         "content":{
            "encoding":"base64",
            "value":"QlVGUgAA9wQAABYAAAAAAAAAAAZuHgAH6AodDSMAAAALAAABgMGWx1AAAMoAADSAAAS0NCSwAAAAAAAAAAAAAAAP//paGhJYAAAAAAAAAAAAAAAAAAAAAP0U62Nivs0PDyVDWVGtTFuTnf///////////7Y/tWDJ////////////////////gAB//////////////////////////////////////8Aln////////////////////////A+j/v/PABr///////////////////////////////////////////////////////////////+ANzc3Nw==",
            "size":247
         },
         "wigos_station_identifier":"0-840-0-KCBK"
      },
      "links":[
         {
            "rel":"canonical",
            "type":"application/x-bufr",
            "href":"https://wis2.dwd.de/gc/24h/us-noaa-synoptic/3c0adaec-9e84-4986-a17a-429293eec998__WIGOS_0-840-0-KCBK_20241029T133500.bufr4",
            "length":247
         },
         {
            "rel":"via",
            "type":"text/html",
            "href":"https://oscar.wmo.int/surface/#/search/station/stationReportDetails/0-840-0-KCBK"
         }
      ]
   },
   "target":"surface-obs"
}

You attempt to extract filename from "href":"https://wis2.dwd.de/gc/24h/us-noaa-synoptic/3c0adaec-9e84-4986-a17a-429293eec998__WIGOS_0-840-0-KCBK_20241029T133500.bufr4". The extraction leads to a filename with value 3c0adaec-9e84-4986-a17a-429293eec998__WIGOS_0-840-0-KCBK_20241029T133500 Which I doubt to be the expected filename.

Suggested solution

In pretty much of all cases, filename can be extracted from "data_id":"us-noaa-synoptic/data/core/weather/surface-based-observations/synop/WIGOS_0-840-0-KCBK_20241029T133500".

The text was updated successfully, but these errors were encountered:

david-i-berry · 2024-12-06T12:55:09Z

There is no standardisation of the data ID and in some cases the data can contain characters such as a comma. Using the filename as extracted above was found to be more reliable. In the example given above the filename extracted is as expected / intended.

Closing due to related issue #19, further discussion can continue there.

david-i-berry closed this as completed Dec 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potential incomplete logic in extract_file_name Method for Handling File Names #29

Potential incomplete logic in extract_file_name Method for Handling File Names #29

AissaGeek commented Oct 29, 2024 •

edited

Loading

david-i-berry commented Dec 6, 2024 •

edited

Loading

Potential incomplete logic in extract_file_name Method for Handling File Names #29

Potential incomplete logic in extract_file_name Method for Handling File Names #29

Comments

AissaGeek commented Oct 29, 2024 • edited Loading

Description

Example

Suggested solution

david-i-berry commented Dec 6, 2024 • edited Loading

AissaGeek commented Oct 29, 2024 •

edited

Loading

david-i-berry commented Dec 6, 2024 •

edited

Loading