Add tropical cyclones basin bounds #1031

NicolasColombi · 2025-03-23T18:27:24Z

Changes proposed in this PR:

This PR adds tropical cyclone basin bounds as a dictionary named BASINS_BOUNDS in 0–360 coordinates, following STORM dataset bounds definitions.

This addition can be used to determine whether a track originated in that basin simply by using shapely functionalities:

point = Point(lat, lon)
True_or_False = polygon.contains(point)

Particularly useful to decrease the memory overload during global wind fields computations by computing fields basin-wide.

Preview of the bounds:

PR Author Checklist

PR Reviewer Checklist

chahank · 2025-03-24T10:16:12Z

Simple yet useful addition!

I would suggest two things:
1- make it a data class instead
2- make it clear that these are the STORM basin definitions. These can vary substantially, and hence cannot be defined as a definitive default

Also, I am not sure to understand the point Particularly useful to decrease the memory overload during global wind fields computations by computing fields basin-wide.?

In what way does this work beyond what is already possible?

NicolasColombi · 2025-03-24T10:25:37Z

Simple yet useful addition!

I would suggest two things: 1- make it a data class instead 2- make it clear that these are the STORM basin definitions. These can vary substantially, and hence cannot be defined as a definitive default

Also, I am not sure to understand the point Particularly useful to decrease the memory overload during global wind fields computations by computing fields basin-wide.?

In what way does this work beyond what is already possible?

Thanks Chahan, I will do so. Tracks from Emmanuel simulations, for example, often come without basin information. Hence, these bounds would allow you to classify tracks based on the origin basin. This would decrease the amount of memory needed because if you compute wind fields globally, all centroids near all tracks will be loaded into memory. So if you compute wind fields for a track in the NA, centroids in the NI will also be loaded. Whereas if you perform the computation basin-wide, the centroid overlap is much bigger between all tracks. This is my understanding of the discussion I had with @spjuhel, I might be misinterpreting though.

chahank · 2025-03-24T10:36:55Z

Thanks for the explanation! I am a bit confused though, because for each track only the centroids near the track are considered (and they are loaded one by one into memory).

Could you maybe post an example of a code that shows this increase in memory efficiency? That would certainly be useful to see.

NicolasColombi · 2025-03-24T10:43:37Z

I see, then I might have misinterpreted Sam's words. I haven't tested yet if this indeed increases efficiency (it was my hypothesis given the above), but I will try soon. Also, if you know already areas in the Winfield computation that could be optimized memory-wise, let me know. As my memory is currently exploding using global exposure at 450 arcsec with only 200 tracks (>50GB on euler) and >150GB at 150 arcsec resolution.

NicolasColombi · 2025-03-24T14:02:32Z

@chahank I converted it to a dataclass (unsure whether this is how you intended it) and added a method split_by_basin() which creates 6 TCTtracks instances (1 per basin) from 1 TCTtrack instance.

chahank · 2025-03-24T14:53:17Z

Thanks! Although the idea of the data class is precisely to not have a macro variable like BASIN. The idea is to have the data in that class directly. Also, the idea is that you can then clearly explain that this is an arbitrary boarder for basins based on one particular publication.

I am not sure to understand the purpose of the method split_by_basin(). Can you please provide an example?

NicolasColombi · 2025-03-24T15:12:12Z

I am not sure to understand exactly... is your goal to have a basin class whose instances attributes are the basin boundaries ?

The idea of split_by_basin() is that you can load world-wide tracks (e.g. Emmanuel's, which you can not select by basin) and then split this one global TCTrack instance into 6 new TCTracks instances, each containing tracks of only one basin. This is currently not possible if you have a global dataset that does not provide the basin information (e.g Emmanuel's, FAST) and it very handy even with those datasets that do provide that information.

chahank · 2025-03-24T15:14:10Z

Exactly. The data class is there to contain the data. A global variable like BASIN is hard to manage and document, and is usually quite obscure to the user as it does not appear in the documentation.

emanuel-schmid · 2025-03-24T15:20:12Z

@chahank - do you really mean dataclass and not enum? Because if it's about managing some structured kind of constants, an enum seems to be the better match and the benefits of a dataclass (free __init__ and __eq__) don't help a lot. 🤔

ValentinGebhart · 2025-03-24T16:15:41Z

Thanks for the addition! I will take a closer look, but just by a quick read I noticed that there might be problems with the coordinate range (and the antimeridian)? There is no convention in CLIMADA about whether longitude is in -180 to 180 or 0 to 360 or others. But if you use the wrong convention, the polygon.contains does not work anymore. See this code

from climada.hazard.tc_tracks import BASINS
from shapely import Point
SP = BASINS["SP"].polygon
EP = BASINS["EP"].polygon
print(SP.contains(Point(-170, -30))) # False, longitude in -180 to 180 
print(SP.contains(Point(-170 + 360, -30))) # True, longitude in 0 to 360
print(EP.contains(Point(-170, 30))) # True, longitude in -180 to 180 
print(EP.contains(Point(-170 + 360, 30))) # False, longitude in 0 to 360

Maybe this is not a problem when connected to the STORM algorithm, just to note if we wanted to use these definitions in more generality

chahank · 2025-03-24T17:45:12Z

@emanuel-schmid aaa no of course, sorry for the confusion! I meant an Enum class.

NicolasColombi · 2025-03-25T09:01:02Z

@emanuel-schmid makes more sense now! Let me know if it is implemented as intended. For consistency, we may also consider adding the basin pressure to that class ?

Thanks @ValentinGebhart. I agree, I switched to -180,+180 as it is consistent with the tracks coordinate range. I had to split the "SP" basin into two polygons to avoid issues with the antimeridian (original reason why I chose longitude between 0-360).

Before I write the test for subset_by_basin(), let me know if you have comments on the function.

emanuel-schmid · 2025-03-27T09:05:49Z

@NicolasColombi many thanks. I've got two questions about the subset_by_basin method

wouldn't it be preferable to return an array with the basins instead of a dictionary with lists?
supposedly there are much more performant ways to do this than to loop over the data - is speed irrelevant here?

NicolasColombi · 2025-03-27T10:18:52Z

@NicolasColombi many thanks. I've got two questions about the subset_by_basin method

wouldn't it be preferable to return an array with the basins instead of a dictionary with lists?

supposedly there are much more performant ways to do this than to loop over the data - is speed irrelevant here?

@emanuel-schmid
For 1. don't you lose the information on which basin the TCTtracks object belongs to with an array ? I just picked a format where you can access the data by calling the basin name. But mybe there is a better option 🙃

For 2. Speed is not so much of an issue here, but if you know a more efficient way I am absolutely keen to modify the function. Can you advise on how to optimize it?

NicolasColombi · 2025-04-08T13:54:00Z

@NicolasColombi many thanks. I've got two questions about the subset_by_basin method

wouldn't it be preferable to return an array with the basins instead of a dictionary with lists?

supposedly there are much more performant ways to do this than to loop over the data - is speed irrelevant here?

@emanuel-schmid For 1. don't you lose the information on which basin the TCTtracks object belongs to with an array ? I just picked a format where you can access the data by calling the basin name. But mybe there is a better option 🙃

For 2. Speed is not so much of an issue here, but if you know a more efficient way I am absolutely keen to modify the function. Can you advise on how to optimize it?

@emanuel-schmid Any updates ?

chahank · 2025-04-08T15:01:43Z

Would it make sense to also add the bounds for the impact basins as defined in https://www.nature.com/articles/s41467-022-33918-1 ?

NicolasColombi · 2025-04-08T15:19:03Z

Would it make sense to also add the bounds for the impact basins as defined in https://www.nature.com/articles/s41467-022-33918-1 ?

Maybe, what do you have in mind ? But at the moment I personally do not see a direct application. So, we can either add it now without a clear idea, or we add it later if needed. I would personally opt for the latter, unless you had something in mind.

chahank · 2025-04-08T15:40:12Z

Maybe, what do you have in mind ? But at the moment I personally do not see a direct application. So, we can either add it now without a clear idea, or we add it later if needed. I would personally opt for the latter, unless you had something in mind.

I am a bit confused now. I thought your goal was to be able to load tracks for specific basins. You will also use the basins defined in https://www.nature.com/articles/s41467-022-33918-1 . Hence, why would you not implement these?

NicolasColombi · 2025-04-08T16:25:42Z

Because I do not need them, nor I see a use case aside from defining centroids, which I already have. But of course feel free to add them; I am sure someone will find it useful.

chahank · 2025-04-08T16:49:07Z

Still confused, but ok.

Then please just choose a name that makes obvious that this is only the definition used in STORM (is this really defined by STORM, or did they use the definition from somewhere else?). Currently, it looks like the Basin is a universally defined set of boundaries.

NicolasColombi · 2025-04-08T17:19:04Z

Still confused, but ok.

Then please just choose a name that makes obvious that this is only the definition used in STORM (is this really defined by STORM, or did they use the definition from somewhere else?). Currently, it looks like the Basin is a universally defined set of boundaries.

Not really, STORM defined boundaries based on IBTracks, but did not use exactly the same. I wrote it clearly at the beginning of the class description that these boundaries were defined according to STORM. I think it is up to the user to check this information, especially since it is easy to find and basin boundaries are very similar across different publications. Adding "STORM" to the class name would be confusing in my opinion.

chahank · 2025-04-08T19:47:11Z

Ok. It is not particularly important here. But in general, it is not a good idea to have code that is named to imply something which the code does not deliver. Good python code is as unambiguous as possible without having to read the docstring. It would here be better to either name the class Basin_storm or to make the attributes something like NA_STORM. Otherwise a user can easily expect that the class Basin represents the basin from IBTrACS.

emanuel-schmid · 2025-04-09T07:35:01Z

climada/hazard/tc_tracks.py

+
+        Example:
+        --------
+        >>> tc = TCTracks.from_ibtracks("")


as an example this is suboptimal:

>>> tc = TCTracks.from_ibtracks("") Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: type object 'TCTracks' has no attribute 'from_ibtracks'

emanuel-schmid

updates 😄

emanuel-schmid · 2025-04-09T08:21:12Z

climada/hazard/tc_tracks.py

+            lat, lon = track.lat.values[0], track.lon.values[0]
+            origin_point = Point(lon, lat)
+            point_in_basin = False
+
+            # Find the basin that contains the point
+            for basin in Basin:
+                if basin.value.contains(origin_point):
+                    basins_dict[basin.name].append(track)
+                    point_in_basin = True
+                    break
+
+            if not point_in_basin:
+                tracks_outside_basin.append(track.id_no)


@NicolasColombi as to 1. and 2.
I would first define a method that assigns basins to a track:

BASINS_GDF = gpd.GeoDataFrame( {'basin': b, 'geometry': b.value} for b in Basin ) def get_basins(track): # this is the method I had in mind for 1. and I'd guess it could be a performance boost track_coordinates = GeoDataFrame( geometry=gpd.points_from_xy(track.lon, track.lat) ) return track_coordinates.sjoin(BASINS_GDF , how='left', predicate='within').basin

Having that I'd write the track loop like this:

for track in self.data: touched = get_basins(track).dropna().drop_duplicates() if touched.size: for basin in touched: basins_dict[basin].append(track) else: tracks_outside_basin.append(track)

add bounds

9c9150b

NicolasColombi requested a review from ValentinGebhart March 23, 2025 18:28

NicolasColombi changed the title ~~add bounds~~ Add tropical cyclones basin bounds Mar 23, 2025

NicolasColombi marked this pull request as ready for review March 23, 2025 18:30

NicolasColombi added 3 commits March 23, 2025 19:34

update CHANGELOG

a111937

modify NA

f37212f

modify NI

7e82b8f

NicolasColombi added 2 commits March 24, 2025 14:58

make basin a data class, add function to split by basin

030bcb8

Merge branch 'develop' into feature/basins-bounds

73889b9

convert data cls to enum cls

cadf062

NicolasColombi added 2 commits April 8, 2025 16:01

Merge branch 'develop' into feature/basins-bounds

5a5c128

add test and fix pylits

6056e6b

NicolasColombi requested review from emanuel-schmid, chahank and peanutfun as code owners April 8, 2025 14:44

emanuel-schmid reviewed Apr 9, 2025

View reviewed changes

Add tropical cyclones basin bounds #1031

Are you sure you want to change the base?

Add tropical cyclones basin bounds #1031

Uh oh!

Conversation

NicolasColombi commented Mar 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Author Checklist

PR Reviewer Checklist

Uh oh!

chahank commented Mar 24, 2025

Uh oh!

NicolasColombi commented Mar 24, 2025

Uh oh!

chahank commented Mar 24, 2025

Uh oh!

NicolasColombi commented Mar 24, 2025

Uh oh!

NicolasColombi commented Mar 24, 2025

Uh oh!

chahank commented Mar 24, 2025

Uh oh!

NicolasColombi commented Mar 24, 2025

Uh oh!

chahank commented Mar 24, 2025

Uh oh!

emanuel-schmid commented Mar 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ValentinGebhart commented Mar 24, 2025

Uh oh!

chahank commented Mar 24, 2025

Uh oh!

NicolasColombi commented Mar 25, 2025

Uh oh!

emanuel-schmid commented Mar 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NicolasColombi commented Mar 27, 2025

Uh oh!

NicolasColombi commented Apr 8, 2025

Uh oh!

chahank commented Apr 8, 2025

Uh oh!

NicolasColombi commented Apr 8, 2025

Uh oh!

chahank commented Apr 8, 2025

Uh oh!

NicolasColombi commented Apr 8, 2025

Uh oh!

chahank commented Apr 8, 2025

Uh oh!

NicolasColombi commented Apr 8, 2025

Uh oh!

chahank commented Apr 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

emanuel-schmid Apr 9, 2025

Choose a reason for hiding this comment

Uh oh!

emanuel-schmid left a comment

Choose a reason for hiding this comment

Uh oh!

emanuel-schmid Apr 9, 2025

Choose a reason for hiding this comment

Uh oh!

emanuel-schmid Apr 9, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

NicolasColombi commented Mar 23, 2025 •

edited

Loading

emanuel-schmid commented Mar 24, 2025 •

edited

Loading

emanuel-schmid commented Mar 27, 2025 •

edited

Loading

chahank commented Apr 8, 2025 •

edited

Loading