3. Comparing two datasets#

In this part, we compare the set of 1996 tracks (used in the previous example) to IBTrACS which we use as reference. To start with, note that for all that was shown in the previous examples, you can superimpose several sets and therefore compare several sources/models/trackers/etc. Below we show specific functions for matching tracks and computing detection scores.

[1]:
import huracanpy
import matplotlib.pyplot as plt

3.1. Load tracks#

3.1.1. Load IBTrACS and subset the 1996 tracks with xarray’s where method#

[2]:
ib = huracanpy.load(source="ibtracs")
ib_1996 = ib.where(ib.time.dt.year == 1996, drop=True)
ib_1996
/home/docs/checkouts/readthedocs.org/user_builds/huracanpy/envs/stable/lib/python3.12/site-packages/huracanpy/_data/ibtracs.py:119: UserWarning: This offline function loads a light version of IBTrACS which is embedded within the package, based on a file produced manually by the developers.
 It was last updated on the 15th Nov 2024, based on the IBTrACS file at that date.
 It contains only data from 1980 up to the last year with no provisional tracks. All spur tracks were removed. Only 6-hourly time steps were kept.
  warnings.warn(
/home/docs/checkouts/readthedocs.org/user_builds/huracanpy/envs/stable/lib/python3.12/site-packages/huracanpy/_data/ibtracs.py:128: UserWarning: You are loading the IBTrACS-WMO subset. This dataset contains the positions and intensity reported by the WMO agency responsible for each basin
 Be aware of the fact that wind and pressure data is provided as they are in IBTrACS, which means in particular that wind speeds are in knots and averaged over different time periods.
 For more information, see the IBTrACS column documentation at https://www.ncei.noaa.gov/sites/default/files/2021-07/IBTrACS_v04_column_documentation.pdf
  warnings.warn(
[2]:
<xarray.Dataset> Size: 276kB
Dimensions:   (record: 4313)
Dimensions without coordinates: record
Data variables:
    track_id  (record) object 35kB '1995357N07139' ... '1996365S15137'
    season    (record) float64 35kB 1.995e+03 1.995e+03 ... 1.997e+03 1.997e+03
    basin     (record) object 35kB 'WP' 'WP' 'WP' 'WP' ... 'SP' 'SI' 'SI' 'SI'
    time      (record) datetime64[s] 35kB 1996-01-01 ... 1996-12-31T18:00:00
    lon       (record) float64 35kB 153.5 157.0 159.0 ... 134.6 134.1 133.8
    lat       (record) float64 35kB 26.0 27.0 27.5 28.0 ... -13.6 -13.2 -12.8
    wind      (record) float64 35kB nan nan nan nan nan ... nan nan nan nan nan
    slp       (record) float64 35kB 1.008e+03 1.006e+03 1.006e+03 ... nan nan

3.1.2. Load ERA5 year of tracks#

[3]:
era5 = huracanpy.load(huracanpy.example_year_file)

3.2. Superimposing several sets on one plot#

To start with, note that for all that was shown above, you can superimpose several sets and therefore compare several sources/models/trackers/etc. Here we only show one example.

3.2.1. Compute lifetime maximum intensity (LMI) for both sets#

[4]:
lmi_wind_ib = ib_1996.wind.groupby(ib_1996.track_id).max()
# Convert kn to m/s
lmi_wind_ib = lmi_wind_ib / 1.94
lmi_wind_era5 = era5.wind10.groupby(era5.track_id).max()

3.2.2. Plot both histograms#

[5]:
bins = range(10, 65 + 1, 5)
lmi_wind_ib.plot.hist(bins=bins, color="k", label="IBTrACS", alpha=0.8)
lmi_wind_era5.plot.hist(bins=bins, label="ERA5", alpha=0.8)
plt.legend()
plt.xlabel("Lifetime maximum wind speed / m/s")
plt.ylabel("Number of tracks")
[5]:
Text(0, 0.5, 'Number of tracks')
../../_images/user_guide_demos_tracks_vs_obs_9_1.png

3.3. Matching tracks#

Use huracanpy.assess.match to find matching tracks. The results is a pandas.DataFrame where each row is a pair of tracks that matched, with both ids, the number of time steps and the mean distance between the tracks over their matching period.

[6]:
matches = huracanpy.assess.match([era5, ib_1996], names=["ERA5", "IBTrACS"])
matches
[6]:
id_ERA5 id_IBTrACS temp dist
0 1207.0 1996002S15133 27 48.531963
1 1208.0 1996001S08075 39 39.697687
2 1209.0 1996007S10100 13 80.979701
3 1210.0 1996015S18182 13 104.460676
4 1213.0 1996021S16152 11 70.118845
... ... ... ... ...
72 1291.0 1996353N05151 26 64.419466
73 1292.0 1996357S10136 30 96.003139
74 1293.0 1996356N08110 13 98.513440
75 1294.0 1996354S05170 32 86.234183
76 1295.0 1996365S15137 1 134.400469

77 rows × 4 columns

3.4. Computing scores#

3.4.1. Probability of detection (POD)#

Proportion of observed tracks that are found in ERA5.

[7]:
huracanpy.assess.pod(matches, ref=ib_1996, ref_name="IBTrACS")
[7]:
0.635593220338983

3.4.2. False alarm rate (FAR)#

Proportion of detected tracks that were not observed

[8]:
huracanpy.assess.far(matches, detected=era5, detected_name="ERA5")
[8]:
0.1460674157303371

3.5. Venn diagrams#

Venn diagrams are a convenient way to show the overlap between two datasets.

[9]:
huracanpy.plot.venn([era5, ib_1996], matches, labels=["ERA5", "IBTrACS"])
../../_images/user_guide_demos_tracks_vs_obs_17_0.png