3. Comparing two datasets#

In this part, we compare the set of 1996 tracks (used in the previous example) to IBTrACS which we use as reference. To start with, note that for all that was shown in the previous examples, you can superimpose several sets and therefore compare several sources/models/trackers/etc. Below we show specific functions for matching tracks and computing detection scores.

[1]:

import huracanpy
import matplotlib.pyplot as plt

3.2. Superimposing several sets on one plot#

To start with, note that for all that was shown above, you can superimpose several sets and therefore compare several sources/models/trackers/etc. Here we only show one example.

3.2.1. Compute lifetime maximum intensity (LMI) for both sets#

[4]:

lmi_wind_ib = ib_1996.wind.groupby(ib_1996.track_id).max()
# Convert kn to m/s
lmi_wind_ib = lmi_wind_ib / 1.94
lmi_wind_era5 = era5.wind10.groupby(era5.track_id).max()

3.2.2. Plot both histograms#

[5]:

bins = range(10, 65 + 1, 5)
lmi_wind_ib.plot.hist(bins=bins, color="k", label="IBTrACS", alpha=0.8)
lmi_wind_era5.plot.hist(bins=bins, label="ERA5", alpha=0.8)
plt.legend()
plt.xlabel("Lifetime maximum wind speed / m/s")
plt.ylabel("Number of tracks")

[5]:

Text(0, 0.5, 'Number of tracks')

../../_images/user_guide_demos_tracks_vs_obs_9_1.png

3.3. Matching tracks#

Use huracanpy.assess.match to find matching tracks. The results is a pandas.DataFrame where each row is a pair of tracks that matched, with both ids, the number of time steps and the mean distance between the tracks over their matching period.

[6]:

matches = huracanpy.assess.match([era5, ib_1996], names=["ERA5", "IBTrACS"])
matches

[6]:

	id_ERA5	id_IBTrACS	temp	dist
0	1207.0	1996002S15133	27	48.531963
1	1208.0	1996001S08075	39	39.697687
2	1209.0	1996007S10100	13	80.979701
3	1210.0	1996015S18182	13	104.460676
4	1213.0	1996021S16152	11	70.118845
...	...	...	...	...
72	1291.0	1996353N05151	26	64.419466
73	1292.0	1996357S10136	30	96.003139
74	1293.0	1996356N08110	13	98.513440
75	1294.0	1996354S05170	32	86.234183
76	1295.0	1996365S15137	1	134.400469

77 rows × 4 columns

3.4. Computing scores#

3.4.1. Probability of detection (POD)#

Proportion of observed tracks that are found in ERA5.

[7]:

huracanpy.assess.pod(matches, ref=ib_1996, ref_name="IBTrACS")

[7]:

0.635593220338983

3.4.2. False alarm rate (FAR)#

Proportion of detected tracks that were not observed

[8]:

huracanpy.assess.far(matches, detected=era5, detected_name="ERA5")

[8]:

0.1460674157303371

3.5. Venn diagrams#

Venn diagrams are a convenient way to show the overlap between two datasets.

[9]:

huracanpy.plot.venn([era5, ib_1996], matches, labels=["ERA5", "IBTrACS"])

../../_images/user_guide_demos_tracks_vs_obs_17_0.png