{ "cells": [ { "cell_type": "markdown", "id": "aef8b2fb-d9b0-47e8-a70e-1024c7e6e2b1", "metadata": {}, "source": [ "# Comparing two datasets\n", "In this part, we compare the set of 1996 tracks (used in the [previous example](set_of_tracks.ipynb)) to IBTrACS which we use as reference.\n", "To start with, note that for all that was shown in the previous examples, you can superimpose several sets and therefore compare several sources/models/trackers/etc. Below we show specific functions for matching tracks and computing detection scores." ] }, { "cell_type": "code", "execution_count": null, "id": "98f00647-d338-4bea-b551-511758fa39c2", "metadata": {}, "outputs": [], "source": [ "import huracanpy\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "markdown", "id": "53a94bdb-6d94-47c9-9296-5826986ea0cb", "metadata": {}, "source": [ "## Load tracks\n", "### Load IBTrACS and subset the 1996 tracks with xarray's where method" ] }, { "cell_type": "code", "execution_count": null, "id": "efcb9ebc-bf0e-4432-b803-e53925b4fcdd", "metadata": {}, "outputs": [], "source": [ "ib = huracanpy.load(source=\"ibtracs\")\n", "ib_1996 = ib.where(ib.time.dt.year == 1996, drop=True)\n", "ib_1996" ] }, { "cell_type": "markdown", "id": "6eb6bcbe-28a1-4d5c-a296-df8d9ed26e08", "metadata": {}, "source": [ "### Load ERA5 year of tracks" ] }, { "cell_type": "code", "execution_count": null, "id": "fd786725-709d-4d66-9fc2-3f964117fb3b", "metadata": {}, "outputs": [], "source": [ "era5 = huracanpy.load(huracanpy.example_year_file)" ] }, { "cell_type": "markdown", "id": "f9c764fc-9f16-49b1-abcb-08faed69e8e3", "metadata": {}, "source": [ "## Superimposing several sets on one plot\n", "To start with, note that for all that was shown above, you can superimpose several sets and therefore compare several sources/models/trackers/etc. Here we only show one example.\n", "\n", "### Compute lifetime maximum intensity (LMI) for both sets" ] }, { "cell_type": "code", "execution_count": null, "id": "a8a683b6-fea5-4de3-bf85-a04b5568ab2d", "metadata": {}, "outputs": [], "source": [ "lmi_wind_ib = ib_1996.wind.groupby(ib_1996.track_id).max()\n", "# Convert kn to m/s\n", "lmi_wind_ib = lmi_wind_ib / 1.94\n", "lmi_wind_era5 = era5.wind10.groupby(era5.track_id).max()" ] }, { "cell_type": "markdown", "id": "a0db6b16-6c07-4b47-9e86-3e82b1bbc0b1", "metadata": {}, "source": [ "### Plot both histograms" ] }, { "cell_type": "code", "execution_count": null, "id": "a226725f-86df-4a6b-8e9f-ea40acc5cefa", "metadata": {}, "outputs": [], "source": [ "bins = range(10, 65 + 1, 5)\n", "lmi_wind_ib.plot.hist(bins=bins, color=\"k\", label=\"IBTrACS\", alpha=0.8)\n", "lmi_wind_era5.plot.hist(bins=bins, label=\"ERA5\", alpha=0.8)\n", "plt.legend()\n", "plt.xlabel(\"Lifetime maximum wind speed / m/s\")\n", "plt.ylabel(\"Number of tracks\")" ] }, { "cell_type": "markdown", "id": "da63e862-5675-4277-bc20-e7e9d9dd5168", "metadata": {}, "source": [ "## Matching tracks\n", "Use `huracanpy.assess.match` to find matching tracks.\n", "The results is a `pandas.DataFrame` where each row is a pair of tracks that matched, with both ids, the number of time steps and the mean distance between the tracks over their matching period." ] }, { "cell_type": "code", "execution_count": null, "id": "c66913f0-cbe9-41b2-a740-645e74942788", "metadata": {}, "outputs": [], "source": [ "matches = huracanpy.assess.match([era5, ib_1996], names=[\"ERA5\", \"IBTrACS\"])\n", "matches" ] }, { "cell_type": "markdown", "id": "80c90c22-a6ad-4219-9efa-9ba062d7f57d", "metadata": {}, "source": [ "## Computing scores\n", "### Probability of detection (POD)\n", "Proportion of observed tracks that are found in ERA5." ] }, { "cell_type": "code", "execution_count": null, "id": "73d70abf-7ba0-472d-bf25-f5c47d1775fa", "metadata": {}, "outputs": [], "source": [ "huracanpy.assess.pod(matches, ref=ib_1996, ref_name=\"IBTrACS\")" ] }, { "cell_type": "markdown", "id": "a79fa3d3-5365-488f-a967-58d0c53633ea", "metadata": {}, "source": [ "### False alarm rate (FAR)\n", "Proportion of detected tracks that were not observed" ] }, { "cell_type": "code", "execution_count": null, "id": "31c17f9d-e830-4dd3-8084-f09430f58b2d", "metadata": {}, "outputs": [], "source": [ "huracanpy.assess.far(matches, detected=era5, detected_name=\"ERA5\")" ] }, { "cell_type": "markdown", "id": "01fc6748-fef2-4cb4-9b77-b9d91b122cbe", "metadata": {}, "source": [ "## Venn diagrams\n", "Venn diagrams are a convenient way to show the overlap between two datasets." ] }, { "cell_type": "code", "execution_count": null, "id": "d719ad71-830c-472d-906e-bfe044b390c9", "metadata": {}, "outputs": [], "source": [ "huracanpy.plot.venn([era5, ib_1996], matches, labels=[\"ERA5\", \"IBTrACS\"])" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.4" } }, "nbformat": 4, "nbformat_minor": 5 }