{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "aef8b2fb-d9b0-47e8-a70e-1024c7e6e2b1",
   "metadata": {},
   "source": [
    "# Comparing two datasets\n",
    "In this part, we compare the set of 1996 tracks (used in the [previous example](set_of_tracks.ipynb)) to IBTrACS which we use as reference.\n",
    "To start with, note that for all that was shown in the previous examples, you can superimpose several sets and therefore compare several sources/models/trackers/etc. Below we show specific functions for matching tracks and computing detection scores."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "98f00647-d338-4bea-b551-511758fa39c2",
   "metadata": {},
   "outputs": [],
   "source": [
    "import huracanpy\n",
    "import matplotlib.pyplot as plt"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "53a94bdb-6d94-47c9-9296-5826986ea0cb",
   "metadata": {},
   "source": [
    "## Load tracks\n",
    "### Load IBTrACS and subset the 1996 tracks with xarray's where method"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "efcb9ebc-bf0e-4432-b803-e53925b4fcdd",
   "metadata": {},
   "outputs": [],
   "source": [
    "ib = huracanpy.load(source=\"ibtracs\")\n",
    "ib_1996 = ib.where(ib.time.dt.year == 1996, drop=True)\n",
    "ib_1996"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6eb6bcbe-28a1-4d5c-a296-df8d9ed26e08",
   "metadata": {},
   "source": [
    "### Load ERA5 year of tracks"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fd786725-709d-4d66-9fc2-3f964117fb3b",
   "metadata": {},
   "outputs": [],
   "source": [
    "era5 = huracanpy.load(huracanpy.example_year_file)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f9c764fc-9f16-49b1-abcb-08faed69e8e3",
   "metadata": {},
   "source": [
    "## Superimposing several sets on one plot\n",
    "To start with, note that for all that was shown above, you can superimpose several sets and therefore compare several sources/models/trackers/etc. Here we only show one example.\n",
    "\n",
    "### Compute lifetime maximum intensity (LMI) for both sets"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a8a683b6-fea5-4de3-bf85-a04b5568ab2d",
   "metadata": {},
   "outputs": [],
   "source": [
    "lmi_wind_ib = ib_1996.wind.groupby(ib_1996.track_id).max()\n",
    "# Convert kn to m/s\n",
    "lmi_wind_ib = lmi_wind_ib / 1.94\n",
    "lmi_wind_era5 = era5.wind10.groupby(era5.track_id).max()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a0db6b16-6c07-4b47-9e86-3e82b1bbc0b1",
   "metadata": {},
   "source": [
    "### Plot both histograms"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a226725f-86df-4a6b-8e9f-ea40acc5cefa",
   "metadata": {},
   "outputs": [],
   "source": [
    "bins = range(10, 65 + 1, 5)\n",
    "lmi_wind_ib.plot.hist(bins=bins, color=\"k\", label=\"IBTrACS\", alpha=0.8)\n",
    "lmi_wind_era5.plot.hist(bins=bins, label=\"ERA5\", alpha=0.8)\n",
    "plt.legend()\n",
    "plt.xlabel(\"Lifetime maximum wind speed / m/s\")\n",
    "plt.ylabel(\"Number of tracks\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "da63e862-5675-4277-bc20-e7e9d9dd5168",
   "metadata": {},
   "source": [
    "## Matching tracks\n",
    "Use `huracanpy.assess.match` to find matching tracks.\n",
    "The results is a `pandas.DataFrame` where each row is a pair of tracks that matched, with both ids, the number of time steps and the mean distance between the tracks over their matching period."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c66913f0-cbe9-41b2-a740-645e74942788",
   "metadata": {},
   "outputs": [],
   "source": [
    "matches = huracanpy.assess.match([era5, ib_1996], names=[\"ERA5\", \"IBTrACS\"])\n",
    "matches"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "80c90c22-a6ad-4219-9efa-9ba062d7f57d",
   "metadata": {},
   "source": [
    "## Computing scores\n",
    "### Probability of detection (POD)\n",
    "Proportion of observed tracks that are found in ERA5."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "73d70abf-7ba0-472d-bf25-f5c47d1775fa",
   "metadata": {},
   "outputs": [],
   "source": [
    "huracanpy.assess.pod(matches, ref=ib_1996, ref_name=\"IBTrACS\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a79fa3d3-5365-488f-a967-58d0c53633ea",
   "metadata": {},
   "source": [
    "### False alarm rate (FAR)\n",
    "Proportion of detected tracks that were not observed"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "31c17f9d-e830-4dd3-8084-f09430f58b2d",
   "metadata": {},
   "outputs": [],
   "source": [
    "huracanpy.assess.far(matches, detected=era5, detected_name=\"ERA5\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "01fc6748-fef2-4cb4-9b77-b9d91b122cbe",
   "metadata": {},
   "source": [
    "## Venn diagrams\n",
    "Venn diagrams are a convenient way to show the overlap between two datasets."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d719ad71-830c-472d-906e-bfe044b390c9",
   "metadata": {},
   "outputs": [],
   "source": [
    "huracanpy.plot.venn([era5, ib_1996], matches, labels=[\"ERA5\", \"IBTrACS\"])"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}