{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6d4b7b6b-1126-49ec-b8b4-337321000fe8",
   "metadata": {
    "editable": true,
    "nbsphinx": "hidden",
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "%xmode Minimal\n",
    "import numpy as np\n",
    "\n",
    "np.set_printoptions(threshold=5)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e32f671b-8fb2-40e8-aa4d-fbd17e2eae90",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "source": [
    "# Subsetting Data\n",
    "\n",
    "## Using existing xarray functions\n",
    "Tracks are loaded as an `xarray.Dataset` which have lots of built in methods for subsetting data.\n",
    "e.g. for indexing see [xarray indexing](https://docs.xarray.dev/en/stable/user-guide/indexing.html).\n",
    "\n",
    "For more specific selection of data, the best method is to use\n",
    "[xarray.Dataset.where](https://docs.xarray.dev/en/stable/generated/xarray.Dataset.where.html)\n",
    "with the argument `drop=True`. e.g."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5e345c67-31c2-443b-af6f-36024ff3f5eb",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "import huracanpy\n",
    "\n",
    "tracks = huracanpy.load(huracanpy.example_csv_file)\n",
    "\n",
    "# Select all points with longitude > 60\n",
    "print(tracks.lon, \"\\n\")\n",
    "tracks_subset = tracks.where(tracks.lon > 60, drop=True)\n",
    "print(tracks_subset.lon)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "25225038-3c80-4fdd-a9fd-0e23e5f8fd93",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "source": [
    "## Selecting times\n",
    "Generally the `time` array will be loaded in as an\n",
    "[np.datetime64](https://numpy.org/doc/stable/reference/arrays.datetime.html)\n",
    "array. This means it doesn't work to compare it with the standard `datetime`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0c593b94-c1dc-4dde-bc1e-4271ec06b1ba",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": [
     "raises-exception"
    ]
   },
   "outputs": [],
   "source": [
    "import datetime\n",
    "\n",
    "# Try to select a subset of times based on datetime\n",
    "print(tracks.time)\n",
    "tracks_subset = tracks.where(tracks.time > datetime.datetime(1980, 1, 10), drop=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2401c2a8-7e7e-4c60-95d3-d8b56451f3f2",
   "metadata": {},
   "source": [
    "However, the same comparison can be done using `datetime64`, the syntax is just a bit different"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "df00739b-a637-44cc-8eac-5855d0f87d21",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "\n",
    "tracks_subset = tracks.where(tracks.time > np.datetime64(\"1980-01-10\"), drop=True)\n",
    "print(tracks_subset.time)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6b275199-ae33-474d-95b9-1fdf6c7dfb1d",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "source": [
    "Note, that this isn't always the case. If the tracks are loaded in with a different\n",
    "calendar, then the times will use [cftime](https://unidata.github.io/cftime/)\n",
    "which is not converted to `datetime64` by xarray."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a8ac434d-a9b8-48d4-925e-c6d510d68ea1",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "# The tracks don't actually use a 360_day calendar.\n",
    "# I'm just passing this as an argument to show an example of it loading this way\n",
    "tracks = huracanpy.load(\n",
    "    huracanpy.example_TRACK_file, source=\"track\", track_calendar=\"360_day\"\n",
    ")\n",
    "print(tracks.time)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "574bcf5b-496b-492a-a39a-e69aff8aa08c",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "source": [
    "In this case, neither the `datetime` or the `datetime64` comparison will work and you\n",
    "have to compare to a `cftime.datetime` object with the same calendar"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "39842f6a-f071-4f00-9dad-8afc45224120",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": [
     "raises-exception"
    ]
   },
   "outputs": [],
   "source": [
    "tracks_subset = tracks.where(tracks.time > datetime.datetime(1980, 1, 10), drop=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "87bf2027-8b9c-42ab-8498-d64f5023182c",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": [
     "raises-exception"
    ]
   },
   "outputs": [],
   "source": [
    "tracks_subset = tracks.where(tracks.time > np.datetime64(\"1980-01-10\"), drop=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1e1eab5c-5dac-4ab2-80c5-673e8a0940fd",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "import cftime\n",
    "\n",
    "tracks_subset = tracks.where(\n",
    "    tracks.time > cftime.datetime(1980, 1, 10, calendar=\"360_day\"), drop=True\n",
    ")\n",
    "print(tracks_subset.time)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "66405828-51e8-423b-816f-9b23ed575594",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "source": [
    "## Selecting an individual track\n",
    "\n",
    "This can fairly easily be achieved by `groupby`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5a674435-8975-46a2-8045-88658a4877e2",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "print(np.unique(tracks.track_id))\n",
    "track = tracks.groupby(\"track_id\")[840]\n",
    "print(track)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "01542bc2-76cf-49ce-b5ba-9fbd49555a37",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "source": [
    "However, this can be fairly slow if you have a large amount of tracks and you are doing\n",
    "nothing else with `groupby`. Instead, you can use `sel_id` to quickly get an individual\n",
    "track"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "702467d8-f21b-4ea1-8b32-af3348c0618f",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "track = huracanpy.sel_id(tracks, tracks.track_id, 840)\n",
    "print(track)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "84113888-e2b6-4314-9355-f43437f6b5a4",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "source": [
    "## Subsetting by track\n",
    "To apply a criteria to each track in the dataset, use\n",
    "[huracanpy.trackswhere](../api/_autosummary/huracanpy.trackswhere.rst)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "22bfae72-04e4-4327-8489-14160f4df557",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Add storm category by pressure to each track and filter those that don't reach\n",
    "# category 2\n",
    "tracks = huracanpy.load(huracanpy.example_csv_file)\n",
    "tracks = tracks.hrcn.add_pressure_category(slp_units=\"Pa\")\n",
    "\n",
    "# Show the categories for each storm\n",
    "# Storms 0 and 2 reach category 2, and storm 1 only reaches category 1\n",
    "for track_id, track in tracks.groupby(\"track_id\"):\n",
    "    print(\"track\", track_id, \"category\", int(track.pressure_category.max()))\n",
    "\n",
    "# Subset the tracks by category threshold which will remove track 1\n",
    "track_subset = huracanpy.trackswhere(\n",
    "    tracks, tracks.track_id, lambda track: track.pressure_category.max() >= 2\n",
    ")\n",
    "\n",
    "# Confirm that track 1 has been filtered out\n",
    "print(\"\\n\", \"tracks remaining -\", set(track_subset.track_id.data))"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}