{ "cells": [ { "cell_type": "code", "execution_count": null, "id": "6d4b7b6b-1126-49ec-b8b4-337321000fe8", "metadata": { "editable": true, "nbsphinx": "hidden", "slideshow": { "slide_type": "" }, "tags": [] }, "outputs": [], "source": [ "%xmode Minimal\n", "import numpy as np\n", "\n", "np.set_printoptions(threshold=5)" ] }, { "cell_type": "markdown", "id": "e32f671b-8fb2-40e8-aa4d-fbd17e2eae90", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "# Subsetting Data\n", "\n", "## Using existing xarray functions\n", "Tracks are loaded as an `xarray.Dataset` which have lots of built in methods for subsetting data.\n", "e.g. for indexing see [xarray indexing](https://docs.xarray.dev/en/stable/user-guide/indexing.html).\n", "\n", "For more specific selection of data, the best method is to use\n", "[xarray.Dataset.where](https://docs.xarray.dev/en/stable/generated/xarray.Dataset.where.html)\n", "with the argument `drop=True`. e.g." ] }, { "cell_type": "code", "execution_count": null, "id": "5e345c67-31c2-443b-af6f-36024ff3f5eb", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "outputs": [], "source": [ "import huracanpy\n", "\n", "tracks = huracanpy.load(huracanpy.example_csv_file)\n", "\n", "# Select all points with longitude > 60\n", "print(tracks.lon, \"\\n\")\n", "tracks_subset = tracks.where(tracks.lon > 60, drop=True)\n", "print(tracks_subset.lon)" ] }, { "cell_type": "markdown", "id": "25225038-3c80-4fdd-a9fd-0e23e5f8fd93", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "## Selecting times\n", "Generally the `time` array will be loaded in as an\n", "[np.datetime64](https://numpy.org/doc/stable/reference/arrays.datetime.html)\n", "array. This means it doesn't work to compare it with the standard `datetime`" ] }, { "cell_type": "code", "execution_count": null, "id": "0c593b94-c1dc-4dde-bc1e-4271ec06b1ba", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [ "raises-exception" ] }, "outputs": [], "source": [ "import datetime\n", "\n", "# Try to select a subset of times based on datetime\n", "print(tracks.time)\n", "tracks_subset = tracks.where(tracks.time > datetime.datetime(1980, 1, 10), drop=True)" ] }, { "cell_type": "markdown", "id": "2401c2a8-7e7e-4c60-95d3-d8b56451f3f2", "metadata": {}, "source": [ "However, the same comparison can be done using `datetime64`, the syntax is just a bit different" ] }, { "cell_type": "code", "execution_count": null, "id": "df00739b-a637-44cc-8eac-5855d0f87d21", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "outputs": [], "source": [ "import numpy as np\n", "\n", "tracks_subset = tracks.where(tracks.time > np.datetime64(\"1980-01-10\"), drop=True)\n", "print(tracks_subset.time)" ] }, { "cell_type": "markdown", "id": "6b275199-ae33-474d-95b9-1fdf6c7dfb1d", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "Note, that this isn't always the case. If the tracks are loaded in with a different\n", "calendar, then the times will use [cftime](https://unidata.github.io/cftime/)\n", "which is not converted to `datetime64` by xarray." ] }, { "cell_type": "code", "execution_count": null, "id": "a8ac434d-a9b8-48d4-925e-c6d510d68ea1", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "outputs": [], "source": [ "# The tracks don't actually use a 360_day calendar.\n", "# I'm just passing this as an argument to show an example of it loading this way\n", "tracks = huracanpy.load(\n", " huracanpy.example_TRACK_file, source=\"track\", track_calendar=\"360_day\"\n", ")\n", "print(tracks.time)" ] }, { "cell_type": "markdown", "id": "574bcf5b-496b-492a-a39a-e69aff8aa08c", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "In this case, neither the `datetime` or the `datetime64` comparison will work and you\n", "have to compare to a `cftime.datetime` object with the same calendar" ] }, { "cell_type": "code", "execution_count": null, "id": "39842f6a-f071-4f00-9dad-8afc45224120", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [ "raises-exception" ] }, "outputs": [], "source": [ "tracks_subset = tracks.where(tracks.time > datetime.datetime(1980, 1, 10), drop=True)" ] }, { "cell_type": "code", "execution_count": null, "id": "87bf2027-8b9c-42ab-8498-d64f5023182c", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [ "raises-exception" ] }, "outputs": [], "source": [ "tracks_subset = tracks.where(tracks.time > np.datetime64(\"1980-01-10\"), drop=True)" ] }, { "cell_type": "code", "execution_count": null, "id": "1e1eab5c-5dac-4ab2-80c5-673e8a0940fd", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "outputs": [], "source": [ "import cftime\n", "\n", "tracks_subset = tracks.where(\n", " tracks.time > cftime.datetime(1980, 1, 10, calendar=\"360_day\"), drop=True\n", ")\n", "print(tracks_subset.time)" ] }, { "cell_type": "markdown", "id": "66405828-51e8-423b-816f-9b23ed575594", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "## Selecting an individual track\n", "\n", "This can fairly easily be achieved by `groupby`" ] }, { "cell_type": "code", "execution_count": null, "id": "5a674435-8975-46a2-8045-88658a4877e2", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "outputs": [], "source": [ "print(np.unique(tracks.track_id))\n", "track = tracks.groupby(\"track_id\")[840]\n", "print(track)" ] }, { "cell_type": "markdown", "id": "01542bc2-76cf-49ce-b5ba-9fbd49555a37", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "However, this can be fairly slow if you have a large amount of tracks and you are doing\n", "nothing else with `groupby`. Instead, you can use `sel_id` to quickly get an individual\n", "track" ] }, { "cell_type": "code", "execution_count": null, "id": "702467d8-f21b-4ea1-8b32-af3348c0618f", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "outputs": [], "source": [ "track = huracanpy.sel_id(tracks, tracks.track_id, 840)\n", "print(track)" ] }, { "cell_type": "markdown", "id": "84113888-e2b6-4314-9355-f43437f6b5a4", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "## Subsetting by track\n", "To apply a criteria to each track in the dataset, use\n", "[huracanpy.trackswhere](../api/_autosummary/huracanpy.trackswhere.rst)" ] }, { "cell_type": "code", "execution_count": null, "id": "22bfae72-04e4-4327-8489-14160f4df557", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "outputs": [], "source": [ "# Add storm category by pressure to each track and filter those that don't reach\n", "# category 2\n", "tracks = huracanpy.load(huracanpy.example_csv_file)\n", "tracks = tracks.hrcn.add_pressure_category(slp_units=\"Pa\")\n", "\n", "# Show the categories for each storm\n", "# Storms 0 and 2 reach category 2, and storm 1 only reaches category 1\n", "for track_id, track in tracks.groupby(\"track_id\"):\n", " print(\"track\", track_id, \"category\", int(track.pressure_category.max()))\n", "\n", "# Subset the tracks by category threshold which will remove track 1\n", "track_subset = huracanpy.trackswhere(\n", " tracks, tracks.track_id, lambda track: track.pressure_category.max() >= 2\n", ")\n", "\n", "# Confirm that track 1 has been filtered out\n", "print(\"\\n\", \"tracks remaining -\", set(track_subset.track_id.data))" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.4" } }, "nbformat": 4, "nbformat_minor": 5 }