{ "cells": [ { "cell_type": "markdown", "id": "654a7e7d-0d37-48dd-9738-4c20c510ad8e", "metadata": {}, "source": [ "# Loading tracks from a file or IBTrACS\n", "\n", "One of the main motivations for HuracanPy was to provide a common tool to load the tracks that come from different sources with various incompatible formats.\n", "\n", "HuracanPy provides the `load` function which can be used for loading either tracks from a file on your computer, or from databases (currently, only IBTrACS). \n", "Additionally, HuracanPy embeds small data samples from various formats for examples and testing. " ] }, { "cell_type": "code", "execution_count": null, "id": "9a96b6bf-fce0-4afd-a9b4-346c91ae782b", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "outputs": [], "source": [ "import huracanpy" ] }, { "cell_type": "markdown", "id": "cedac95c-dd85-4fca-bcf0-8dd2c407f161", "metadata": {}, "source": [ "## Loading tracks from files" ] }, { "cell_type": "markdown", "id": "3b95771e-9aab-4203-bda5-865465b16d06", "metadata": {}, "source": [ "To load tracks from file, the basic syntax is `huracanpy.load(filepath, source = \"type-of-file\")`. Below we describe supported format, and potential associated additional options." ] }, { "cell_type": "markdown", "id": "ef691b1f-9fb4-405c-a650-6538f21c75b8", "metadata": {}, "source": [ "### CSV\n", "\n", "A CSV is a compact and simple way of storing track data. Each row corresponds to a point, identified by its position in space and time. \n", "If you tracks are stored in csv (including\n", "if they were outputed from TempestExtremes' StitchNodes), you can specify the\n", "`source=\"csv\"` argument, or, if your filename ends with *csv*, it will be detected\n", "automatically.\n", "\n", "`huracanpy.load` will read most of the CSV file as it is to output as an\n", "`xarray.Dataset`. There can be a few extra modifications\n", "to make sure the output has the variables `track_id`, `time`, `lon`, and `lat`.\n", "For example, in the file used here, the time variable is constructed from\n", "`year`, `month`, `day`, and `hour`." ] }, { "cell_type": "code", "execution_count": null, "id": "dc604929-c0fa-479e-aa5e-871b71399f4c", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "outputs": [], "source": [ "# HuracanPy embeds an example csv file. Here is the content of the file.\n", "!head {huracanpy.example_csv_file}" ] }, { "cell_type": "code", "execution_count": null, "id": "55bf511e-dc63-4ca1-95fb-e2c1e5116d7b", "metadata": {}, "outputs": [], "source": [ "file = (\n", " huracanpy.example_csv_file\n", ") # Replace with your file name if necessary (including the .csv extension)\n", "huracanpy.load(file, source=\"csv\") # Load the file" ] }, { "cell_type": "markdown", "id": "6613dbc6-02d2-4db4-a38a-cf8a723bd9ac", "metadata": {}, "source": [ "Advanced: You can pass arguments to `pd.read_csv` through `load`." ] }, { "cell_type": "markdown", "id": "1c4ed4e3-7983-42e1-a345-60bdea8418db", "metadata": {}, "source": [ "### NetCDF\n", "\n", "Similar to CSV, NetCDF data can largely be loaded as is. NetCDF has the disadvantage of\n", "not being readable like a CSV, but the advantage that it can better store metadata about\n", "variables.\n", "\n", "`huracanpy.load` only recognizes NetCDF files if their name ends with `.nc`. \n", "\n", "HuracanPy assumes that NetCDF files follow the [CF convention](http://cfconventions.org/Data/cf-conventions/cf-conventions-1.11/cf-conventions.html#_contiguous_ragged_array_representation_of_trajectories)\n", "This allows the load function to identify the TRACK_ID and extend it along the data\n", "dimension. \n", "\n", "Like loading CSV data, some variables are renamed. In the example the positions\n", "are `longitude` and `latitude` in the netCDF file, but are renamed to `lon` and `lat`.\n", "\n", "NB: This supports loading NetCDF files from TRACK, CHAZ or MIt-Open." ] }, { "cell_type": "code", "execution_count": null, "id": "b1a21795-bbd7-4890-a39e-75d636e36ea5", "metadata": { "editable": true, "scrolled": true, "slideshow": { "slide_type": "" }, "tags": [] }, "outputs": [], "source": [ "# HuracanPy embeds an example netcdf file. Here is the content of the file.\n", "!ncdump -h {huracanpy.example_TRACK_netcdf_file} | head -n 30" ] }, { "cell_type": "code", "execution_count": null, "id": "9ba97a62-dc2f-4d2c-838b-555c846890cd", "metadata": { "scrolled": true }, "outputs": [], "source": [ "file = (\n", " huracanpy.example_TRACK_netcdf_file\n", ") # Replace with your file name if necessary (including the .nc extension)\n", "huracanpy.load(\n", " file,\n", ") # Load the file" ] }, { "cell_type": "markdown", "id": "41aeaf8d-3653-49ee-99bb-8b9834106fe8", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "### TRACK\n", "\n", "TRACK is a cyclone tracker, which output text files with tracks data. Note that TRACK\n", "files don't contain the variable names, instead they are usually described in the\n", "filename. Currently `huracanpy.load` doesn't try to infer the variable names from the\n", "filename. Instead, any extra variables will be named feature_n, where n is between 0 and\n", "number of variables minus 1. TRACK also associates extra coordinates with some of these\n", "features, these will be loaded as feature_n_longitude and feature_n_latitude." ] }, { "cell_type": "code", "execution_count": null, "id": "5767fffe-3ae9-489e-832e-0a93986554f1", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "outputs": [], "source": [ "# HuracanPy embeds an example file. Here is the content of the file.\n", "!head {huracanpy.example_TRACK_file}" ] }, { "cell_type": "code", "execution_count": null, "id": "1ebad373-64d9-43cb-ad5a-ed758d71ba61", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "outputs": [], "source": [ "file = huracanpy.example_TRACK_file # Replace with your file name if necessary\n", "huracanpy.load(file, source=\"TRACK\")" ] }, { "cell_type": "markdown", "id": "10edab87-a4a6-4201-896d-99a5ffacec7e", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "If you want to load the variables by name, then pass a list of variable names to\n", "`huracanpy.load`. The associated longitudes/latitudes are associated to the respective\n", "feature names." ] }, { "cell_type": "code", "execution_count": null, "id": "df9770ec-6617-46ca-b6b7-3ec7953d0403", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "outputs": [], "source": [ "file = huracanpy.example_TRACK_file\n", "variable_names = [\n", " *[f\"vorticity_{n}hpa\" for n in [850, 700, 600, 500, 400, 300, 200]],\n", " \"mslp\",\n", " \"vmax_925hpa\",\n", " \"vmax_10m\",\n", "]\n", "tracks = huracanpy.load(file, source=\"TRACK\", variable_names=variable_names)\n", "tracks" ] }, { "cell_type": "markdown", "id": "55bcf637-8114-4076-9b3c-8f10c0b21df4", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "The TRACK file contains vorticity on multiple levels. While xarray can allow these\n", "vorticity profiles to be a multidimensional variable, currently each level is loaded in\n", "as a separate variable. To group the variables into one variable you can follow the\n", "example below. In the future we will support this grouping in the load function" ] }, { "cell_type": "code", "execution_count": null, "id": "660e1e17-55e4-4b5e-92a1-efa9cabad8a1", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "outputs": [], "source": [ "import numpy as np\n", "\n", "# Add a pressure as a coordinate (add as a variable and then promote)\n", "tracks[\"pressure\"] = (\"pressure\", [850, 700, 600, 500, 400, 300, 200])\n", "tracks = tracks.set_coords(\"pressure\")\n", "\n", "# Group the various vorticity levels into a single variable\n", "# Do the same for the associated lon/lat\n", "vorticity = np.zeros([tracks.sizes[\"record\"], tracks.sizes[\"pressure\"]])\n", "vorticity_lon = np.zeros_like(vorticity)\n", "vorticity_lat = np.zeros_like(vorticity)\n", "for n, plev in enumerate(tracks.pressure):\n", " # Use the naming specified when loading the variables\n", " # Use int(plev) otherwise the whole xarray DataArray details are included in the\n", " # string\n", " name = f\"vorticity_{int(plev)}hpa\"\n", " vorticity[:, n] = tracks[name]\n", " vorticity_lon[:, n] = tracks[name + \"_lon\"]\n", " vorticity_lat[:, n] = tracks[name + \"_lat\"]\n", "\n", " # Remove the old variables\n", " tracks = tracks.drop_vars([name, name + \"_lon\", name + \"_lat\"])\n", "\n", "# Use the name vorticity\n", "tracks = tracks.assign(\n", " relative_vorticity=([\"record\", \"pressure\"], vorticity),\n", " relative_vorticity_lon=([\"record\", \"pressure\"], vorticity_lon),\n", " relative_vorticity_lat=([\"record\", \"pressure\"], vorticity_lat),\n", ")\n", "tracks" ] }, { "cell_type": "markdown", "id": "102169c0-05a1-403e-8082-748fcd2cb773", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "### TRACK tilt files\n", "TRACK has a program to calculate the tilt of the vortex at each point. The output of\n", "this program is different to normal track files so you need to specify\n", "`source=\"track.tilt\"` to load it. The pressure levels are included in the file so,\n", "unlike other TRACK files, the multiple levels of tilt are combined to a single variable\n", "with an extra level coordinate in the load function." ] }, { "cell_type": "code", "execution_count": null, "id": "dfcd179a-e014-47b5-a26a-4b0100fa0405", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "outputs": [], "source": [ "!head {huracanpy.example_TRACK_tilt_file}" ] }, { "cell_type": "code", "execution_count": null, "id": "c3b07843-dd9a-4d75-b358-9908e1e9a084", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "outputs": [], "source": [ "huracanpy.load(huracanpy.example_TRACK_tilt_file, source=\"track.tilt\")" ] }, { "cell_type": "markdown", "id": "cfe83901-a0b0-4fbb-82ae-0933539f347e", "metadata": {}, "source": [ "### TempestExtremes/GFDL textual format\n", "\n", "TempestExtremes & GFDL also has their own textual format. Note however that TempestExtremes' `StitchNodes` in particular can output csv and we recommend that option. \n", "\n", "*Variable names:* These files can be read with HuracanPy specifying `source=\"te\"`. Because the file themselves do not embed variable names, you may pass them with `variable_names`. \n", "\n", "*Tracks from unstructured grid:* By default, HuracanPy assumes that your file comes from tracking structured data, hence has two grid indices `i` and `j`. If this is not the case (i.e. file comes from tracking unstructured data), then you need to specify `tempest_extremes_unstructured=True` so that only one index `i` is read. \n", "\n", "*Line starting keyword:* Finally, if the line starting keyword is not \"`start`\", you can specify it with `tempest_extremes_header_str`" ] }, { "cell_type": "code", "execution_count": null, "id": "cb4e1444-b5d7-4a85-80a9-8031213bc949", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "outputs": [], "source": [ "# HuracanPy embeds an example GFDL format file. Here is the content of the file.\n", "!head {huracanpy.example_TE_file}" ] }, { "cell_type": "code", "execution_count": null, "id": "ccd44e39-cc87-4cb4-92f5-d10f0fb16efc", "metadata": {}, "outputs": [], "source": [ "file = huracanpy.example_TE_file # Replace with your file name if necessary\n", "huracanpy.load(file, source=\"te\")" ] }, { "cell_type": "code", "execution_count": null, "id": "23a30230-ddad-45b2-b344-b12966b88135", "metadata": {}, "outputs": [], "source": [ "# Providing names\n", "file = huracanpy.example_TE_file\n", "variable_names = [\"slp\", \"wind10\"]\n", "huracanpy.load(file, source=\"te\", variable_names=variable_names)" ] }, { "cell_type": "markdown", "id": "d0932274-2aaf-421d-8efa-f0ead1e6790f", "metadata": {}, "source": [ "### \"Old HURDAT\"/ECMWF\n", "The ECMWF uses track files which format is based on the \"old HURDAT\" format. " ] }, { "cell_type": "code", "execution_count": null, "id": "9599a4ff-c241-447b-9adf-05a1906dbeed", "metadata": {}, "outputs": [], "source": [ "file = huracanpy.example_old_HURDAT_file" ] }, { "cell_type": "code", "execution_count": null, "id": "1a2d42b8-5684-42fb-84bb-6f9aeec280a7", "metadata": {}, "outputs": [], "source": [ "huracanpy.load(\n", " file,\n", " source=\"ecmwf\",\n", ")" ] }, { "cell_type": "markdown", "id": "cf2034c3-6692-4067-a8b2-d16e327ede3a", "metadata": {}, "source": [ "### STORM (No explicit track ID)\n", "Files output from\n", "[STORM (Synthetic Tropical cyclOne geneRation Model)](https://github.com/MRibberink/STORM2.0)\n", "are in a CSV format but don't have a track_id. Instead they have a storm number that\n", "starts from zero for each year. From the combination of these two variables we can\n", "infer a track_id for each unique track." ] }, { "cell_type": "code", "execution_count": null, "id": "6a301987-9255-4215-976d-8d200060f5b6", "metadata": {}, "outputs": [], "source": [ "# HuracanPy embeds an example STORM file. Here is the content of the file.\n", "!head {huracanpy.example_STORM_file}" ] }, { "cell_type": "code", "execution_count": null, "id": "c295d127-9824-4a16-a205-3eac3301018e", "metadata": {}, "outputs": [], "source": [ "# The CSV does not have a header so we need to specify the names of the columns\n", "# Note that these are passed to huracanpy.load as names=, not variable_names=, because\n", "# we are passing them to pandas.load_csv\n", "names = [\n", " \"year\",\n", " \"month\",\n", " \"time\",\n", " \"TC_num\",\n", " \"timeStep\",\n", " \"basinID\",\n", " \"lat\",\n", " \"lon\",\n", " \"minP\",\n", " \"Vmax\",\n", " \"Rmax\",\n", " \"cat\",\n", " \"landfall\",\n", " \"dist_land\",\n", "]\n", "\n", "# The filename does not end with CSV, so we specify the source as CSV\n", "# The variable names are converted to lower case wehn loading from a CSV, so specify\n", "# them as lower case to `infer_track_id`\n", "huracanpy.load(\n", " huracanpy.example_STORM_file,\n", " names=names,\n", " source=\"csv\",\n", " infer_track_id=[\"year\", \"tc_num\"],\n", ")" ] }, { "cell_type": "markdown", "id": "bbf77611-7b6d-4928-8ba1-b9c7a1602b97", "metadata": {}, "source": [ "## Loading IBTrACS\n", "The [International Best Track Archive for Climate Stewardship (IBTrACS)](https://www.ncei.noaa.gov/products/international-best-track-archive) is a reference observational dataset.\n", "\n", "HuracanPy embeds two subsets of IBTrACS for offline use, and can also retrieve the latest online version. \n", "They can be loaded with `huracanpy.load(source=\"ibtracs\")` without specifying a filename.\n", "\n", "NB: A warning will be raised when you load the data to remind you of the main caveats." ] }, { "cell_type": "markdown", "id": "9685d2c9-cdb0-46b0-84b7-4b6645cae681", "metadata": {}, "source": [ "### Offline subsets\n", "By default, HuracanPy will use the offline option. Two subsets of IBTrACS for offline use: \n", "* \"WMO\": Data with the wmo_* variables. The data as reported by the WMO agency responsible for each basin. (Default)\n", "* \"JTWC\": Data with the usa_* variables. The data as recorded by the USA/Joint Typhoon Warning Centre.\n", "\n", "NB: These offline files are updated manually by the developers. As such, they may not correspond to the latest versions. If you want the latest version and/or more columns, use the online option below." ] }, { "cell_type": "code", "execution_count": null, "id": "dcf8f6a3-f6e5-4b88-8a54-26ab6af38c9d", "metadata": {}, "outputs": [], "source": [ "huracanpy.load(source=\"ibtracs\", ibtracs_subset=\"wmo\") # WMO subset" ] }, { "cell_type": "code", "execution_count": null, "id": "f49b9ec9-0dfa-41cb-b860-aba203466e31", "metadata": {}, "outputs": [], "source": [ "huracanpy.load(source=\"ibtracs\", ibtracs_subset=\"jtwc\") # JTWC subset" ] }, { "cell_type": "markdown", "id": "3d0afeef-3143-4530-ae74-96ef8cdc2707", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "### Online subsets\n", "You can download the latest IBTrACS subsets from NOAA's storage by selecting specific subsets. In this case the `ibtracs_subset` refers to the official IBTrACS subsets:\n", "- **ACTIVE**: TCs currently active\n", "- **ALL**: Entire IBTrACS database\n", "- Specific basins: **EP**, **NA**, **NI**, **SA**, **SI**, **SP**, **WP**\n", "- **last3years**: self-explanatory\n", "- **since1980**: Entire IBTrACS database since 1980 (advent of satellite era,\n", " considered reliable from then on)\n", "\n", "Example: `huracanpy.load(source=\"IBTrACS\", ibtracs_subset=\"ALL\")`\n", "\n", "Note that this will fail if you are using a machine that is not currently connected to the internet. HuracanPy developers' decline all responsibility for any breach in security resulting from using this online option.\n", "\n", "`huracanpy` won't load locally saved copies of IBTrACS. We would recommend downloading once, subsetting, then saving a copy as CSV or NetCDF with `huracanpy.save`. Also note that the NetCDF files provided by IBTrACS are not (currently) compatible with `huracanpy` because the format is different." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.4" } }, "nbformat": 4, "nbformat_minor": 5 }