{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "654a7e7d-0d37-48dd-9738-4c20c510ad8e",
   "metadata": {},
   "source": [
    "# Loading tracks from a file or IBTrACS\n",
    "\n",
    "One of the main motivations for HuracanPy was to provide a common tool to load the tracks that come from different sources with various incompatible formats.\n",
    "\n",
    "HuracanPy provides the `load` function which can be used for loading either tracks from a file on your computer, or from databases (currently, only IBTrACS). \n",
    "Additionally, HuracanPy embeds small data samples from various formats for examples and testing. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9a96b6bf-fce0-4afd-a9b4-346c91ae782b",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "import huracanpy"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cedac95c-dd85-4fca-bcf0-8dd2c407f161",
   "metadata": {},
   "source": [
    "## Loading tracks from files"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3b95771e-9aab-4203-bda5-865465b16d06",
   "metadata": {},
   "source": [
    "To load tracks from file, the basic syntax is `huracanpy.load(filepath, source = \"type-of-file\")`. Below we describe supported format, and potential associated additional options."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ef691b1f-9fb4-405c-a650-6538f21c75b8",
   "metadata": {},
   "source": [
    "### CSV\n",
    "\n",
    "A CSV is a compact and simple way of storing track data. Each row corresponds to a point, identified by its position in space and time. \n",
    "If you tracks are stored in csv (including\n",
    "if they were outputed from TempestExtremes' StitchNodes), you can specify the\n",
    "`source=\"csv\"` argument, or, if your filename ends with *csv*, it will be detected\n",
    "automatically.\n",
    "\n",
    "`huracanpy.load` will read most of the CSV file as it is to output as an\n",
    "`xarray.Dataset`. There can be a few extra modifications\n",
    "to make sure the output has the variables `track_id`, `time`, `lon`, and `lat`.\n",
    "For example, in the file used here, the time variable is constructed from\n",
    "`year`, `month`, `day`, and `hour`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "dc604929-c0fa-479e-aa5e-871b71399f4c",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "# HuracanPy embeds an example csv file. Here is the content of the file.\n",
    "!head {huracanpy.example_csv_file}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "55bf511e-dc63-4ca1-95fb-e2c1e5116d7b",
   "metadata": {},
   "outputs": [],
   "source": [
    "file = (\n",
    "    huracanpy.example_csv_file\n",
    ")  # Replace with your file name if necessary (including the .csv extension)\n",
    "huracanpy.load(file, source=\"csv\")  # Load the file"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6613dbc6-02d2-4db4-a38a-cf8a723bd9ac",
   "metadata": {},
   "source": [
    "Advanced: You can pass arguments to `pd.read_csv` through `load`."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1c4ed4e3-7983-42e1-a345-60bdea8418db",
   "metadata": {},
   "source": [
    "### NetCDF\n",
    "\n",
    "Similar to CSV, NetCDF data can largely be loaded as is. NetCDF has the disadvantage of\n",
    "not being readable like a CSV, but the advantage that it can better store metadata about\n",
    "variables.\n",
    "\n",
    "`huracanpy.load` only recognizes NetCDF files if their name ends with `.nc`. \n",
    "\n",
    "HuracanPy assumes that NetCDF files follow the [CF convention](http://cfconventions.org/Data/cf-conventions/cf-conventions-1.11/cf-conventions.html#_contiguous_ragged_array_representation_of_trajectories)\n",
    "This allows the load function to identify the TRACK_ID and extend it along the data\n",
    "dimension. \n",
    "\n",
    "Like loading CSV data, some variables are renamed. In the example the positions\n",
    "are `longitude` and `latitude` in the netCDF file, but are renamed to `lon` and `lat`.\n",
    "\n",
    "NB: This supports loading NetCDF files from TRACK, CHAZ or MIt-Open."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b1a21795-bbd7-4890-a39e-75d636e36ea5",
   "metadata": {
    "editable": true,
    "scrolled": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "# HuracanPy embeds an example netcdf file. Here is the content of the file.\n",
    "!ncdump -h {huracanpy.example_TRACK_netcdf_file} | head -n 30"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9ba97a62-dc2f-4d2c-838b-555c846890cd",
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "file = (\n",
    "    huracanpy.example_TRACK_netcdf_file\n",
    ")  # Replace with your file name if necessary (including the .nc extension)\n",
    "huracanpy.load(\n",
    "    file,\n",
    ")  # Load the file"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "41aeaf8d-3653-49ee-99bb-8b9834106fe8",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "source": [
    "### TRACK\n",
    "\n",
    "TRACK is a cyclone tracker, which output text files with tracks data. Note that TRACK\n",
    "files don't contain the variable names, instead they are usually described in the\n",
    "filename. Currently `huracanpy.load` doesn't try to infer the variable names from the\n",
    "filename. Instead, any extra variables will be named feature_n, where n is between 0 and\n",
    "number of variables minus 1. TRACK also associates extra coordinates with some of these\n",
    "features, these will be loaded as feature_n_longitude and feature_n_latitude."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5767fffe-3ae9-489e-832e-0a93986554f1",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "# HuracanPy embeds an example file. Here is the content of the file.\n",
    "!head {huracanpy.example_TRACK_file}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1ebad373-64d9-43cb-ad5a-ed758d71ba61",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "file = huracanpy.example_TRACK_file  # Replace with your file name if necessary\n",
    "huracanpy.load(file, source=\"TRACK\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "10edab87-a4a6-4201-896d-99a5ffacec7e",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "source": [
    "If you want to load the variables by name, then pass a list of variable names to\n",
    "`huracanpy.load`. The associated longitudes/latitudes are associated to the respective\n",
    "feature names."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "df9770ec-6617-46ca-b6b7-3ec7953d0403",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "file = huracanpy.example_TRACK_file\n",
    "variable_names = [\n",
    "    *[f\"vorticity_{n}hpa\" for n in [850, 700, 600, 500, 400, 300, 200]],\n",
    "    \"mslp\",\n",
    "    \"vmax_925hpa\",\n",
    "    \"vmax_10m\",\n",
    "]\n",
    "tracks = huracanpy.load(file, source=\"TRACK\", variable_names=variable_names)\n",
    "tracks"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "55bcf637-8114-4076-9b3c-8f10c0b21df4",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "source": [
    "The TRACK file contains vorticity on multiple levels. While xarray can allow these\n",
    "vorticity profiles to be a multidimensional variable, currently each level is loaded in\n",
    "as a separate variable. To group the variables into one variable you can follow the\n",
    "example below. In the future we will support this grouping in the load function"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "660e1e17-55e4-4b5e-92a1-efa9cabad8a1",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "\n",
    "# Add a pressure as a coordinate (add as a variable and then promote)\n",
    "tracks[\"pressure\"] = (\"pressure\", [850, 700, 600, 500, 400, 300, 200])\n",
    "tracks = tracks.set_coords(\"pressure\")\n",
    "\n",
    "# Group the various vorticity levels into a single variable\n",
    "# Do the same for the associated lon/lat\n",
    "vorticity = np.zeros([tracks.sizes[\"record\"], tracks.sizes[\"pressure\"]])\n",
    "vorticity_lon = np.zeros_like(vorticity)\n",
    "vorticity_lat = np.zeros_like(vorticity)\n",
    "for n, plev in enumerate(tracks.pressure):\n",
    "    # Use the naming specified when loading the variables\n",
    "    # Use int(plev) otherwise the whole xarray DataArray details are included in the\n",
    "    # string\n",
    "    name = f\"vorticity_{int(plev)}hpa\"\n",
    "    vorticity[:, n] = tracks[name]\n",
    "    vorticity_lon[:, n] = tracks[name + \"_lon\"]\n",
    "    vorticity_lat[:, n] = tracks[name + \"_lat\"]\n",
    "\n",
    "    # Remove the old variables\n",
    "    tracks = tracks.drop_vars([name, name + \"_lon\", name + \"_lat\"])\n",
    "\n",
    "# Use the name vorticity\n",
    "tracks = tracks.assign(\n",
    "    relative_vorticity=([\"record\", \"pressure\"], vorticity),\n",
    "    relative_vorticity_lon=([\"record\", \"pressure\"], vorticity_lon),\n",
    "    relative_vorticity_lat=([\"record\", \"pressure\"], vorticity_lat),\n",
    ")\n",
    "tracks"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "102169c0-05a1-403e-8082-748fcd2cb773",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "source": [
    "### TRACK tilt files\n",
    "TRACK has a program to calculate the tilt of the vortex at each point. The output of\n",
    "this program is different to normal track files so you need to specify\n",
    "`source=\"track.tilt\"` to load it. The pressure levels are included in the file so,\n",
    "unlike other TRACK files, the multiple levels of tilt are combined to a single variable\n",
    "with an extra level coordinate in the load function."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "dfcd179a-e014-47b5-a26a-4b0100fa0405",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "!head {huracanpy.example_TRACK_tilt_file}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c3b07843-dd9a-4d75-b358-9908e1e9a084",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "huracanpy.load(huracanpy.example_TRACK_tilt_file, source=\"track.tilt\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cfe83901-a0b0-4fbb-82ae-0933539f347e",
   "metadata": {},
   "source": [
    "### TempestExtremes/GFDL textual format\n",
    "\n",
    "TempestExtremes & GFDL also has their own textual format. Note however that TempestExtremes' `StitchNodes` in particular can output csv and we recommend that option. \n",
    "\n",
    "*Variable names:* These files can be read with HuracanPy specifying `source=\"te\"`. Because the file themselves do not embed variable names, you may pass them with `variable_names`. \n",
    "\n",
    "*Tracks from unstructured grid:* By default, HuracanPy assumes that your file comes from tracking structured data, hence has two grid indices `i` and `j`. If this is not the case (i.e. file comes from tracking unstructured data), then you need to specify `tempest_extremes_unstructured=True` so that only one index `i` is read. \n",
    "\n",
    "*Line starting keyword:* Finally, if the line starting keyword is not \"`start`\", you can specify it with `tempest_extremes_header_str`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cb4e1444-b5d7-4a85-80a9-8031213bc949",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "# HuracanPy embeds an example GFDL format file. Here is the content of the file.\n",
    "!head {huracanpy.example_TE_file}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ccd44e39-cc87-4cb4-92f5-d10f0fb16efc",
   "metadata": {},
   "outputs": [],
   "source": [
    "file = huracanpy.example_TE_file  # Replace with your file name if necessary\n",
    "huracanpy.load(file, source=\"te\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "23a30230-ddad-45b2-b344-b12966b88135",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Providing names\n",
    "file = huracanpy.example_TE_file\n",
    "variable_names = [\"slp\", \"wind10\"]\n",
    "huracanpy.load(file, source=\"te\", variable_names=variable_names)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d0932274-2aaf-421d-8efa-f0ead1e6790f",
   "metadata": {},
   "source": [
    "### \"Old HURDAT\"/ECMWF\n",
    "The ECMWF uses track files which format is based on the \"old HURDAT\" format. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9599a4ff-c241-447b-9adf-05a1906dbeed",
   "metadata": {},
   "outputs": [],
   "source": [
    "file = huracanpy.example_old_HURDAT_file"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1a2d42b8-5684-42fb-84bb-6f9aeec280a7",
   "metadata": {},
   "outputs": [],
   "source": [
    "huracanpy.load(\n",
    "    file,\n",
    "    source=\"ecmwf\",\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cf2034c3-6692-4067-a8b2-d16e327ede3a",
   "metadata": {},
   "source": [
    "### STORM (No explicit track ID)\n",
    "Files output from\n",
    "[STORM (Synthetic Tropical cyclOne geneRation Model)](https://github.com/MRibberink/STORM2.0)\n",
    "are in a CSV format but don't have a track_id. Instead they have a storm number that\n",
    "starts from zero for each year. From the combination of these two variables we can\n",
    "infer a track_id for each unique track."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6a301987-9255-4215-976d-8d200060f5b6",
   "metadata": {},
   "outputs": [],
   "source": [
    "# HuracanPy embeds an example STORM file. Here is the content of the file.\n",
    "!head {huracanpy.example_STORM_file}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c295d127-9824-4a16-a205-3eac3301018e",
   "metadata": {},
   "outputs": [],
   "source": [
    "# The CSV does not have a header so we need to specify the names of the columns\n",
    "# Note that these are passed to huracanpy.load as names=, not variable_names=, because\n",
    "# we are passing them to pandas.load_csv\n",
    "names = [\n",
    "    \"year\",\n",
    "    \"month\",\n",
    "    \"time\",\n",
    "    \"TC_num\",\n",
    "    \"timeStep\",\n",
    "    \"basinID\",\n",
    "    \"lat\",\n",
    "    \"lon\",\n",
    "    \"minP\",\n",
    "    \"Vmax\",\n",
    "    \"Rmax\",\n",
    "    \"cat\",\n",
    "    \"landfall\",\n",
    "    \"dist_land\",\n",
    "]\n",
    "\n",
    "# The filename does not end with CSV, so we specify the source as CSV\n",
    "# The variable names are converted to lower case wehn loading from a CSV, so specify\n",
    "# them as lower case to `infer_track_id`\n",
    "huracanpy.load(\n",
    "    huracanpy.example_STORM_file,\n",
    "    names=names,\n",
    "    source=\"csv\",\n",
    "    infer_track_id=[\"year\", \"tc_num\"],\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bbf77611-7b6d-4928-8ba1-b9c7a1602b97",
   "metadata": {},
   "source": [
    "## Loading IBTrACS\n",
    "The [International Best Track Archive for Climate Stewardship (IBTrACS)](https://www.ncei.noaa.gov/products/international-best-track-archive) is a reference observational dataset.\n",
    "\n",
    "HuracanPy embeds two subsets of IBTrACS for offline use, and can also retrieve the latest online version. \n",
    "They can be loaded with `huracanpy.load(source=\"ibtracs\")` without specifying a filename.\n",
    "\n",
    "NB: A warning will be raised when you load the data to remind you of the main caveats."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9685d2c9-cdb0-46b0-84b7-4b6645cae681",
   "metadata": {},
   "source": [
    "### Offline subsets\n",
    "By default, HuracanPy will use the offline option. Two subsets of IBTrACS for offline use: \n",
    "* \"WMO\": Data with the wmo_* variables. The data as reported by the WMO agency responsible for each basin. (Default)\n",
    "* \"JTWC\": Data with the usa_* variables. The data as recorded by the USA/Joint Typhoon Warning Centre.\n",
    "\n",
    "NB: These offline files are updated manually by the developers. As such, they may not correspond to the latest versions. If you want the latest version and/or more columns, use the online option below."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "dcf8f6a3-f6e5-4b88-8a54-26ab6af38c9d",
   "metadata": {},
   "outputs": [],
   "source": [
    "huracanpy.load(source=\"ibtracs\", ibtracs_subset=\"wmo\")  # WMO subset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f49b9ec9-0dfa-41cb-b860-aba203466e31",
   "metadata": {},
   "outputs": [],
   "source": [
    "huracanpy.load(source=\"ibtracs\", ibtracs_subset=\"jtwc\")  # JTWC subset"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3d0afeef-3143-4530-ae74-96ef8cdc2707",
   "metadata": {
    "editable": true,
    "slideshow": {
     "slide_type": ""
    },
    "tags": []
   },
   "source": [
    "### Online subsets\n",
    "You can download the latest IBTrACS subsets from NOAA's storage by selecting specific subsets. In this case the `ibtracs_subset` refers to the official IBTrACS subsets:\n",
    "- **ACTIVE**: TCs currently active\n",
    "- **ALL**: Entire IBTrACS database\n",
    "- Specific basins: **EP**, **NA**, **NI**, **SA**, **SI**, **SP**, **WP**\n",
    "- **last3years**: self-explanatory\n",
    "- **since1980**: Entire IBTrACS database since 1980 (advent of satellite era,\n",
    "          considered reliable from then on)\n",
    "\n",
    "Example: `huracanpy.load(source=\"IBTrACS\", ibtracs_subset=\"ALL\")`\n",
    "\n",
    "Note that this will fail if you are using a machine that is not currently connected to the internet. HuracanPy developers' decline all responsibility for any breach in security resulting from using this online option.\n",
    "\n",
    "`huracanpy` won't load locally saved copies of IBTrACS. We would recommend downloading once, subsetting, then saving a copy as CSV or NetCDF with `huracanpy.save`. Also note that the NetCDF files provided by IBTrACS are not (currently) compatible with `huracanpy` because the format is different."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}