Loading tracks from a file or IBTrACS#

One of the main motivations for HuracanPy was to provide a common tool to load the tracks that come from different sources with various incompatible formats.

HuracanPy provides the load function which can be used for loading either tracks from a file on your computer, or from databases (currently, only IBTrACS). Additionally, HuracanPy embeds small data samples from various formats for examples and testing.

[1]:
import huracanpy

Loading tracks from files#

To load tracks from file, the basic syntax is huracanpy.load(filepath, source = "type-of-file"). Below we describe supported format, and potential associated additional options.

CSV#

A CSV is a compact and simple way of storing track data. Each row corresponds to a point, identified by its position in space and time. If you tracks are stored in csv (including if they were outputed from TempestExtremes’ StitchNodes), you can specify the source="csv" argument, or, if your filename ends with csv, it will be detected automatically.

huracanpy.load will read most of the CSV file as it is to output as an xarray.Dataset. There can be a few extra modifications to make sure the output has the variables track_id, time, lon, and lat. For example, in the file used here, the time variable is constructed from year, month, day, and hour.

[2]:
# HuracanPy embeds an example csv file. Here is the content of the file.
!head {huracanpy.example_csv_file}
track_id, year, month, day, hour, i, j, lon, lat, slp, zs, wind10
0, 1980, 1, 6, 6, 482, 417, 120.500000, -14.250000, 9.987638e+04, -1.071180e+01, 1.464815e+01
0, 1980, 1, 6, 12, 476, 419, 119.000000, -14.750000, 9.981100e+04, -1.610522e+01, 1.398848e+01
0, 1980, 1, 6, 18, 476, 420, 119.000000, -15.000000, 9.953694e+04, -4.020874e+01, 1.369575e+01
0, 1980, 1, 7, 0, 477, 420, 119.250000, -15.000000, 9.941456e+04, -5.043206e+01, 1.797812e+01
0, 1980, 1, 7, 6, 478, 422, 119.500000, -15.500000, 9.917319e+04, -7.229797e+01, 1.722728e+01
0, 1980, 1, 7, 12, 475, 424, 118.750000, -16.000000, 9.911612e+04, -7.817557e+01, 1.637649e+01
0, 1980, 1, 7, 18, 474, 426, 118.500000, -16.500000, 9.888225e+04, -9.791051e+01, 1.840417e+01
0, 1980, 1, 8, 0, 473, 427, 118.250000, -16.750000, 9.813756e+04, -1.645126e+02, 2.211769e+01
0, 1980, 1, 8, 6, 473, 430, 118.250000, -17.500000, 9.780431e+04, -1.938985e+02, 2.554702e+01
[3]:
file = (
    huracanpy.example_csv_file
)  # Replace with your file name if necessary (including the .csv extension)
huracanpy.load(file, source="csv")  # Load the file
[3]:
<xarray.Dataset> Size: 7kB
Dimensions:   (record: 99)
Dimensions without coordinates: record
Data variables:
    track_id  (record) int64 792B 0 0 0 0 0 0 0 0 0 0 0 ... 2 2 2 2 2 2 2 2 2 2
    i         (record) int64 792B 482 476 476 477 478 ... 229 230 234 241 249
    j         (record) int64 792B 417 419 420 420 422 ... 501 509 517 528 542
    lon       (record) float64 792B 120.5 119.0 119.0 119.2 ... 58.5 60.25 62.25
    lat       (record) float64 792B -14.25 -14.75 -15.0 ... -39.25 -42.0 -45.5
    slp       (record) float64 792B 9.988e+04 9.981e+04 ... 9.747e+04 9.754e+04
    zs        (record) float64 792B -10.71 -16.11 -40.21 ... -218.5 -211.5
    wind10    (record) float64 792B 14.65 13.99 13.7 17.98 ... 23.69 23.96 23.4
    time      (record) datetime64[ns] 792B 1980-01-06T06:00:00 ... 1980-01-30...

Advanced: You can pass arguments to pd.read_csv through load.

NetCDF#

Similar to CSV, NetCDF data can largely be loaded as is. NetCDF has the disadvantage of not being readable like a CSV, but the advantage that it can better store metadata about variables.

huracanpy.load only recognizes NetCDF files if their name ends with .nc.

HuracanPy assumes that NetCDF files follow the CF convention This allows the load function to identify the TRACK_ID and extend it along the data dimension.

Like loading CSV data, some variables are renamed. In the example the positions are longitude and latitude in the netCDF file, but are renamed to lon and lat.

NB: This supports loading NetCDF files from TRACK, CHAZ or MIt-Open.

[4]:
# HuracanPy embeds an example netcdf file. Here is the content of the file.
!ncdump -h {huracanpy.example_TRACK_netcdf_file} | head -n 30
netcdf tr_trs_pos.2day_addT63vor_addmslp_add925wind_add10mwind.tcident.new {
dimensions:
        tracks = 86 ;
        record = UNLIMITED ; // (4580 currently)
variables:
        int TRACK_ID(tracks) ;
                TRACK_ID:add_fld_num = 10 ;
                TRACK_ID:tot_add_fld_num = 30 ;
                TRACK_ID:loc_flags = 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 ;
                TRACK_ID:cf_role = "trajectory_id" ;
        int FIRST_PT(tracks) ;
        int NUM_PTS(tracks) ;
                NUM_PTS:long_name = "number of obs for this trajectory" ;
                NUM_PTS:sample_dimension = "record" ;
        int index(record) ;
        double time(record) ;
                time:standard_name = "time" ;
                time:long_name = "Time" ;
                time:units = "hours since 1979-01-01 00" ;
                time:time_calendar = " gregorian" ;
                time:start = "1979010100" ;
                time:step = "6" ;
        float longitude(record) ;
                longitude:standard_name = "longitude" ;
                longitude:long_name = "Longitude" ;
                longitude:units = "degrees_east" ;
        float latitude(record) ;
                latitude:standard_name = "latitude" ;
                latitude:long_name = "Latitude" ;
                latitude:units = "degrees_north" ;
[5]:
file = (
    huracanpy.example_TRACK_netcdf_file
)  # Replace with your file name if necessary (including the .nc extension)
huracanpy.load(
    file,
)  # Load the file
/home/docs/checkouts/readthedocs.org/user_builds/huracanpy/envs/stable/lib/python3.12/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm
[5]:
<xarray.Dataset> Size: 843kB
Dimensions:                 (tracks: 86, record: 4580)
Coordinates: (12/17)
    time                    (record) datetime64[ns] 37kB ...
    lon                     (record) float32 18kB ...
    lat                     (record) float32 18kB ...
    longitude_1             (record) float32 18kB ...
    latitude_1              (record) float32 18kB ...
    longitude_2             (record) float32 18kB ...
    ...                      ...
    longitude_5             (record) float32 18kB ...
    latitude_5              (record) float32 18kB ...
    longitude_6             (record) float32 18kB ...
    latitude_6              (record) float32 18kB ...
    longitude_7             (record) float32 18kB ...
    latitude_7              (record) float32 18kB ...
Dimensions without coordinates: tracks, record
Data variables: (12/20)
    FIRST_PT                (tracks) int32 344B ...
    index                   (record) int32 18kB ...
    relative_vorticity      (record) float64 37kB ...
    relative_vorticity_850  (record) float64 37kB ...
    relative_vorticity_700  (record) float64 37kB ...
    relative_vorticity_600  (record) float64 37kB ...
    ...                      ...
    latitude_9              (record) float32 18kB ...
    wind_speed_925          (record) float32 18kB ...
    longitude_10            (record) float32 18kB ...
    latitude_10             (record) float32 18kB ...
    wind_speed_10m          (record) float32 18kB ...
    track_id                (record) int32 18kB 840 840 840 ... 24685 24685
Attributes:
    realm:        atmos
    history:      testing
    Conventions:  CF-1.7
    featureType:  trajectory

TRACK#

TRACK is a cyclone tracker, which output text files with tracks data. Note that TRACK files don’t contain the variable names, instead they are usually described in the filename. Currently huracanpy.load doesn’t try to infer the variable names from the filename. Instead, any extra variables will be named feature_n, where n is between 0 and number of variables minus 1. TRACK also associates extra coordinates with some of these features, these will be loaded as feature_n_longitude and feature_n_latitude.

[6]:
# HuracanPy embeds an example file. Here is the content of the file.
!head {huracanpy.example_TRACK_file}
0
0 0
TRACK_NUM        2 ADD_FLD   10  30 &1111111111
TRACK_ID  840 START_TIME 2022011318
POINT_NUM  17
2022011318 284.087189 29.495327 2.272951e+00 & 2.853501e+02 & 2.937949e+01 & 3.937706e+00 & 2.847127e+02 & 3.015302e+01 & 3.712262e+00 & 2.832617e+02 & 3.014817e+01 & 5.775572e+00 & 2.823937e+02 & 2.967681e+01 & 6.245322e+00 & 2.836030e+02 & 2.968051e+01 & 7.118392e+00 & 2.849838e+02 & 3.022345e+01 & 1.004886e+01 & 2.808263e+02 & 2.867818e+01 & 1.116219e+01 & 2.854711e+02 & 2.825417e+01 & 1.012099e+03 & 2.894062e+02 & 2.964870e+01 & 1.610264e+01 & 2.896875e+02 & 2.964870e+01 & 1.122322e+01 &
2022011400 287.751404 29.994690 4.124446e+00 & 2.877488e+02 & 3.058120e+01 & 6.482213e+00 & 2.873137e+02 & 3.021548e+01 & 6.117030e+00 & 2.866389e+02 & 3.108244e+01 & 4.401947e+00 & 2.845144e+02 & 3.062476e+01 & 6.982445e+00 & 2.840860e+02 & 2.945677e+01 & 1.062076e+01 & 2.844715e+02 & 2.926452e+01 & 1.758659e+01 & 2.847920e+02 & 2.966851e+01 & 1.589604e+01 & 2.893238e+02 & 3.123128e+01 & 1.006213e+03 & 2.894062e+02 & 3.105385e+01 & 2.450230e+01 & 2.896875e+02 & 3.105385e+01 & 1.569316e+01 &
2022011406 292.468933 30.882786 5.977634e+00 & 2.926974e+02 & 3.013615e+01 & 8.458924e+00 & 2.926713e+02 & 3.173274e+01 & 7.631178e+00 & 2.935300e+02 & 3.107848e+01 & 6.291720e+00 & 2.957971e+02 & 3.023536e+01 & 4.058350e+00 & 2.962265e+02 & 3.212355e+01 & 1.384215e+00 & 2.968449e+02 & 3.463542e+01 & 4.254645e-01 & 1.000000e+25 & 1.000000e+25 & 6.851044e+00 & 2.899479e+02 & 3.291520e+01 & 9.969943e+02 & 2.936250e+02 & 2.908664e+01 & 2.588479e+01 & 2.905312e+02 & 3.302106e+01 & 1.604651e+01 &
2022011412 296.248199 33.198776 8.092249e+00 & 2.964735e+02 & 3.323745e+01 & 1.313651e+01 & 2.968956e+02 & 3.321787e+01 & 1.153349e+01 & 2.965216e+02 & 3.360655e+01 & 1.086066e+01 & 2.956291e+02 & 3.347411e+01 & 1.021852e+01 & 2.921989e+02 & 3.376152e+01 & 1.152619e+01 & 2.904768e+02 & 3.346690e+01 & 1.624540e+01 & 2.926208e+02 & 3.140471e+01 & 1.498581e+01 & 2.959172e+02 & 3.310349e+01 & 9.952847e+02 & 2.970000e+02 & 3.302106e+01 & 3.225636e+01 & 3.015000e+02 & 3.498828e+01 & 1.922561e+01 &
2022011418 293.284149 38.289780 1.074499e+01 & 2.928845e+02 & 3.866169e+01 & 1.533360e+01 & 2.929006e+02 & 3.895190e+01 & 1.420384e+01 & 2.963961e+02 & 3.833500e+01 & 1.280716e+01 & 2.972124e+02 & 3.799892e+01 & 1.138840e+01 & 2.970977e+02 & 3.683564e+01 & 1.150583e+01 & 2.929781e+02 & 3.593739e+01 & 1.660423e+01 & 2.910842e+02 & 3.770803e+01 & 8.740053e+00 & 2.933201e+02 & 3.717303e+01 & 9.837216e+02 & 2.995312e+02 & 3.779858e+01 & 4.481586e+01 & 2.998125e+02 & 3.779858e+01 & 2.399883e+01 &
[7]:
file = huracanpy.example_TRACK_file  # Replace with your file name if necessary
huracanpy.load(file, source="TRACK")
[7]:
<xarray.Dataset> Size: 13kB
Dimensions:        (record: 46)
Dimensions without coordinates: record
Data variables: (12/35)
    track_id       (record) int64 368B 840 840 840 840 ... 1601 1601 1601 1601
    time           (record) datetime64[s] 368B 2022-01-13T18:00:00 ... 2022-0...
    lon            (record) float64 368B 284.1 287.8 292.5 ... 114.4 115.8 116.4
    lat            (record) float64 368B 29.5 29.99 30.88 ... 16.3 17.69 17.81
    vorticity      (record) float64 368B 2.273 4.124 5.978 ... 3.062 2.445 1.589
    feature_0_lon  (record) float64 368B 285.4 287.7 292.7 ... 114.9 116.0 116.2
    ...             ...
    feature_8_lon  (record) float64 368B 289.4 289.4 293.6 ... 115.0 118.4 117.3
    feature_8_lat  (record) float64 368B 29.65 31.05 29.09 ... 17.0 20.09 19.53
    feature_8      (record) float64 368B 16.1 24.5 25.88 ... 16.93 18.84 17.18
    feature_9_lon  (record) float64 368B 289.7 289.7 290.5 ... 114.8 115.9 121.2
    feature_9_lat  (record) float64 368B 29.65 31.05 33.02 ... 17.56 18.13 21.78
    feature_9      (record) float64 368B 11.22 15.69 16.05 ... 13.9 14.0 13.73

If you want to load the variables by name, then pass a list of variable names to huracanpy.load. The associated longitudes/latitudes are associated to the respective feature names.

[8]:
file = huracanpy.example_TRACK_file
variable_names = [
    *[f"vorticity_{n}hpa" for n in [850, 700, 600, 500, 400, 300, 200]],
    "mslp",
    "vmax_925hpa",
    "vmax_10m",
]
tracks = huracanpy.load(file, source="TRACK", variable_names=variable_names)
tracks
[8]:
<xarray.Dataset> Size: 13kB
Dimensions:               (record: 46)
Dimensions without coordinates: record
Data variables: (12/35)
    track_id              (record) int64 368B 840 840 840 840 ... 1601 1601 1601
    time                  (record) datetime64[s] 368B 2022-01-13T18:00:00 ......
    lon                   (record) float64 368B 284.1 287.8 ... 115.8 116.4
    lat                   (record) float64 368B 29.5 29.99 30.88 ... 17.69 17.81
    vorticity             (record) float64 368B 2.273 4.124 ... 2.445 1.589
    vorticity_850hpa_lon  (record) float64 368B 285.4 287.7 ... 116.0 116.2
    ...                    ...
    vmax_925hpa_lon       (record) float64 368B 289.4 289.4 ... 118.4 117.3
    vmax_925hpa_lat       (record) float64 368B 29.65 31.05 ... 20.09 19.53
    vmax_925hpa           (record) float64 368B 16.1 24.5 25.88 ... 18.84 17.18
    vmax_10m_lon          (record) float64 368B 289.7 289.7 ... 115.9 121.2
    vmax_10m_lat          (record) float64 368B 29.65 31.05 ... 18.13 21.78
    vmax_10m              (record) float64 368B 11.22 15.69 16.05 ... 14.0 13.73

The TRACK file contains vorticity on multiple levels. While xarray can allow these vorticity profiles to be a multidimensional variable, currently each level is loaded in as a separate variable. To group the variables into one variable you can follow the example below. In the future we will support this grouping in the load function

[9]:
import numpy as np

# Add a pressure as a coordinate (add as a variable and then promote)
tracks["pressure"] = ("pressure", [850, 700, 600, 500, 400, 300, 200])
tracks = tracks.set_coords("pressure")

# Group the various vorticity levels into a single variable
# Do the same for the associated lon/lat
vorticity = np.zeros([tracks.sizes["record"], tracks.sizes["pressure"]])
vorticity_lon = np.zeros_like(vorticity)
vorticity_lat = np.zeros_like(vorticity)
for n, plev in enumerate(tracks.pressure):
    # Use the naming specified when loading the variables
    # Use int(plev) otherwise the whole xarray DataArray details are included in the
    # string
    name = f"vorticity_{int(plev)}hpa"
    vorticity[:, n] = tracks[name]
    vorticity_lon[:, n] = tracks[name + "_lon"]
    vorticity_lat[:, n] = tracks[name + "_lat"]

    # Remove the old variables
    tracks = tracks.drop_vars([name, name + "_lon", name + "_lat"])

# Use the name vorticity
tracks = tracks.assign(
    relative_vorticity=(["record", "pressure"], vorticity),
    relative_vorticity_lon=(["record", "pressure"], vorticity_lon),
    relative_vorticity_lat=(["record", "pressure"], vorticity_lat),
)
tracks
[9]:
<xarray.Dataset> Size: 13kB
Dimensions:                 (record: 46, pressure: 7)
Coordinates:
  * pressure                (pressure) int64 56B 850 700 600 500 400 300 200
Dimensions without coordinates: record
Data variables: (12/17)
    track_id                (record) int64 368B 840 840 840 ... 1601 1601 1601
    time                    (record) datetime64[s] 368B 2022-01-13T18:00:00 ....
    lon                     (record) float64 368B 284.1 287.8 ... 115.8 116.4
    lat                     (record) float64 368B 29.5 29.99 ... 17.69 17.81
    vorticity               (record) float64 368B 2.273 4.124 ... 2.445 1.589
    mslp_lon                (record) float64 368B 285.5 289.3 ... 116.2 117.9
    ...                      ...
    vmax_10m_lon            (record) float64 368B 289.7 289.7 ... 115.9 121.2
    vmax_10m_lat            (record) float64 368B 29.65 31.05 ... 18.13 21.78
    vmax_10m                (record) float64 368B 11.22 15.69 ... 14.0 13.73
    relative_vorticity      (record, pressure) float64 3kB 3.938 ... -0.3459
    relative_vorticity_lon  (record, pressure) float64 3kB 285.4 284.7 ... 123.3
    relative_vorticity_lat  (record, pressure) float64 3kB 29.38 30.15 ... 18.58

TRACK tilt files#

TRACK has a program to calculate the tilt of the vortex at each point. The output of this program is different to normal track files so you need to specify source="track.tilt" to load it. The pressure levels are included in the file so, unlike other TRACK files, the multiple levels of tilt are combined to a single variable with an extra level coordinate in the load function.

[10]:
!head {huracanpy.example_TRACK_tilt_file}
NTRACK 2 NFIELD 7
LEVELS 850.000000 700.000000 600.000000 500.000000 400.000000 300.000000 200.000000
TRACK_NO 1 NUMPT 17
2022011318 1.065571e+00 6.454832e-01 -5.915490e-01 -1.416936e+00 -3.827309e-01 8.881921e-01 -2.951341e+00
2022011400 1.348074e-01 -3.160709e-01 -6.714730e-01 -2.552225e+00 -3.216630e+00 -2.942914e+00 -2.568547e+00
2022011406 -2.896553e-01 6.476268e-01 8.471107e-01 1.938625e+00 3.322993e+00 5.177341e+00 1.000000e+25
2022011412 -4.257355e-02 -2.051671e-01 2.773108e-01 4.657161e-01 1.965898e+00 2.360386e+00 -3.048034e-01
2022011418 1.784784e-01 4.400532e-01 1.241196e+00 1.280085e+00 2.336322e-01 -2.185304e+00 -1.323916e+00
2022011500 2.369373e-01 -8.590921e-02 1.656362e-01 5.434687e-02 -1.076908e-01 -8.658077e-01 7.362208e-01
2022011506 3.424833e-02 -8.671419e-02 5.183957e-02 -7.561980e-02 -1.818331e-01 -1.769198e-01 1.000000e+25
[11]:
huracanpy.load(huracanpy.example_TRACK_tilt_file, source="track.tilt")
[11]:
<xarray.Dataset> Size: 3kB
Dimensions:   (record: 46, levels: 7)
Coordinates:
    pressure  (levels) float64 56B 850.0 700.0 600.0 500.0 400.0 300.0 200.0
Dimensions without coordinates: record, levels
Data variables:
    tilt      (record, levels) float64 3kB 1.066 0.6455 -0.5915 ... 9.242 6.501
    track_id  (record) int64 368B 1 1 1 1 1 1 1 1 1 1 1 ... 2 2 2 2 2 2 2 2 2 2
    time      (record) datetime64[s] 368B 2022-01-13T18:00:00 ... 2022-01-29T...

TempestExtremes/GFDL textual format#

TempestExtremes & GFDL also has their own textual format. Note however that TempestExtremes’ StitchNodes in particular can output csv and we recommend that option.

Variable names: These files can be read with HuracanPy specifying source="te". Because the file themselves do not embed variable names, you may pass them with variable_names.

Tracks from unstructured grid: By default, HuracanPy assumes that your file comes from tracking structured data, hence has two grid indices i and j. If this is not the case (i.e. file comes from tracking unstructured data), then you need to specify tempest_extremes_unstructured=True so that only one index i is read.

Line starting keyword: Finally, if the line starting keyword is not “start”, you can specify it with tempest_extremes_header_str

[12]:
# HuracanPy embeds an example GFDL format file. Here is the content of the file.
!head {huracanpy.example_TE_file}
start   33      1999    9       10      0
        605     36      302.500000      18.000000       1.003906e+05    1.918985e+01    1999    9       10      0
        604     37      302.000000      18.500000       1.002089e+05    2.398492e+01    1999    9       10      6
        602     39      301.000000      19.500000       1.002611e+05    2.235224e+01    1999    9       10      12
        601     40      300.500000      20.000000       1.000482e+05    2.690826e+01    1999    9       10      18
        599     41      299.500000      20.500000       9.994830e+04    2.750533e+01    1999    9       11      0
        598     42      299.000000      21.000000       9.961060e+04    3.118364e+01    1999    9       11      6
        597     43      298.500000      21.500000       9.947160e+04    3.219960e+01    1999    9       11      12
        594     45      297.000000      22.500000       9.921680e+04    3.460081e+01    1999    9       11      18
        592     45      296.000000      22.500000       9.909650e+04    3.649150e+01    1999    9       12      0
[13]:
file = huracanpy.example_TE_file  # Replace with your file name if necessary
huracanpy.load(file, source="te")
[13]:
<xarray.Dataset> Size: 13kB
Dimensions:    (record: 210)
Dimensions without coordinates: record
Data variables:
    track_id   (record) int64 2kB 0 0 0 0 0 0 0 0 0 0 0 ... 7 7 7 7 7 7 7 7 7 7
    i          (record) int64 2kB 605 604 602 601 599 ... 237 237 237 236 235
    j          (record) int64 2kB 36 37 39 40 41 42 43 ... 42 44 45 45 47 48 48
    lon        (record) float64 2kB 302.5 302.0 301.0 ... 118.5 118.0 117.5
    lat        (record) float64 2kB 18.0 18.5 19.5 20.0 ... 22.5 23.5 24.0 24.0
    feature_0  (record) float64 2kB 1.004e+05 1.002e+05 ... 1.002e+05 1.004e+05
    feature_1  (record) float64 2kB 19.19 23.98 22.35 26.91 ... 28.46 29.49 24.0
    time       (record) datetime64[ns] 2kB 1999-09-10 ... 1999-10-09T06:00:00
[14]:
# Providing names
file = huracanpy.example_TE_file
variable_names = ["slp", "wind10"]
huracanpy.load(file, source="te", variable_names=variable_names)
[14]:
<xarray.Dataset> Size: 13kB
Dimensions:   (record: 210)
Dimensions without coordinates: record
Data variables:
    track_id  (record) int64 2kB 0 0 0 0 0 0 0 0 0 0 0 ... 7 7 7 7 7 7 7 7 7 7 7
    i         (record) int64 2kB 605 604 602 601 599 598 ... 237 237 237 236 235
    j         (record) int64 2kB 36 37 39 40 41 42 43 ... 42 44 45 45 47 48 48
    lon       (record) float64 2kB 302.5 302.0 301.0 300.5 ... 118.5 118.0 117.5
    lat       (record) float64 2kB 18.0 18.5 19.5 20.0 ... 22.5 23.5 24.0 24.0
    slp       (record) float64 2kB 1.004e+05 1.002e+05 ... 1.002e+05 1.004e+05
    wind10    (record) float64 2kB 19.19 23.98 22.35 26.91 ... 28.46 29.49 24.0
    time      (record) datetime64[ns] 2kB 1999-09-10 ... 1999-10-09T06:00:00

“Old HURDAT”/ECMWF#

The ECMWF uses track files which format is based on the “old HURDAT” format.

[15]:
file = huracanpy.example_old_HURDAT_file
[16]:
huracanpy.load(
    file,
    source="ecmwf",
)
[16]:
<xarray.Dataset> Size: 12kB
Dimensions:   (record: 183)
Dimensions without coordinates: record
Data variables:
    track_id  (record) int64 1kB 1 1 1 1 1 2 2 2 2 ... 29 29 29 29 29 29 29 29
    time      (record) datetime64[ns] 1kB 2024-05-02T12:00:00 ... 2024-11-16T...
    lat       (record) float64 1kB 20.9 21.7 22.3 25.9 ... 21.2 21.4 21.4 22.0
    lon       (record) float64 1kB 290.2 291.0 292.7 301.5 ... 277.8 280.1 281.8
    wind      (record) int64 1kB 33 33 35 46 46 50 49 ... 36 39 30 37 28 25 29
    pres      (record) int64 1kB 1009 1006 1004 1004 ... 1000 1002 1004 1008
    lat_wind  (record) float64 1kB 22.0 22.3 22.3 25.9 ... 20.9 20.6 20.9 22.9
    lon_wind  (record) float64 1kB 289.4 290.2 293.3 301.7 ... 277.3 280.9 281.2

STORM (No explicit track ID)#

Files output from STORM (Synthetic Tropical cyclOne geneRation Model) are in a CSV format but don’t have a track_id. Instead they have a storm number that starts from zero for each year. From the combination of these two variables we can infer a track_id for each unique track.

[17]:
# HuracanPy embeds an example STORM file. Here is the content of the file.
!head {huracanpy.example_STORM_file}
    0,    9,2001-09-24 03:00:00,    0,    0,    1,8.299999999999997,325.8,998.7605732778942,15.844888,46.29999923706055,    0,  0.0,1258.2188075284528
    0,    9,2001-09-24 06:00:00,    0,    1,    1,  8.3,324.8,998.1, 16.4,45.755293397342456,    0,  0.0,1347.9805312815295
    0,    9,2001-09-24 09:00:00,    0,    2,    1,  8.3,323.8,996.0, 18.0,45.210587557624365,    0,  0.0,1308.4863632066067
    0,    9,2001-09-24 12:00:00,    0,    3,    1,  8.4,322.8,993.7, 19.8,44.66588171790628,    0,  0.0,1288.1054940182134
    0,    9,2001-09-24 15:00:00,    0,    4,    1,  8.6,321.7,990.7, 21.9,44.12117587818819,    0,  0.0,1286.0036278442199
    0,    9,2001-09-24 18:00:00,    0,    5,    1,  8.9,320.7,988.1, 23.8,43.5764700384701,    0,  0.0,1290.444281941886
    0,    9,2001-09-24 21:00:00,    0,    6,    1,  9.2,319.5,985.6, 25.5,43.03176419875201,    0,  0.0,1262.7522879141518
    0,    9,2001-09-25 00:00:00,    0,    7,    1,  9.8,318.4,984.2, 26.4,42.487058359033924,    0,  0.0,1233.5237417749183
    0,    9,2001-09-25 03:00:00,    0,    8,    1, 10.4,317.3,983.4, 26.9,41.94235251931583,    0,  0.0,1168.4006828093866
    0,    9,2001-09-25 06:00:00,    0,    9,    1, 11.0,316.2,981.0, 28.5,41.39764667959774,    0,  0.0,1116.2004200095419
[18]:
# The CSV does not have a header so we need to specify the names of the columns
# Note that these are passed to huracanpy.load as names=, not variable_names=, because
# we are passing them to pandas.load_csv
names = [
    "year",
    "month",
    "time",
    "TC_num",
    "timeStep",
    "basinID",
    "lat",
    "lon",
    "minP",
    "Vmax",
    "Rmax",
    "cat",
    "landfall",
    "dist_land",
]

# The filename does not end with CSV, so we specify the source as CSV
# The variable names are converted to lower case wehn loading from a CSV, so specify
# them as lower case to `infer_track_id`
huracanpy.load(
    huracanpy.example_STORM_file,
    names=names,
    source="csv",
    infer_track_id=["year", "tc_num"],
)
[18]:
<xarray.Dataset> Size: 469kB
Dimensions:    (record: 3909)
Dimensions without coordinates: record
Data variables: (12/15)
    year       (record) int64 31kB 0 0 0 0 0 0 0 0 0 0 0 ... 9 9 9 9 9 9 9 9 9 9
    month      (record) int64 31kB 9 9 9 9 9 9 9 9 9 9 ... 10 10 9 9 9 9 9 9 9
    time       (record) datetime64[s] 31kB 2001-09-24T03:00:00 ... 2001-09-30...
    tc_num     (record) int64 31kB 0 0 0 0 0 0 0 0 0 ... 18 19 19 19 19 19 19 19
    timestep   (record) int64 31kB 0 1 2 3 4 5 6 7 8 9 ... 8 9 10 0 1 2 3 4 5 6
    basinid    (record) int64 31kB 1 1 1 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1 1 1
    ...         ...
    vmax       (record) float64 31kB 15.84 16.4 18.0 19.8 ... 15.9 15.9 12.8
    rmax       (record) float64 31kB 46.3 45.76 45.21 ... 233.4 338.9 444.5
    cat        (record) int64 31kB 0 0 0 0 0 0 0 0 0 0 ... 0 0 -1 0 0 0 0 0 0 -1
    landfall   (record) float64 31kB 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0
    dist_land  (record) float64 31kB 1.258e+03 1.348e+03 ... 838.1 889.0
    track_id   (record) int64 31kB 0 0 0 0 0 0 0 ... 125 125 125 125 125 125 125

Loading IBTrACS#

The International Best Track Archive for Climate Stewardship (IBTrACS) is a reference observational dataset.

HuracanPy embeds two subsets of IBTrACS for offline use, and can also retrieve the latest online version. They can be loaded with huracanpy.load(source="ibtracs") without specifying a filename.

NB: A warning will be raised when you load the data to remind you of the main caveats.

Offline subsets#

By default, HuracanPy will use the offline option. Two subsets of IBTrACS for offline use:

  • “WMO”: Data with the wmo_* variables. The data as reported by the WMO agency responsible for each basin. (Default)

  • “JTWC”: Data with the usa_* variables. The data as recorded by the USA/Joint Typhoon Warning Centre.

NB: These offline files are updated manually by the developers. As such, they may not correspond to the latest versions. If you want the latest version and/or more columns, use the online option below.

[19]:
huracanpy.load(source="ibtracs", ibtracs_subset="wmo")  # WMO subset
/home/docs/checkouts/readthedocs.org/user_builds/huracanpy/envs/stable/lib/python3.12/site-packages/huracanpy/_data/ibtracs.py:119: UserWarning: This offline function loads a light version of IBTrACS which is embedded within the package, based on a file produced manually by the developers.
 It was last updated on the 15th Nov 2024, based on the IBTrACS file at that date.
 It contains only data from 1980 up to the last year with no provisional tracks. All spur tracks were removed. Only 6-hourly time steps were kept.
  warnings.warn(
/home/docs/checkouts/readthedocs.org/user_builds/huracanpy/envs/stable/lib/python3.12/site-packages/huracanpy/_data/ibtracs.py:128: UserWarning: You are loading the IBTrACS-WMO subset. This dataset contains the positions and intensity reported by the WMO agency responsible for each basin
 Be aware of the fact that wind and pressure data is provided as they are in IBTrACS, which means in particular that wind speeds are in knots and averaged over different time periods.
 For more information, see the IBTrACS column documentation at https://www.ncei.noaa.gov/sites/default/files/2021-07/IBTrACS_v04_column_documentation.pdf
  warnings.warn(
[19]:
<xarray.Dataset> Size: 15MB
Dimensions:   (record: 143287)
Dimensions without coordinates: record
Data variables:
    track_id  (record) <U13 7MB '1980001S13173' ... '2022356N09085'
    season    (record) int64 1MB 1980 1980 1980 1980 ... 2022 2022 2022 2022
    basin     (record) <U2 1MB 'SP' 'SP' 'SP' 'SP' 'SP' ... 'NI' 'NI' 'NI' 'NI'
    time      (record) datetime64[s] 1MB 1980-01-01 ... 2022-12-25T06:00:00
    lon       (record) float64 1MB 172.5 172.4 172.5 172.8 ... 82.5 82.2 81.6
    lat       (record) float64 1MB -12.5 -11.9 -11.5 -11.2 ... 9.7 9.3 9.0 8.5
    wind      (record) float64 1MB nan nan nan nan 30.0 ... 25.0 25.0 25.0 25.0
    slp       (record) float64 1MB nan nan nan ... 1.004e+03 1.004e+03 1.004e+03
[20]:
huracanpy.load(source="ibtracs", ibtracs_subset="jtwc")  # JTWC subset
/home/docs/checkouts/readthedocs.org/user_builds/huracanpy/envs/stable/lib/python3.12/site-packages/huracanpy/_data/ibtracs.py:119: UserWarning: This offline function loads a light version of IBTrACS which is embedded within the package, based on a file produced manually by the developers.
 It was last updated on the 15th Nov 2024, based on the IBTrACS file at that date.
 It contains only data from 1980 up to the last year with no provisional tracks. All spur tracks were removed. Only 6-hourly time steps were kept.
  warnings.warn(
[20]:
<xarray.Dataset> Size: 15MB
Dimensions:   (record: 121806)
Dimensions without coordinates: record
Data variables:
    track_id  (record) <U13 6MB '1980001S13173' ... '2022347N12074'
    season    (record) int64 974kB 1980 1980 1980 1980 ... 2022 2022 2022 2022
    basin     (record) <U2 974kB 'SP' 'SP' 'SP' 'SP' ... 'NI' 'NI' 'NI' 'NI'
    time      (record) datetime64[s] 974kB 1980-01-01 ... 2022-12-19
    lat       (record) float64 974kB -12.5 -11.9 -11.5 -11.2 ... 13.1 12.0 11.6
    lon       (record) float64 974kB 172.5 172.4 172.5 172.8 ... 56.4 55.8 55.4
    status    (record) object 974kB nan nan nan nan nan ... 'TD' 'TD' 'TD' 'DB'
    wind      (record) float64 974kB 25.0 25.0 25.0 30.0 ... 25.0 25.0 25.0 20.0
    slp       (record) float64 974kB nan nan nan ... 1.01e+03 1.01e+03 1.012e+03
    sshs_cat  (record) int64 974kB -1 -1 -1 -1 -1 -1 0 ... -1 -1 -1 -1 -1 -1 -3

Online subsets#

You can download the latest IBTrACS subsets from NOAA’s storage by selecting specific subsets. In this case the ibtracs_subset refers to the official IBTrACS subsets:

  • ACTIVE: TCs currently active

  • ALL: Entire IBTrACS database

  • Specific basins: EP, NA, NI, SA, SI, SP, WP

  • last3years: self-explanatory

  • since1980: Entire IBTrACS database since 1980 (advent of satellite era, considered reliable from then on)

Example: huracanpy.load(source="IBTrACS", ibtracs_subset="ALL")

Note that this will fail if you are using a machine that is not currently connected to the internet. HuracanPy developers’ decline all responsibility for any breach in security resulting from using this online option.

huracanpy won’t load locally saved copies of IBTrACS. We would recommend downloading once, subsetting, then saving a copy as CSV or NetCDF with huracanpy.save. Also note that the NetCDF files provided by IBTrACS are not (currently) compatible with huracanpy because the format is different.