6. Reformat MET Stat and TCST Data

6.1. Description

The METreformat module provides support for reformatting/rearranging MET .stat/.tcst output into METplotpy-readable input formats. The MET .stat/.tcst ASCII output generated by MET tools such as the point-stat, grid-stat, ensemble-stat, and tc-pairs tools may contain ASCII columnar data from multiple line types (determined by settings in the MET configuration file).

Currently, there are MET linetypes that do not have reformatter support. When an unsupported linetype is requested, an appropriate error message will be generated.

In the MET .stat and .tcst files, there are numerous rows of data that have a different number of columns. All MET .stat line types have common columns (refer to table 11.1 in the MET User’s Guide) that are labelled (i.e. the columns have headers/names). The remaining columns are unlabelled. When the stat output from the MET point-stat, grid-stat, ensemble-stat, or tc-pairs tools are reformatted, these unlabelled columns are rearranged into a format appropriate for the METplotpy plot of interest (based on the line_type setting in the YAML configuration file).

The format of input data used by the METplotpy plots is influenced by METviewer. When METviewer generates plots, the MET .stat input data is first loaded into the METviewer database. A database query is then performed to filter the data. The query is based on:

  • the selected values for the dependent variables (i.e. forecast variable) and the associated statistics of interest

  • variables (i.e. model and specific model values)

  • fixed values

Any requested aggregation statistics (i.e. mean, median) are calculated by invoking the METcalcpy agg_stat.py code. The format of the data from the query result may vary, based on the line type and whether aggregation statistics were calculated. METviewer invokes the appropriate METplotpy plot script using the database query results as input, and it is this format that is expected by the METplotpy scripts.

Users have the option to generate plots from the command line, by-passing METviewer and its database. However, the MET stat-analysis tool will need to be used for performing any necessary filtering of data (e.g. by any combination of times, models, regions, etc.) prior to reformatting. The METcalcpy agg_stat.py module is used after reformatting to calculate aggregation statistics (i.e. total, mean, median, confidence intervals for a specific statistic, etc.). Not all data requires the calculation of aggregation statistics (e.g. histogram plots).

Plots that are generated from the command line require a YAML configuration file. The .stat output from the point-stat, grid-stat, ensemble-stat, or tc-pairs tool must be reformatted before invoking the METplotpy scripts from the command line.

6.1.1. Description of Formats:

The reformatted data contains additional columns, based on the type of plot and line type.

Reformatting for line, bar, box, contour, performance diagram, revision box, revision series, and taylor diagram

After reformatting, the .stat data contains the following columns (replacing the unlabelled columns in the original .stat file and converting from wide to long format):

Two columns for statistics

  • stat_name

  • stat_value

Four columns for confidence levels (normal and bootstrap)

  • stat_bcl

    • upper level bootstrap confidence limit

  • stat_bcu

    • lower level bootstrap confidence limit

  • stat_ncl

    • lower level normal confidence limit

  • stat_ncu

    • upper level normal confidence limit

The corresponding line types for the above plots are as follows:

  • FHO
    • from point-stat or grid-stat

  • CNT
    • from point-stat or grid-stat

  • CTC
    • from point-stat or grid-stat

  • CTS
    • from point-stat or grid-stat

  • SL1L2
    • from point-stat or grid-stat

  • VL1L2
    • from point-stat or grid-stat

  • ECNT
    • from point-stat or ensemble-stat

  • MCTS
    • from point-stat

  • VCNT
    • from point-stat or grid-stat

Reformatting for ROC diagrams

The ROC diagram can be created from PCT line type data (from the MET point-stat, grid-stat, or ensemble-stat tools). The reformatted .stat file contains these columns, replacing the unlabelled columns and converting from wide to long format:

thresh_i
  • the ith probability threshold value (repeated)

oy_i
  • number of observations yes when forecast is between the ith and i+1th probability threshold

on_i
  • number of observations when no forecast is between the ith and i+1th

    probability threshold

i_value
  • the threshold number

The PCT line type consists of a variable number of unlabelled columns/headers corresponding to THRESH_i, OY_i, and ON_i, as described in the MET User’s Guide: https://met.readthedocs.io/en/latest/Users_Guide/. These columns corresponding to OY_1, OY_2, ,,,. OY_m (where m is the THRESH_ith value) are unlabelled when generated by the MET tool. These unlabelled columns are appropriately labelled to OY_1,…, OY_m values, ON_1, .., ON_m, and THRESH_1,…, THRESH_m. These labelled columns are then ordered into the thresh_i, oy_i, on_i, and i_value columns. The i_value column is derived from the ith value of OY, ON, and THRESH. The thresh_i column consists of the threshold values for the threshold number defined in the i_value column. The oy_i and on_i columns contain the OY_i and ON_i values from the .stat data.

Reformatting for rank histogram

The rank histogram plot can be created from the RHIST line type data (from the MET ensemble-stat tool). The reformatted .stat file contains these columns, replacing the unlabelled columns (corresponding to the variable number of columns corresponding to rank_1, …, rank_n) and converting from wide to long format:

rank_i
  • count of observations with the ith rank (repeated)

i_value
  • the rank number

Other plot types require additional special columns. Support for reformatting the remaining line types for the remaining point-stat, grid-stat, and ensemble-stat MET output will be added.

Reformatting for computing aggregation statistics with METcalcpy agg_stat

The ECNT linetype is currently the only linetype with support for reformatting the METcalcpy to be used for calculating aggregation statistics. Support for other linetypes will be added in the future.

The reformatted .stat file now replaces the unlabelled columns with the corresponding ECNT statistic values specified in Table 13.2 of the MET User’s Guide.

Reformatting for generating TCMPR plots

The TCDIAG linetype, that also has corresponding TCMPR linetype data, is reformatted by consolidating the TCMPR columns with the TCDIAG columns. The reformatted data does not need to have aggregate statistics computed and can be used by the TCMPR plots (e.g. point, boxplot) to generate time series plots and box plots. Refer to the METplotpy User’s Guide for information on using the TCMPR plotter.

Reformatting for scatter plots

The METplotpy scatter plots support plotting of any column of data in the input file against any other column (useful in visualizing relationships between variables). The MPR output is reformatted to consist of the labelled headers of the original MET output when the keep_all_mpr_cols setting is set to True. This is the format needed to generate scatter plots.

Refer to MET User’s Guide table 11.20 for more information on the MPR linetype.

NOTE: When keep_all_mpr_cols is set to False, the MPR output is reformatted to collect the MPR specific columns to the stat_name and stat_value columns, with additional confidence limit columns (stat_ncl/ncu, stat_bcl/bcu). This format is consistent with the format required for generating a line plot using METplotpy from the command line.

6.2. Required Components

Use the MET stat-analysis tool to filter data (by criteria such as model, valid times, etc.). The output from the stat-analysis tool can then be used as input to the METdataio METreformat reformatter. If filtering of data is not needed, the .stat files from the MET point-stat, grid-stat, and ensemble-stat tools can be used as input to the reformatter. If aggregation statistics are needed, then the METcalcpy agg_stat.py module can be used following the reformatting step. Reformatting to accommodate METcalcpy agg_stat is currently only available for the ECNT linetype. The input_stats_aggregated setting is used to indicate whether the reformatter needs to reformat the output for the METcalcpy agg_stat module.

METdbLoad modules are used to find and collect data from the individual .stat files into one data structure. The input .stat files must all reside under one directory. The path to this input data is specified in a YAML configuration file.

The YAML configuration file is also used to indicate the name and location of the output file, logging information (filename, log level), and the line type to read in and reformat:

Copy this custom config file from the directory where the source code was saved to the working directory.

Modify the YAML configuration file

Edit the reformat_stat.yaml config file: .. literalinclude:: ../../../METdataio/METreformat/reformat_stat.yaml

Refer to the following details for each of the mandatory settings in the configuration file.

Definition of Mandatory Config Settings
input_stats_aggregated
  • By default, this is set to True to:
    -indicate that the input data has been processed by the MET stat-analysis

    tool to calculate aggregation statistics

    or

    -if the data of interest does not require calculation of aggregation statistics. This

    reformatted data can be used as input to the appropriate METplotpy plotting script.

  • Set this to False if aggregation statistics need to be calculated by the METcalcpy agg_stat module.

input_data_dir
  • The full path (no environment variables) to the directory that contains all the input .stat files from the MET point-stat, grid-stat, or ensemble-stat tool

  • If data is distributed among numerous directories, they will need to be consolidated into one directory

output_dir
  • The full path (no environment variables) to the directory where the reformatted file will be saved

output_filename
  • The name of the output file

  • NOTE: save with .data extension if this is to be used for plotting using METplotpy

  • If reformatting is run successively without removing an existing output file of the same name, the existing file will be overwritten.

log_filename
  • The name of the log file

  • Set to STDOUT or stdout (case insensitive) if no log file is to be saved

log_dir
  • The full path to the directory (no environment variables) where the log file is to be saved

log_level
  • The verbosity of the logging: INFO, DEBUG, WARNING, ERROR

  • INFO is the most verbose, ERROR is least verbose

line_type
  • The line type to be reformatted

  • Currently supported line types are:

    • FHO

    • CNT

    • CTC

    • CTS

    • SL1L2

    • VL1L2

    • ECNT

    • MCTS

    • VCNT

    • RHIST

    • PCT

    • TCDIAG

    • MPR

##################### FOR MPR LINETYPE ONLY #####################

keep_all_mpr_cols
  • For reformatting of MPR only

  • True if reformatting for scatter plot, False otherwise

6.3. Example

  • set up a base directory, where the METdataio source code is located

bash:
export BASE_DIR=/path/to/METdataio

csh:
setenv BASE_DIR /path/to/METdataio
  • replace /path/to with an actual path

  • set up a working directory, where the YAML config file will be located

bash:
export WORKING_DIR=/path/to/working_dir

csh:
setenv WORKING_DIR /path/to/working_dir
  • NOTE: Do NOT use environment variables for /path/to, specify the actual path.

  • set the PYTHONPATH:

bash
export PYTHONPATH=$BASE_DIR:/$BASE_DIR/METdbLoad:$BASE_DIR/METdbLoad/ush:$BASE_DIR/METreformat

csh
setenv PYTHONPATH $BASE_DIR:/$BASE_DIR/METdbLoad:$BASE_DIR/METdbLoad/ush:$BASE_DIR/METreformat

6.3.1. Generate the reformatted file:

  • place the .stat/.tcst data of interest (output from MET tool) into a single directory * NOTE*: This may require reorganization of data that is distributed over numerous directories into a single directory.

  • modify the reformat_stat.yaml file, indicating the input directory, output directory, output file name, line stat to reformat, and logging settings (log level, log filename, log directory). Refer to the Definition of Configuration Settings above for a description of each setting.

python $BASE_DIR/METreformat/write_stat_ascii.py $WORKING_DIR/*line_type*_stat.yaml
  • A text file will be created in the output directory with the file name that was specified in the yaml file.