7. Reformat MET Stat Data

7.1. Description

The METreformat module provides support for reformatting/rearranging MET .stat output into METplotpy-readable input formats. The MET .stat ASCII output generated by MET tools such as the point-stat, grid-stat, and ensemble-stat tools can contain ASCII columnar data from multiple line types (determined by settings in the MET configuration file). In addition, MET .tcst ASCII output generated by the MET tc-pairs tool can be reformatted.

As a result, there are numerous rows of data that do not have the same number of columns. All MET .stat line types have some columns in common, all of which are labelled (i.e. the columns have headers/names). The remaining columns are unlabelled. When the stat output from the MET point-stat, grid-stat, or ensemble-stat tools are aggregated using the MET stat-analysis tools, these unlabelled columns need to be rearranged from a wide format (numerous columns) to a long format (fewer columns but more rows).

Initially, METviewer was used to generate plots using MET output data. This was accomplished by loading the MET .stat files into the METviewer database, then performing a query to subset the data of interest, followed by calculating aggregation statistics. The format of the data from the query result may vary, based on line type and the requested plot type. METviewer invokes the appropriate METplotpy plot script using the results from the database query. Each METplotpy plotting script requires input data in a specific format.

Users now have the option to generate plots without the use of METviewer and its database. However, the MET stat-analysis tool will need to be used for calculating aggregation statistics prior to reformatting, if aggregation statistics are to be plotted. Plots can be generated by METplotpy from the command line with a YAML configuration file. However, the format of the input must match what is expected by the plot type, based on the line type of the input data. Any .stat output from the point-stat, grid-stat, or ensemble-stat tool will need to be reformatted before using the METplotpy scripts from the command line.

7.1.1. Description of Formats:

The reformatted data contains additional columns, based on the type of plot and line type.

Reformatting for line, bar, box, contour, performance diagram, revision box, revision series, and taylor diagram

After reformatting, the .stat data contains the following columns (replacing the unlabelled columns in the original .stat file and converting from wide to long format):

Two columns for statistics

  • stat_name

  • stat_value

Four columns for confidence levels (normal and bootstrap)

  • stat_bcl

    • upper level bootstrap confidence limit

  • stat_bcu

    • lower level bootstrap confidence limit

  • stat_ncl

    • lower level normal confidence limit

  • stat_ncu

    • upper level normal confidence limit

The corresponding line types for the above plots are as follows:

  • FHO
    • from point-stat or grid-stat

  • CNT
    • from point-stat or grid-stat

  • CTC
    • from point-stat or grid-stat

  • CTS
    • from point-stat or grid-stat

  • SL1L2
    • from point-stat or grid-stat

  • VL1L2
    • from point-stat or grid-stat

  • ECNT
    • from point-stat or ensemble-stat

  • MCTS
    • from point-stat

  • VCNT
    • from point-stat or grid-stat

Reformatting for ROC diagrams

The ROC diagram can be created from PCT line type data (from the MET point-stat, grid-stat, or ensemble-stat tools). The reformatted .stat file contains these columns, replacing the unlabelled columns and converting from wide to long format:

thresh_i
  • the ith probability threshold value (repeated)

oy_i
  • number of observations yes when forecast is between the ith and i+1th probability threshold

on_i
  • number of observations when no forecast is between the ith and i+1th

    probability threshold

i_value
  • the threshold number

The PCT line type consists of a variable number of unlabelled columns/headers corresponding to THRESH_i, OY_i, and ON_i, as described in the MET User’s Guide: https://met.readthedocs.io/en/latest/Users_Guide/. These columns corresponding to OY_1, OY_2, ,,,. OY_m (where m is the THRESH_ith value) are unlabelled when generated by the MET tool. These unlabelled columns are appropriately labelled to OY_1,…, OY_m values, ON_1, .., ON_m, and THRESH_1,…, THRESH_m. These labelled columns are then ordered into the thresh_i, oy_i, on_i, and i_value columns. The i_value column is derived from the ith value of OY, ON, and THRESH. The thresh_i column consists of the threshold values for the threshold number defined in the i_value column. The oy_i and on_i columns contain the OY_i and ON_i values from the .stat data.

Reformatting for rank histogram

The rank histogram plot can be created from the RHIST line type data (from the MET ensemble-stat tool). The reformatted .stat file contains these columns, replacing the unlabelled columns (corresponding to the variable number of columns corresponding to rank_1, …, rank_n) and converting from wide to long format:

rank_i
  • count of observations with the ith rank (repeated)

i_value
  • the rank number

Other plot types require additional special columns. Support for reformatting the remaining line types for the remaining point-stat, grid-stat, and ensemble-stat MET output will be added.

Reformatting for computing aggregation statistics with METcalcpy agg_stat

The ECNT linetype is currently the only linetype with support for reformatting for METcalcpy aggregation statistics input. Support for other linetypes is forthcoming.

The reformatted .stat file now replaces the unlabelled columns with the corresponding ECNT statistic values specified in Table 13.2 of the MET User’s Guide.

Reformatting for generating TCMPR plots The TCDIAG linetype, that also has corresponding TCMPR linetype data, is reformatted by consolidating the TCMPR columns with the TCDIAG columns. The reformatted data does not need to have aggregate statistics computed and can be used by the TCMPR plots (e.g. point, boxplot) to generate time series plots and box plots. Refer to the METplotpy User’s Guide for information on using the TCMPR plotter.

NOTE: An appropriate error message will appear when attempting to reformat an unsupported linetype for aggregation input.

7.2. Required Components

Use the MET stat-analysis tool to calculate the aggregation statistics as input to the reformatter (this is only needed if aggregation statistics are to be plotted). Otherwise, the .stat files from the MET point-stat, grid-stat, and ensemble-stat tools can be used as input to the reformatter. For the ECNT linetype, the reformatter can reformat the .stat input to be used by the METcalcpy agg_stat module to calculate aggregation statistics.

METdbLoad modules are used to find and collect data from the individual .stat files into one data structure. The input .stat files must all reside under one directory. The path to this input data is specified in a YAML configuration file.

The YAML configuration file is also used to indicate the name and location of the output file, logging information (filename, log level), and the line type to read in and reformat:

Copy this custom config file from the directory where the source code was saved to the working directory.

Modify the YAML configuration file

Edit the reformat_stat.yaml config file: .. literalinclude:: ../../../METdataio/METreformat/reformat_stat.yaml

Refer to the following details for each of the mandatory settings in the configuration file.

Definition of Mandatory Config Settings
input_stats_aggregated
  • By default, this is set to True to indicate that the input data has been processed by the MET stat-analysis tool to calculate aggregation statistics.

  • Leave this set to True if the data of interest does not require calculation of aggregation statistics. This reformatted data can be used as input to the appropriate METplotpy plotting script.

  • Set this to False if aggregation statistics will be calculated by the METcalcpy agg_stat module.

input_data_dir
  • The full path (no environment variables) to the directory that contains all the input .stat files from the MET point-stat, grid-stat, or ensemble-stat tool

  • If data is distributed among numerous directories, they will need to be consolidated into one directory

output_dir
  • The full path (no environment variables) to the directory where the reformatted file will be saved

output_filename
  • The name of the output file

  • NOTE: save with .data extension if this is to be used for plotting using METplotpy

  • If reformatting is run successively without removing an existing output file of the same name, the existing file will be overwritten.

log_filename
  • The name of the log file

  • Set to STDOUT or stdout (case insensitive) if no log file is to be saved

log_dir
  • The full path to the directory (no environment variables) where the log file is to be saved

log_level
  • The verbosity of the logging: INFO, DEBUG, WARNING, ERROR

  • INFO is the most verbose, ERROR is least verbose

line_type
  • The line type to be reformatted

  • Currently supported line types are:

    • FHO

    • CNT

    • CTC

    • CTS

    • SL1L2

    • VL1L2

    • ECNT

    • MCTS

    • VCNT

    • RHIST

    • PCT

    • TCDIAG

7.3. Example

  • set up a base directory, where the METdataio source code is located

bash:
export BASE_DIR=/path/to/METdataio

csh:
setenv BASE_DIR /path/to/METdataio
  • replace /path/to with an actual path

  • set up a working directory, where the YAML config file will be located

bash:
export WORKING_DIR=/path/to/working_dir

csh:
setenv WORKING_DIR /path/to/working_dir
  • NOTE: Do NOT use environment variables for /path/to, specify the actual path.

  • set the PYTHONPATH:

bash
export PYTHONPATH=$BASE_DIR:/$BASE_DIR/METdbLoad:$BASE_DIR/METdbLoad/ush:$BASE_DIR/METreformat

csh
setenv PYTHONPATH $BASE_DIR:/$BASE_DIR/METdbLoad:$BASE_DIR/METdbLoad/ush:$BASE_DIR/METreformat

7.3.1. Generate the reformatted file:

  • place the .stat data of interest (output from MET tool) into a single directory * NOTE*: This may require reorganization of data that is distributed over numerous directories into a single directory.

  • modify the reformat_stat.yaml file, indicating the input directory, output directory, output file name, line stat to reformat, and logging settings (log level, log filename, log directory). Refer to the Definition of Configuration Settings above for a description of each setting.

python $BASE_DIR/METreformat/write_stat_ascii.py $WORKING_DIR/*line_type*_stat.yaml
  • A text file will be created in the output directory with the file name that was specified in the yaml file.