6. Reformat MET Stat and TCST Data
6.1. Description
The METreformat module provides support for reformatting/rearranging MET .stat/.tcst output into METplotpy-readable input formats. The MET .stat/.tcst ASCII output generated by MET tools such as the point-stat, grid-stat, ensemble-stat, and tc-pairs tools may contain ASCII columnar data from multiple line types (determined by settings in the MET configuration file).
Currently, there are MET linetypes that do not have reformatter support. When an unsupported linetype is requested, an appropriate error message will be generated.
In the MET .stat and .tcst files, there are numerous rows of data that have a different number of columns. All MET .stat line types have common columns (refer to table 11.1 in the MET User’s Guide) that are labelled (i.e. the columns have headers/names). The remaining columns are unlabelled. When the stat output from the MET point-stat, grid-stat, ensemble-stat, or tc-pairs tools are reformatted, these unlabelled columns are rearranged into a format appropriate for the METplotpy plot of interest (based on the line_type setting in the YAML configuration file).
The format of input data used by the METplotpy plots is influenced by METviewer. When METviewer generates plots, the MET .stat input data is first loaded into the METviewer database. A database query is then performed to filter the data. The query is based on:
the selected values for the dependent variables (i.e. forecast variable) and the associated statistics of interest
variables (i.e. model and specific model values)
fixed values
Any requested aggregation statistics (i.e. mean, median) are calculated by invoking the METcalcpy agg_stat.py code. The format of the data from the query result may vary, based on the line type and whether aggregation statistics were calculated. METviewer invokes the appropriate METplotpy plot script using the database query results as input, and it is this format that is expected by the METplotpy scripts.
Users have the option to generate plots from the command line, by-passing METviewer and its database. However, the MET stat-analysis tool will need to be used for performing any necessary filtering of data (e.g. by any combination of times, models, regions, etc.) prior to reformatting. The METcalcpy agg_stat.py module is used after reformatting to calculate aggregation statistics (i.e. total, mean, median, confidence intervals for a specific statistic, etc.). Not all data requires the calculation of aggregation statistics (e.g. histogram plots).
Plots that are generated from the command line require a YAML configuration file. The .stat output from the point-stat, grid-stat, ensemble-stat, or tc-pairs tool must be reformatted before invoking the METplotpy scripts from the command line.
6.1.1. Description of Formats:
The reformatted data contains additional columns, based on the type of plot and line type.
Reformatting for line, bar, box, contour, performance diagram, revision box, revision series, and taylor diagram
After reformatting, the .stat data contains the following columns (replacing the unlabelled columns in the original .stat file and converting from wide to long format):
Two columns for statistics
stat_name
stat_value
Four columns for confidence levels (normal and bootstrap)
stat_bcl
upper level bootstrap confidence limit
stat_bcu
lower level bootstrap confidence limit
stat_ncl
lower level normal confidence limit
stat_ncu
upper level normal confidence limit
The corresponding line types for the above plots are as follows:
- FHO
from point-stat or grid-stat
- CNT
from point-stat or grid-stat
- CTC
from point-stat or grid-stat
- CTS
from point-stat or grid-stat
- SL1L2
from point-stat or grid-stat
- VL1L2
from point-stat or grid-stat
- ECNT
from point-stat or ensemble-stat
- MCTS
from point-stat
- VCNT
from point-stat or grid-stat
Reformatting for ROC diagrams
The ROC diagram can be created from PCT line type data (from the MET point-stat, grid-stat, or ensemble-stat tools). The reformatted .stat file contains these columns, replacing the unlabelled columns and converting from wide to long format:
- thresh_i
the ith probability threshold value (repeated)
- oy_i
number of observations yes when forecast is between the ith and i+1th probability threshold
- on_i
- number of observations when no forecast is between the ith and i+1th
probability threshold
- i_value
the threshold number
The PCT line type consists of a variable number of unlabelled columns/headers corresponding to THRESH_i, OY_i, and ON_i, as described in the MET User’s Guide: https://met.readthedocs.io/en/latest/Users_Guide/. These columns corresponding to OY_1, OY_2, ,,,. OY_m (where m is the THRESH_ith value) are unlabelled when generated by the MET tool. These unlabelled columns are appropriately labelled to OY_1,…, OY_m values, ON_1, .., ON_m, and THRESH_1,…, THRESH_m. These labelled columns are then ordered into the thresh_i, oy_i, on_i, and i_value columns. The i_value column is derived from the ith value of OY, ON, and THRESH. The thresh_i column consists of the threshold values for the threshold number defined in the i_value column. The oy_i and on_i columns contain the OY_i and ON_i values from the .stat data.
Reformatting for rank histogram
The rank histogram plot can be created from the RHIST line type data (from the MET ensemble-stat tool). The reformatted .stat file contains these columns, replacing the unlabelled columns (corresponding to the variable number of columns corresponding to rank_1, …, rank_n) and converting from wide to long format:
- rank_i
count of observations with the ith rank (repeated)
- i_value
the rank number
Other plot types require additional special columns. Support for reformatting the remaining line types for the remaining point-stat, grid-stat, and ensemble-stat MET output will be added.
Reformatting for computing aggregation statistics with METcalcpy agg_stat
The ECNT linetype is currently the only linetype with support for reformatting the METcalcpy to be used for calculating aggregation statistics. Support for other linetypes will be added in the future.
The reformatted .stat file now replaces the unlabelled columns with the corresponding ECNT statistic values specified in Table 13.2 of the MET User’s Guide.
Reformatting for generating TCMPR plots
The TCDIAG linetype, that also has corresponding TCMPR linetype data, is reformatted by consolidating the TCMPR columns with the TCDIAG columns. The reformatted data does not need to have aggregate statistics computed and can be used by the TCMPR plots (e.g. point, boxplot) to generate time series plots and box plots. Refer to the METplotpy User’s Guide for information on using the TCMPR plotter.
Reformatting for scatter plots
The METplotpy scatter plots support plotting of any column of data in the input file against any other column (useful in visualizing relationships between variables). The MPR output is reformatted to consist of the labelled headers of the original MET output when the keep_all_mpr_cols setting is set to True. This is the format needed to generate scatter plots.
Refer to MET User’s Guide table 11.20 for more information on the MPR linetype.
NOTE: When keep_all_mpr_cols is set to False, the MPR output is reformatted to collect the MPR specific columns to the stat_name and stat_value columns, with additional confidence limit columns (stat_ncl/ncu, stat_bcl/bcu). This format is consistent with the format required for generating a line plot using METplotpy from the command line.
6.2. Required Components
Use the MET stat-analysis tool to filter data (by criteria such as model, valid times, etc.). The output from the stat-analysis tool can then be used as input to the METdataio METreformat reformatter. If filtering of data is not needed, the .stat files from the MET point-stat, grid-stat, and ensemble-stat tools can be used as input to the reformatter. If aggregation statistics are needed, then the METcalcpy agg_stat.py module can be used following the reformatting step. Reformatting to accommodate METcalcpy agg_stat is currently only available for the ECNT linetype. The input_stats_aggregated setting is used to indicate whether the reformatter needs to reformat the output for the METcalcpy agg_stat module.
METdbLoad modules are used to find and collect data from the individual .stat files into one data structure. The input .stat files must all reside under one directory. The path to this input data is specified in a YAML configuration file.
The YAML configuration file is also used to indicate the name and location of the output file, logging information (filename, log level), and the line type to read in and reformat:
Copy this custom config file from the directory where the source code was saved to the working directory.
Modify the YAML configuration file
Edit the reformat_stat.yaml config file: .. literalinclude:: ../../../METdataio/METreformat/reformat_stat.yaml
Refer to the following details for each of the mandatory settings in the configuration file.
Definition of Mandatory Config Settings
- input_stats_aggregated
- By default, this is set to True to:
- -indicate that the input data has been processed by the MET stat-analysis
tool to calculate aggregation statistics
or
- -if the data of interest does not require calculation of aggregation statistics. This
reformatted data can be used as input to the appropriate METplotpy plotting script.
Set this to False if aggregation statistics need to be calculated by the METcalcpy agg_stat module.
- input_data_dir
The full path (no environment variables) to the directory that contains all the input .stat files from the MET point-stat, grid-stat, or ensemble-stat tool
If data is distributed among numerous directories, they will need to be consolidated into one directory
- output_dir
The full path (no environment variables) to the directory where the reformatted file will be saved
- output_filename
The name of the output file
NOTE: save with .data extension if this is to be used for plotting using METplotpy
If reformatting is run successively without removing an existing output file of the same name, the existing file will be overwritten.
- log_filename
The name of the log file
Set to STDOUT or stdout (case insensitive) if no log file is to be saved
- log_dir
The full path to the directory (no environment variables) where the log file is to be saved
- log_level
The verbosity of the logging: INFO, DEBUG, WARNING, ERROR
INFO is the most verbose, ERROR is least verbose
- line_type
The line type to be reformatted
Currently supported line types are:
FHO
CNT
CTC
CTS
SL1L2
VL1L2
ECNT
MCTS
VCNT
RHIST
PCT
TCDIAG
MPR
##################### FOR MPR LINETYPE ONLY #####################
- keep_all_mpr_cols
For reformatting of MPR only
True if reformatting for scatter plot, False otherwise
6.3. Example
set up a base directory, where the METdataio source code is located
bash:
export BASE_DIR=/path/to/METdataio
csh:
setenv BASE_DIR /path/to/METdataio
replace /path/to with an actual path
set up a working directory, where the YAML config file will be located
bash:
export WORKING_DIR=/path/to/working_dir
csh:
setenv WORKING_DIR /path/to/working_dir
NOTE: Do NOT use environment variables for /path/to, specify the actual path.
set the PYTHONPATH:
bash
export PYTHONPATH=$BASE_DIR:/$BASE_DIR/METdbLoad:$BASE_DIR/METdbLoad/ush:$BASE_DIR/METreformat
csh
setenv PYTHONPATH $BASE_DIR:/$BASE_DIR/METdbLoad:$BASE_DIR/METdbLoad/ush:$BASE_DIR/METreformat
6.3.1. Generate the reformatted file:
place the .stat/.tcst data of interest (output from MET tool) into a single directory * NOTE*: This may require reorganization of data that is distributed over numerous directories into a single directory.
modify the reformat_stat.yaml file, indicating the input directory, output directory, output file name, line stat to reformat, and logging settings (log level, log filename, log directory). Refer to the Definition of Configuration Settings above for a description of each setting.
python $BASE_DIR/METreformat/write_stat_ascii.py $WORKING_DIR/*line_type*_stat.yaml
A text file will be created in the output directory with the file name that was specified in the yaml file.