Pecos can be used to collect quality control test results and performance metrics, and generate HTML reports and dashboards.
Quality control test results¶
When a quality control test fails, information is stored in:
This DataFrame is updated each time a new quality control test is run. Test results includes the following information:
Variable Name: Column name in the DataFrame
Start Time: Start time of the failure
End Time: : End time of the failure
Timesteps: The number of consecutive time steps involved in the failure
Error Flag: Error messages include:
Missing data (used for missing data and missing timestamp)
Data < lower bound OR Data > upper bound
Increment < lower bound OR Increment > upper bound
Delta < lower bound OR Delta > upper bound
Outlier < lower bound OR Outlier > upper bound
A subset of quality control test results from the Simple example are shown below.
>>> print(pm.test_results) Variable Name Start Time End Time Timesteps Error Flag 1 NaN 1/1/2015 5:00 1/1/2015 5:00 1 Missing timestamp 2 NaN 1/1/2015 17:00 1/1/2015 17:00 1 Duplicate timestamp 3 NaN 1/1/2015 19:30 1/1/2015 19:30 1 Nonmonotonic timestamp 4 A 1/1/2015 12:00 1/1/2015 14:30 11 Delta < lower bound, 0.0001 5 B 1/1/2015 6:30 1/1/2015 6:30 1 Data < lower bound, 0 6 B 1/1/2015 15:30 1/1/2015 15:30 1 Data > upper bound, 1 7 C 1/1/2015 7:30 1/1/2015 9:30 9 Corrupt data
Note that variable names are not recorded for timestamp test failures (Test results 1, 2, and 3).
write_test_results method is used to write quality control test results to a CSV file.
This method can be customized to write quality control test results to a database or to other file formats.
Quality control mask¶
Boolean mask indicating data that failed a quality control test is stored in:
This DataFrame is updated each time a new quality control test is run. True indicates that data pass all tests, False indicates data did not pass at least one test (or data is NaN).
Cleaned data set is stored in:
This DataFrame is updated each time a new quality control test is run. Data that failed a quality control test are replaced by NaN.
Note that Pandas includes several methods to replace NaN using different replacement strategies. Generally, the best data replacement strategy must be defined on a case by case basis. Possible strategies include:
Replacing missing data using linear interpolation or other polynomial approximations
Replacing missing data using a rolling mean of the data
Replacing missing data with a data from a previous period (previous day, hour, etc.)
Replacing missing data with data from a redundant sensor
Replacing missing data with values from a model
These strategies can be accomplished using the Pandas methods
See Pandas documentation for more details.
write_metrics method is used to write metrics that describe the quality control analysis (i.e. QCI) to a CSV file.
This method can be customized to write performance metrics to a database or to other file formats.
The method can be called multiple times to appended metrics based on the timestamp of the DataFrame.
>>> print(metrics_day1) QCI RMSE 2018-01-01 0.871 0.952 >>> print(metrics_day2) QCI RMSE 2018-01-02 0.755 0.845 >>> pecos.io.write_metrics(metrics_day1, 'metrics_file.csv') >>> pecos.io.write_metrics(metrics_day2, 'metrics_file.csv')
The metrics_file.csv file will contain:
QCI RMSE 2018-01-01 0.871 0.952 2018-01-02 0.755 0.845
write_monitoring_report method is used to generate a HTML monitoring report.
The monitoring report includes the start and end time for analysis, custom graphics
and performance metrics, a table that includes test results, graphics associated
with the test results (highlighting data points that failed a quality control tests),
notes on runtime errors and warnings, and the configuration options
used in the analysis.
Custom Graphics: Custom graphics can be created for specific applications. These graphics are included at the top of the report. Custom graphics can be generated using methods in the
Performance Metrics: Performance metrics are displayed in a table.
Test Results Test results contain information stored in pm.test_results. Graphics follow that display the data point(s) that caused the failure. Test results graphics are generated using the
Notes: Notes include Pecos runtime errors and warnings. Notes include:
Formatting error in the translation dictionary
Insufficient data for a specific quality control test
Insufficient data or error when evaluating string
Configuration Options: Configuration options used in the analysis.
To compare quality control analysis across several systems, key graphics and metrics
can be gathered in a dashboard view.
For example, the dashboard can contain multiple rows (one for each system) and multiple columns (one for each location).
The dashboard can be linked to monitoring reports and interactive graphics for more detailed information.
write_monitoring_report method is used to generate a HTML dashboard.
For each row and column in the dashboard, the following information can be specified
Text (i.e. general information about the system/location)
Graphics (i.e. a list of custom graphics)
Table (i.e. a Pandas DataFrame with performance metrics)
Links (i.e. the path to a monitoring report or other file/site for additional information)
Text, graphics, tables, and links can be combined to create custom dashboards. Pecos includes dashboard examples in the examples/dashboard directory. Figure 6, Figure 7, and Figure 8 show example dashboards generated using Pecos.
graphics module contains several methods to plot time series data, scatter plots, heatmaps,
and interactive graphics. These methods can be used to generate graphics that are included in
monitoring reports and dashboards, or to generate stand alone graphics. The following figures
illustrate graphics created using the methods included in Pecos.
Note that many other graphing options are available using Python graphing packages directly.
Test results graphics, generated using
time series data along with a shaded time filter and quality control test results.
The following figure shows inverter efficiency over the course of 1 day.
The gray region indicates times when sun elevation is < 20 degrees.
This region is eliminated from quality control tests. Green marks identify data points
that were flagged as changing abruptly, red marks identify data points that were outside expected range.
These graphics can be included in Monitoring reports.
Day-of-year vs. time-of-day heatmaps, generated using
can help identify missing data, trends, define filters and define quality control test thresholds when working with large data sets.
The following figure shows irradiance over a year with the time of sunrise and sunset for each day.
The white vertical line indicates one day of missing data.
plot_heatmap can be used to create simple heatmaps.
These plots can be included as custom graphics in Monitoring reports and Dashboards.
Interactive graphics, generated using
are HTML graphic files which the user can scale and hover over to visualize data.
The following figure shows an image of an interactive graphic. Many more options are available,
see https://plot.ly for more details.
Interactive graphics can be linked to Dashboards.