Performance Monitoring using Pecos¶
Advances in sensor technology have rapidly increased our ability to monitor natural and human-made physical systems. In many cases, it is critical to process the resulting large volumes of data on a regular schedule and alert system operators when the system has changed. Automated quality control and performance monitoring can allow system operators to quickly detect performance issues.
Pecos is an open source Python package designed to address this need. Pecos includes built-in functionality to monitor performance of time series data. The software can be used to automate a series of quality control tests and generate custom reports which include performance metrics, test results, and graphics. The software was developed specifically to monitor solar photovoltaic systems, but is designed to be used for a wide range of applications. Figure 1 shows example graphics and dashboard created using Pecos.

Example graphics and dashboard created using Pecos.¶
Citing Pecos¶
To cite Pecos, use one of the following references:
K.A. Klise and J.S. Stein (2016), Performance Monitoring using Pecos, Technical Report SAND2016-3583, Sandia National Laboratories, Albuquerque, NM.
K.A. Klise and J.S. Stein (2016), Automated Performance Monitoring for PV Systems using Pecos, 43rd IEEE Photovoltaic Specialists Conference (PVSC), Portland, OR, June 5-10. pdf
Contents¶
Overview¶
Pecos is an open-source Python package designed to monitor performance of time series data, subject to a series of quality control tests. The software includes methods to run quality control tests defined by the user and generate reports which include performance metrics, test results, and graphics. The software can be customized for specific applications. Some high-level features include:
Pecos uses Pandas DataFrames [Mcki13] to store and analyze time series data. This dependency facilitates a wide range of analysis options and date-time functionality.
Data column names can be easily reassigned to common names through the use of a translation dictionary. Translation dictionaries also allow data columns to be grouped for analysis.
Time filters can be used to eliminate data at specific times from quality control tests (i.e. early evening and late afternoon).
Predefined and custom quality control functions can be used to determine if data is anomalous.
Application specific models can be incorporated into quality control tests to compare measured to modeled data values.
General and custom performance metrics can be saved to keep a running history of system health.
Analysis can be set up to run on an automated schedule (i.e. Pecos can be run each day to analyze data collected on the previous day).
HTML formatted reports can be sent via email or hosted on a website. LaTeX formatted reports can also be generated.
Data acquisition methods can be used to transfer data from sensors to an SQL database.
Installation¶
Pecos requires Python (tested on 3.6, 3.7, 3.8, and 3.9) along with several Python package dependencies. Information on installing and using Python can be found at https://www.python.org/. Python distributions, such as Anaconda, are recommended to manage the Python interface. Anaconda Python distributions include the Python packages needed to run Pecos.
Users can install the latest release of Pecos from PyPI or Anaconda using one of the following commands.
Developers can install the main branch of Pecos from the GitHub repository using the following commands:
git clone https://github.com/sandialabs/pecos
cd pecos
python setup.py install
To install Pecos using a downloaded zip file, go to https://github.com/sandialabs/pecos, select the “Clone or download” button and then select “Download ZIP”. This downloads a zip file called pecos-main.zip. To download a specific release, go to https://github.com/sandialabs/pecos/releases and select a zip file. The software can then be installed by unzipping the file and running setup.py:
unzip pecos-main.zip
cd pecos-main
python setup.py install
To use Pecos, import the package from a Python console:
import pecos
Dependencies¶
Required Python package dependencies include:
Pandas [Mcki13]: used to analyze and store time series data, http://pandas.pydata.org/
Numpy [VaCV11]: used to support large, multi-dimensional arrays and matrices, http://www.numpy.org/
Jinja [Rona08]: used to generate HTML templates, http://jinja.pocoo.org/
Matplotlib [Hunt07]: used to produce figures, http://matplotlib.org/
Optional Python packages dependencies include:
minimalmodbus: used to collect data from a modbus device, https://minimalmodbus.readthedocs.io
sqlalchemy: used to insert data into a MySQL database, https://www.sqlalchemy.org/
pyyaml: used to store configuration options in human readable data format, http://pyyaml.org/
PVLIB [SHFH16]: used to simulate the performance of photovoltaic energy systems, http://pvlib-python.readthedocs.io/
Plotly [SPHC16]: used to produce interactive scalable figures, https://plot.ly/
All other dependencies are part of the Python Standard Library.
Framework¶
Pecos contains the following modules
Module
Description
Contains the PerformanceMonitoring class and individual quality control test functions that are used to run analysis
Contains metrics that describe the quality control analysis or compute quantities that might be of use in the analysis
Contains functions to load data, send email alerts, write results to files, and generate HTML and LaTeX reports
Contains functions to generate scatter, time series, and heatmap plots for reports
Contains helper functions, including functions to convert time series indices from seconds to datetime
In addition to the modules listed above, Pecos also includes a pv
module that contains metrics specific to photovoltaic analysis.
Object-oriented and functional approach¶
Pecos supports quality control tests that are called using both an object-oriented and functional approach.
Object-oriented approach¶
Pecos includes a PerformanceMonitoring
class which is the base class used to define
the quality control analysis. This class stores:
Raw data
Translation dictionary (maps raw data column names to common names)
Time filter (excludes specific timestamps from analysis)
The class is used to call quality control tests, including:
check_timestamp
: Check timestamps for missing, duplicate, and non-monotonic indexescheck_missing
: Check for missing datacheck_corrupt
: Check for corrupt datacheck_range
: Check for data outside expected rangecheck_delta
: Check for stagnant of abrupt changes in the datacheck_outlier
: Check for outlierscheck_custom_static
: Custom static quality control testcheck_custom_streaming
: Custom streaming quality control test
The class can return the following results:
Cleaned data (data that failed a test is replaced by NaN)
Boolean mask (indicates if data failed a test)
Summary of the quality control test results
The object-oriented approach is convenient when running a series of quality control tests and can make use of the translation dictionary and time filter across all tests. The cleaned data, boolean mask, and test results summary reflect results from all quality control tests.
When using the object-oriented approach, a PerformanceMonitoring object is created and methods are called using that object. The cleaned data, mask, and tests results can then be extracted from the PerformanceMonitoring object. These properties are updated each time a quality control test is run.
>>> pm = pecos.monitoring.PerformanceMonitoring()
>>> pm.add_dataframe(data)
>>> pm.check_range([-3,3])
>>> cleaned_data = pm.cleaned_data
>>> mask = pm.mask
>>> test_results = pm.test_results
Functional approach¶
The same quality control tests can also be run using individual functions. These functions generate a PerformanceMonitoring object under the hood and return:
Cleaned data
Boolean mask
Summary of the quality control test results
The functional approach is a convenient way to quickly get results from a single quality control tests.
When using the functional approach, data is passed to the quality control test function. All other augments match the object-oriented approach. The cleaned data, mask, and tests results can then be extracted from a resulting dictionary.
>>> results = pecos.monitoring.check_range(data, [-3,3])
>>> cleaned_data = results['cleaned_data']
>>> mask = results['mask']
>>> test_results = results['test_results']
Note, examples in the documentation use the object-oriented approach.
Static and streaming analysis¶
Pecos supports both static and streaming analysis.
Static analysis¶
Most quality control tests in Pecos use static analysis. Static analysis operates on the entire data set to determine if all data points are normal or anomalous. While this can include operations like moving window statistics, the quality control tests operates on the entire data set at once. This means that results from the quality control test are not dependent on results from a previous time step. This approach is appropriate when data at different time steps can be analyzed independently, or moving window statistics used to analyze the data do not need to be updated based on test results.
The following quality control tests use static analysis:
1 The outlier test can make use of both static and streaming analysis. See Outlier test for more details.
Streaming analysis¶
The streaming analysis loops through each data point using a quality control tests that relies on information from “clean data” in a moving window. If a data point is determined to be anomalous, it is not included in the window for subsequent analysis. When using a streaming analysis, Pecos keeps track of the cleaned history that is used in the quality control test at each time step. This approach is important to use when the underlying methods in the quality control test could be corrupted by historical data points that were deemed anomalous. The streaming analysis also allows users to better analyze continuous datasets in a near real-time fashion. While Pecos could be used to analyze data at a single time step in a real-time fashion (creating a new instance of the PerformanceMonitoring class each time), the methods in Pecos are really designed to analyze data over a time period. That time period can depend on several factors, including the size of the data and how often the test results and reports should be generated. Cleaned history can be appended to new datasets as they come available to create a seamless analysis for continuous data. See Continuous analysis for more details.
The streaming analysis includes an optional parameter which is used to rebase data in the history window if a certain fraction of that data has been deemed to be anomalous. The ability to rebase the history is useful if data changes to a new normal condition that would otherwise continue to be flagged as anomalous.
The following quality control tests use streaming analysis:
2 The timestamp test does not loop through data using a moving window, rather timestamp functionality in Pandas is used to determine anomalies in the time index.
3 The outlier test can make use of both static and streaming analysis. See Outlier test for more details.
Custom quality control tests¶
Pecos supports custom quality control tests that can be static or streaming in form. This feature allows the user to customize the analysis used to determine if data is anomalous and return custom metadata from the analysis.
The custom function is defined outside of Pecos and handed to the custom quality control method as an input argument. The allows the user to include analysis options that are not currently support in Pecos or are very specific to their application.
While there are no specifications on the information that metadata stores, the metadata commonly includes raw values that were used in the quality control test. For example, while the outlier test returns a boolean value that indicates if data is normal or anomalous, the metadata can include the normalized data value that was used to make that determination. See Custom tests for more details.
Simple example¶
A simple example is included in the examples/simple directory. This example uses data from a CSV file, simple.csv, which contains 4 columns of data (A through D).
A = elapsed time in days
B = uniform random number between 0 and 1
C = sin(10*A)
D = C+(B-0.5)/2
The data includes missing timestamps, duplicate timestamps, non-monotonic timestamps, corrupt data, data out of expected range, data that doesn’t change, and data that changes abruptly, as listed below.
Missing timestamp at 5:00
Duplicate timestamp 17:00
Non-monotonic timestamp 19:30
Column A has the same value (0.5) from 12:00 until 14:30
Column B is below the expected lower bound of 0 at 6:30 and above the expected upper bound of 1 at 15:30
Column C has corrupt data (-999) between 7:30 and 9:30
Column C does not follow the expected sine function from 13:00 until 16:15. The change is abrupt and gradually corrected.
Column D is missing data from 17:45 until 18:15
Column D is occasionally below the expected lower bound of -1 around midday (2 time steps) and above the expected upper bound of 1 in the early morning and late evening (10 time steps).
The script, simple_example.py (shown below), is used to run quality control analysis using Pecos. The script performs the following steps:
Load time series data from a CSV file
Run quality control tests
Save test results to a CSV files
Generate an HTML report
"""
In this example, simple time series data is used to demonstrate basic functions
in pecos.
* Data is loaded from a CSV file which contains four columns of values that
are expected to follow linear, random, and sine models.
* A translation dictionary is defined to map and group the raw data into
common names for analysis
* A time filter is established to screen out data between 3 AM and 9 PM
* The data is loaded into a pecos PerformanceMonitoring object and a series of
quality control tests are run, including range tests and increment tests
* The results are printed to CSV and HTML reports
"""
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import pecos
# Initialize logger
pecos.logger.initialize()
# Create a Pecos PerformanceMonitoring data object
pm = pecos.monitoring.PerformanceMonitoring()
# Populate the object with a DataFrame and translation dictionary
data_file = 'simple.csv'
df = pd.read_csv(data_file, index_col=0, parse_dates=True)
pm.add_dataframe(df)
pm.add_translation_dictionary({'Wave': ['C','D']}) # group C and D
# Check the expected frequency of the timestamp
pm.check_timestamp(900)
# Generate a time filter to exclude data points early and late in the day
clock_time = pecos.utils.datetime_to_clocktime(pm.data.index)
time_filter = pd.Series((clock_time > 3*3600) & (clock_time < 21*3600),
index=pm.data.index)
pm.add_time_filter(time_filter)
# Check for missing data
pm.check_missing()
# Check for corrupt data values
pm.check_corrupt([-999])
# Add a composite signal which compares measurements to a model
wave_model = np.array(np.sin(10*clock_time/86400))
wave_measurments = pm.data[pm.trans['Wave']]
wave_error = np.abs(wave_measurments.subtract(wave_model,axis=0))
wave_error.columns=['Wave Error C', 'Wave Error D']
pm.add_dataframe(wave_error)
pm.add_translation_dictionary({'Wave Error': ['Wave Error C', 'Wave Error D']})
# Check data for expected ranges
pm.check_range([0, 1], 'B')
pm.check_range([-1, 1], 'Wave')
pm.check_range([None, 0.25], 'Wave Error')
# Check for stagnant data within a 1 hour moving window
pm.check_delta([0.0001, None], 3600, 'A')
pm.check_delta([0.0001, None], 3600, 'B')
pm.check_delta([0.0001, None], 3600, 'Wave')
# Check for abrupt changes between consecutive time steps
pm.check_increment([None, 0.6], 'Wave')
# Compute the quality control index for A, B, C, and D
mask = pm.mask[['A','B','C','D']]
QCI = pecos.metrics.qci(mask, pm.tfilter)
# Generate graphics
test_results_graphics = pecos.graphics.plot_test_results(pm.data, pm.test_results, pm.tfilter)
df.plot(ylim=[-1.5,1.5], figsize=(7.0,3.5))
plt.savefig('custom.png', format='png', dpi=500)
# Write test results and report files
pecos.io.write_test_results(pm.test_results)
pecos.io.write_monitoring_report(pm.data, pm.test_results, test_results_graphics,
['custom.png'], QCI)
Results include:
HTML monitoring report, monitoring_report.html (Figure 2), includes quality control index, summary table, and graphics
Test results CSV file, test_results.csv, includes information from the summary tables

Example monitoring report.¶
Time series data¶
Pecos uses Pandas DataFrames to store and analyze data indexed by time. Pandas DataFrames store 2D data with labeled columns. Pandas includes a wide range of time series analysis and date-time functionality. By using Pandas DataFrames, Pecos is able to take advantage of a wide range of timestamp string formats, including UTC offset.
Pandas includes many built-in functions to read data from CSV, Excel, SQL, etc. For example, data can be loaded from an excel file using the following code.
>>> import pandas as pd
>>> data = pd.read_excel('data.xlsx')
Data can also be gathered from the web using the Python package request, http://docs.python-requests.org.
To get started, create an instance of the PerformanceMonitoring
class.
Note
Quality control tests can also be called using individual functions, see Framework for more details.
>>> import pecos
>>> pm = pecos.monitoring.PerformanceMonitoring()
Data, in the form of a Pandas DataFrame, can then be added to the PerformanceMonitoring object.
>>> pm.add_dataframe(data)
The data is accessed using
>>> pm.data
Multiple DataFrames can be added to the PerformanceMonitoring object. New data overrides existing data if DataFrames share indexes and columns. Missing indexes and columns are filled with NaN. An example is shown below.
>>> print(data1)
A B
2018-01-01 0.0 5.0
2018-01-02 1.0 6.0
2018-01-03 2.0 7.0
>>> print(data2)
B C
2018-01-02 0.0 5.0
2018-01-03 1.0 6.0
2018-01-04 2.0 7.0
>>> pm.add_dataframe(data1)
>>> pm.add_dataframe(data2)
>>> print(pm.data)
A B C
2018-01-01 0.0 5.0 NaN
2018-01-02 1.0 0.0 5.0
2018-01-03 2.0 1.0 6.0
2018-01-04 NaN 2.0 7.0
Translation dictionary¶
A translation dictionary is an optional feature which allows the user to map original column names into common names that can be more useful for analysis and reporting. A translation dictionary can also be used to group columns with similar properties into a single variable. Using grouped variables, Pecos can run a single set of quality control tests on the group.
Each entry in a translation dictionary is a key:value pair where ‘key’ is the common name of the data and ‘value’ is a list of original column names in the DataFrame. For example, {temp: [temp1,temp2]} means that columns named ‘temp1’ and ‘temp2’ in the DataFrame are assigned to the common name ‘temp’ in Pecos. In the Simple example, the following translation dictionary is used to group columns ‘C’ and ‘D’ to ‘Wave’.
>>> trans = {'Wave': ['C','D']}
The translation dictionary can then be added to the PerformanceMonitoring object.
>>> pm.add_translation_dictionary(trans)
As with DataFrames, multiple translation dictionaries can be added to the PerformanceMonitoring object. New dictionaries override existing keys in the translation dictionary.
Keys defined in the translation dictionary can be used in quality control tests, for example,
>>> pm.check_range([-1,1], 'Wave')
runs a check range test on columns ‘C’ and ‘D’.
Inside Pecos, the translation dictionary is used to index into the DataFrame, for example,
>>> pm.data[pm.trans['Wave']]
returns columns ‘C’ and ‘D’ from the DataFrame.
Time filter¶
A time filter is an optional feature which allows the user to exclude specific timestamps from all quality control tests. The time filter is a Boolean time series that can be defined using elapsed time, clock time, or other custom algorithms.
Pecos includes methods to get the elapsed and clock time of the DataFrame (in seconds). The following example defines a time filter between 3 AM and 9 PM,
>>> clocktime = pecos.utils.datetime_to_clocktime(pm.data.index)
>>> time_filter = pd.Series((clocktime > 3*3600) & (clocktime < 21*3600),
... index=pm.data.index)
The time filter can also be defined based on properties of the DataFrame, for example,
>>> time_filter = pm.data['A'] > 0.5
For some applications, it is useful to define the time filter based on sun position, as demonstrated in pv_example.py in the examples/pv directory.
The time filter can then be added to the PerformanceMonitoring object as follows,
>>> pm.add_time_filter(time_filter)
Quality control tests¶
Pecos includes several built in quality control tests. When a test fails, information is stored in a summary table. This information can be saved to a file, database, or included in reports. Quality controls tests fall into eight categories:
Timestamp
Missing data
Corrupt data
Range
Delta
Increment
Outlier
Custom
Note
Quality control tests can also be called using individual functions, see Framework for more details.
Timestamp test¶
The check_timestamp
method is used to check the time index for missing,
duplicate, and non-monotonic indexes. If a duplicate timestamp is found, Pecos keeps the first occurrence.
If timestamps are not monotonic, the timestamps are reordered.
For this reason, the timestamp should be corrected before other quality control
tests are run.
The timestamp test is the only test that modifies the data stored in pm.data.
Input includes:
Expected frequency of the time series in seconds
Expected start time (default = None, which uses the first index of the time series)
Expected end time (default = None, which uses the last index of the time series)
Minimum number of consecutive failures for reporting (default = 1)
A flag indicating if exact timestamps are expected. When set to False, irregular timestamps can be used in the Pecos analysis (default = True).
For example,
>>> pm.check_timestamp(60)
checks for missing, duplicate, and non-monotonic indexes assuming an expected frequency of 60 seconds.
Missing data test¶
The check_missing
method is used to check for missing values.
Unlike missing timestamps, missing data only impacts a subset of data columns.
NaN is included as missing.
Input includes:
Data column (default = None, which indicates that all columns are used)
Minimum number of consecutive failures for reporting (default = 1)
For example,
>>> pm.check_missing('A', min_failures=5)
checks for missing data in the columns associated with the column or group ‘A’. In this example, warnings are only reported if there are 5 consecutive failures.
Corrupt data test¶
The check_corrupt
method is used to check for corrupt values.
Input includes:
List of corrupt values
Data column (default = None, which indicates that all columns are used)
Minimum number of consecutive failures for reporting (default = 1)
For example,
>>> pm.check_corrupt([-999, 999])
checks for data with values -999 or 999 in the entire dataset.
Range test¶
The check_range
method is used to check if data is within expected bounds.
Range tests are very flexible. The test can be used to check for expected range on the raw data or using modified data.
For example, composite signals can be add to the analysis to check for expected range on modeled
vs. measured values (i.e. absolute error or relative error) or an expected
relationships between data columns (i.e. column A divided by column B).
An upper bound, lower bound, or both can be specified.
Input includes:
Upper and lower bound
Data column (default = None, which indicates that all columns are used)
Minimum number of consecutive failures for reporting (default = 1)
For example,
>>> pm.check_range([None, 1], 'A')
checks for values greater than 1 in the columns associated with the key ‘A’.
Delta test¶
The check_delta
method is used to check for stagnant data and abrupt changes in data.
The test checks if the difference between the minimum and maximum data value within a moving window is within expected bounds.
Input includes:
Upper and lower bound
Size of the moving window used to compute the difference between the minimum and maximum
Data column (default = None, which indicates that all columns are used)
Flag indicating if the test should only check for positive delta (the min occurs before the max) or negative delta (the max occurs before the min) (default = False)
Minimum number of consecutive failures for reporting (default = 1)
For example,
>>> pm.check_delta([0.0001, None], window=3600)
checks if data changes by less than 0.0001 in a 1 hour moving window.
>>> pm.check_delta([None, 800], window=1800, direction='negative')
checks if data decrease by more than 800 in a 30 minute moving window.
Increment test¶
Similar to the check_delta method above, the check_increment
method can be used to check for stagnant data and abrupt changes in data.
The test checks if the difference between
consecutive data values (or other specified increment) is within expected bounds.
While this method is faster than the check_delta method, it does not consider
the timestamp index or
changes within a moving window, making its ability to
find stagnant data and abrupt changes less robust.
Input includes:
Upper and lower bound
Data column (default = None, which indicates that all columns are used)
Increment used for difference calculation (default = 1 timestamp)
Flag indicating if the absolute value of the increment is used in the test (default = True)
Minimum number of consecutive failures for reporting (default = 1)
For example,
>>> pm.check_increment([0.0001, None], min_failures=60)
checks if increments are less than 0.0001 for 60 consecutive time steps.
>>> pm.check_increment([-800, None], absolute_value=False)
checks if increments decrease by more than 800 in a single time step.
Outlier test¶
The check_outlier
method is used to check if normalized data
falls outside expected bounds. Data is normalized using the mean and standard deviation, using either a
moving window or using the entire data set. If multiple columns of data are used, each column is normalized separately.
Input includes:
Upper and lower bound (in standard deviations)
Data column (default = None, which indicates that all columns are used)
Size of the moving window used to normalize the data (default = None). Note that when the window is set to None, the mean and standard deviation of the entire data set is used to normalize the data.
Flag indicating if the absolute value of the normalize data is used in the test (default = True)
Minimum number of consecutive failures for reporting (default = 1)
Flag indicating if the outlier test should use streaming analysis (default=False).
Note that using a streaming analysis is different than merely defining a moving window. Streaming analysis omits anomalous values from subsequent normalization calculations, where as a static analysis with a moving window does not.
In a static analysis, the mean and standard deviation used to normalize the data are computed using a moving window (or using the entire data set if window=None) and upper and lower bounds are used to determine if data points are anomalous. The results do not impact the moving window statistics. In a streaming analysis, the mean and standard deviation are computed using a moving window after each data points is determined to be normal or anomalous. Data points that are determined to be anomalous are not used in the normalization.
For example,
>>> pm.check_outlier([None, 3], window=12*3600)
checks if the normalized data changes by more than 3 standard deviations within a 12 hour moving window.
Custom tests¶
The check_custom_static
and check_custom_streaming
methods
allow the user to supply a custom function that is used to determine if data is normal or anomalous.
See Static and streaming analysis for more details.
This feature allows the user to customize the analysis and return custom metadata from the analysis. The custom function is defined outside of Pecos and handed to the custom quality control method as an input argument. The allows the user to include analysis options that are not currently support in Pecos or are very specific to their application. While there are no specifications on what this metadata stores, the metadata commonly includes the raw values that were included in a quality control test. For example, while the outlier test returns a boolean value that indicates if data is normal or anomalous, the metadata can include the normalized data value that was used to make that determination.
The user can also create custom quality control tests by creating a class that inherits from the PerformanceMonitoring class.
Custom static analysis¶
Static analysis operates on the entire data set to determine if all data points are normal or anomalous. Input for custom static analysis includes:
Custom quality control function with the following general form:
def custom_static_function(data): """ Parameters ---------- data : pandas DataFrame Data to be analyzed. Returns -------- mask : pandas DataFrame Mask contains boolean values and is the same size as data. True = data passed the quality control test, False = data failed the quality control test. metadata : pandas DataFrame Metadata stores additional information about the test and is returned by ''check_custom_static''. Metadata is generally the same size as data. """ # User defined custom algorithm ... return mask, metadata
Data column (default = None, which indicates that all columns are used)
Minimum number of consecutive failures for reporting (default = 1)
Error message (default = None)
Custom static analysis can be run using the following example.
The custom function below, sine_function
, determines if sin(data) is greater than 0.5 and returns the value of sin(data) as metadata.
>>> import numpy as np
>>> def sine_function(data):
... # Create metadata and mask using sin(data)
... metadata = np.sin(data)
... mask = metadata > 0.5
... return mask, metadata
>>> metadata = pm.check_custom_static(sine_function)
Custom streaming analysis¶
The streaming analysis loops through each data point using a quality control tests that relies on information from “clean data” in a moving window. Input for custom streaming analysis includes:
Custom quality control function with the following general form:
def custom_streaming_function(data_pt, history): """ Parameters ---------- data_pt : pandas Series The current data point to be analyzed. history : pandas DataFrame Historical data used in the analysis. The streaming analysis omits data points that were previously flagged as anomalous in the history. Returns -------- mask : pandas Series Mask contains boolean values (one value for each row in data_pt). True = data passed the quality control test, False = data failed the quality control test. metadata : pandas Series Metadata stores additional information about the test for the current data point. Metadata generally contains one value for row in data_pt. Metadata is collected into a pandas DataFrame with one row per time index that was included in the quality control test (omits the history window) and is returned by ''check_custom_streaming''. """ # User defined custom algorithm ... return mask, metadata
Size of the moving window used to define the cleaned history.
Indicator used to rebase the history window. If the user defined fraction of the history window has been deemed anomalous, then the history is reset using raw data. The ability to rebase the history is useful if data changes to a new normal condition that would otherwise continue to be flagged as anomalous. (default = None, which indicates that rebase is not used)
Data column (default = None, which indicates that all columns are used)
Error message (default = None)
Custom streaming analysis can be run using the following example.
The custom function below, nearest_neighbor
, determines if the current data point is within 3 standard
deviations of data in a 10 minute history window.
In this case, metadata returns the distance from each column in the current data point to its nearest neighbor in the history.
This is similar to the multivariate nearest neighbor algorithm used in CANARY [HMKC07].
>>> import numpy as np
>>> import pandas as pd
>>> from scipy.spatial.distance import cdist
>>> def nearest_neighbor(data_pt, history):
... # Normalize the current data point and history using the history window
... zt = (data_pt - history.mean())/history.std()
... z = (history - history.mean())/history.std()
... # Compute the distance from the current data point to data in the history window
... zt_reshape = zt.to_frame().T
... dist = cdist(zt_reshape, z)
... # Extract the minimum distance
... min_dist = np.nanmin(dist)
... # Extract the index for the min distance and the distance components
... idx = np.nanargmin(dist)
... metadata = z.loc[idx,:] - zt
... # Determine if the min distance is less than 3, assign value (T/F) to the mask
... mask = pd.Series(min_dist <= 3, index=data_pt.index)
... return mask, metadata
>>> metadata = pm.check_custom_streaming(nearest_neighbor, window=600)
Metrics¶
Pecos includes several metrics that describe the quality control analysis or compute quantities that might be of use in the analysis. Many of these metrics aggregates over time and can be saved to track long term performance and system health.
While Pecos typically runs a series of quality control tests on raw data, quality control tests can also be run on metrics generated from these analyses to track long term performance and system health. For example, daily quality control analysis can generate summary metrics that can later be used to generate a yearly summary report. Pecos includes a performance metrics example (based on one year of PV metrics) in the examples/metrics directory.
Quality control index¶
The quality control index (QCI) is a general metric which indicates the
percent of data points that pass quality control tests.
Duplicate and non-monotonic indexes are not counted as failed tests
(duplicates are removed and non-monotonic indexes are reordered).
A value of 1 indicates that all data passed all tests.
QCI is computed for each column of data.
For example, if the data contains 720 entries and
700 pass all quality control tests, then the QCI is 700/720 = 0.972.
QCI is computed using the qci
method.
To compute QCI,
>>> QCI = pecos.metrics.qci(pm.mask)
Root mean square error¶
The root mean squared error (RMSE) is used to compare the
difference between two variables.
RMSE is computed for each column of data (note, the column names in the two data sets must be equal).
This metric is often used to compare measured to modeled data.
RMSE is computed using the rmse
method.
Time integral¶
The integral is computed using the trapezoidal rule and is computed using
the time_integral
method.
The integral is computed for each column of data.
Time derivative¶
The derivative is computed using central differences and is computed using
the time_derivative
method.
The derivative is computed for each column of data.
Probability of detection and false alarm rate¶
The probability of detection (PD) and false alarm rate (FAR) are used to evaluate how well a quality control test (or set of quality control tests) distinguishes background from anomalous conditions. PD and FAR are related to the number of true negatives, false negatives, false positives, and true positives, as shown in Figure 3. The estimated condition can be computed using results from quality control tests in Pecos, the actual condition must be supplied by the user. If actual conditions are not known, anomalous conditions can be superimposed in the raw data to generate a testing data set. A “good” quality control test (or tests) result in a PD close to 1 and FAR close to 0.
Receiver Operating Characteristic (ROC) curves are used to compare the
effectiveness of different quality control tests, as shown in Figure 4.
To generate a ROC curve, quality control test input parameters (i.e. upper
bound for a range test) are systematically adjusted.
PD and FAR are computed using the probability_of_detection
and false_alarm_rate
methods.
These metrics are computed for each column of data.

Relationship between FAR and PD.¶

Example ROC curve.¶
Composite signals¶
Composite signals are defined as data generated from existing data or from models. Composite signals can be used to add modeled data values or relationships between data columns to quality control tests.
Python facilitates a wide range of analysis options that can be incorporated into Pecos using composite signals. For example, composite signals can be created using the following methods available in open source Python packages (e.g., numpy, scipy, pandas, scikit-learn, tensorflow):
Logic/comparison
Interpolation
Filtering
Rolling window statistics
Regression
Classification
Clustering
Machine learning
Pecos can also interface with analysis run outside Python using the Python package subprocess.
Once a composite signal is created, it can be used directly within a quality control test, or compared to existing data and the residual can be used in a quality control test.
In the Simple example, a very simple ‘Wave Model’ composite signal is added to the PerformanceMonitoring object.
>>> clocktime = pecos.utils.datetime_to_clocktime(pm.data.index)
>>> wave_model = pd.DataFrame(np.sin(10*(clocktime/86400)),
... index=pm.data.index, columns=['Wave Model'])
>>> pm.add_dataframe(wave_model)
Results¶
Analysis run using Pecos results in a collection of quality control test results, quality control mask, cleaned data, and performance metrics. This information can be used to generate HTML/LaTeX reports and dashboards.
Quality control test results¶
When a quality control test fails, information is stored in:
pm.test_results
This DataFrame is updated each time a new quality control test is run. Test results includes the following information:
Variable Name: Column name in the DataFrame
Start Time: Start time of the failure
End Time: : End time of the failure
Timesteps: The number of consecutive time steps involved in the failure
Error Flag: Error messages include:
Duplicate timestamp
Nonmonotonic timestamp
Missing data (used for missing data and missing timestamp)
Corrupt data
Data < lower bound OR Data > upper bound
Increment < lower bound OR Increment > upper bound
Delta < lower bound OR Delta > upper bound
Outlier < lower bound OR Outlier > upper bound
A subset of quality control test results from the Simple example are shown below.
>>> print(pm.test_results)
Variable Name Start Time End Time Timesteps Error Flag
1 NaN 1/1/2015 5:00 1/1/2015 5:00 1 Missing timestamp
2 NaN 1/1/2015 17:00 1/1/2015 17:00 1 Duplicate timestamp
3 NaN 1/1/2015 19:30 1/1/2015 19:30 1 Nonmonotonic timestamp
4 A 1/1/2015 12:00 1/1/2015 14:30 11 Delta < lower bound, 0.0001
5 B 1/1/2015 6:30 1/1/2015 6:30 1 Data < lower bound, 0
6 B 1/1/2015 15:30 1/1/2015 15:30 1 Data > upper bound, 1
7 C 1/1/2015 7:30 1/1/2015 9:30 9 Corrupt data
Note that variable names are not recorded for timestamp test failures (Test results 1, 2, and 3).
The write_test_results
method is used to write quality control test results to a CSV file.
This method can be customized to write quality control test results to a database or to other file formats.
Quality control mask¶
Boolean mask indicating data that failed a quality control test is stored in:
pm.mask
This DataFrame is updated each time a new quality control test is run. True indicates that data pass all tests, False indicates data did not pass at least one test (or data is NaN).
Cleaned data¶
Cleaned data set is stored in:
pm.cleaned_data
This DataFrame is updated each time a new quality control test is run. Data that failed a quality control test are replaced by NaN.
Note that Pandas includes several methods to replace NaN using different replacement strategies. Generally, the best data replacement strategy must be defined on a case by case basis. Possible strategies include:
Replacing missing data using linear interpolation or other polynomial approximations
Replacing missing data using a rolling mean of the data
Replacing missing data with a data from a previous period (previous day, hour, etc.)
Replacing missing data with data from a redundant sensor
Replacing missing data with values from a model
These strategies can be accomplished using the Pandas methods interpolate
, replace
, and fillna
.
See Pandas documentation for more details.
Metrics¶
The write_metrics
method is used to write metrics that describe the quality control analysis (i.e. QCI) to a CSV file.
This method can be customized to write performance metrics to a database or to other file formats.
The method can be called multiple times to appended metrics based on the timestamp of the DataFrame.
>>> print(metrics_day1)
QCI RMSE
2018-01-01 0.871 0.952
>>> print(metrics_day2)
QCI RMSE
2018-01-02 0.755 0.845
>>> pecos.io.write_metrics(metrics_day1, 'metrics_file.csv')
>>> pecos.io.write_metrics(metrics_day2, 'metrics_file.csv')
The metrics_file.csv file will contain:
QCI RMSE
2018-01-01 0.871 0.952
2018-01-02 0.755 0.845
Monitoring reports¶
The write_monitoring_report
method is used to generate a HTML or LaTeX formatted monitoring report.
The monitoring report includes the start and end time for the data that was analyzed, custom graphics
and performance metrics, a table that includes test results, graphics associated
with the test results (highlighting data points that failed a quality control tests),
notes on runtime errors and warnings, and the configuration options
used in the analysis.
Custom Graphics: Custom graphics are created by the user for their specific application. Custom graphics can also be generated using methods in the
graphics
module. These graphics are included at the top of the report.Performance Metrics: Performance metrics are displayed in a table.
Test Results Test results contain information stored in pm.test_results. Graphics follow that display the data point(s) that caused the failure. Test results graphics are generated using the
plot_test_results
method.Notes: Notes include Pecos runtime errors and warnings. Notes include:
Empty/missing data
Formatting error in the translation dictionary
Insufficient data for a specific quality control test
Insufficient data or error when evaluating string
Configuration Options: Configuration options used in the analysis.
Figure 5 shows the monitoring report from the Simple example.

Example monitoring report.¶
Dashboards¶
To compare quality control analysis across several systems, key graphics and metrics
can be gathered in a dashboard view.
For example, the dashboard can contain multiple rows (one for each system) and multiple columns (one for each location).
The dashboard can be linked to monitoring reports and interactive graphics for more detailed information.
The write_monitoring_report
method is used to generate a HTML dashboard.
For each row and column in the dashboard, the following information can be specified
Text (i.e. general information about the system/location)
Graphics (i.e. a list of custom graphics)
Table (i.e. a Pandas DataFrame with performance metrics)
Links (i.e. the path to a monitoring report or other file/site for additional information)
The user defined text, graphics, tables, and links create custom dashboards. Pecos includes dashboard examples in the examples/dashboard directory. Figure 6, Figure 7, and Figure 8 show example dashboards generated using Pecos.

Example dashboard 1.¶

Example dashboard 2.¶

Example dashboard 3.¶
Graphics¶
The graphics
module contains several methods to plot time series data, scatter plots, heatmaps,
and interactive graphics. These methods can be used to generate graphics that are included in
monitoring reports and dashboards, or to generate stand alone graphics. The following figures
illustrate graphics created using the methods included in Pecos.
Note that many other graphing options are available using Python graphing packages directly.
Test results graphics, generated using plot_test_results
, include
time series data along with a shaded time filter and quality control test results.
The following figure shows inverter efficiency over the course of 1 day.
The gray region indicates times when sun elevation is < 20 degrees.
This region is eliminated from quality control tests. Green marks identify data points
that were flagged as changing abruptly, red marks identify data points that were outside expected range.
These graphics can be included in Monitoring reports.

Example test results graphic.¶
Day-of-year vs. time-of-day heatmaps, generated using plot_doy_heatmap
,
help identify missing data, trends, define filters and define quality control test thresholds when working with large data sets.
The following figure shows irradiance over a year with the time of sunrise and sunset for each day.
The white vertical line indicates one day of missing data.
The method plot_heatmap
creates a simple heatmaps.
These plots can be included as custom graphics in Monitoring reports and Dashboards.

Example day-of-year vs. time of day heatmap.¶
Interactive graphics, generated using plot_interactive_time series
,
are HTML graphic files which the user can scale and hover over to visualize data.
The following figure shows an image of an interactive graphic. Many more options are available,
see https://plot.ly for more details.
Interactive graphics can be linked to Dashboards.

Example interactive graphic using plotly.¶
Automation¶
Task scheduler¶
To run Pecos on an automated schedule, create a task using your operating systems. On Windows, open the Control Panel and search for Schedule Tasks. On Linux and OSX, use the cron utility.
Tasks are defined by a trigger and an action. The trigger indicates when the task should be run (i.e. Daily at 1:00 pm). The action can be set to run a batch file. A batch file (.bat or .cmd filename extension) can be easily written to start a Python script which runs Pecos. For example, the following batch file runs driver.py:
cd your_working_directory
C:\Users\username\Anaconda3\python.exe driver.py
Continuous analysis¶
The following example illustrates a framework that analyzes continuous streaming data and provides reports. For continuous data streams, it is often advantageous to provide quality control analysis and reports at a regular interval. While the analysis and reporting can occur every time new data is available, it is often more informative and more efficient to run analysis and create reports that cover a longer time interval. For example, data might be collected every minute and quality control analysis might be run every day.
The following example pulls data from an SQL database that includes a table of raw data (data), table of data that has completed quality control analysis (qc_data), and a table that stores a summary of quality control test failures (qc_summary). After the analysis, quality control results are appended to the database. This process could also include metrics that describe the quality control results. The following code could be used as a Python driver that runs using a task scheduler every day, pulling in yesterday’s data. In this example, 1 hour of cleaned data is used to initialize the moving window and a streaming outlier test is run.
>>> import pandas as pd
>>> from sqlalchemy import create_engine
>>> import datetime
>>> import pecos
>>> # Create the SQLite engine
>>> engine = create_engine('sqlite:///monitor.db', echo=False)
>>> # Define the date to extract yesterday's data
>>> date = datetime.date.today()-datetime.timedelta(days=1)
>>> # Load data and recent history from the database
>>> data = pd.read_sql("SELECT * FROM data WHERE timestamp BETWEEN '" + str(date) + \
... " 00:00:00' AND '" + str(date) + " 23:59:59';" , engine,
... parse_dates='timestamp', index_col='timestamp')
>>> history = pd.read_sql("SELECT * FROM qc_data WHERE timestamp BETWEEN '" + \
... str(date-datetime.timedelta(days=1)) + " 23:00:00' AND '" + \
... str(date-datetime.timedelta(days=1)) + " 23:59:59';" , engine,
... parse_dates='timestamp', index_col='timestamp')
>>> # Setup the PerformanceMonitoring with data and history and run a streaming outlier test
>>> pm = pecos.monitoring.PerformanceMonitoring()
>>> pm.add_dataframe(data)
>>> pm.add_dataframe(history)
>>> pm.check_outlier([-3, 3], window=3600, streaming=True)
>>> # Save the cleaned data and test results to the database
>>> pm.cleaned_data.to_sql('qc_data', engine, if_exists='append')
>>> pm.test_results.to_sql('qc_summary', engine, if_exists='append')
>>> # Create a monitoring report with test results and graphics
>>> test_results_graphics = pecos.graphics.plot_test_results(data, pm.test_results)
>>> filename = pecos.io.write_monitoring_report(pm.data, pm.test_results, test_results_graphics,
... filename='monitoring_report_'+str(date)+'.html')
Configuration file¶
A configuration file can be used to store information about the system, data, and quality control tests. The configuration file is not used directly within Pecos, therefore there are no specific formatting requirements. Configuration files can be useful when using the same Python script to analyze several systems that have slightly different input requirements.
The examples/simple directory includes a configuration file, simple_config.yml, that defines system specifications, translation dictionary, composite signals, corrupt values, and bounds for range and increment tests. The script, simple_example_using_config.py uses this configuration file to run the simple example.
Specifications:
Frequency: 900
Multiplier: 10
Translation:
Wave: [C,D]
Composite Signals:
- Wave Model: "np.sin({Multiplier}*{ELAPSED_TIME}/86400)"
- Wave Error: "np.abs(np.subtract({Wave}, {Wave Model}))"
Time Filter: "({CLOCK_TIME} > 3*3600) & ({CLOCK_TIME} < 21*3600)"
Corrupt: [-999]
Range:
B: [0, 1]
Wave: [-1, 1]
Wave Error: [None, 0.25]
Delta:
A: [0.0001, None]
B: [0.0001, None]
Wave: [0.0001, None]
Increment:
Wave: [None, 0.6]
For some use cases, it is convenient to use strings of Python code in
a configuration file to define time filters,
quality control bounds, and composite signals.
These strings can be evaluated using evaluate_string
.
WARNING this function calls
eval
. Strings of Python code should be thoroughly tested by the user.
For each {keyword} in the string, {keyword} is expanded in the following order:
If keyword is ELAPSED_TIME, CLOCK_TIME or EPOCH_TIME then data.index is converted to seconds (elapsed time, clock time, or epoch time) is used in the evaluation
If keyword is used to select a column (or columns) of data, then data[keyword] is used in the evaluation
If a translation dictionary is used to select a column (or columns) of data, then data[trans[keyword]] is used in the evaluation
If the keyword is a key in a dictionary of constants (specs), then specs[keyword] is used in the evaluation
For example, the time filter string is evaluated below.
>>> string_to_eval = "({CLOCK_TIME} > 3*3600) & ({CLOCK_TIME} < 21*3600)"
>>> time_filter = pecos.utils.evaluate_string(string_to_eval, df)
Data acquisition¶
Pecos includes basic data acquisition methods to transfer data from sensors to an SQL database. These methods require the Python packages sqlalchemy (https://www.sqlalchemy.org/) and minimalmodbus (https://minimalmodbus.readthedocs.io).
The device_to_client
method collects data from a modbus device and stores it in a local
MySQL database.
The method requires several configuration options, which are stored as a nested dictionary.
pyyaml can be used to store configuration options in a file.
The options are stored in a Client block and a Devices block.
The Devices block can define multiple devices and each device can have multiple data streams.
The configuration options are described below.
Client: A dictionary that contains information about the client. The dictionary has the following keys:
IP: IP address (string)
Database: name of database (string)
Table: name of table (string)
Username: name of user (string)
Password: password for user (string)
Interval: data collection frequency in seconds (integer)
Retries: number of retries for each channel (integer)
Devices: A list of dictionaries that contain information about each device (one dictionary per device). Each dictionary has the following keys:
Name: modbus device name (string)
USB: serial connection (string) e.g. /dev/ttyUSB0 for linux
Address: modbus slave address (string)
Baud: data transfer rate in bits per second (integer)
Parity: parity of transmitted data for error checking (string). Possible values: N, E, O
Bytes: number of data bits (integer)
Stopbits: number of stop bits (integer)
Timeout: read timeout value in seconds (integer)
Data: A list of dictionaries that contain information about each data stream (one dictionary per data stream). Each dictionary has the following keys:
Name: data name (string)
Type: data type (string)
Scale: scaling factor (integer)
Conversion: conversion factor (float)
Channel: register number (integer)
Signed: define data as unsigned or signed (bool)
Fcode: modbus function code (integer). Possible values: 3,4
Example configuration options are shown below.
Client:
IP: 127.0.0.1
Database: db_name
Table: table_name
Username: username
Password: password
Interval: 1
Retries: 2
Devices:
- Name: Device1
USB: /dev/ttyUSB0
Address: 21
Baud: 9600
Parity: N
Bytes: 8
Stopbits: 1
Timeout: 0.05
Data:
- Name: AmbientTemp
Type: Temp
Scale: 1
Conversion: 1.0
Channel: 0
Signed: True
Fcode: 4
- Name: DC Power
Type: Power
Scale: 1
Conversion: 1.0
Channel: 1
Signed: True
Fcode: 4
Custom applications¶
While Pecos was initially developed to monitor photovoltaic systems, it is designed to be used for a wide range of applications. The ability to run the analysis within the Python environment enables the use of diverse analysis options that can be incorporated into Pecos, including application specific models. The software has been used to monitor energy systems in support of several Department of Energy projects, as described below.
Photovoltaic systems¶
Pecos was originally developed at Sandia National Laboratories in 2016 to monitor photovoltaic (PV) systems as part of the
Department of Energy Regional Test Centers.
Pecos is used to run daily analysis on data collected at several sites across the US.
For PV systems, the translation dictionary can be used to group data
according to the system architecture, which can include multiple strings and modules.
The time filter can be defined based on sun position and system location.
The data objects used in Pecos are compatible with PVLIB, which can be used to model PV
systems [SHFH16].
Pecos also includes functions to compute PV specific metrics (i.e. insolation,
performance ratio, clearness index) in the pv
module.
The International Electrotechnical Commission (IEC) has developed guidance to measure
and analyze energy production from PV systems.
Klise et al. [KlSC17] describe an application of IEC 61724-3, using
Pecos and PVLIB.
Pecos includes a PV system example in the examples/pv directory.
Marine renewable energy systems¶
In partnership with National Renewable Energy Laboratory (NREL) and Pacific Northwest National Laboratory (PNNL), Pecos was integrated into the Marine and Hydrokinetic Toolkit (MHKiT) to support research funded by the Department of Energy’s Water Power Technologies Office. MHKiT provides provides the marine renewable energy (MRE) community with tools for data quality control, resource assessment, and device performance which adhere to the International Electrotechnical Commission (IEC) Technical Committee’s, IEC TC 114. Pecos provides a quality control analysis on data collected from MRE systems including wave, tidal, and river systems.
Fossil energy systems¶
In partnership with National Energy Technology Laboratory (NETL), Pecos was extended to demonstrate real-time monitoring of coal-fired power plants in support of the Department of Energy’s Institute for the Design of Advanced Energy Systems (IDAES). As part of this demonstration, streaming algorithms were added to Pecos to facilitate near real-time analysis using continuous data streams.
Copyright and license¶
Pecos is copyright through Sandia National Laboratories. The software is distributed under the Revised BSD License. Pecos also leverages a variety of third-party software packages, which have separate licensing policies.
Copyright¶
Copyright 2016 National Technology & Engineering Solutions of Sandia,
LLC (NTESS). Under the terms of Contract DE-NA0003525 with NTESS, the U.S.
Government retains certain rights in this software.
Revised BSD license¶
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:
* Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
* Neither the name of Sandia National Laboratories, nor the names of
its contributors may be used to endorse or promote products derived from
this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED
TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Release Notes¶
v0.2.1 (main)¶
Bug fix in custom static and streaming quality control tests to use a specific column
Minor updates for testing and documentation
Added GitHub Actions and Python 3.9 tests
v0.2.0 (March 5, 2021)¶
Replaced the use of Excel files in examples/tests with CSV files. The Excel files were causing test failures.
Added min_failures to the streaming outlier test
Replaced mutable default arguments with None
Removed pecos logo from monitoring reports
Added timestamp to logger
Updated documentation and tests
v0.1.9 (November 2, 2020)¶
Added the ability to use custom quality control test functions in static or streaming analysis. The methods,
check_custom_static
andcheck_custom_streaming
, allow the user to supply a custom function that is used to determine if data is anomalous. The custom tests also allow the user to return metadata that contains information about the quality control test.The streaming analysis loops through the data using a moving window to determine if data point is normal or anomalous. If the data point is deemed anomalous it is omitted from the history and not used to determine the status of subsequent data points.
The static analysis operates on the entire data set, and while it can include operations like moving windows, it does not update the history based on the test results.
The following input arguments were changed or added:
In
check_outlier
, the input argument window was changed to None (not used), absolute value was changed to False, and an input argument streaming was added to use streaming analysis (default value is False). Changed the order of key and window to be more consistent with other quality control tests.In
check_delta
, the input argument window is no longer optional
Added property
data
to the PerformanceMonitoring class. pm.data is equivalent to pm.df (pm.df was retained for backward compatibility)Added the ability to create monitoring reports using a LaTeX template. Small changes in the HTML template.
Added the option to specify a date format string to timeseries plots.
Fixed a bug in the way masks are generated. Data points that have Null values were being assigned to False, indicating that a quality control test failed. Null values are now assumed to be True, unless a specific test fails (e.g. check_missing).
Updated the boolean mask used in the code to have a consistent definition (True = data point pass all tests, False = data point did not pass at least one test.)
Added an example in the docs to illustrate analysis of continuous data
Added Python 3.8 tests
Updated documentation and tests
v0.1.8 (January 9, 2020)¶
Added properties to the PerformanceMonitoring object to return the following:
Boolean mask,
pm.mask
. Indicates data that failed a quality control test. This replaces the method ``get_test_results_mask``(API change).Cleaned data,
pm.cleaned_data
. Data that failed a quality control test are replaced by NaN.
Added the ability to run quality control tests as individual functions. These functions allow the user to use Pecos without creating a PerformanceMonitoring object. Each function returns cleaned data, a boolean mask, and a summary of quality control test results.
io and graphics functions were updated to use specific components of the PerformanceMonitoring class (instead of requiring an instance of the class). This changes the API for
write_monitoring_report
,write_dashboard
, andplot_test_results
.Filenames are now an optional parameter in io and graphics functions, this changes the API for
write_metrics
,write_test_results
, andplot_test_results
.Updated metrics:
Added time_derivative which returns a derivative time series for each column of data
qci
,rmse
,time_integral
,probability_of_detection
, andfalse_alarm_rate
now return 1 value per column of data (API change)pv metrics were also updated to return 1 value per column (API change)
Deprecated per_day option. Data can be grouped by custom time intervals before computing metrics (API change)
Efficiency improvements to
check_delta
. As part of these changes, the optional input argument absolute_value has been removed and direction has been added (API change). If direction is set to positive, then the test only identify positive deltas (the min occurs before the max). If direction is set to negative, then the test only identify negative deltas (the max occurs before the min).Timestamp indexes down to millisecond resolution are supported
Added additional helper functions in pecos.utils to convert to/from datetime indexes. Methods
get_elapsed_time
andget_clock_time
were removed from the PerformanceMonitoring class (API change).Moved functionality to evaluate strings from the PerformanceMonitoring class into a stand alone utility function (API change).
Removed option to smooth data using a rolling mean within the quality control tests (API change). Preprocessing steps should be done before the quality control test is run.
Added Python 3.7 tests, dropped Python 2.7 and 3.5 tests
Updated examples, tests, and documentation
v0.1.7 (June 2, 2018)¶
Added quality control test to identify outliers.
Bug fix to allow for sub-second data frequency.
Dropped ‘System Name’ from the analysis and test results, this added assumptions about column names in the code.
Changed ‘Start Date’ and ‘End Date’ to ‘Start Time’ and ‘End Time’ in the test results.
New data added to a PerformanceMonitoring object using add_dataframe now overrides existing data if there are shared indexes and columns.
Removed add_signal method, use add_dataframe instead.
Adding a translation dictionary to the analysis is now optional. A 1:1 map of column names is generated when data is added to the PerformanceMonitoring object using add_dataframe.
Added Python 3.6 tests.
Removed Python 3.4 tests (Pandas dropped support for Python 3.4 with version 0.21.0).
Updates to check_range requires Pandas 0.23.0.
Updated documentation, added doctests.
v0.1.6 (August 14, 2017)¶
Added readme and license file to manifest to fix pip install
v0.1.5 (June 23, 2017)¶
Added ability to check for regular or irregular timestamps in check_timestamp.
Added probability of detection and false alarm metrics.
Added check_delta method to check bounds on the difference between max and min data values within a rolling window.
Added graphics method to create interactive graphics using plotly.
Added graphics method to create day-of-year heatmaps.
Method named plot_colorblock changed to plot_heatmap (API change).
Added data acquisition method to transfer data from sensors to an SQL database.
Added dashboard example that uses Pandas Styling to color code tables based on values.
Added graphics tests.
Updated documentation.
v0.1.4 (December 15, 2016)¶
Some of the changes in this release are not backward compatible:
Added capability to allow multiple html links in dashboards (API change).
Updated send_email function to use smptlib (API change).
Added additional options in html reports to set figure size and image width.
Bug fix setting axis limits in figures.
Bug fix for reporting duplicate time steps.
Improved efficiency for get_clock_time function.
Added dashboard example that uses color blocks to indicate number of test failures.
Removed basic_pvlib_performance_model, the pv_example now uses pvlib directly to compute a basic model (API change).
v0.1.3 (August 2, 2016)¶
This is a minor release, changes include:
Bug fix for DataFrames using timezones. There was an issue retaining the timezone across the entire pecos analysis chain. The timezone was not stored properly in the testing results. This is a known pandas bug. The fix in Pecos includes stronger tests for analysis that use timezones.
The use of Jinja for html report templates
Cleaned up examples
v0.1.2 (June 6, 2016)¶
This is a minor release, changes include:
Minor changes to the software to support Python 3.4 and 3.5
Default image format changed from jpg to png
Datatables format options added to dashboards
Additional testing
v0.1.1 (May 6, 2016)¶
This is a minor release, changes include:
Added a pv module, includes basic methods to compute energy, insolation, performance ratio, performance index, energy yield, clearness index, and a basic pv performance model.
Added method to compute time integral and RMSE to metrics module
Cleaned up examples, API, and documentation
Software test harness run through TravisCI and analyzed using Coveralls
Documentation hosted on readthedocs
v0.1.0 (March 31, 2016)¶
This is the first official release of Pecos. Features include:
PerformanceMonitoring class used to run quality control tests and store results. The class includes the ability to add time filters and translation dictionaries. Quality control tests include checks for timestamp, missing and corrupt data, data out of range, and increment data out of range.
Quality control index used to quantify test failures
HTML report templates for monitoring reports and dashboards
Graphics capabilities
Basic tutorials
Preliminary software test harness, run using nosetests
Basic user manual including API documentation
Developers¶
The following services are used for software quality assurance:
The software repository is hosted on GitHub at https://github.com/sandialabs/pecos.
Automated testing is run using GitHub Actions at https://github.com/sandialabs/pecos/actions.
Test coverage statistics are collected using Coveralls at https://coveralls.io/github/sandialabs/pecos.
The current release is hosted on PyPI at https://pypi.python.org/pypi/pecos.
Tests can be run locally using nosetests:
nosetests -v --with-coverage --cover-package=pecos pecos
Software developers are expected to follow standard practices to document and test new code. Pull requests will be reviewed by the core development team. See https://github.com/sandialabs/pecos/graphs/contributors for a list of contributors.
pecos package¶
Submodules¶
pecos.monitoring module¶
The monitoring module contains the PerformanceMonitoring class used to run quality control tests and store results. The module also contains individual functions that can be used to run quality control tests.
- class pecos.monitoring.PerformanceMonitoring[source]¶
Bases:
object
PerformanceMonitoring class
- property data¶
Data used in quality control analysis, added to the PerformanceMonitoring object using
add_dataframe
.
- property mask¶
Boolean mask indicating if data that failed a quality control test. True = data point pass all tests, False = data point did not pass at least one test.
- property cleaned_data¶
Cleaned data set, data that failed a quality control test are replaced by NaN.
- add_dataframe(data)[source]¶
Add data to the PerformanceMonitoring object
- Parameters
data (pandas DataFrame) – Data to add to the PerformanceMonitoring object, indexed by datetime
- add_translation_dictionary(trans)[source]¶
Add translation dictionary to the PerformanceMonitoring object
- Parameters
trans (dictionary) – Translation dictionary
- add_time_filter(time_filter)[source]¶
Add a time filter to the PerformanceMonitoring object
- Parameters
time_filter (pandas DataFrame with a single column or pandas Series) – Time filter containing boolean values for each time index True = keep time index in the quality control results. False = remove time index from the quality control results.
- check_timestamp(frequency, expected_start_time=None, expected_end_time=None, min_failures=1, exact_times=True)[source]¶
Check time series for missing, non-monotonic and duplicate timestamps
- Parameters
frequency (int or float) – Expected time series frequency, in seconds
expected_start_time (Timestamp, optional) – Expected start time. If not specified, the minimum timestamp is used
expected_end_time (Timestamp, optional) – Expected end time. If not specified, the maximum timestamp is used
min_failures (int, optional) – Minimum number of consecutive failures required for reporting, default = 1
exact_times (bool, optional) – Controls how missing times are checked. If True, times are expected to occur at regular intervals (specified in frequency) and the DataFrame is reindexed to match the expected frequency. If False, times only need to occur once or more within each interval (specified in frequency) and the DataFrame is not reindexed.
- check_range(bound, key=None, min_failures=1)[source]¶
Check for data that is outside expected range
- Parameters
bound (list of floats) – [lower bound, upper bound], None can be used in place of a lower or upper bound
key (string, optional) – Data column name or translation dictionary key. If not specified, all columns are used in the test.
min_failures (int, optional) – Minimum number of consecutive failures required for reporting, default = 1
- check_increment(bound, key=None, increment=1, absolute_value=True, min_failures=1)[source]¶
Check data increments using the difference between values
- Parameters
bound (list of floats) – [lower bound, upper bound], None can be used in place of a lower or upper bound
key (string, optional) – Data column name or translation dictionary key. If not specified, all columns are used in the test.
increment (int, optional) – Time step shift used to compute difference, default = 1
absolute_value (boolean, optional) – Use the absolute value of the increment data, default = True
min_failures (int, optional) – Minimum number of consecutive failures required for reporting, default = 1
- check_delta(bound, window, key=None, direction=None, min_failures=1)[source]¶
Check for stagnant data and/or abrupt changes in the data using the difference between max and min values (delta) within a rolling window
- Parameters
bound (list of floats) – [lower bound, upper bound], None can be used in place of a lower or upper bound
window (int or float) – Size of the rolling window (in seconds) used to compute delta
key (string, optional) – Data column name or translation dictionary key. If not specified, all columns are used in the test.
direction (str, optional) –
Options = ‘positive’, ‘negative’, or None
If direction is positive, then only identify positive deltas (the min occurs before the max)
If direction is negative, then only identify negative deltas (the max occurs before the min)
If direction is None, then identify both positive and negative deltas
min_failures (int, optional) – Minimum number of consecutive failures required for reporting, default = 1
- check_outlier(bound, window=None, key=None, absolute_value=False, streaming=False, min_failures=1)[source]¶
Check for outliers using normalized data within a rolling window
The upper and lower bounds are specified in standard deviations. Data normalized using (data-mean)/std.
- Parameters
bound (list of floats) – [lower bound, upper bound], None can be used in place of a lower or upper bound
window (int or float, optional) – Size of the rolling window (in seconds) used to normalize data, If window is set to None, data is normalized using the entire data sets mean and standard deviation (column by column). default = None.
key (string, optional) – Data column name or translation dictionary key. If not specified, all columns are used in the test.
absolute_value (boolean, optional) – Use the absolute value the normalized data, default = True
streaming (boolean, optional) – Indicates if streaming analysis should be used, default = False
min_failures (int, optional) – Minimum number of consecutive failures required for reporting, default = 1
- check_missing(key=None, min_failures=1)[source]¶
Check for missing data
- Parameters
key (string, optional) – Data column name or translation dictionary key. If not specified, all columns are used in the test.
min_failures (int, optional) – Minimum number of consecutive failures required for reporting, default = 1
- check_corrupt(corrupt_values, key=None, min_failures=1)[source]¶
Check for corrupt data
- Parameters
corrupt_values (list of int or floats) – List of corrupt data values
key (string, optional) – Data column name or translation dictionary key. If not specified, all columns are used in the test.
min_failures (int, optional) – Minimum number of consecutive failures required for reporting, default = 1
- check_custom_static(quality_control_func, key=None, min_failures=1, error_message=None)[source]¶
Use custom functions that operate on the entire dataset at once to perform quality control analysis
- Parameters
quality_control_func (function) – Function that operates on self.df and returns a mask and metadata
key (string, optional) – Data column name or translation dictionary key. If not specified, all columns are used in the test.
min_failures (int, optional) – Minimum number of consecutive failures required for reporting, default = 1
error_message (str, optional) – Error message
- check_custom_streaming(quality_control_func, window, key=None, rebase=None, min_failures=1, error_message=None)[source]¶
Check for anomolous data using a streaming framework which removes anomolous data from the history after each timestamp. A custom quality control function is supplied by the user to determine if the data is anomolous.
- Parameters
quality_control_func (function) – Function that determines if the last data point is normal or anomalous. Returns a mask and metadata for the last data point.
window (int or float) – Size of the rolling window (in seconds) used to define history If window is set to None, data is normalized using the entire data sets mean and standard deviation (column by column).
key (string, optional) – Data column name or translation dictionary key. If not specified, all columns are used in the test.
rebase (int, float, or None) – Value between 0 and 1 that indicates the fraction of default = None.
min_failures (int, optional) – Minimum number of consecutive failures required for reporting, default = 1
error_message (str, optional) – Error message
- pecos.monitoring.check_timestamp(data, frequency, expected_start_time=None, expected_end_time=None, min_failures=1, exact_times=True)[source]¶
Check time series for missing, non-monotonic and duplicate timestamps
- Parameters
data (pandas DataFrame) – Data used in the quality control test, indexed by datetime
frequency (int or float) – Expected time series frequency, in seconds
expected_start_time (Timestamp, optional) – Expected start time. If not specified, the minimum timestamp is used
expected_end_time (Timestamp, optional) – Expected end time. If not specified, the maximum timestamp is used
min_failures (int, optional) – Minimum number of consecutive failures required for reporting, default = 1
exact_times (bool, optional) – Controls how missing times are checked. If True, times are expected to occur at regular intervals (specified in frequency) and the DataFrame is reindexed to match the expected frequency. If False, times only need to occur once or more within each interval (specified in frequency) and the DataFrame is not reindexed.
- Returns
dictionary – Results include cleaned data, mask, and test results summary
- pecos.monitoring.check_range(data, bound, key=None, min_failures=1)[source]¶
Check for data that is outside expected range
- Parameters
data (pandas DataFrame) – Data used in the quality control test, indexed by datetime
bound (list of floats) – [lower bound, upper bound], None can be used in place of a lower or upper bound
key (string, optional) – Data column name or translation dictionary key. If not specified, all columns are used in the test.
min_failures (int, optional) – Minimum number of consecutive failures required for reporting, default = 1
- Returns
dictionary – Results include cleaned data, mask, and test results summary
- pecos.monitoring.check_increment(data, bound, key=None, increment=1, absolute_value=True, min_failures=1)[source]¶
Check data increments using the difference between values
- Parameters
data (pandas DataFrame) – Data used in the quality control test, indexed by datetime
bound (list of floats) – [lower bound, upper bound], None can be used in place of a lower or upper bound
key (string, optional) – Data column name or translation dictionary key. If not specified, all columns are used in the test.
increment (int, optional) – Time step shift used to compute difference, default = 1
absolute_value (boolean, optional) – Use the absolute value of the increment data, default = True
min_failures (int, optional) – Minimum number of consecutive failures required for reporting, default = 1
- Returns
dictionary – Results include cleaned data, mask, and test results summary
- pecos.monitoring.check_delta(data, bound, window, key=None, direction=None, min_failures=1)[source]¶
Check for stagnant data and/or abrupt changes in the data using the difference between max and min values (delta) within a rolling window
- Parameters
data (pandas DataFrame) – Data used in the quality control test, indexed by datetime
bound (list of floats) – [lower bound, upper bound], None can be used in place of a lower or upper bound
window (int or float) – Size of the rolling window (in seconds) used to compute delta
key (string, optional) – Data column name or translation dictionary key. If not specified, all columns are used in the test.
direction (str, optional) –
Options = ‘positive’, ‘negative’, or None
If direction is positive, then only identify positive deltas (the min occurs before the max)
If direction is negative, then only identify negative deltas (the max occurs before the min)
If direction is None, then identify both positive and negative deltas
min_failures (int, optional) – Minimum number of consecutive failures required for reporting, default = 1
- Returns
dictionary – Results include cleaned data, mask, and test results summary
- pecos.monitoring.check_outlier(data, bound, window=None, key=None, absolute_value=False, streaming=False, min_failures=1)[source]¶
Check for outliers using normalized data within a rolling window
The upper and lower bounds are specified in standard deviations. Data normalized using (data-mean)/std.
- Parameters
data (pandas DataFrame) – Data used in the quality control test, indexed by datetime
bound (list of floats) – [lower bound, upper bound], None can be used in place of a lower or upper bound
window (int or float, optional) – Size of the rolling window (in seconds) used to normalize data, If window is set to None, data is normalized using the entire data sets mean and standard deviation (column by column). default = None.
key (string, optional) – Data column name or translation dictionary key. If not specified, all columns are used in the test.
absolute_value (boolean, optional) – Use the absolute value the normalized data, default = True
streaming (boolean, optional) – Indicates if streaming analysis should be used, default = False
min_failures (int, optional) – Minimum number of consecutive failures required for reporting, default = 1
- Returns
dictionary – Results include cleaned data, mask, and test results summary
- pecos.monitoring.check_missing(data, key=None, min_failures=1)[source]¶
Check for missing data
- Parameters
data (pandas DataFrame) – Data used in the quality control test, indexed by datetime
key (string, optional) – Data column name or translation dictionary key. If not specified, all columns are used in the test.
min_failures (int, optional) – Minimum number of consecutive failures required for reporting, default = 1
- Returns
dictionary – Results include cleaned data, mask, and test results summary
- pecos.monitoring.check_corrupt(data, corrupt_values, key=None, min_failures=1)[source]¶
Check for corrupt data
- Parameters
data (pandas DataFrame) – Data used in the quality control test, indexed by datetime
corrupt_values (list of int or floats) – List of corrupt data values
key (string, optional) – Data column name or translation dictionary key. If not specified, all columns are used in the test.
min_failures (int, optional) – Minimum number of consecutive failures required for reporting, default = 1
- Returns
dictionary – Results include cleaned data, mask, and test results summary
- pecos.monitoring.check_custom_static(data, quality_control_func, key=None, min_failures=1, error_message=None)[source]¶
Use custom functions that operate on the entire dataset at once to perform quality control analysis
- Parameters
data (pandas DataFrame) – Data used in the quality control test, indexed by datetime
quality_control_func (function) – Function that operates on self.df and returns a mask and metadata
key (string, optional) – Data column name or translation dictionary key. If not specified, all columns are used in the test.
min_failures (int, optional) – Minimum number of consecutive failures required for reporting, default = 1
error_message (str, optional) – Error message
- Returns
dictionary – Results include cleaned data, mask, test results summary, and metadata
- pecos.monitoring.check_custom_streaming(data, quality_control_func, window, key=None, rebase=None, min_failures=1, error_message=None)[source]¶
Check for anomolous data using a streaming framework which removes anomolous data from the history after each timestamp. A custom quality control function is supplied by the user to determine if the data is anomolous.
- Parameters
data (pandas DataFrame) – Data used in the quality control test, indexed by datetime
quality_control_func (function) – Function that determines if the last data point is normal or anomalous. Returns a mask and metadata for the last data point.
window (int or float) – Size of the rolling window (in seconds) used to define history If window is set to None, data is normalized using the entire data sets mean and standard deviation (column by column).
key (string, optional) – Data column name or translation dictionary key. If not specified, all columns are used in the test.
rebase (int, float, or None) – Value between 0 and 1 that indicates the fraction of default = None.
min_failures (int, optional) – Minimum number of consecutive failures required for reporting, default = 1
error_message (str, optional) – Error message
- Returns
dictionary – Results include cleaned data, mask, test results summary, and metadata
pecos.metrics module¶
The metrics module contains metrics that describe the quality control analysis or compute quantities that might be of use in the analysis
- pecos.metrics.qci(mask, tfilter=None)[source]¶
Compute the quality control index (QCI) for each column, defined as:
\(QCI=\dfrac{\sum_{t\in T}X_{dt}}{|T|}\)
where \(T\) is the set of timestamps in the analysis. \(X_{dt}\) is a data point for column \(d\) time t` that passed all quality control test. \(|T|\) is the number of data points in the analysis.
- Parameters
mask (pandas DataFrame) – Test results mask, returned from pm.mask
tfilter (pandas Series, optional) – Time filter containing boolean values for each time index
- Returns
pandas Series – Quality control index
- pecos.metrics.rmse(data1, data2, tfilter=None)[source]¶
Compute the root mean squared error (RMSE) for each column, defined as:
\(RMSE=\sqrt{\dfrac{\sum{(data_1-data_2)^2}}{n}}\)
where \(data_1\) is a time series, \(data_2\) is a time series, and \(n\) is a number of data points.
- Parameters
data1 (pandas DataFrame) – Data
data2 (pandas DataFrame) – Data. Note, the column names in data1 must equal the column names in data2
tfilter (pandas Series, optional) – Time filter containing boolean values for each time index
- Returns
pandas Series – Root mean squared error
- pecos.metrics.time_integral(data, tfilter=None)[source]¶
Compute the time integral (F) for each column, defined as:
\(F=\int{fdt}\)
where \(f\) is a column of data \(dt\) is the time step between observations. The integral is computed using the trapezoidal rule from numpy.trapz. Results are given in [original data units]*seconds. NaN values are set to 0 for integration.
- Parameters
data (pandas DataFrame) – Data
tfilter (pandas Series, optional) – Time filter containing boolean values for each time index
- Returns
pandas Series – Integral
- pecos.metrics.time_derivative(data, tfilter=None)[source]¶
Compute the derivative (f’) of each column, defined as:
\(f'=\dfrac{df}{dt}\)
where \(f\) is a column of data \(dt\) is the time step between observations. The derivative is computed using central differences from numpy.gradient. Results are given in [original data units]/seconds.
- Parameters
data (pandas DataFrame) – Data
tfilter (pandas Series, optional) – Filter containing boolean values for each time index
- Returns
pandas DataFrame – Derivative of the data
- pecos.metrics.probability_of_detection(observed, actual, tfilter=None)[source]¶
Compute probability of detection (PD) for each column, defined as:
\(PD=\dfrac{TP}{TP+FN}\)
where \(TP\) is number of true positives and \(FN\) is the number of false negatives.
- Parameters
observed (pandas DataFrame) – Estimated conditions (True = background, False = anomalous), returned from pm.mask
actual (pandas DataFrame) – Actual conditions, (True = background, False = anomalous). Note, the column names in observed must equal the column names in actual
tfilter (pandas Series, optional) – Filter containing boolean values for each time index
- Returns
pandas Series – Probability of detection
- pecos.metrics.false_alarm_rate(observed, actual, tfilter=None)[source]¶
Compute false alarm rate (FAR) for each column, defined as:
\(FAR=\dfrac{TN}{TN+FP}\)
where \(TN\) is number of true negatives and \(FP\) is the number of false positives.
- Parameters
estimated (pandas DataFrame) – Estimated conditions (True = background, False = anomalous), returned from pm.mask
actual (pandas DataFrame) – Actual conditions, (True = background, False = anomalous). Note, the column names in observed must equal the column names in actual.
tfilter (pandas Series, optional) – Filter containing boolean values for each time index
- Returns
pandas Series – False alarm rate
pecos.io module¶
The io module contains functions to read/send data and write results to files/html reports.
- pecos.io.read_campbell_scientific(filename, index_col='TIMESTAMP', encoding=None)[source]¶
Read Campbell Scientific CSV file.
- Parameters
filename (string) – File name
index_col (string, optional) – Index column name, default = ‘TIMESTAMP’
encoding (string, optional) – Character encoding (i.e. utf-16)
- Returns
pandas DataFrame – Data
- pecos.io.send_email(subject, body, recipient, sender, attachment=None, host='localhost', username=None, password=None)[source]¶
Send email using Python smtplib and email packages.
- Parameters
subject (string) – Subject text
body (string) – Email body, in HTML or plain format
recipient (list of string) – Recipient email address or addresses
sender (string) – Sender email address
attachment (string, optional) – Name of file to attach
host (string, optional) – Name of email host (or host:port), default = ‘localhost’
username (string, optional) – Email username for authentication
password (string, optional) – Email password for authentication
- pecos.io.write_metrics(metrics, filename='metrics.csv')[source]¶
Write metrics file.
- Parameters
metrics (pandas DataFrame) – Data to add to the metrics file
filename (string, optional) – File name. If the full path is not provided, the file is saved into the current working directory. By default, the file is named ‘metrics.csv’
- Returns
string – filename
- pecos.io.write_test_results(test_results, filename='test_results.csv')[source]¶
Write test results file.
- Parameters
test_results (pandas DataFrame) – Summary of the quality control test results (pm.test_results)
filename (string, optional) – File name. If the full path is not provided, the file is saved into the current working directory. By default, the file is named ‘test_results.csv’
- Returns
string – filename
- pecos.io.write_monitoring_report(data, test_results, test_results_graphics=None, custom_graphics=None, metrics=None, title='Pecos Monitoring Report', config=None, logo=False, im_width_test_results=1, im_width_custom=1, im_width_logo=0.1, encode=False, file_format='html', filename='monitoring_report.html')[source]¶
Generate a monitoring report. The monitoring report is used to report quality control test results for a single system. The report includes custom graphics, performance metrics, and test results.
- Parameters
data (pandas DataFrame) – Data, indexed by time (pm.data)
test_results (pandas DataFrame) – Summary of the quality control test results (pm.test_results)
test_results_graphics (list of strings or None, optional) – Graphics files, with full path. These graphics highlight data points that failed a quality control test, created using pecos.graphics.plot_test_results(). If None, test results graphics are not included in the report.
custom_graphics (list of strings or None, optional) – Custom files, with full path. Created by the user. If None, custom graphics are not included in the report.
metrics (pandas Series or DataFrame, optional) – Performance metrics to add as a table to the monitoring report
title (string, optional) – Monitoring report title, default = ‘Pecos Monitoring Report’
config (dictionary or None, optional) – Configuration options, to be printed at the end of the report. If None, configuration options are not included in the report.
logo (string, optional) – Graphic to be added to the report header
im_width_test_results (float, optional) – Image width as a fraction of page size, for test results graphics, default = 1
im_width_custom (float, optional) – Image width as a fraction of page size, for custom graphics, default = 1
im_width_logo (float, optional) – Image width as a fraction of page size, for the logo, default = 0.1
encode (boolean, optional) – Encode graphics in the html, default = False
filename (string, optional) – File name. If the full path is not provided, the file is saved into the current working directory. By default, the file is named ‘monitoring_report.html’
- Returns
string – filename
- pecos.io.write_dashboard(column_names, row_names, content, title='Pecos Dashboard', footnote='', logo=False, im_width=250, datatables=False, encode=False, filename='dashboard.html')[source]¶
Generate a dashboard. The dashboard is used to compare results across multiple systems. Each cell in the dashboard includes custom system graphics and metrics.
- Parameters
column_names (list of strings) – Column names listed in the order they should appear in the dashboard, i.e. [‘location1’, ‘location2’]
row_names (list of strings) – Row names listed in the order they should appear in the dashboard, i.e. [‘system1’, ‘system2’]
content (dictionary) –
Dashboard content for each cell.
Dictionary keys are tuples indicating the row name and column name, i.e. (‘row name’, ‘column name’), where ‘row name’ is in the list row_names and ‘column name’ is in the list column_names.
For each key, another dictionary is defined that contains the content to be included in each cell of the dashboard. Each cell can contain text, graphics, a table, and an html link. These are defined using the following keys:
text (string) = text at the top of each cell
graphics (list of strings) = a list of graphics file names. Each file name includes the full path
table (string) = a table in html format, for example a table of performance metrics. DataFrames can be converted to an html string using df.to_html() or df.transpose().to_html(). Values in the table can be color coded using pandas Styler class.
link (dict) = a dictionary where keys define the name of the link and values define the html link (with full path)
For example:
content = {('row name', 'column name'): { 'text': 'text at the top', 'graphic': ['C:\\pecos\\results\\custom_graphic.png'], 'table': df.to_html(), 'link': {'Link to monitoring report': 'C:\\pecos\\results\\monitoring_report.html'}}
title (string, optional) – Dashboard title, default = ‘Pecos Dashboard’
footnote (string, optional) – Text to be added to the end of the report
logo (string, optional) – Graphic to be added to the report header
im_width (float, optional) – Image width in the HTML report, default = 250
datatables (boolean, optional) – Use datatables.net to format the dashboard, default = False. See https://datatables.net/ for more information.
encode (boolean, optional) – Encode graphics in the html, default = False
filename (string, optional) – File name. If the full path is not provided, the file is saved into the current working directory. By default, the file is named ‘dashboard.html’
- Returns
string – filename
- pecos.io.device_to_client(config)[source]¶
Read channels on modbus device, scale and calibrate the values, and store the data in a MySQL database. The inputs are provided by a configuration dictionary that describe general information for data acquisition and the devices.
- Parameters
config (dictionary) – Configuration options, see Data acquisition
pecos.graphics module¶
The graphics module contains functions to generate scatter, time series, and heatmap plots for reports.
- pecos.graphics.plot_scatter(x, y, xaxis_min=None, xaxis_max=None, yaxis_min=None, yaxis_max=None, title=None, figsize=(7.0, 3.0))[source]¶
Create a scatter plot. If x and y have the same number of columns, then the columns of x are plotted against the corresponding columns of y, in order. If x (or y) has 1 column, then that column of data is plotted against all the columns in y (or x).
- Parameters
x (pandas DataFrame) – X data
y (pandas DataFrame) – Y data
xaxis_min (float, optional) – X-axis minimum, default = None (autoscale)
xaxis_max (float, optional) – X-axis maximum, default = None (autoscale)
yaxis_min (float, optional) – Y-axis minimum, default = None (autoscale)
yaxis_max (float, optional) – Y-axis maximum, default = None (autoscale)
title (string, optional) – Title, default = None
figsize (tuple, optional) – Figure size, default = (7.0, 3.0)
- pecos.graphics.plot_timeseries(data, tfilter=None, test_results_group=None, xaxis_min=None, xaxis_max=None, yaxis_min=None, yaxis_max=None, title=None, figsize=(7.0, 3.0), date_formatter=None)[source]¶
Create a time series plot using each column in the DataFrame.
- Parameters
data (pandas DataFrame or Series) – Data, indexed by time
tfilter (pandas Series, optional) – Boolean values used to include time filter in the plot, default = None
test_results_group (pandas DataFrame, optional) – Test results for the data default = None
xaxis_min (float, optional) – X-axis minimum, default = None (autoscale)
xaxis_max (float, optional) – X-axis maximum, default = None (autoscale)
yaxis_min (float, optional) – Y-axis minimum, default = None (autoscale)
yaxis_max (float, optional) – Y-axis maximum, default = None (autoscale)
title (string, optional) – Title, default = None
figsize (tuple, optional) – Figure size, default = (7.0, 3.0)
date_formatter (string, optional) – Date formatter used on the x axis, for example, “%m-%d”. Default = None
- pecos.graphics.plot_interactive_timeseries(data, xaxis_min=None, xaxis_max=None, yaxis_min=None, yaxis_max=None, title=None, filename=None, auto_open=True)[source]¶
Create a basic interactive time series graphic using plotly. Many more options are available, see https://plot.ly for more details.
- Parameters
data (pandas DataFrame) – Data, indexed by time
xaxis_min (float, optional) – X-axis minimum, default = None (autoscale)
xaxis_max (float, optional) – X-axis maximum, default = None (autoscale)
yaxis_min (float, optional) – Y-axis minimum, default = None (autoscale)
yaxis_max (float, optional) – Y-axis maximum, default = None (autoscale)
title (string, optional) – Title, default = None
filename (string, optional) – HTML file name, default = None (file will be named temp-plot.html)
auto_open (boolean, optional) – Flag indicating if HTML graphic is opened, default = True
- pecos.graphics.plot_heatmap(data, colors=None, nColors=12, cmap=None, vmin=None, vmax=None, show_axis=False, title=None, figsize=(5.0, 5.0))[source]¶
Create a heatmap. Default color scheme is red to yellow to green with 12 colors. This function can be used to generate dashboards with simple color indicators in each cell (to remove borders use bbox_inches=’tight’ and pad_inches=0 when saving the image).
- Parameters
data (pandas DataFrame, pandas Series, or numpy array) – Data
colors (list or None, optional) – List of colors, colors can be specified in any way understandable by matplotlib.colors.ColorConverter.to_rgb(). If None, colors transitions from red to yellow to green.
num_colors (int, optional) – Number of colors in the colormap, default = 12
cmap (string, optional) – Colormap, default = None. Overrides colors and num_colors listed above.
vmin (float, optional) – Colomap minimum, default = None (autoscale)
vmax (float, optional) – Colomap maximum, default = None (autoscale)
title (string, optional) – Title, default = None
figsize (tuple, optional) – Figure size, default = (5.0, 5.0)
- pecos.graphics.plot_doy_heatmap(data, cmap='nipy_spectral', vmin=None, vmax=None, overlay=None, title=None, figsize=(7.0, 3.0))[source]¶
Create a day-of-year (X-axis) vs. time-of-day (Y-axis) heatmap.
- Parameters
data (pandas DataFrame or pandas Series) – Data (single column), indexed by time
cmap (string, optional) – Colomap, default = nipy_spectral
vmin (float, optional) – Colomap minimum, default = None (autoscale)
vmax (float, optional) – Colomap maximum, default = None (autoscale)
overlay (pandas DataFrame, optional) – Data to overlay on the heatmap. Time index should be in day-of-year (X-axis) Values should be in time-of-day in minutes (Y-axis)
title (string, optional) – Title, default = None
figsize (tuple, optional) – Figure size, default = (7.0, 3.0)
- pecos.graphics.plot_test_results(data, test_results, tfilter=None, image_format='png', dpi=500, figsize=(7.0, 3.0), date_formatter=None, filename_root='test_results')[source]¶
Create test results graphics which highlight data points that failed a quality control test.
- Parameters
data (pandas DataFrame) – Data, indexed by time (pm.data)
test_results (pandas DataFrame) – Summary of the quality control test results (pm.test_results)
tfilter (pandas Series, optional) – Boolean values used to include time filter in the plot, default = None
image_format (string , optional) – Image format, default = ‘png’
dpi (int, optional) – DPI resolution, default = 500
figsize (tuple, optional) – Figure size, default = (7.0,3.0)
date_formatter (string, optional) – Date formatter used on the x axis, for example, “%m-%d”. Default = None
filename_root (string, optional) – File name root. If the full path is not provided, files are saved into the current working directory. Each graphic filename is appended with an integer. For example, filename_root = ‘test’ will generate a files named ‘test0.png’, ‘test1.png’, etc. By default, the filename root is ‘test_results’
- Returns
A list of file names
pecos.utils module¶
The utils module contains helper functions.
- pecos.utils.index_to_datetime(index, unit='s', origin='unix')[source]¶
Convert DataFrame index from int/float to datetime, rounds datetime to the nearest millisecond
- Parameters
index (pandas Index) – DataFrame index in int or float
unit (str, optional) – Units of the original index
origin (str) – Reference date used to define the starting time. If origin = ‘unix’, the start time is ‘1970-01-01 00:00:00’ The origin can also be defined using a datetime string in a similar format (i.e. ‘2019-05-17 16:05:45’)
- Returns
pandas Index – DataFrame index in datetime
- pecos.utils.datetime_to_elapsedtime(index, origin=0.0)[source]¶
Convert DataFrame index from datetime to elapsed time in seconds
- Parameters
index (pandas Index) – DataFrame index in datetime
origin (float) – Reference for elapsed time
- Returns
pandas Index – DataFrame index in elapsed seconds
- pecos.utils.datetime_to_clocktime(index)[source]¶
Convert DataFrame index from datetime to clocktime (seconds past midnight)
- Parameters
index (pandas Index) – DataFrame index in datetime
- Returns
pandas Index – DataFrame index in clocktime
- pecos.utils.datetime_to_epochtime(index)[source]¶
Convert DataFrame index from datetime to epoch time
- Parameters
index (pandas Index) – DataFrame index in datetime
- Returns
pandas Index – DataFrame index in epoch time
- pecos.utils.round_index(index, frequency, how='nearest')[source]¶
Round DataFrame index
- Parameters
index (pandas Index) – Datetime index
frequency (int) – Expected time series frequency, in seconds
how (string, optional) –
Method for rounding, default = ‘nearest’. Options include:
nearest = round the index to the nearest frequency
floor = round the index to the smallest expected frequency
ceiling = round the index to the largest expected frequency
- Returns
pandas Index – DataFrame index with rounded values
- pecos.utils.evaluate_string(string_to_eval, data=None, trans=None, specs=None, col_name='eval')[source]¶
Returns an evaluated Python string. WARNING this function calls ‘eval’. Strings of Python code should be thoroughly tested by the user.
This function can be useful when defining quality control configuration options in a file, such as:
Time filters that depend on the data index
Quality control bounds that depend on system constants
Composite signals that are defined using existing data
For each {keyword} in string_to_eval, {keyword} is expanded in the following order:
If keyword is ELAPSED_TIME, CLOCK_TIME or EPOCH_TIME then data.index is converted to seconds (elapsed time, clock time, or epoch time) and used in the evaluation (requires data)
If keyword is used to select a column (or columns) of data, then data[keyword] is used in the evaluation (requires data)
If a translation dictionary is used to select a column (or columns) of data, then data[trans[keyword]] is used in the evaluation (requires data and trans)
If the keyword is a key in a dictionary of constants, specs, then specs[keyword] is used in the evaluation (requires specs)
- Parameters
string_to_eval (string) – String to evaluate, the string can included multiple keywords and numpy (np.*) and pandas (pd.*) functions
data (pandas DataFrame, optional) – Data, indexed by datetime
trans (dictionary, optional) – Translation dictionary
specs (dictionary, optional) – Keyword:value pairs used to define constants
col_name (string, optional) – Column name used in the returned DataFrame. If the DataFrame has more than one column, columns are named col_name 0, col_name 1, …
- Returns
pandas DataFrame or float – Evaluated string
pecos.logger module¶
The logger module contains a function to initialize the logger. Logger warnings are printed to the monitoring report.
pecos.pv module¶
The pv module contains custom methods for PV applications.
- pecos.pv.insolation(G, tfilter=None)[source]¶
Compute insolation defined as:
\(H=\int{Gdt}\)
where \(G\) is irradiance and \(dt\) is the time step between observations. The time integral is computed using the trapezoidal rule. Results are given in [irradiance units]*seconds.
- Parameters
G (pandas DataFrame) – Irradiance time series
tfilter (pandas Series, optional) – Time filter containing boolean values for each time index
- Returns
pandas Series – Insolation
- pecos.pv.energy(P, tfilter=None)[source]¶
Convert energy defined as:
\(E=\int{Pdt}\)
where \(P\) is power and \(dt\) is the time step between observations. The time integral is computed using the trapezoidal rule. Results are given in [power units]*seconds.
- Parameters
P (pandas DataFrame) – Power time series
tfilter (pandas Series, optional) – Time filter containing boolean values for each time index
- Returns
pandas Series – Energy
- pecos.pv.performance_ratio(E, H_poa, P_ref, G_ref=1000)[source]¶
Compute performance ratio defined as:
\(PR=\dfrac{Y_{f}}{Yr} = \dfrac{\dfrac{E}{P_{ref}}}{\dfrac{H_{poa}}{G_{ref}}}\)
where \(Y_f\) is the observed energy (AC or DC) produced by the PV system (kWh) divided by the DC power rating at STC conditions. \(Y_r\) is the plane-of-array insolation (kWh/m2) divided by the reference irradiance (1000 W/m2).
- Parameters
E (pandas Series or float) – Energy (AC or DC)
H_poa (pandas Series or float) – Plane of array insolation
P_ref (float) – DC power rating at STC conditions
G_ref (float, optional) – Reference irradiance, default = 1000
- Returns
pandas Series or float – Performance ratio in a pandas Series (if E or H_poa are Series) or float (if E and H_poa are floats)
- pecos.pv.normalized_current(I, G_poa, I_sco, G_ref=1000)[source]¶
Compute normalized current defined as:
\(NI = \dfrac{\dfrac{I}{I_{sco}}}{\dfrac{G_{poa}}{G_{ref}}}\)
where \(I\) is current, \(I_{sco}\) is the short circuit current at STC conditions, \(G_{poa}\) is the plane-of-array irradiance, and \(G_{ref}\) is the reference irradiance.
- Parameters
I (pandas Series or float) – Current
G_poa (pandas Series or float) – Plane of array irradiance
I_sco (float) – Short circuit current at STC conditions
G_ref (float, optional) – Reference irradiance, default = 1000
- Returns
pandas Series or float – Normalized current in a pandas Series (if I or G_poa are Series) or float (if I and G_poa are floats)
- pecos.pv.normalized_efficiency(P, G_poa, P_ref, G_ref=1000)[source]¶
Compute normalized efficiency defined as:
\(NE = \dfrac{\dfrac{P}{P_{ref}}}{\dfrac{G_{poa}}{G_{ref}}}\)
where \(P\) is the observed power (AC or DC), \(P_{ref}\) is the DC power rating at STC conditions, \(G_{poa}\) is the plane-of-array irradiance, and \(G_{ref}\) is the reference irradiance.
- Parameters
P (pandas Series or float) – Power (AC or DC)
G_poa (pandas Series or float) – Plane of array irradiance
P_ref (float) – DC power rating at STC conditions
G_ref (float, optional) – Reference irradiance, default = 1000
- Returns
pandas Series or float – Normalized efficiency in a pandas Series (if P or G_poa are Series) or float (if P and G_poa are floats)
- pecos.pv.performance_index(E, E_predicted)[source]¶
Compute performance index defined as:
\(PI=\dfrac{E}{\hat{E}}\)
where \(E\) is the observed energy from a PV system and \(\hat{E}\) is the predicted energy over the same time frame. \(\hat{E}\) can be computed using methods in
pvlib.pvsystem
and then convert power to energy usingpecos.pv.enery
.Unlike with the performance ratio, the performance index should be very close to 1 for a well functioning PV system and should not vary by season due to temperature variations.
- Parameters
E (pandas Series or float) – Observed energy
E_predicted (pandas Series or float) – Predicted energy
- Returns
pandas Series or float – Performance index in a pandas Series (if E or E_predicted are Series) or float (if E and E_predicted are floats)
- pecos.pv.energy_yield(E, P_ref)[source]¶
Compute energy yield is defined as:
\(EY=\dfrac{E}{P_{ref}}\)
where \(E\) is the observed energy from a PV system and \(P_{ref}\) is the DC power rating of the system at STC conditions.
- Parameters
E (pandas Series or float) – Observed energy
P_ref (float) – DC power rating at STC conditions
- Returns
pandas Series or float – Energy yield
- pecos.pv.clearness_index(H_dn, H_ea)[source]¶
Compute clearness index defined as:
\(Kt=\dfrac{H_{dn}}{H_{ea}}\)
where \(H_{dn}\) is the direct-normal insolation (kWh/m2) \(H_{ea}\) is the extraterrestrial insolation (kWh/m2) over the same time frame. Extraterrestrial irradiation can be computed using
pvlib.irradiance.extraradiation
. Irradiation can be converted to insolation usingpecos.pv.insolation
.- Parameters
H_dn (pandas Series or float) – Direct normal insolation
H_ea (pandas Series or float) – Extraterrestrial insolation
- Returns
pandas Series or float – Clearness index in a pandas Series (if H_dn or H_ea are Series) or float (if H_dn and H_ea are floats)
References¶
- HMKC07
Hart, D., McKenna, S.A., Klise, K., Cruz, V., & Wilson, M. (2007) Water quality event detection systems for drinking water contamination warning systems: Development testing and application of CANARY, World Environmental and Water Resources Congress (EWRI), Tampa, FL, May 15-19.
- Hunt07
Hunter, J.D. (2007). Matplotlib: A 2D graphics environment. Computing in Science & Engineering, 3(9), 90-95.
- KlSt16a
Klise, K.A., Stein, J.S. (2016). Performance Monitoring using Pecos, Technical Report SAND2016-3583, Sandia National Laboratories.
- KlSt16b
Klise, K.A., Stein, J.S. (2016). Automated Performance Monitoring for PV Systems using Pecos, 43th Photovoltaic Specialists Conference (PVSC), Portland, OR, June 5-10.
- KlSC17
Klise, K.A., Stein, J.S., Cunningham, J. (2017). Application of IEC 61724 Standards to Analyze PV System Performance in Different Climates, 44th Photovoltaic Specialists Conference (PVSC), Washington, DC, June 25-30.
- Mcki13
McKinney W. (2013). Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython. O’Reilly Media, 1 edition, 466P.
- Rona08
Ronacher, A. (2008). Template Designer Documentation, http://jinja.pocoo.org/docs/dev/templates/ accessed July 1, 2016.
- SHFH16
Stein, J.S., Holmgren, W.F., Forbess, J., & Hansen, C.W. (2016). PVLIB: Open Source Photovoltaic Performance Modeling Functions for Matlab and Python, 43rd Photovoltaic Specialists Conference (PVSC), Portland, OR, June 5-10.
- SPHC16
Sievert, C., Parmer, C., Hocking, T., Chamberlain, S., Ram, K., Corvellec, M., and Despouy, P. (2016). plotly: Create interactive web graphics via Plotly’s JavaScript graphing library [Software].
- VaCV11
van der Walt, S., Colbert, S.C., & Varoquaux, G. (2011). The NumPy array: A structure for efficient numerical computation. Computing in Science & Engineering, 13, 22-30.
Indices and tables¶
Sandia National Laboratories is a multimission laboratory managed and operated by National Technology and Engineering Solutions of Sandia, LLC., a wholly owned subsidiary of Honeywell International, Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA-0003525.