Logo

Performance Monitoring using Pecos

Advances in sensor technology have rapidly increased our ability to monitor natural and human-made physical systems. In many cases, it is critical to process the resulting large volumes of data on a regular schedule and alert system operators when the system has changed. Automated quality control and performance monitoring can allow system operators to quickly detect performance issues.

Pecos is an open source Python package designed to address this need. Pecos includes built-in functionality to monitor performance of time series data. The software can be used to automate a series of quality control tests and generate custom reports which include performance metrics, test results, and graphics. The software was developed specifically to monitor solar photovoltaic systems, but is designed to be used for a wide range of applications. Figure 1 shows example graphics and dashboard created using Pecos.

Sample-graphics

Example graphics and dashboard created using Pecos.

Citing Pecos

To cite Pecos, use one of the following references:

  • K.A. Klise and J.S. Stein (2016), Performance Monitoring using Pecos, Technical Report SAND2016-3583, Sandia National Laboratories, Albuquerque, NM.

  • K.A. Klise and J.S. Stein (2016), Automated Performance Monitoring for PV Systems using Pecos, 43rd IEEE Photovoltaic Specialists Conference (PVSC), Portland, OR, June 5-10. pdf

Contents

Overview

Pecos is an open-source Python package designed to monitor performance of time series data, subject to a series of quality control tests. The software includes methods to run quality control tests defined by the user and generate reports which include performance metrics, test results, and graphics. The software can be customized for specific applications. Some high-level features include:

  • Pecos uses Pandas DataFrames [Mcki13] to store and analyze time series data. This dependency facilitates a wide range of analysis options and date-time functionality.

  • Data column names can be easily reassigned to common names through the use of a translation dictionary. Translation dictionaries also allow data columns to be grouped for analysis.

  • Time filters can be used to eliminate data at specific times from quality control tests (i.e. early evening and late afternoon).

  • Predefined and custom quality control functions can be used to determine if data is anomalous.

  • Application specific models can be incorporated into quality control tests to compare measured to modeled data values.

  • General and custom performance metrics can be saved to keep a running history of system health.

  • Analysis can be set up to run on an automated schedule (i.e. Pecos can be run each day to analyze data collected on the previous day).

  • HTML formatted reports can be sent via email or hosted on a website. LaTeX formatted reports can also be generated.

  • Data acquisition methods can be used to transfer data from sensors to an SQL database.

Installation

Pecos requires Python (tested on 3.6, 3.7, and 3.8) along with several Python package dependencies. Information on installing and using Python can be found at https://www.python.org/. Python distributions, such as Anaconda, are recommended to manage the Python interface. Anaconda Python distributions include the Python packages needed to run Pecos.

Pecos can be installed using pip, git, or a downloaded zip file.

pip: To install Pecos using pip:

pip install pecos

git: To install Pecos using git:

git clone https://github.com/sandialabs/pecos
cd pecos
python setup.py install

zip file: To install Pecos using a downloaded zip file, go to https://github.com/sandialabs/pecos, select the “Clone or download” button and then select “Download ZIP”. This downloads a zip file called pecos-master.zip. To download a specific release, go to https://github.com/sandialabs/pecos/releases and select a zip file. The software can then be installed by unzipping the file and running setup.py:

unzip pecos-master.zip
cd pecos-master
python setup.py install

Required Python package dependencies include:

Optional Python packages dependencies include:

All other dependencies are part of the Python Standard Library.

To use Pecos, import the package from a Python console:

import pecos

Framework

Pecos contains the following modules

Module

Description

monitoring

Contains the PerformanceMonitoring class and individual quality control test functions that are used to run analysis

metrics

Contains metrics that describe the quality control analysis or compute quantities that might be of use in the analysis

io

Contains functions to load data, send email alerts, write results to files, and generate HTML and LaTeX reports

graphics

Contains functions to generate scatter, time series, and heatmap plots for reports

utils

Contains helper functions, including functions to convert time series indices from seconds to datetime

In addition to the modules listed above, Pecos also includes a pv module that contains metrics specific to photovoltaic analysis.

Object-oriented and functional approach

Pecos supports quality control tests that are called using both an object-oriented and functional approach.

Object-oriented approach

Pecos includes a PerformanceMonitoring class which is the base class used to define the quality control analysis. This class stores:

  • Raw data

  • Translation dictionary (maps raw data column names to common names)

  • Time filter (excludes specific timestamps from analysis)

The class is used to call quality control tests, including:

The class can return the following results:

  • Cleaned data (data that failed a test is replaced by NaN)

  • Boolean mask (indicates if data failed a test)

  • Summary of the quality control test results

The object-oriented approach is convenient when running a series of quality control tests and can make use of the translation dictionary and time filter across all tests. The cleaned data, boolean mask, and test results summary reflect results from all quality control tests.

When using the object-oriented approach, a PerformanceMonitoring object is created and methods are called using that object. The cleaned data, mask, and tests results can then be extracted from the PerformanceMonitoring object. These properties are updated each time a quality control test is run.

>>> pm = pecos.monitoring.PerformanceMonitoring()
>>> pm.add_dataframe(data)
>>> pm.check_range([-3,3])
>>> cleaned_data = pm.cleaned_data
>>> mask = pm.mask
>>> test_results = pm.test_results
Functional approach

The same quality control tests can also be run using individual functions. These functions generate a PerformanceMonitoring object under the hood and return:

  • Cleaned data

  • Boolean mask

  • Summary of the quality control test results

The functional approach is a convenient way to quickly get results from a single quality control tests.

When using the functional approach, data is passed to the quality control test function. All other augments match the object-oriented approach. The cleaned data, mask, and tests results can then be extracted from a resulting dictionary.

>>> results = pecos.monitoring.check_range(data, [-3,3])
>>> cleaned_data = results['cleaned_data']
>>> mask = results['mask']
>>> test_results = results['test_results']

Note, examples in the documentation use the object-oriented approach.

Static and streaming analysis

Pecos supports both static and streaming analysis.

Static analysis

Most quality control tests in Pecos use static analysis. Static analysis operates on the entire data set to determine if all data points are normal or anomalous. While this can include operations like moving window statistics, the quality control tests operates on the entire data set at once. This means that results from the quality control test are not dependent on results from a previous time step. This approach is appropriate when data at different time steps can be analyzed independently, or moving window statistics used to analyze the data do not need to be updated based on test results.

The following quality control tests use static analysis:

1 The outlier test can make use of both static and streaming analysis. See Outlier test for more details.

Streaming analysis

The streaming analysis loops through each data point using a quality control tests that relies on information from “clean data” in a moving window. If a data point is determined to be anomalous, it is not included in the window for subsequent analysis. When using a streaming analysis, Pecos keeps track of the cleaned history that is used in the quality control test at each time step. This approach is important to use when the underlying methods in the quality control test could be corrupted by historical data points that were deemed anomalous. The streaming analysis also allows users to better analyze continuous datasets in a near real-time fashion. While Pecos could be used to analyze data at a single time step in a real-time fashion (creating a new instance of the PerformanceMonitoring class each time), the methods in Pecos are really designed to analyze data over a time period. That time period can depend on several factors, including the size of the data and how often the test results and reports should be generated. Cleaned history can be appended to new datasets as they come available to create a seamless analysis for continuous data. See Continuous analysis for more details.

The streaming analysis includes an optional parameter which is used to rebase data in the history window if a certain fraction of that data has been deemed to be anomalous. The ability to rebase the history is useful if data changes to a new normal condition that would otherwise continue to be flagged as anomalous.

The following quality control tests use streaming analysis:

2 The timestamp test does not loop through data using a moving window, rather timestamp functionality in Pandas is used to determine anomalies in the time index.

3 The outlier test can make use of both static and streaming analysis. See Outlier test for more details.

Custom quality control tests

Pecos supports custom quality control tests that can be static or streaming in form. This feature allows the user to customize the analysis used to determine if data is anomalous and return custom metadata from the analysis.

The custom function is defined outside of Pecos and handed to the custom quality control method as an input argument. The allows the user to include analysis options that are not currently support in Pecos or are very specific to their application.

While there are no specifications on the information that metadata stores, the metadata commonly includes raw values that were used in the quality control test. For example, while the outlier test returns a boolean value that indicates if data is normal or anomalous, the metadata can include the normalized data value that was used to make that determination. See Custom tests for more details.

Simple example

A simple example is included in the examples/simple directory. This example uses data from a CSV file, simple.csv, which contains 4 columns of data (A through D).

  • A = elapsed time in days

  • B = uniform random number between 0 and 1

  • C = sin(10*A)

  • D = C+(B-0.5)/2

The data includes missing timestamps, duplicate timestamps, non-monotonic timestamps, corrupt data, data out of expected range, data that doesn’t change, and data that changes abruptly, as listed below.

  • Missing timestamp at 5:00

  • Duplicate timestamp 17:00

  • Non-monotonic timestamp 19:30

  • Column A has the same value (0.5) from 12:00 until 14:30

  • Column B is below the expected lower bound of 0 at 6:30 and above the expected upper bound of 1 at 15:30

  • Column C has corrupt data (-999) between 7:30 and 9:30

  • Column C does not follow the expected sine function from 13:00 until 16:15. The change is abrupt and gradually corrected.

  • Column D is missing data from 17:45 until 18:15

  • Column D is occasionally below the expected lower bound of -1 around midday (2 time steps) and above the expected upper bound of 1 in the early morning and late evening (10 time steps).

The script, simple_example.py (shown below), is used to run quality control analysis using Pecos. The script performs the following steps:

  • Load time series data from a CSV file

  • Run quality control tests

  • Save test results to a CSV files

  • Generate an HTML report

"""
In this example, simple time series data is used to demonstrate basic functions
in pecos.  
* Data is loaded from a CSV file which contains four columns of values that 
  are expected to follow linear, random, and sine models.
* A translation dictionary is defined to map and group the raw data into 
  common names for analysis
* A time filter is established to screen out data between 3 AM and 9 PM
* The data is loaded into a pecos PerformanceMonitoring object and a series of 
  quality control tests are run, including range tests and increment tests 
* The results are printed to CSV and HTML reports
"""
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import pecos

# Initialize logger
pecos.logger.initialize()

# Create a Pecos PerformanceMonitoring data object
pm = pecos.monitoring.PerformanceMonitoring()

# Populate the object with a DataFrame and translation dictionary
data_file = 'simple.csv'
df = pd.read_csv(data_file, index_col=0, parse_dates=True)
pm.add_dataframe(df)
pm.add_translation_dictionary({'Wave': ['C','D']}) # group C and D

# Check the expected frequency of the timestamp
pm.check_timestamp(900)
 
# Generate a time filter to exclude data points early and late in the day
clock_time = pecos.utils.datetime_to_clocktime(pm.data.index)
time_filter = pd.Series((clock_time > 3*3600) & (clock_time < 21*3600), 
                        index=pm.data.index)
pm.add_time_filter(time_filter)

# Check for missing data
pm.check_missing()
        
# Check for corrupt data values
pm.check_corrupt([-999]) 

# Add a composite signal which compares measurements to a model
wave_model = np.array(np.sin(10*clock_time/86400))
wave_measurments = pm.data[pm.trans['Wave']]
wave_error = np.abs(wave_measurments.subtract(wave_model,axis=0))
wave_error.columns=['Wave Error C', 'Wave Error D']
pm.add_dataframe(wave_error)
pm.add_translation_dictionary({'Wave Error': ['Wave Error C', 'Wave Error D']})

# Check data for expected ranges
pm.check_range([0, 1], 'B')
pm.check_range([-1, 1], 'Wave')
pm.check_range([None, 0.25], 'Wave Error')

# Check for stagnant data within a 1 hour moving window
pm.check_delta([0.0001, None], 3600, 'A') 
pm.check_delta([0.0001, None], 3600, 'B') 
pm.check_delta([0.0001, None], 3600, 'Wave') 
    
# Check for abrupt changes between consecutive time steps
pm.check_increment([None, 0.6], 'Wave') 

# Compute the quality control index for A, B, C, and D
mask = pm.mask[['A','B','C','D']]
QCI = pecos.metrics.qci(mask, pm.tfilter)

# Generate graphics
test_results_graphics = pecos.graphics.plot_test_results(pm.data, pm.test_results, pm.tfilter)
df.plot(ylim=[-1.5,1.5], figsize=(7.0,3.5))
plt.savefig('custom.png', format='png', dpi=500)

# Write test results and report files
pecos.io.write_test_results(pm.test_results)
pecos.io.write_monitoring_report(pm.data, pm.test_results, test_results_graphics, 
                                 ['custom.png'], QCI)
                                 

Results include:

  • HTML monitoring report, monitoring_report.html (Figure 2), includes quality control index, summary table, and graphics

  • Test results CSV file, test_results.csv, includes information from the summary tables

Monitoring report

Example monitoring report.

Time series data

Pecos uses Pandas DataFrames to store and analyze data indexed by time. Pandas DataFrames store 2D data with labeled columns. Pandas includes a wide range of time series analysis and date-time functionality. By using Pandas DataFrames, Pecos is able to take advantage of a wide range of timestamp string formats, including UTC offset.

Pandas includes many built-in functions to read data from CSV, Excel, SQL, etc. For example, data can be loaded from an excel file using the following code.

>>> import pandas as pd
>>> data = pd.read_excel('data.xlsx') 

Data can also be gathered from the web using the Python package request, http://docs.python-requests.org.

To get started, create an instance of the PerformanceMonitoring class.

Note

Quality control tests can also be called using individual functions, see Framework for more details.

>>> import pecos
>>> pm = pecos.monitoring.PerformanceMonitoring()

Data, in the form of a Pandas DataFrame, can then be added to the PerformanceMonitoring object.

>>> pm.add_dataframe(data)

The data is accessed using

>>> pm.data 

Multiple DataFrames can be added to the PerformanceMonitoring object. New data overrides existing data if DataFrames share indexes and columns. Missing indexes and columns are filled with NaN. An example is shown below.

>>> print(data1)
            A  B
2018-01-01  0  5
2018-01-02  1  6
2018-01-03  2  7

>>> print(data2)
            B  C
2018-01-02  0  5
2018-01-03  1  6
2018-01-04  2  7

>>> pm.add_dataframe(data1)
>>> pm.add_dataframe(data2)
>>> print(pm.data)
              A    B    C
2018-01-01  0.0  5.0  NaN
2018-01-02  1.0  0.0  5.0
2018-01-03  2.0  1.0  6.0
2018-01-04  NaN  2.0  7.0

Translation dictionary

A translation dictionary is an optional feature which allows the user to map original column names into common names that can be more useful for analysis and reporting. A translation dictionary can also be used to group columns with similar properties into a single variable. Using grouped variables, Pecos can run a single set of quality control tests on the group.

Each entry in a translation dictionary is a key:value pair where ‘key’ is the common name of the data and ‘value’ is a list of original column names in the DataFrame. For example, {temp: [temp1,temp2]} means that columns named ‘temp1’ and ‘temp2’ in the DataFrame are assigned to the common name ‘temp’ in Pecos. In the Simple example, the following translation dictionary is used to group columns ‘C’ and ‘D’ to ‘Wave’.

>>> trans = {'Wave': ['C','D']}

The translation dictionary can then be added to the PerformanceMonitoring object.

>>> pm.add_translation_dictionary(trans)

As with DataFrames, multiple translation dictionaries can be added to the PerformanceMonitoring object. New dictionaries override existing keys in the translation dictionary.

Keys defined in the translation dictionary can be used in quality control tests, for example,

>>> pm.check_range([-1,1], 'Wave')

runs a check range test on columns ‘C’ and ‘D’.

Inside Pecos, the translation dictionary is used to index into the DataFrame, for example,

>>> pm.data[pm.trans['Wave']] 

returns columns ‘C’ and ‘D’ from the DataFrame.

Time filter

A time filter is an optional feature which allows the user to exclude specific timestamps from all quality control tests. The time filter is a Boolean time series that can be defined using elapsed time, clock time, or other custom algorithms.

Pecos includes methods to get the elapsed and clock time of the DataFrame (in seconds). The following example defines a time filter between 3 AM and 9 PM,

>>> clocktime = pecos.utils.datetime_to_clocktime(pm.data.index)
>>> time_filter = pd.Series((clocktime > 3*3600) & (clocktime < 21*3600),
...                         index=pm.data.index)

The time filter can also be defined based on properties of the DataFrame, for example,

>>> time_filter = pm.data['A'] > 0.5

For some applications, it is useful to define the time filter based on sun position, as demonstrated in pv_example.py in the examples/pv directory.

The time filter can then be added to the PerformanceMonitoring object as follows,

>>> pm.add_time_filter(time_filter)

Quality control tests

Pecos includes several built in quality control tests. When a test fails, information is stored in a summary table. This information can be saved to a file, database, or included in reports. Quality controls tests fall into eight categories:

  • Timestamp

  • Missing data

  • Corrupt data

  • Range

  • Delta

  • Increment

  • Outlier

  • Custom

Note

Quality control tests can also be called using individual functions, see Framework for more details.

Timestamp test

The check_timestamp method is used to check the time index for missing, duplicate, and non-monotonic indexes. If a duplicate timestamp is found, Pecos keeps the first occurrence. If timestamps are not monotonic, the timestamps are reordered. For this reason, the timestamp should be corrected before other quality control tests are run. The timestamp test is the only test that modifies the data stored in pm.data. Input includes:

  • Expected frequency of the time series in seconds

  • Expected start time (default = None, which uses the first index of the time series)

  • Expected end time (default = None, which uses the last index of the time series)

  • Minimum number of consecutive failures for reporting (default = 1)

  • A flag indicating if exact timestamps are expected. When set to False, irregular timestamps can be used in the Pecos analysis (default = True).

For example,

>>> pm.check_timestamp(60)

checks for missing, duplicate, and non-monotonic indexes assuming an expected frequency of 60 seconds.

Missing data test

The check_missing method is used to check for missing values. Unlike missing timestamps, missing data only impacts a subset of data columns. NaN is included as missing. Input includes:

  • Data column (default = None, which indicates that all columns are used)

  • Minimum number of consecutive failures for reporting (default = 1)

For example,

>>> pm.check_missing('A', min_failures=5)

checks for missing data in the columns associated with the column or group ‘A’. In this example, warnings are only reported if there are 5 consecutive failures.

Corrupt data test

The check_corrupt method is used to check for corrupt values. Input includes:

  • List of corrupt values

  • Data column (default = None, which indicates that all columns are used)

  • Minimum number of consecutive failures for reporting (default = 1)

For example,

>>> pm.check_corrupt([-999, 999])

checks for data with values -999 or 999 in the entire dataset.

Range test

The check_range method is used to check if data is within expected bounds. Range tests are very flexible. The test can be used to check for expected range on the raw data or using modified data. For example, composite signals can be add to the analysis to check for expected range on modeled vs. measured values (i.e. absolute error or relative error) or an expected relationships between data columns (i.e. column A divided by column B). An upper bound, lower bound, or both can be specified. Input includes:

  • Upper and lower bound

  • Data column (default = None, which indicates that all columns are used)

  • Minimum number of consecutive failures for reporting (default = 1)

For example,

>>> pm.check_range([None, 1], 'A')

checks for values greater than 1 in the columns associated with the key ‘A’.

Delta test

The check_delta method is used to check for stagnant data and abrupt changes in data. The test checks if the difference between the minimum and maximum data value within a moving window is within expected bounds.

Input includes:

  • Upper and lower bound

  • Size of the moving window used to compute the difference between the minimum and maximum

  • Data column (default = None, which indicates that all columns are used)

  • Flag indicating if the test should only check for positive delta (the min occurs before the max) or negative delta (the max occurs before the min) (default = False)

  • Minimum number of consecutive failures for reporting (default = 1)

For example,

>>> pm.check_delta([0.0001, None], window=3600)

checks if data changes by less than 0.0001 in a 1 hour moving window.

>>> pm.check_delta([None, 800], window=1800, direction='negative')

checks if data decrease by more than 800 in a 30 minute moving window.

Increment test

Similar to the check_delta method above, the check_increment method can be used to check for stagnant data and abrupt changes in data. The test checks if the difference between consecutive data values (or other specified increment) is within expected bounds. While this method is faster than the check_delta method, it does not consider the timestamp index or changes within a moving window, making its ability to find stagnant data and abrupt changes less robust.

Input includes:

  • Upper and lower bound

  • Data column (default = None, which indicates that all columns are used)

  • Increment used for difference calculation (default = 1 timestamp)

  • Flag indicating if the absolute value of the increment is used in the test (default = True)

  • Minimum number of consecutive failures for reporting (default = 1)

For example,

>>> pm.check_increment([0.0001, None], min_failures=60)

checks if increments are less than 0.0001 for 60 consecutive time steps.

>>> pm.check_increment([-800, None], absolute_value=False)

checks if increments decrease by more than 800 in a single time step.

Outlier test

The check_outlier method is used to check if normalized data falls outside expected bounds. Data is normalized using the mean and standard deviation, using either a moving window or using the entire data set. If multiple columns of data are used, each column is normalized separately. Input includes:

  • Upper and lower bound (in standard deviations)

  • Data column (default = None, which indicates that all columns are used)

  • Size of the moving window used to normalize the data (default = None). Note that when the window is set to None, the mean and standard deviation of the entire data set is used to normalize the data.

  • Flag indicating if the absolute value of the normalize data is used in the test (default = True)

  • Minimum number of consecutive failures for reporting (default = 1)

  • Flag indicating if the outlier test should use streaming analysis (default=False).

Note that using a streaming analysis is different than merely defining a moving window. Streaming analysis omits anomalous values from subsequent normalization calculations, where as a static analysis with a moving window does not.

In a static analysis, the mean and standard deviation used to normalize the data are computed using a moving window (or using the entire data set if window=None) and upper and lower bounds are used to determine if data points are anomalous. The results do not impact the moving window statistics. In a streaming analysis, the mean and standard deviation are computed using a moving window after each data points is determined to be normal or anomalous. Data points that are determined to be anomalous are not used in the normalization.

For example,

>>> pm.check_outlier([None, 3], window=12*3600)

checks if the normalized data changes by more than 3 standard deviations within a 12 hour moving window.

Custom tests

The check_custom_static and check_custom_streaming methods allow the user to supply a custom function that is used to determine if data is normal or anomalous. See Static and streaming analysis for more details.

This feature allows the user to customize the analysis and return custom metadata from the analysis. The custom function is defined outside of Pecos and handed to the custom quality control method as an input argument. The allows the user to include analysis options that are not currently support in Pecos or are very specific to their application. While there are no specifications on what this metadata stores, the metadata commonly includes the raw values that were included in a quality control test. For example, while the outlier test returns a boolean value that indicates if data is normal or anomalous, the metadata can include the normalized data value that was used to make that determination.

The user can also create custom quality control tests by creating a class that inherits from the PerformanceMonitoring class.

Custom static analysis

Static analysis operates on the entire data set to determine if all data points are normal or anomalous. Input for custom static analysis includes:

  • Custom quality control function with the following general form:

    def custom_static_function(data):
        """
        Parameters
        ----------
        data : pandas DataFrame
            Data to be analyzed.
    
        Returns
        --------
        mask : pandas DataFrame
            Mask contains boolean values and is the same size as data.
            True = data passed the quality control test,
            False = data failed the quality control test.
    
        metadata : pandas DataFrame
            Metadata stores additional information about the test and is returned by
            ''check_custom_static''.  Metadata is generally the same size as data.
        """
    
        # User defined custom algorithm
        ...
    
        return mask, metadata
    
  • Data column (default = None, which indicates that all columns are used)

  • Minimum number of consecutive failures for reporting (default = 1)

  • Error message (default = None)

Custom static analysis can be run using the following example. The custom function below, sine_function, determines if sin(data) is greater than 0.5 and returns the value of sin(data) as metadata.

>>> import numpy as np

>>> def sine_function(data):
...     # Create metadata and mask using sin(data)
...     metadata = np.sin(data)
...     mask = metadata > 0.5
...     return mask, metadata

>>> metadata = pm.check_custom_static(sine_function)
Custom streaming analysis

The streaming analysis loops through each data point using a quality control tests that relies on information from “clean data” in a moving window. Input for custom streaming analysis includes:

  • Custom quality control function with the following general form:

    def custom_streaming_function(data_pt, history):
        """
        Parameters
        ----------
        data_pt : pandas Series
            The current data point to be analyzed.
    
        history : pandas DataFrame
            Historical data used in the analysis. The streaming analysis omits
            data points that were previously flagged as anomalous in the history.
    
        Returns
        --------
        mask : pandas Series
            Mask contains boolean values (one value for each row in data_pt).
            True = data passed the quality control test,
            False = data failed the quality control test.
    
        metadata : pandas Series
            Metadata stores additional information about the test for the current data point.
            Metadata generally contains one value for row in data_pt. Metadata is
            collected into a pandas DataFrame with one row per time index that was included
            in the quality control test (omits the history window) and is returned
            by ''check_custom_streaming''.
        """
    
        # User defined custom algorithm
        ...
    
        return mask, metadata
    
  • Size of the moving window used to define the cleaned history.

  • Indicator used to rebase the history window. If the user defined fraction of the history window has been deemed anomalous, then the history is reset using raw data. The ability to rebase the history is useful if data changes to a new normal condition that would otherwise continue to be flagged as anomalous. (default = None, which indicates that rebase is not used)

  • Data column (default = None, which indicates that all columns are used)

  • Error message (default = None)

Custom streaming analysis can be run using the following example. The custom function below, nearest_neighbor, determines if the current data point is within 3 standard deviations of data in a 10 minute history window. In this case, metadata returns the distance from each column in the current data point to its nearest neighbor in the history. This is similar to the multivariate nearest neighbor algorithm used in CANARY [HMKC07].

>>> import numpy as np
>>> import pandas as pd
>>> from scipy.spatial.distance import cdist

>>> def nearest_neighbor(data_pt, history):
...     # Normalize the current data point and history using the history window
...     zt = (data_pt - history.mean())/history.std()
...     z = (history - history.mean())/history.std()
...     # Compute the distance from the current data point to data in the history window
...     zt_reshape = zt.to_frame().T
...     dist = cdist(zt_reshape, z)
...     # Extract the minimum distance
...     min_dist = np.nanmin(dist)
...     # Extract the index for the min distance and the distance components
...     idx = np.nanargmin(dist)
...     metadata = z.loc[idx,:] - zt
...     # Determine if the min distance is less than 3, assign value (T/F) to the mask
...     mask = pd.Series(min_dist <= 3, index=data_pt.index)
...     return mask, metadata

>>> metadata = pm.check_custom_streaming(nearest_neighbor, window=600)

Metrics

Pecos includes several metrics that describe the quality control analysis or compute quantities that might be of use in the analysis. Many of these metrics aggregates over time and can be saved to track long term performance and system health.

While Pecos typically runs a series of quality control tests on raw data, quality control tests can also be run on metrics generated from these analyses to track long term performance and system health. For example, daily quality control analysis can generate summary metrics that can later be used to generate a yearly summary report. Pecos includes a performance metrics example (based on one year of PV metrics) in the examples/metrics directory.

Quality control index

The quality control index (QCI) is a general metric which indicates the percent of data points that pass quality control tests. Duplicate and non-monotonic indexes are not counted as failed tests (duplicates are removed and non-monotonic indexes are reordered). A value of 1 indicates that all data passed all tests. QCI is computed for each column of data. For example, if the data contains 720 entries and 700 pass all quality control tests, then the QCI is 700/720 = 0.972. QCI is computed using the qci method.

To compute QCI,

>>> QCI = pecos.metrics.qci(pm.mask)

Root mean square error

The root mean squared error (RMSE) is used to compare the difference between two variables. RMSE is computed for each column of data (note, the column names in the two data sets must be equal). This metric is often used to compare measured to modeled data. RMSE is computed using the rmse method.

Time integral

The integral is computed using the trapezoidal rule and is computed using the time_integral method. The integral is computed for each column of data.

Time derivative

The derivative is computed using central differences and is computed using the time_derivative method. The derivative is computed for each column of data.

Probability of detection and false alarm rate

The probability of detection (PD) and false alarm rate (FAR) are used to evaluate how well a quality control test (or set of quality control tests) distinguishes background from anomalous conditions. PD and FAR are related to the number of true negatives, false negatives, false positives, and true positives, as shown in Figure 3. The estimated condition can be computed using results from quality control tests in Pecos, the actual condition must be supplied by the user. If actual conditions are not known, anomalous conditions can be superimposed in the raw data to generate a testing data set. A “good” quality control test (or tests) result in a PD close to 1 and FAR close to 0.

Receiver Operating Characteristic (ROC) curves are used to compare the effectiveness of different quality control tests, as shown in Figure 4. To generate a ROC curve, quality control test input parameters (i.e. upper bound for a range test) are systematically adjusted. PD and FAR are computed using the probability_of_detection and false_alarm_rate methods. These metrics are computed for each column of data.

FAR and PD

Relationship between FAR and PD.

ROC

Example ROC curve.

Composite signals

Composite signals are defined as data generated from existing data or from models. Composite signals can be used to add modeled data values or relationships between data columns to quality control tests.

Python facilitates a wide range of analysis options that can be incorporated into Pecos using composite signals. For example, composite signals can be created using the following methods available in open source Python packages (e.g., numpy, scipy, pandas, scikit-learn, tensorflow):

  • Logic/comparison

  • Interpolation

  • Filtering

  • Rolling window statistics

  • Regression

  • Classification

  • Clustering

  • Machine learning

Pecos can also interface with analysis run outside Python using the Python package subprocess.

Once a composite signal is created, it can be used directly within a quality control test, or compared to existing data and the residual can be used in a quality control test.

In the Simple example, a very simple ‘Wave Model’ composite signal is added to the PerformanceMonitoring object.

>>> clocktime = pecos.utils.datetime_to_clocktime(pm.data.index)
>>> wave_model = pd.DataFrame(np.sin(10*(clocktime/86400)),
...                           index=pm.data.index, columns=['Wave Model'])
>>> pm.add_dataframe(wave_model)

Results

Analysis run using Pecos results in a collection of quality control test results, quality control mask, cleaned data, and performance metrics. This information can be used to generate HTML/LaTeX reports and dashboards.

Quality control test results

When a quality control test fails, information is stored in:

pm.test_results

This DataFrame is updated each time a new quality control test is run. Test results includes the following information:

  • Variable Name: Column name in the DataFrame

  • Start Time: Start time of the failure

  • End Time: : End time of the failure

  • Timesteps: The number of consecutive time steps involved in the failure

  • Error Flag: Error messages include:

    • Duplicate timestamp

    • Nonmonotonic timestamp

    • Missing data (used for missing data and missing timestamp)

    • Corrupt data

    • Data < lower bound OR Data > upper bound

    • Increment < lower bound OR Increment > upper bound

    • Delta < lower bound OR Delta > upper bound

    • Outlier < lower bound OR Outlier > upper bound

A subset of quality control test results from the Simple example are shown below.

>>> print(pm.test_results)
  Variable Name      Start Time        End Time  Timesteps                   Error Flag
1           NaN   1/1/2015 5:00   1/1/2015 5:00          1            Missing timestamp
2           NaN  1/1/2015 17:00  1/1/2015 17:00          1          Duplicate timestamp
3           NaN  1/1/2015 19:30  1/1/2015 19:30          1       Nonmonotonic timestamp
4             A  1/1/2015 12:00  1/1/2015 14:30         11  Delta < lower bound, 0.0001
5             B   1/1/2015 6:30   1/1/2015 6:30          1        Data < lower bound, 0
6             B  1/1/2015 15:30  1/1/2015 15:30          1        Data > upper bound, 1
7             C   1/1/2015 7:30   1/1/2015 9:30          9                 Corrupt data

Note that variable names are not recorded for timestamp test failures (Test results 1, 2, and 3).

The write_test_results method is used to write quality control test results to a CSV file. This method can be customized to write quality control test results to a database or to other file formats.

Quality control mask

Boolean mask indicating data that failed a quality control test is stored in:

pm.mask

This DataFrame is updated each time a new quality control test is run. True indicates that data pass all tests, False indicates data did not pass at least one test (or data is NaN).

Cleaned data

Cleaned data set is stored in:

pm.cleaned_data

This DataFrame is updated each time a new quality control test is run. Data that failed a quality control test are replaced by NaN.

Note that Pandas includes several methods to replace NaN using different replacement strategies. Generally, the best data replacement strategy must be defined on a case by case basis. Possible strategies include:

  • Replacing missing data using linear interpolation or other polynomial approximations

  • Replacing missing data using a rolling mean of the data

  • Replacing missing data with a data from a previous period (previous day, hour, etc.)

  • Replacing missing data with data from a redundant sensor

  • Replacing missing data with values from a model

These strategies can be accomplished using the Pandas methods interpolate, replace, and fillna. See Pandas documentation for more details.

Metrics

The write_metrics method is used to write metrics that describe the quality control analysis (i.e. QCI) to a CSV file. This method can be customized to write performance metrics to a database or to other file formats. The method can be called multiple times to appended metrics based on the timestamp of the DataFrame.

>>> print(metrics_day1)
              QCI   RMSE
2018-01-01  0.871  0.952
>>> print(metrics_day2)
              QCI   RMSE
2018-01-02  0.755  0.845
>>> pecos.io.write_metrics(metrics_day1, 'metrics_file.csv') 
>>> pecos.io.write_metrics(metrics_day2, 'metrics_file.csv') 

The metrics_file.csv file will contain:

              QCI   RMSE
2018-01-01  0.871  0.952
2018-01-02  0.755  0.845

Monitoring reports

The write_monitoring_report method is used to generate a HTML or LaTeX formatted monitoring report. The monitoring report includes the start and end time for the data that was analyzed, custom graphics and performance metrics, a table that includes test results, graphics associated with the test results (highlighting data points that failed a quality control tests), notes on runtime errors and warnings, and the configuration options used in the analysis.

  • Custom Graphics: Custom graphics are created by the user for their specific application. Custom graphics can also be generated using methods in the graphics module. These graphics are included at the top of the report.

  • Performance Metrics: Performance metrics are displayed in a table.

  • Test Results Test results contain information stored in pm.test_results. Graphics follow that display the data point(s) that caused the failure. Test results graphics are generated using the plot_test_results method.

  • Notes: Notes include Pecos runtime errors and warnings. Notes include:

    • Empty/missing data

    • Formatting error in the translation dictionary

    • Insufficient data for a specific quality control test

    • Insufficient data or error when evaluating string

  • Configuration Options: Configuration options used in the analysis.

Figure 5 shows the monitoring report from the Simple example.

Monitoring report

Example monitoring report.

Dashboards

To compare quality control analysis across several systems, key graphics and metrics can be gathered in a dashboard view. For example, the dashboard can contain multiple rows (one for each system) and multiple columns (one for each location). The dashboard can be linked to monitoring reports and interactive graphics for more detailed information. The write_monitoring_report method is used to generate a HTML dashboard.

For each row and column in the dashboard, the following information can be specified

  • Text (i.e. general information about the system/location)

  • Graphics (i.e. a list of custom graphics)

  • Table (i.e. a Pandas DataFrame with performance metrics)

  • Links (i.e. the path to a monitoring report or other file/site for additional information)

The user defined text, graphics, tables, and links create custom dashboards. Pecos includes dashboard examples in the examples/dashboard directory. Figure 6, Figure 7, and Figure 8 show example dashboards generated using Pecos.

Dashboard1

Example dashboard 1.

Dashboard

Example dashboard 2.

Dashboard

Example dashboard 3.

Graphics

The graphics module contains several methods to plot time series data, scatter plots, heatmaps, and interactive graphics. These methods can be used to generate graphics that are included in monitoring reports and dashboards, or to generate stand alone graphics. The following figures illustrate graphics created using the methods included in Pecos. Note that many other graphing options are available using Python graphing packages directly.

Test results graphics, generated using plot_test_results, include time series data along with a shaded time filter and quality control test results. The following figure shows inverter efficiency over the course of 1 day. The gray region indicates times when sun elevation is < 20 degrees. This region is eliminated from quality control tests. Green marks identify data points that were flagged as changing abruptly, red marks identify data points that were outside expected range. These graphics can be included in Monitoring reports.

test-results

Example test results graphic.

Day-of-year vs. time-of-day heatmaps, generated using plot_doy_heatmap, help identify missing data, trends, define filters and define quality control test thresholds when working with large data sets. The following figure shows irradiance over a year with the time of sunrise and sunset for each day. The white vertical line indicates one day of missing data. The method plot_heatmap creates a simple heatmaps. These plots can be included as custom graphics in Monitoring reports and Dashboards.

DOY heatmap

Example day-of-year vs. time of day heatmap.

Interactive graphics, generated using plot_interactive_time series, are HTML graphic files which the user can scale and hover over to visualize data. The following figure shows an image of an interactive graphic. Many more options are available, see https://plot.ly for more details. Interactive graphics can be linked to Dashboards.

Plotly

Example interactive graphic using plotly.

Automation

Task scheduler

To run Pecos on an automated schedule, create a task using your operating systems. On Windows, open the Control Panel and search for Schedule Tasks. On Linux and OSX, use the cron utility.

Tasks are defined by a trigger and an action. The trigger indicates when the task should be run (i.e. Daily at 1:00 pm). The action can be set to run a batch file. A batch file (.bat or .cmd filename extension) can be easily written to start a Python script which runs Pecos. For example, the following batch file runs driver.py:

cd your_working_directory
C:\Users\username\Anaconda3\python.exe driver.py

Continuous analysis

The following example illustrates a framework that analyzes continuous streaming data and provides reports. For continuous data streams, it is often advantageous to provide quality control analysis and reports at a regular interval. While the analysis and reporting can occur every time new data is available, it is often more informative and more efficient to run analysis and create reports that cover a longer time interval. For example, data might be collected every minute and quality control analysis might be run every day.

The following example pulls data from an SQL database that includes a table of raw data (data), table of data that has completed quality control analysis (qc_data), and a table that stores a summary of quality control test failures (qc_summary). After the analysis, quality control results are appended to the database. This process could also include metrics that describe the quality control results. The following code could be used as a Python driver that runs using a task scheduler every day, pulling in yesterday’s data. In this example, 1 hour of cleaned data is used to initialize the moving window and a streaming outlier test is run.

>>> import pandas as pd
>>> from sqlalchemy import create_engine
>>> import datetime
>>> import pecos

>>> # Create the SQLite engine
>>> engine = create_engine('sqlite:///monitor.db', echo=False)

>>> # Define the date to extract yesterday's data
>>> date = datetime.date.today()-datetime.timedelta(days=1)

>>> # Load data and recent history from the database
>>> data = pd.read_sql("SELECT * FROM data WHERE timestamp BETWEEN '" + str(date) + \
...                    " 00:00:00' AND '" + str(date) + " 23:59:59';" , engine,
...                    parse_dates='timestamp', index_col='timestamp')

>>> history = pd.read_sql("SELECT * FROM qc_data WHERE timestamp BETWEEN '" + \
...                       str(date-datetime.timedelta(days=1)) + " 23:00:00' AND '" + \
...                       str(date-datetime.timedelta(days=1)) + " 23:59:59';" , engine,
...                       parse_dates='timestamp', index_col='timestamp')

>>> # Setup the PerformanceMonitoring with data and history and run a streaming outlier test
>>> pm = pecos.monitoring.PerformanceMonitoring()
>>> pm.add_dataframe(data)
>>> pm.add_dataframe(history)
>>> pm.check_outlier([-3, 3], window=3600, streaming=True)

>>> # Save the cleaned data and test results to the database
>>> pm.cleaned_data.to_sql('qc_data', engine, if_exists='append')
>>> pm.test_results.to_sql('qc_summary', engine, if_exists='append')

>>> # Create a monitoring report with test results and graphics
>>> test_results_graphics = pecos.graphics.plot_test_results(data, pm.test_results)
>>> filename = pecos.io.write_monitoring_report(pm.data, pm.test_results, test_results_graphics,
...             filename='monitoring_report_'+str(date)+'.html')

Configuration file

A configuration file can be used to store information about the system, data, and quality control tests. The configuration file is not used directly within Pecos, therefore there are no specific formatting requirements. Configuration files can be useful when using the same Python script to analyze several systems that have slightly different input requirements.

The examples/simple directory includes a configuration file, simple_config.yml, that defines system specifications, translation dictionary, composite signals, corrupt values, and bounds for range and increment tests. The script, simple_example_using_config.py uses this configuration file to run the simple example.

Specifications: 
  Frequency: 900  
  Multiplier: 10
  
Translation: 
  Wave: [C,D]

Composite Signals: 
- Wave Model: "np.sin({Multiplier}*{ELAPSED_TIME}/86400)"
- Wave Error: "np.abs(np.subtract({Wave}, {Wave Model}))"

Time Filter: "({CLOCK_TIME} > 3*3600) & ({CLOCK_TIME} < 21*3600)"

Corrupt: [-999]

Range:
  B: [0, 1]
  Wave: [-1, 1]
  Wave Error: [None, 0.25]
 
Delta:
  A: [0.0001, None]
  B: [0.0001, None]
  Wave: [0.0001, None] 
  
Increment:
  Wave: [None, 0.6] 

For some use cases, it is convenient to use strings of Python code in a configuration file to define time filters, quality control bounds, and composite signals. These strings can be evaluated using evaluate_string. WARNING this function calls eval . Strings of Python code should be thoroughly tested by the user.

For each {keyword} in the string, {keyword} is expanded in the following order:

  • If keyword is ELAPSED_TIME, CLOCK_TIME or EPOCH_TIME then data.index is converted to seconds (elapsed time, clock time, or epoch time) is used in the evaluation

  • If keyword is used to select a column (or columns) of data, then data[keyword] is used in the evaluation

  • If a translation dictionary is used to select a column (or columns) of data, then data[trans[keyword]] is used in the evaluation

  • If the keyword is a key in a dictionary of constants (specs), then specs[keyword] is used in the evaluation

For example, the time filter string is evaluated below.

>>> string_to_eval = "({CLOCK_TIME} > 3*3600) & ({CLOCK_TIME} < 21*3600)"
>>> time_filter = pecos.utils.evaluate_string(string_to_eval, df)

Data acquisition

Pecos includes basic data acquisition methods to transfer data from sensors to an SQL database. These methods require the Python packages sqlalchemy (https://www.sqlalchemy.org/) and minimalmodbus (https://minimalmodbus.readthedocs.io).

The device_to_client method collects data from a modbus device and stores it in a local MySQL database. The method requires several configuration options, which are stored as a nested dictionary. pyyaml can be used to store configuration options in a file. The options are stored in a Client block and a Devices block. The Devices block can define multiple devices and each device can have multiple data streams. The configuration options are described below.

  • Client: A dictionary that contains information about the client. The dictionary has the following keys:

    • IP: IP address (string)

    • Database: name of database (string)

    • Table: name of table (string)

    • Username: name of user (string)

    • Password: password for user (string)

    • Interval: data collection frequency in seconds (integer)

    • Retries: number of retries for each channel (integer)

  • Devices: A list of dictionaries that contain information about each device (one dictionary per device). Each dictionary has the following keys:

    • Name: modbus device name (string)

    • USB: serial connection (string) e.g. /dev/ttyUSB0 for linux

    • Address: modbus slave address (string)

    • Baud: data transfer rate in bits per second (integer)

    • Parity: parity of transmitted data for error checking (string). Possible values: N, E, O

    • Bytes: number of data bits (integer)

    • Stopbits: number of stop bits (integer)

    • Timeout: read timeout value in seconds (integer)

    • Data: A list of dictionaries that contain information about each data stream (one dictionary per data stream). Each dictionary has the following keys:

      • Name: data name (string)

      • Type: data type (string)

      • Scale: scaling factor (integer)

      • Conversion: conversion factor (float)

      • Channel: register number (integer)

      • Signed: define data as unsigned or signed (bool)

      • Fcode: modbus function code (integer). Possible values: 3,4

Example configuration options are shown below.

Client: 
  IP: 127.0.0.1
  Database: db_name
  Table: table_name
  Username: username
  Password: password
  Interval: 1 
  Retries: 2
Devices: 
- Name: Device1
  USB: /dev/ttyUSB0       
  Address: 21        
  Baud: 9600
  Parity: N
  Bytes: 8
  Stopbits: 1
  Timeout: 0.05
  Data:
  - Name: AmbientTemp
    Type: Temp
    Scale: 1
    Conversion: 1.0
    Channel: 0
    Signed: True
    Fcode: 4
  - Name: DC Power
    Type: Power
    Scale: 1
    Conversion: 1.0
    Channel: 1
    Signed: True
    Fcode: 4

Custom applications

While Pecos was initially developed to monitor photovoltaic systems, it is designed to be used for a wide range of applications. The ability to run the analysis within the Python environment enables the use of diverse analysis options that can be incorporated into Pecos, including application specific models. The software has been used to monitor energy systems in support of several Department of Energy projects, as described below.

Photovoltaic systems

Pecos was originally developed at Sandia National Laboratories in 2016 to monitor photovoltaic (PV) systems as part of the Department of Energy Regional Test Centers. Pecos is used to run daily analysis on data collected at several sites across the US. For PV systems, the translation dictionary can be used to group data according to the system architecture, which can include multiple strings and modules. The time filter can be defined based on sun position and system location. The data objects used in Pecos are compatible with PVLIB, which can be used to model PV systems [SHFH16]. Pecos also includes functions to compute PV specific metrics (i.e. insolation, performance ratio, clearness index) in the pv module. The International Electrotechnical Commission (IEC) has developed guidance to measure and analyze energy production from PV systems. Klise et al. [KlSC17] describe an application of IEC 61724-3, using Pecos and PVLIB. Pecos includes a PV system example in the examples/pv directory.

Marine renewable energy systems

In partnership with National Renewable Energy Laboratory (NREL) and Pacific Northwest National Laboratory (PNNL), Pecos was integrated into the Marine and Hydrokinetic Toolkit (MHKiT) to support research funded by the Department of Energy’s Water Power Technologies Office. MHKiT provides provides the marine renewable energy (MRE) community with tools for data quality control, resource assessment, and device performance which adhere to the International Electrotechnical Commission (IEC) Technical Committee’s, IEC TC 114. Pecos provides a quality control analysis on data collected from MRE systems including wave, tidal, and river systems.

Fossil energy systems

In partnership with National Energy Technology Laboratory (NETL), Pecos was extended to demonstrate real-time monitoring of coal-fired power plants in support of the Department of Energy’s Institute for the Design of Advanced Energy Systems (IDAES). As part of this demonstration, streaming algorithms were added to Pecos to facilitate near real-time analysis using continuous data streams.

Release Notes

v0.2.0 (master)

  • Replaced the use of Excel files in examples/tests with CSV files. The Excel files were causing test failures.

  • Added min_failures to the streaming outlier test

  • Replaced mutable default arguments with None

  • Removed pecos logo from monitoring reports

  • Added timestamp to logger

  • Updated documentation and tests

v0.1.9 (November 2, 2020)

  • Added the ability to use custom quality control test functions in static or streaming analysis. The methods, check_custom_static and check_custom_streaming, allow the user to supply a custom function that is used to determine if data is anomalous. The custom tests also allow the user to return metadata that contains information about the quality control test.

    • The streaming analysis loops through the data using a moving window to determine if data point is normal or anomalous. If the data point is deemed anomalous it is omitted from the history and not used to determine the status of subsequent data points.

    • The static analysis operates on the entire data set, and while it can include operations like moving windows, it does not update the history based on the test results.

  • The following input arguments were changed or added:

    • In check_outlier, the input argument window was changed to None (not used), absolute value was changed to False, and an input argument streaming was added to use streaming analysis (default value is False). Changed the order of key and window to be more consistent with other quality control tests.

    • In check_delta, the input argument window is no longer optional

  • Added property data to the PerformanceMonitoring class. pm.data is equivalent to pm.df (pm.df was retained for backward compatibility)

  • Added the ability to create monitoring reports using a LaTeX template. Small changes in the HTML template.

  • Added the option to specify a date format string to timeseries plots.

  • Fixed a bug in the way masks are generated. Data points that have Null values were being assigned to False, indicating that a quality control test failed. Null values are now assumed to be True, unless a specific test fails (e.g. check_missing).

  • Updated the boolean mask used in the code to have a consistent definition (True = data point pass all tests, False = data point did not pass at least one test.)

  • Added an example in the docs to illustrate analysis of continuous data

  • Added Python 3.8 tests

  • Updated documentation and tests

v0.1.8 (January 9, 2020)

  • Added properties to the PerformanceMonitoring object to return the following:

    • Boolean mask, pm.mask. Indicates data that failed a quality control test. This replaces the method ``get_test_results_mask``(API change).

    • Cleaned data, pm.cleaned_data. Data that failed a quality control test are replaced by NaN.

  • Added the ability to run quality control tests as individual functions. These functions allow the user to use Pecos without creating a PerformanceMonitoring object. Each function returns cleaned data, a boolean mask, and a summary of quality control test results.

  • io and graphics functions were updated to use specific components of the PerformanceMonitoring class (instead of requiring an instance of the class). This changes the API for write_monitoring_report, write_dashboard, and plot_test_results.

  • Filenames are now an optional parameter in io and graphics functions, this changes the API for write_metrics, write_test_results, and plot_test_results.

  • Updated metrics:

    • Added time_derivative which returns a derivative time series for each column of data

    • qci, rmse, time_integral, probability_of_detection, and false_alarm_rate now return 1 value per column of data (API change)

    • pv metrics were also updated to return 1 value per column (API change)

    • Deprecated per_day option. Data can be grouped by custom time intervals before computing metrics (API change)

  • Efficiency improvements to check_delta. As part of these changes, the optional input argument absolute_value has been removed and direction has been added (API change). If direction is set to positive, then the test only identify positive deltas (the min occurs before the max). If direction is set to negative, then the test only identify negative deltas (the max occurs before the min).

  • Timestamp indexes down to millisecond resolution are supported

  • Added additional helper functions in pecos.utils to convert to/from datetime indexes. Methods get_elapsed_time and get_clock_time were removed from the PerformanceMonitoring class (API change).

  • Moved functionality to evaluate strings from the PerformanceMonitoring class into a stand alone utility function (API change).

  • Removed option to smooth data using a rolling mean within the quality control tests (API change). Preprocessing steps should be done before the quality control test is run.

  • Added Python 3.7 tests, dropped Python 2.7 and 3.5 tests

  • Updated examples, tests, and documentation

v0.1.7 (June 2, 2018)

  • Added quality control test to identify outliers.

  • Bug fix to allow for sub-second data frequency.

  • Dropped ‘System Name’ from the analysis and test results, this added assumptions about column names in the code.

  • Changed ‘Start Date’ and ‘End Date’ to ‘Start Time’ and ‘End Time’ in the test results.

  • New data added to a PerformanceMonitoring object using add_dataframe now overrides existing data if there are shared indexes and columns.

  • Removed add_signal method, use add_dataframe instead.

  • Adding a translation dictionary to the analysis is now optional. A 1:1 map of column names is generated when data is added to the PerformanceMonitoring object using add_dataframe.

  • Added Python 3.6 tests.

  • Removed Python 3.4 tests (Pandas dropped support for Python 3.4 with version 0.21.0).

  • Updates to check_range requires Pandas 0.23.0.

  • Updated documentation, added doctests.

v0.1.6 (August 14, 2017)

  • Added readme and license file to manifest to fix pip install

v0.1.5 (June 23, 2017)

  • Added ability to check for regular or irregular timestamps in check_timestamp.

  • Added probability of detection and false alarm metrics.

  • Added check_delta method to check bounds on the difference between max and min data values within a rolling window.

  • Added graphics method to create interactive graphics using plotly.

  • Added graphics method to create day-of-year heatmaps.

  • Method named plot_colorblock changed to plot_heatmap (API change).

  • Added data acquisition method to transfer data from sensors to an SQL database.

  • Added dashboard example that uses Pandas Styling to color code tables based on values.

  • Added graphics tests.

  • Updated documentation.

v0.1.4 (December 15, 2016)

Some of the changes in this release are not backward compatible:

  • Added capability to allow multiple html links in dashboards (API change).

  • Updated send_email function to use smptlib (API change).

  • Added additional options in html reports to set figure size and image width.

  • Bug fix setting axis limits in figures.

  • Bug fix for reporting duplicate time steps.

  • Improved efficiency for get_clock_time function.

  • Added dashboard example that uses color blocks to indicate number of test failures.

  • Removed basic_pvlib_performance_model, the pv_example now uses pvlib directly to compute a basic model (API change).

v0.1.3 (August 2, 2016)

This is a minor release, changes include:

  • Bug fix for DataFrames using timezones. There was an issue retaining the timezone across the entire pecos analysis chain. The timezone was not stored properly in the testing results. This is a known pandas bug. The fix in Pecos includes stronger tests for analysis that use timezones.

  • The use of Jinja for html report templates

  • Cleaned up examples

v0.1.2 (June 6, 2016)

This is a minor release, changes include:

  • Minor changes to the software to support Python 3.4 and 3.5

  • Default image format changed from jpg to png

  • Datatables format options added to dashboards

  • Additional testing

v0.1.1 (May 6, 2016)

This is a minor release, changes include:

  • Added a pv module, includes basic methods to compute energy, insolation, performance ratio, performance index, energy yield, clearness index, and a basic pv performance model.

  • Added method to compute time integral and RMSE to metrics module

  • Cleaned up examples, API, and documentation

  • Software test harness run through TravisCI and analyzed using Coveralls

  • Documentation hosted on readthedocs

v0.1.0 (March 31, 2016)

This is the first official release of Pecos. Features include:

  • PerformanceMonitoring class used to run quality control tests and store results. The class includes the ability to add time filters and translation dictionaries. Quality control tests include checks for timestamp, missing and corrupt data, data out of range, and increment data out of range.

  • Quality control index used to quantify test failures

  • HTML report templates for monitoring reports and dashboards

  • Graphics capabilities

  • Basic tutorials

  • Preliminary software test harness, run using nosetests

  • Basic user manual including API documentation

Developers

The following services are used for software quality assurance:

Tests can be run locally using nosetests:

nosetests -v --with-coverage --cover-package=pecos pecos

Software developers are expected to follow standard practices to document and test new code. Pull requests will be reviewed by the core development team. See https://github.com/sandialabs/pecos/graphs/contributors for a list of contributors.

pecos package

Submodules

pecos.monitoring module

The monitoring module contains the PerformanceMonitoring class used to run quality control tests and store results. The module also contains individual functions that can be used to run quality control tests.

class pecos.monitoring.PerformanceMonitoring[source]

Bases: object

PerformanceMonitoring class

property data

Data used in quality control analysis, added to the PerformanceMonitoring object using add_dataframe.

property mask

Boolean mask indicating if data that failed a quality control test. True = data point pass all tests, False = data point did not pass at least one test.

property cleaned_data

Cleaned data set, data that failed a quality control test are replaced by NaN.

add_dataframe(data)[source]

Add data to the PerformanceMonitoring object

Parameters

data (pandas DataFrame) – Data to add to the PerformanceMonitoring object, indexed by datetime

add_translation_dictionary(trans)[source]

Add translation dictionary to the PerformanceMonitoring object

Parameters

trans (dictionary) – Translation dictionary

add_time_filter(time_filter)[source]

Add a time filter to the PerformanceMonitoring object

Parameters

time_filter (pandas DataFrame with a single column or pandas Series) – Time filter containing boolean values for each time index True = keep time index in the quality control results. False = remove time index from the quality control results.

check_timestamp(frequency, expected_start_time=None, expected_end_time=None, min_failures=1, exact_times=True)[source]

Check time series for missing, non-monotonic and duplicate timestamps

Parameters
  • frequency (int or float) – Expected time series frequency, in seconds

  • expected_start_time (Timestamp, optional) – Expected start time. If not specified, the minimum timestamp is used

  • expected_end_time (Timestamp, optional) – Expected end time. If not specified, the maximum timestamp is used

  • min_failures (int, optional) – Minimum number of consecutive failures required for reporting, default = 1

  • exact_times (bool, optional) – Controls how missing times are checked. If True, times are expected to occur at regular intervals (specified in frequency) and the DataFrame is reindexed to match the expected frequency. If False, times only need to occur once or more within each interval (specified in frequency) and the DataFrame is not reindexed.

check_range(bound, key=None, min_failures=1)[source]

Check for data that is outside expected range

Parameters
  • bound (list of floats) – [lower bound, upper bound], None can be used in place of a lower or upper bound

  • key (string, optional) – Data column name or translation dictionary key. If not specified, all columns are used in the test.

  • min_failures (int, optional) – Minimum number of consecutive failures required for reporting, default = 1

check_increment(bound, key=None, increment=1, absolute_value=True, min_failures=1)[source]

Check data increments using the difference between values

Parameters
  • bound (list of floats) – [lower bound, upper bound], None can be used in place of a lower or upper bound

  • key (string, optional) – Data column name or translation dictionary key. If not specified, all columns are used in the test.

  • increment (int, optional) – Time step shift used to compute difference, default = 1

  • absolute_value (boolean, optional) – Use the absolute value of the increment data, default = True

  • min_failures (int, optional) – Minimum number of consecutive failures required for reporting, default = 1

check_delta(bound, window, key=None, direction=None, min_failures=1)[source]

Check for stagnant data and/or abrupt changes in the data using the difference between max and min values (delta) within a rolling window

Parameters
  • bound (list of floats) – [lower bound, upper bound], None can be used in place of a lower or upper bound

  • window (int or float) – Size of the rolling window (in seconds) used to compute delta

  • key (string, optional) – Data column name or translation dictionary key. If not specified, all columns are used in the test.

  • direction (str, optional) –

    Options = ‘positive’, ‘negative’, or None

    • If direction is positive, then only identify positive deltas (the min occurs before the max)

    • If direction is negative, then only identify negative deltas (the max occurs before the min)

    • If direction is None, then identify both positive and negative deltas

  • min_failures (int, optional) – Minimum number of consecutive failures required for reporting, default = 1

check_outlier(bound, window=None, key=None, absolute_value=False, streaming=False, min_failures=1)[source]

Check for outliers using normalized data within a rolling window

The upper and lower bounds are specified in standard deviations. Data normalized using (data-mean)/std.

Parameters
  • bound (list of floats) – [lower bound, upper bound], None can be used in place of a lower or upper bound

  • window (int or float, optional) – Size of the rolling window (in seconds) used to normalize data, If window is set to None, data is normalized using the entire data sets mean and standard deviation (column by column). default = None.

  • key (string, optional) – Data column name or translation dictionary key. If not specified, all columns are used in the test.

  • absolute_value (boolean, optional) – Use the absolute value the normalized data, default = True

  • streaming (boolean, optional) – Indicates if streaming analysis should be used, default = False

  • min_failures (int, optional) – Minimum number of consecutive failures required for reporting, default = 1

check_missing(key=None, min_failures=1)[source]

Check for missing data

Parameters
  • key (string, optional) – Data column name or translation dictionary key. If not specified, all columns are used in the test.

  • min_failures (int, optional) – Minimum number of consecutive failures required for reporting, default = 1

check_corrupt(corrupt_values, key=None, min_failures=1)[source]

Check for corrupt data

Parameters
  • corrupt_values (list of int or floats) – List of corrupt data values

  • key (string, optional) – Data column name or translation dictionary key. If not specified, all columns are used in the test.

  • min_failures (int, optional) – Minimum number of consecutive failures required for reporting, default = 1

check_custom_static(quality_control_func, key=None, min_failures=1, error_message=None)[source]

Use custom functions that operate on the entire dataset at once to perform quality control analysis

Parameters
  • quality_control_func (function) – Function that operates on self.df and returns a mask and metadata

  • key (string, optional) – Data column name or translation dictionary key. If not specified, all columns are used in the test.

  • min_failures (int, optional) – Minimum number of consecutive failures required for reporting, default = 1

  • error_message (str, optional) – Error message

check_custom_streaming(quality_control_func, window, key=None, rebase=None, min_failures=1, error_message=None)[source]

Check for anomolous data using a streaming framework which removes anomolous data from the history after each timestamp. A custom quality control function is supplied by the user to determine if the data is anomolous.

Parameters
  • quality_control_func (function) – Function that determines if the last data point is normal or anomalous. Returns a mask and metadata for the last data point.

  • window (int or float) – Size of the rolling window (in seconds) used to define history If window is set to None, data is normalized using the entire data sets mean and standard deviation (column by column).

  • key (string, optional) – Data column name or translation dictionary key. If not specified, all columns are used in the test.

  • rebase (int, float, or None) – Value between 0 and 1 that indicates the fraction of default = None.

  • min_failures (int, optional) – Minimum number of consecutive failures required for reporting, default = 1

  • error_message (str, optional) – Error message

pecos.monitoring.check_timestamp(data, frequency, expected_start_time=None, expected_end_time=None, min_failures=1, exact_times=True)[source]

Check time series for missing, non-monotonic and duplicate timestamps

Parameters
  • data (pandas DataFrame) – Data used in the quality control test, indexed by datetime

  • frequency (int or float) – Expected time series frequency, in seconds

  • expected_start_time (Timestamp, optional) – Expected start time. If not specified, the minimum timestamp is used

  • expected_end_time (Timestamp, optional) – Expected end time. If not specified, the maximum timestamp is used

  • min_failures (int, optional) – Minimum number of consecutive failures required for reporting, default = 1

  • exact_times (bool, optional) – Controls how missing times are checked. If True, times are expected to occur at regular intervals (specified in frequency) and the DataFrame is reindexed to match the expected frequency. If False, times only need to occur once or more within each interval (specified in frequency) and the DataFrame is not reindexed.

Returns

dictionary – Results include cleaned data, mask, and test results summary

pecos.monitoring.check_range(data, bound, key=None, min_failures=1)[source]

Check for data that is outside expected range

Parameters
  • data (pandas DataFrame) – Data used in the quality control test, indexed by datetime

  • bound (list of floats) – [lower bound, upper bound], None can be used in place of a lower or upper bound

  • key (string, optional) – Data column name or translation dictionary key. If not specified, all columns are used in the test.

  • min_failures (int, optional) – Minimum number of consecutive failures required for reporting, default = 1

Returns

dictionary – Results include cleaned data, mask, and test results summary

pecos.monitoring.check_increment(data, bound, key=None, increment=1, absolute_value=True, min_failures=1)[source]

Check data increments using the difference between values

Parameters
  • data (pandas DataFrame) – Data used in the quality control test, indexed by datetime

  • bound (list of floats) – [lower bound, upper bound], None can be used in place of a lower or upper bound

  • key (string, optional) – Data column name or translation dictionary key. If not specified, all columns are used in the test.

  • increment (int, optional) – Time step shift used to compute difference, default = 1

  • absolute_value (boolean, optional) – Use the absolute value of the increment data, default = True

  • min_failures (int, optional) – Minimum number of consecutive failures required for reporting, default = 1

Returns

dictionary – Results include cleaned data, mask, and test results summary

pecos.monitoring.check_delta(data, bound, window, key=None, direction=None, min_failures=1)[source]

Check for stagnant data and/or abrupt changes in the data using the difference between max and min values (delta) within a rolling window

Parameters
  • data (pandas DataFrame) – Data used in the quality control test, indexed by datetime

  • bound (list of floats) – [lower bound, upper bound], None can be used in place of a lower or upper bound

  • window (int or float) – Size of the rolling window (in seconds) used to compute delta

  • key (string, optional) – Data column name or translation dictionary key. If not specified, all columns are used in the test.

  • direction (str, optional) –

    Options = ‘positive’, ‘negative’, or None

    • If direction is positive, then only identify positive deltas (the min occurs before the max)

    • If direction is negative, then only identify negative deltas (the max occurs before the min)

    • If direction is None, then identify both positive and negative deltas

  • min_failures (int, optional) – Minimum number of consecutive failures required for reporting, default = 1

Returns

dictionary – Results include cleaned data, mask, and test results summary

pecos.monitoring.check_outlier(data, bound, window=None, key=None, absolute_value=False, streaming=False, min_failures=1)[source]

Check for outliers using normalized data within a rolling window

The upper and lower bounds are specified in standard deviations. Data normalized using (data-mean)/std.

Parameters
  • data (pandas DataFrame) – Data used in the quality control test, indexed by datetime

  • bound (list of floats) – [lower bound, upper bound], None can be used in place of a lower or upper bound

  • window (int or float, optional) – Size of the rolling window (in seconds) used to normalize data, If window is set to None, data is normalized using the entire data sets mean and standard deviation (column by column). default = None.

  • key (string, optional) – Data column name or translation dictionary key. If not specified, all columns are used in the test.

  • absolute_value (boolean, optional) – Use the absolute value the normalized data, default = True

  • streaming (boolean, optional) – Indicates if streaming analysis should be used, default = False

  • min_failures (int, optional) – Minimum number of consecutive failures required for reporting, default = 1

Returns

dictionary – Results include cleaned data, mask, and test results summary

pecos.monitoring.check_missing(data, key=None, min_failures=1)[source]

Check for missing data

Parameters
  • data (pandas DataFrame) – Data used in the quality control test, indexed by datetime

  • key (string, optional) – Data column name or translation dictionary key. If not specified, all columns are used in the test.

  • min_failures (int, optional) – Minimum number of consecutive failures required for reporting, default = 1

Returns

dictionary – Results include cleaned data, mask, and test results summary

pecos.monitoring.check_corrupt(data, corrupt_values, key=None, min_failures=1)[source]

Check for corrupt data

Parameters
  • data (pandas DataFrame) – Data used in the quality control test, indexed by datetime

  • corrupt_values (list of int or floats) – List of corrupt data values

  • key (string, optional) – Data column name or translation dictionary key. If not specified, all columns are used in the test.

  • min_failures (int, optional) – Minimum number of consecutive failures required for reporting, default = 1

Returns

dictionary – Results include cleaned data, mask, and test results summary

pecos.monitoring.check_custom_static(data, quality_control_func, key=None, min_failures=1, error_message=None)[source]

Use custom functions that operate on the entire dataset at once to perform quality control analysis

Parameters
  • data (pandas DataFrame) – Data used in the quality control test, indexed by datetime

  • quality_control_func (function) – Function that operates on self.df and returns a mask and metadata

  • key (string, optional) – Data column name or translation dictionary key. If not specified, all columns are used in the test.

  • min_failures (int, optional) – Minimum number of consecutive failures required for reporting, default = 1

  • error_message (str, optional) – Error message

Returns

dictionary – Results include cleaned data, mask, test results summary, and metadata

pecos.monitoring.check_custom_streaming(data, quality_control_func, window, key=None, rebase=None, min_failures=1, error_message=None)[source]

Check for anomolous data using a streaming framework which removes anomolous data from the history after each timestamp. A custom quality control function is supplied by the user to determine if the data is anomolous.

Parameters
  • data (pandas DataFrame) – Data used in the quality control test, indexed by datetime

  • quality_control_func (function) – Function that determines if the last data point is normal or anomalous. Returns a mask and metadata for the last data point.

  • window (int or float) – Size of the rolling window (in seconds) used to define history If window is set to None, data is normalized using the entire data sets mean and standard deviation (column by column).

  • key (string, optional) – Data column name or translation dictionary key. If not specified, all columns are used in the test.

  • rebase (int, float, or None) – Value between 0 and 1 that indicates the fraction of default = None.

  • min_failures (int, optional) – Minimum number of consecutive failures required for reporting, default = 1

  • error_message (str, optional) – Error message

Returns

dictionary – Results include cleaned data, mask, test results summary, and metadata

pecos.metrics module

The metrics module contains metrics that describe the quality control analysis or compute quantities that might be of use in the analysis

pecos.metrics.qci(mask, tfilter=None)[source]

Compute the quality control index (QCI) for each column, defined as:

\(QCI=\dfrac{\sum_{t\in T}X_{dt}}{|T|}\)

where \(T\) is the set of timestamps in the analysis. \(X_{dt}\) is a data point for column \(d\) time t` that passed all quality control test. \(|T|\) is the number of data points in the analysis.

Parameters
  • mask (pandas DataFrame) – Test results mask, returned from pm.mask

  • tfilter (pandas Series, optional) – Time filter containing boolean values for each time index

Returns

pandas Series – Quality control index

pecos.metrics.rmse(data1, data2, tfilter=None)[source]

Compute the root mean squared error (RMSE) for each column, defined as:

\(RMSE=\sqrt{\dfrac{\sum{(data_1-data_2)^2}}{n}}\)

where \(data_1\) is a time series, \(data_2\) is a time series, and \(n\) is a number of data points.

Parameters
  • data1 (pandas DataFrame) – Data

  • data2 (pandas DataFrame) – Data. Note, the column names in data1 must equal the column names in data2

  • tfilter (pandas Series, optional) – Time filter containing boolean values for each time index

Returns

pandas Series – Root mean squared error

pecos.metrics.time_integral(data, tfilter=None)[source]

Compute the time integral (F) for each column, defined as:

\(F=\int{fdt}\)

where \(f\) is a column of data \(dt\) is the time step between observations. The integral is computed using the trapezoidal rule from numpy.trapz. Results are given in [original data units]*seconds. NaN values are set to 0 for integration.

Parameters
  • data (pandas DataFrame) – Data

  • tfilter (pandas Series, optional) – Time filter containing boolean values for each time index

Returns

pandas Series – Integral

pecos.metrics.time_derivative(data, tfilter=None)[source]

Compute the derivative (f’) of each column, defined as:

\(f'=\dfrac{df}{dt}\)

where \(f\) is a column of data \(dt\) is the time step between observations. The derivative is computed using central differences from numpy.gradient. Results are given in [original data units]/seconds.

Parameters
  • data (pandas DataFrame) – Data

  • tfilter (pandas Series, optional) – Filter containing boolean values for each time index

Returns

pandas DataFrame – Derivative of the data

pecos.metrics.probability_of_detection(observed, actual, tfilter=None)[source]

Compute probability of detection (PD) for each column, defined as:

\(PD=\dfrac{TP}{TP+FN}\)

where \(TP\) is number of true positives and \(FN\) is the number of false negatives.

Parameters
  • observed (pandas DataFrame) – Estimated conditions (True = background, False = anomalous), returned from pm.mask

  • actual (pandas DataFrame) – Actual conditions, (True = background, False = anomalous). Note, the column names in observed must equal the column names in actual

  • tfilter (pandas Series, optional) – Filter containing boolean values for each time index

Returns

pandas Series – Probability of detection

pecos.metrics.false_alarm_rate(observed, actual, tfilter=None)[source]

Compute false alarm rate (FAR) for each column, defined as:

\(FAR=\dfrac{TN}{TN+FP}\)

where \(TN\) is number of true negatives and \(FP\) is the number of false positives.

Parameters
  • estimated (pandas DataFrame) – Estimated conditions (True = background, False = anomalous), returned from pm.mask

  • actual (pandas DataFrame) – Actual conditions, (True = background, False = anomalous). Note, the column names in observed must equal the column names in actual.

  • tfilter (pandas Series, optional) – Filter containing boolean values for each time index

Returns

pandas Series – False alarm rate

pecos.io module

The io module contains functions to read/send data and write results to files/html reports.

pecos.io.read_campbell_scientific(filename, index_col='TIMESTAMP', encoding=None)[source]

Read Campbell Scientific CSV file.

Parameters
  • filename (string) – File name

  • index_col (string, optional) – Index column name, default = ‘TIMESTAMP’

  • encoding (string, optional) – Character encoding (i.e. utf-16)

Returns

pandas DataFrame – Data

pecos.io.send_email(subject, body, recipient, sender, attachment=None, host='localhost', username=None, password=None)[source]

Send email using Python smtplib and email packages.

Parameters
  • subject (string) – Subject text

  • body (string) – Email body, in HTML or plain format

  • recipient (list of string) – Recipient email address or addresses

  • sender (string) – Sender email address

  • attachment (string, optional) – Name of file to attach

  • host (string, optional) – Name of email host (or host:port), default = ‘localhost’

  • username (string, optional) – Email username for authentication

  • password (string, optional) – Email password for authentication

pecos.io.write_metrics(metrics, filename='metrics.csv')[source]

Write metrics file.

Parameters
  • metrics (pandas DataFrame) – Data to add to the metrics file

  • filename (string, optional) – File name. If the full path is not provided, the file is saved into the current working directory. By default, the file is named ‘metrics.csv’

Returns

string – filename

pecos.io.write_test_results(test_results, filename='test_results.csv')[source]

Write test results file.

Parameters
  • test_results (pandas DataFrame) – Summary of the quality control test results (pm.test_results)

  • filename (string, optional) – File name. If the full path is not provided, the file is saved into the current working directory. By default, the file is named ‘test_results.csv’

Returns

string – filename

pecos.io.write_monitoring_report(data, test_results, test_results_graphics=None, custom_graphics=None, metrics=None, title='Pecos Monitoring Report', config=None, logo=False, im_width_test_results=1, im_width_custom=1, im_width_logo=0.1, encode=False, file_format='html', filename='monitoring_report.html')[source]

Generate a monitoring report. The monitoring report is used to report quality control test results for a single system. The report includes custom graphics, performance metrics, and test results.

Parameters
  • data (pandas DataFrame) – Data, indexed by time (pm.data)

  • test_results (pandas DataFrame) – Summary of the quality control test results (pm.test_results)

  • test_results_graphics (list of strings or None, optional) – Graphics files, with full path. These graphics highlight data points that failed a quality control test, created using pecos.graphics.plot_test_results(). If None, test results graphics are not included in the report.

  • custom_graphics (list of strings or None, optional) – Custom files, with full path. Created by the user. If None, custom graphics are not included in the report.

  • metrics (pandas Series or DataFrame, optional) – Performance metrics to add as a table to the monitoring report

  • title (string, optional) – Monitoring report title, default = ‘Pecos Monitoring Report’

  • config (dictionary or None, optional) – Configuration options, to be printed at the end of the report. If None, configuration options are not included in the report.

  • logo (string, optional) – Graphic to be added to the report header

  • im_width_test_results (float, optional) – Image width as a fraction of page size, for test results graphics, default = 1

  • im_width_custom (float, optional) – Image width as a fraction of page size, for custom graphics, default = 1

  • im_width_logo (float, optional) – Image width as a fraction of page size, for the logo, default = 0.1

  • encode (boolean, optional) – Encode graphics in the html, default = False

  • filename (string, optional) – File name. If the full path is not provided, the file is saved into the current working directory. By default, the file is named ‘monitoring_report.html’

Returns

string – filename

pecos.io.write_dashboard(column_names, row_names, content, title='Pecos Dashboard', footnote='', logo=False, im_width=250, datatables=False, encode=False, filename='dashboard.html')[source]

Generate a dashboard. The dashboard is used to compare results across multiple systems. Each cell in the dashboard includes custom system graphics and metrics.

Parameters
  • column_names (list of strings) – Column names listed in the order they should appear in the dashboard, i.e. [‘location1’, ‘location2’]

  • row_names (list of strings) – Row names listed in the order they should appear in the dashboard, i.e. [‘system1’, ‘system2’]

  • content (dictionary) –

    Dashboard content for each cell.

    Dictionary keys are tuples indicating the row name and column name, i.e. (‘row name’, ‘column name’), where ‘row name’ is in the list row_names and ‘column name’ is in the list column_names.

    For each key, another dictionary is defined that contains the content to be included in each cell of the dashboard. Each cell can contain text, graphics, a table, and an html link. These are defined using the following keys:

    • text (string) = text at the top of each cell

    • graphics (list of strings) = a list of graphics file names. Each file name includes the full path

    • table (string) = a table in html format, for example a table of performance metrics. DataFrames can be converted to an html string using df.to_html() or df.transpose().to_html(). Values in the table can be color coded using pandas Styler class.

    • link (dict) = a dictionary where keys define the name of the link and values define the html link (with full path)

    For example:

    content = {('row name', 'column name'): {
        'text': 'text at the top',
        'graphic': ['C:\\pecos\\results\\custom_graphic.png'],
        'table': df.to_html(),
        'link': {'Link to monitoring report': 'C:\\pecos\\results\\monitoring_report.html'}}
    

  • title (string, optional) – Dashboard title, default = ‘Pecos Dashboard’

  • footnote (string, optional) – Text to be added to the end of the report

  • logo (string, optional) – Graphic to be added to the report header

  • im_width (float, optional) – Image width in the HTML report, default = 250

  • datatables (boolean, optional) – Use datatables.net to format the dashboard, default = False. See https://datatables.net/ for more information.

  • encode (boolean, optional) – Encode graphics in the html, default = False

  • filename (string, optional) – File name. If the full path is not provided, the file is saved into the current working directory. By default, the file is named ‘dashboard.html’

Returns

string – filename

pecos.io.device_to_client(config)[source]

Read channels on modbus device, scale and calibrate the values, and store the data in a MySQL database. The inputs are provided by a configuration dictionary that describe general information for data acquisition and the devices.

Parameters

config (dictionary) – Configuration options, see Data acquisition

pecos.graphics module

The graphics module contains functions to generate scatter, time series, and heatmap plots for reports.

pecos.graphics.plot_scatter(x, y, xaxis_min=None, xaxis_max=None, yaxis_min=None, yaxis_max=None, title=None, figsize=(7.0, 3.0))[source]

Create a scatter plot. If x and y have the same number of columns, then the columns of x are plotted against the corresponding columns of y, in order. If x (or y) has 1 column, then that column of data is plotted against all the columns in y (or x).

Parameters
  • x (pandas DataFrame) – X data

  • y (pandas DataFrame) – Y data

  • xaxis_min (float, optional) – X-axis minimum, default = None (autoscale)

  • xaxis_max (float, optional) – X-axis maximum, default = None (autoscale)

  • yaxis_min (float, optional) – Y-axis minimum, default = None (autoscale)

  • yaxis_max (float, optional) – Y-axis maximum, default = None (autoscale)

  • title (string, optional) – Title, default = None

  • figsize (tuple, optional) – Figure size, default = (7.0, 3.0)

pecos.graphics.plot_timeseries(data, tfilter=None, test_results_group=None, xaxis_min=None, xaxis_max=None, yaxis_min=None, yaxis_max=None, title=None, figsize=(7.0, 3.0), date_formatter=None)[source]

Create a time series plot using each column in the DataFrame.

Parameters
  • data (pandas DataFrame or Series) – Data, indexed by time

  • tfilter (pandas Series, optional) – Boolean values used to include time filter in the plot, default = None

  • test_results_group (pandas DataFrame, optional) – Test results for the data default = None

  • xaxis_min (float, optional) – X-axis minimum, default = None (autoscale)

  • xaxis_max (float, optional) – X-axis maximum, default = None (autoscale)

  • yaxis_min (float, optional) – Y-axis minimum, default = None (autoscale)

  • yaxis_max (float, optional) – Y-axis maximum, default = None (autoscale)

  • title (string, optional) – Title, default = None

  • figsize (tuple, optional) – Figure size, default = (7.0, 3.0)

  • date_formatter (string, optional) – Date formatter used on the x axis, for example, “%m-%d”. Default = None

pecos.graphics.plot_interactive_timeseries(data, xaxis_min=None, xaxis_max=None, yaxis_min=None, yaxis_max=None, title=None, filename=None, auto_open=True)[source]

Create a basic interactive time series graphic using plotly. Many more options are available, see https://plot.ly for more details.

Parameters
  • data (pandas DataFrame) – Data, indexed by time

  • xaxis_min (float, optional) – X-axis minimum, default = None (autoscale)

  • xaxis_max (float, optional) – X-axis maximum, default = None (autoscale)

  • yaxis_min (float, optional) – Y-axis minimum, default = None (autoscale)

  • yaxis_max (float, optional) – Y-axis maximum, default = None (autoscale)

  • title (string, optional) – Title, default = None

  • filename (string, optional) – HTML file name, default = None (file will be named temp-plot.html)

  • auto_open (boolean, optional) – Flag indicating if HTML graphic is opened, default = True

pecos.graphics.plot_heatmap(data, colors=None, nColors=12, cmap=None, vmin=None, vmax=None, show_axis=False, title=None, figsize=(5.0, 5.0))[source]

Create a heatmap. Default color scheme is red to yellow to green with 12 colors. This function can be used to generate dashboards with simple color indicators in each cell (to remove borders use bbox_inches=’tight’ and pad_inches=0 when saving the image).

Parameters
  • data (pandas DataFrame, pandas Series, or numpy array) – Data

  • colors (list or None, optional) – List of colors, colors can be specified in any way understandable by matplotlib.colors.ColorConverter.to_rgb(). If None, colors transitions from red to yellow to green.

  • num_colors (int, optional) – Number of colors in the colormap, default = 12

  • cmap (string, optional) – Colormap, default = None. Overrides colors and num_colors listed above.

  • vmin (float, optional) – Colomap minimum, default = None (autoscale)

  • vmax (float, optional) – Colomap maximum, default = None (autoscale)

  • title (string, optional) – Title, default = None

  • figsize (tuple, optional) – Figure size, default = (5.0, 5.0)

pecos.graphics.plot_doy_heatmap(data, cmap='nipy_spectral', vmin=None, vmax=None, overlay=None, title=None, figsize=(7.0, 3.0))[source]

Create a day-of-year (X-axis) vs. time-of-day (Y-axis) heatmap.

Parameters
  • data (pandas DataFrame or pandas Series) – Data (single column), indexed by time

  • cmap (string, optional) – Colomap, default = nipy_spectral

  • vmin (float, optional) – Colomap minimum, default = None (autoscale)

  • vmax (float, optional) – Colomap maximum, default = None (autoscale)

  • overlay (pandas DataFrame, optional) – Data to overlay on the heatmap. Time index should be in day-of-year (X-axis) Values should be in time-of-day in minutes (Y-axis)

  • title (string, optional) – Title, default = None

  • figsize (tuple, optional) – Figure size, default = (7.0, 3.0)

pecos.graphics.plot_test_results(data, test_results, tfilter=None, image_format='png', dpi=500, figsize=(7.0, 3.0), date_formatter=None, filename_root='test_results')[source]

Create test results graphics which highlight data points that failed a quality control test.

Parameters
  • data (pandas DataFrame) – Data, indexed by time (pm.data)

  • test_results (pandas DataFrame) – Summary of the quality control test results (pm.test_results)

  • tfilter (pandas Series, optional) – Boolean values used to include time filter in the plot, default = None

  • image_format (string , optional) – Image format, default = ‘png’

  • dpi (int, optional) – DPI resolution, default = 500

  • figsize (tuple, optional) – Figure size, default = (7.0,3.0)

  • date_formatter (string, optional) – Date formatter used on the x axis, for example, “%m-%d”. Default = None

  • filename_root (string, optional) – File name root. If the full path is not provided, files are saved into the current working directory. Each graphic filename is appended with an integer. For example, filename_root = ‘test’ will generate a files named ‘test0.png’, ‘test1.png’, etc. By default, the filename root is ‘test_results’

Returns

A list of file names

pecos.utils module

The utils module contains helper functions.

pecos.utils.index_to_datetime(index, unit='s', origin='unix')[source]

Convert DataFrame index from int/float to datetime, rounds datetime to the nearest millisecond

Parameters
  • index (pandas Index) – DataFrame index in int or float

  • unit (str, optional) – Units of the original index

  • origin (str) – Reference date used to define the starting time. If origin = ‘unix’, the start time is ‘1970-01-01 00:00:00’ The origin can also be defined using a datetime string in a similar format (i.e. ‘2019-05-17 16:05:45’)

Returns

pandas Index – DataFrame index in datetime

pecos.utils.datetime_to_elapsedtime(index, origin=0.0)[source]

Convert DataFrame index from datetime to elapsed time in seconds

Parameters
  • index (pandas Index) – DataFrame index in datetime

  • origin (float) – Reference for elapsed time

Returns

pandas Index – DataFrame index in elapsed seconds

pecos.utils.datetime_to_clocktime(index)[source]

Convert DataFrame index from datetime to clocktime (seconds past midnight)

Parameters

index (pandas Index) – DataFrame index in datetime

Returns

pandas Index – DataFrame index in clocktime

pecos.utils.datetime_to_epochtime(index)[source]

Convert DataFrame index from datetime to epoch time

Parameters

index (pandas Index) – DataFrame index in datetime

Returns

pandas Index – DataFrame index in epoch time

pecos.utils.round_index(index, frequency, how='nearest')[source]

Round DataFrame index

Parameters
  • index (pandas Index) – Datetime index

  • frequency (int) – Expected time series frequency, in seconds

  • how (string, optional) –

    Method for rounding, default = ‘nearest’. Options include:

    • nearest = round the index to the nearest frequency

    • floor = round the index to the smallest expected frequency

    • ceiling = round the index to the largest expected frequency

Returns

pandas Index – DataFrame index with rounded values

pecos.utils.evaluate_string(string_to_eval, data=None, trans=None, specs=None, col_name='eval')[source]

Returns an evaluated Python string. WARNING this function calls ‘eval’. Strings of Python code should be thoroughly tested by the user.

This function can be useful when defining quality control configuration options in a file, such as:

  • Time filters that depend on the data index

  • Quality control bounds that depend on system constants

  • Composite signals that are defined using existing data

For each {keyword} in string_to_eval, {keyword} is expanded in the following order:

  • If keyword is ELAPSED_TIME, CLOCK_TIME or EPOCH_TIME then data.index is converted to seconds (elapsed time, clock time, or epoch time) and used in the evaluation (requires data)

  • If keyword is used to select a column (or columns) of data, then data[keyword] is used in the evaluation (requires data)

  • If a translation dictionary is used to select a column (or columns) of data, then data[trans[keyword]] is used in the evaluation (requires data and trans)

  • If the keyword is a key in a dictionary of constants, specs, then specs[keyword] is used in the evaluation (requires specs)

Parameters
  • string_to_eval (string) – String to evaluate, the string can included multiple keywords and numpy (np.*) and pandas (pd.*) functions

  • data (pandas DataFrame, optional) – Data, indexed by datetime

  • trans (dictionary, optional) – Translation dictionary

  • specs (dictionary, optional) – Keyword:value pairs used to define constants

  • col_name (string, optional) – Column name used in the returned DataFrame. If the DataFrame has more than one column, columns are named col_name 0, col_name 1, …

Returns

pandas DataFrame or float – Evaluated string

pecos.logger module

The logger module contains a function to initialize the logger. Logger warnings are printed to the monitoring report.

pecos.logger.initialize()[source]

Initialize the pecos logger. Warnings are printed to the monitoring report.

pecos.pv module

The pv module contains custom methods for PV applications.

pecos.pv.insolation(G, tfilter=None)[source]

Compute insolation defined as:

\(H=\int{Gdt}\)

where \(G\) is irradiance and \(dt\) is the time step between observations. The time integral is computed using the trapezoidal rule. Results are given in [irradiance units]*seconds.

Parameters
  • G (pandas DataFrame) – Irradiance time series

  • tfilter (pandas Series, optional) – Time filter containing boolean values for each time index

Returns

pandas Series – Insolation

pecos.pv.energy(P, tfilter=None)[source]

Convert energy defined as:

\(E=\int{Pdt}\)

where \(P\) is power and \(dt\) is the time step between observations. The time integral is computed using the trapezoidal rule. Results are given in [power units]*seconds.

Parameters
  • P (pandas DataFrame) – Power time series

  • tfilter (pandas Series, optional) – Time filter containing boolean values for each time index

Returns

pandas Series – Energy

pecos.pv.performance_ratio(E, H_poa, P_ref, G_ref=1000)[source]

Compute performance ratio defined as:

\(PR=\dfrac{Y_{f}}{Yr} = \dfrac{\dfrac{E}{P_{ref}}}{\dfrac{H_{poa}}{G_{ref}}}\)

where \(Y_f\) is the observed energy (AC or DC) produced by the PV system (kWh) divided by the DC power rating at STC conditions. \(Y_r\) is the plane-of-array insolation (kWh/m2) divided by the reference irradiance (1000 W/m2).

Parameters
  • E (pandas Series or float) – Energy (AC or DC)

  • H_poa (pandas Series or float) – Plane of array insolation

  • P_ref (float) – DC power rating at STC conditions

  • G_ref (float, optional) – Reference irradiance, default = 1000

Returns

pandas Series or float – Performance ratio in a pandas Series (if E or H_poa are Series) or float (if E and H_poa are floats)

pecos.pv.normalized_current(I, G_poa, I_sco, G_ref=1000)[source]

Compute normalized current defined as:

\(NI = \dfrac{\dfrac{I}{I_{sco}}}{\dfrac{G_{poa}}{G_{ref}}}\)

where \(I\) is current, \(I_{sco}\) is the short circuit current at STC conditions, \(G_{poa}\) is the plane-of-array irradiance, and \(G_{ref}\) is the reference irradiance.

Parameters
  • I (pandas Series or float) – Current

  • G_poa (pandas Series or float) – Plane of array irradiance

  • I_sco (float) – Short circuit current at STC conditions

  • G_ref (float, optional) – Reference irradiance, default = 1000

Returns

pandas Series or float – Normalized current in a pandas Series (if I or G_poa are Series) or float (if I and G_poa are floats)

pecos.pv.normalized_efficiency(P, G_poa, P_ref, G_ref=1000)[source]

Compute normalized efficiency defined as:

\(NE = \dfrac{\dfrac{P}{P_{ref}}}{\dfrac{G_{poa}}{G_{ref}}}\)

where \(P\) is the observed power (AC or DC), \(P_{ref}\) is the DC power rating at STC conditions, \(G_{poa}\) is the plane-of-array irradiance, and \(G_{ref}\) is the reference irradiance.

Parameters
  • P (pandas Series or float) – Power (AC or DC)

  • G_poa (pandas Series or float) – Plane of array irradiance

  • P_ref (float) – DC power rating at STC conditions

  • G_ref (float, optional) – Reference irradiance, default = 1000

Returns

pandas Series or float – Normalized efficiency in a pandas Series (if P or G_poa are Series) or float (if P and G_poa are floats)

pecos.pv.performance_index(E, E_predicted)[source]

Compute performance index defined as:

\(PI=\dfrac{E}{\hat{E}}\)

where \(E\) is the observed energy from a PV system and \(\hat{E}\) is the predicted energy over the same time frame. \(\hat{E}\) can be computed using methods in pvlib.pvsystem and then convert power to energy using pecos.pv.enery.

Unlike with the performance ratio, the performance index should be very close to 1 for a well functioning PV system and should not vary by season due to temperature variations.

Parameters
  • E (pandas Series or float) – Observed energy

  • E_predicted (pandas Series or float) – Predicted energy

Returns

pandas Series or float – Performance index in a pandas Series (if E or E_predicted are Series) or float (if E and E_predicted are floats)

pecos.pv.energy_yield(E, P_ref)[source]

Compute energy yield is defined as:

\(EY=\dfrac{E}{P_{ref}}\)

where \(E\) is the observed energy from a PV system and \(P_{ref}\) is the DC power rating of the system at STC conditions.

Parameters
  • E (pandas Series or float) – Observed energy

  • P_ref (float) – DC power rating at STC conditions

Returns

pandas Series or float – Energy yield

pecos.pv.clearness_index(H_dn, H_ea)[source]

Compute clearness index defined as:

\(Kt=\dfrac{H_{dn}}{H_{ea}}\)

where \(H_{dn}\) is the direct-normal insolation (kWh/m2) \(H_{ea}\) is the extraterrestrial insolation (kWh/m2) over the same time frame. Extraterrestrial irradiation can be computed using pvlib.irradiance.extraradiation. Irradiation can be converted to insolation using pecos.pv.insolation.

Parameters
  • H_dn (pandas Series or float) – Direct normal insolation

  • H_ea (pandas Series or float) – Extraterrestrial insolation

Returns

pandas Series or float – Clearness index in a pandas Series (if H_dn or H_ea are Series) or float (if H_dn and H_ea are floats)

References

HMKC07

Hart, D., McKenna, S.A., Klise, K., Cruz, V., & Wilson, M. (2007) Water quality event detection systems for drinking water contamination warning systems: Development testing and application of CANARY, World Environmental and Water Resources Congress (EWRI), Tampa, FL, May 15-19.

Hunt07

Hunter, J.D. (2007). Matplotlib: A 2D graphics environment. Computing in Science & Engineering, 3(9), 90-95.

KlSt16a

Klise, K.A., Stein, J.S. (2016). Performance Monitoring using Pecos, Technical Report SAND2016-3583, Sandia National Laboratories.

KlSt16b

Klise, K.A., Stein, J.S. (2016). Automated Performance Monitoring for PV Systems using Pecos, 43th Photovoltaic Specialists Conference (PVSC), Portland, OR, June 5-10.

KlSC17

Klise, K.A., Stein, J.S., Cunningham, J. (2017). Application of IEC 61724 Standards to Analyze PV System Performance in Different Climates, 44th Photovoltaic Specialists Conference (PVSC), Washington, DC, June 25-30.

Mcki13

McKinney W. (2013). Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython. O’Reilly Media, 1 edition, 466P.

Rona08

Ronacher, A. (2008). Template Designer Documentation, http://jinja.pocoo.org/docs/dev/templates/ accessed July 1, 2016.

SHFH16

Stein, J.S., Holmgren, W.F., Forbess, J., & Hansen, C.W. (2016). PVLIB: Open Source Photovoltaic Performance Modeling Functions for Matlab and Python, 43rd Photovoltaic Specialists Conference (PVSC), Portland, OR, June 5-10.

SPHC16

Sievert, C., Parmer, C., Hocking, T., Chamberlain, S., Ram, K., Corvellec, M., and Despouy, P. (2016). plotly: Create interactive web graphics via Plotly’s JavaScript graphing library [Software].

VaCV11

van der Walt, S., Colbert, S.C., & Varoquaux, G. (2011). The NumPy array: A structure for efficient numerical computation. Computing in Science & Engineering, 13, 22-30.

Indices and tables

Sandia National Laboratories is a multimission laboratory managed and operated by National Technology and Engineering Solutions of Sandia, LLC., a wholly owned subsidiary of Honeywell International, Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA-0003525.