pecos.monitoring module¶

The monitoring module contains the PerformanceMonitoring class used to run quality control tests and store results. The module also contains individual functions that can be used to run quality control tests.

class pecos.monitoring.PerformanceMonitoring[source]¶

Bases: object

PerformanceMonitoring class

property data¶: Data used in quality control analysis, added to the PerformanceMonitoring object using add_dataframe.

property mask¶: Boolean mask indicating if data that failed a quality control test. True = data point pass all tests, False = data point did not pass at least one test.

property cleaned_data¶: Cleaned data set, data that failed a quality control test are replaced by NaN.

add_dataframe(data)[source]¶

Add data to the PerformanceMonitoring object

Parameters: data (pandas DataFrame) – Data to add to the PerformanceMonitoring object, indexed by datetime

add_translation_dictionary(trans)[source]¶

Add translation dictionary to the PerformanceMonitoring object

Parameters: trans (dictionary) – Translation dictionary

add_time_filter(time_filter)[source]¶

Add a time filter to the PerformanceMonitoring object

Parameters: time_filter (pandas DataFrame with a single column or pandas Series) – Time filter containing boolean values for each time index True = keep time index in the quality control results. False = remove time index from the quality control results.

check_timestamp(frequency, expected_start_time=None, expected_end_time=None, min_failures=1, exact_times=True)[source]¶

Check time series for missing, non-monotonic and duplicate timestamps

Parameters

frequency (int or float) – Expected time series frequency, in seconds
expected_start_time (Timestamp, optional) – Expected start time. If not specified, the minimum timestamp is used
expected_end_time (Timestamp, optional) – Expected end time. If not specified, the maximum timestamp is used
min_failures (int, optional) – Minimum number of consecutive failures required for reporting, default = 1
exact_times (bool, optional) – Controls how missing times are checked. If True, times are expected to occur at regular intervals (specified in frequency) and the DataFrame is reindexed to match the expected frequency. If False, times only need to occur once or more within each interval (specified in frequency) and the DataFrame is not reindexed.

check_range(bound, key=None, min_failures=1)[source]¶

Check for data that is outside expected range

Parameters

bound (list of floats) – [lower bound, upper bound], None can be used in place of a lower or upper bound
key (string, optional) – Data column name or translation dictionary key. If not specified, all columns are used in the test.
min_failures (int, optional) – Minimum number of consecutive failures required for reporting, default = 1

check_increment(bound, key=None, increment=1, absolute_value=True, min_failures=1)[source]¶

Check data increments using the difference between values

Parameters

bound (list of floats) – [lower bound, upper bound], None can be used in place of a lower or upper bound
key (string, optional) – Data column name or translation dictionary key. If not specified, all columns are used in the test.
increment (int, optional) – Time step shift used to compute difference, default = 1
absolute_value (boolean, optional) – Use the absolute value of the increment data, default = True
min_failures (int, optional) – Minimum number of consecutive failures required for reporting, default = 1

check_delta(bound, window, key=None, direction=None, min_failures=1)[source]¶

Check for stagnant data and/or abrupt changes in the data using the difference between max and min values (delta) within a rolling window

Parameters

bound (list of floats) – [lower bound, upper bound], None can be used in place of a lower or upper bound
window (int or float) – Size of the rolling window (in seconds) used to compute delta
key (string, optional) – Data column name or translation dictionary key. If not specified, all columns are used in the test.
direction (str, optional) –
Options = ‘positive’, ‘negative’, or None
- If direction is positive, then only identify positive deltas (the min occurs before the max)
- If direction is negative, then only identify negative deltas (the max occurs before the min)
- If direction is None, then identify both positive and negative deltas
min_failures (int, optional) – Minimum number of consecutive failures required for reporting, default = 1

check_outlier(bound, window=None, key=None, absolute_value=False, streaming=False, min_failures=1)[source]¶

Check for outliers using normalized data within a rolling window

The upper and lower bounds are specified in standard deviations. Data normalized using (data-mean)/std.

Parameters

bound (list of floats) – [lower bound, upper bound], None can be used in place of a lower or upper bound
window (int or float, optional) – Size of the rolling window (in seconds) used to normalize data, If window is set to None, data is normalized using the entire data sets mean and standard deviation (column by column). default = None.
key (string, optional) – Data column name or translation dictionary key. If not specified, all columns are used in the test.
absolute_value (boolean, optional) – Use the absolute value the normalized data, default = True
streaming (boolean, optional) – Indicates if streaming analysis should be used, default = False
min_failures (int, optional) – Minimum number of consecutive failures required for reporting, default = 1

check_missing(key=None, min_failures=1)[source]¶

Check for missing data

Parameters

key (string, optional) – Data column name or translation dictionary key. If not specified, all columns are used in the test.
min_failures (int, optional) – Minimum number of consecutive failures required for reporting, default = 1

check_corrupt(corrupt_values, key=None, min_failures=1)[source]¶

Check for corrupt data

Parameters

corrupt_values (list of int or floats) – List of corrupt data values
key (string, optional) – Data column name or translation dictionary key. If not specified, all columns are used in the test.
min_failures (int, optional) – Minimum number of consecutive failures required for reporting, default = 1

check_custom_static(quality_control_func, key=None, min_failures=1, error_message=None)[source]¶

Use custom functions that operate on the entire dataset at once to perform quality control analysis

Parameters

quality_control_func (function) – Function that operates on self.df and returns a mask and metadata
key (string, optional) – Data column name or translation dictionary key. If not specified, all columns are used in the test.
min_failures (int, optional) – Minimum number of consecutive failures required for reporting, default = 1
error_message (str, optional) – Error message

check_custom_streaming(quality_control_func, window, key=None, rebase=None, min_failures=1, error_message=None)[source]¶

Check for anomolous data using a streaming framework which removes anomolous data from the history after each timestamp. A custom quality control function is supplied by the user to determine if the data is anomolous.

Parameters

quality_control_func (function) – Function that determines if the last data point is normal or anomalous. Returns a mask and metadata for the last data point.
window (int or float) – Size of the rolling window (in seconds) used to define history If window is set to None, data is normalized using the entire data sets mean and standard deviation (column by column).
key (string, optional) – Data column name or translation dictionary key. If not specified, all columns are used in the test.
rebase (int, float, or None) – Value between 0 and 1 that indicates the fraction of default = None.
min_failures (int, optional) – Minimum number of consecutive failures required for reporting, default = 1
error_message (str, optional) – Error message

pecos.monitoring.check_timestamp(data, frequency, expected_start_time=None, expected_end_time=None, min_failures=1, exact_times=True)[source]¶

Check time series for missing, non-monotonic and duplicate timestamps

Parameters

data (pandas DataFrame) – Data used in the quality control test, indexed by datetime
frequency (int or float) – Expected time series frequency, in seconds
expected_start_time (Timestamp, optional) – Expected start time. If not specified, the minimum timestamp is used
expected_end_time (Timestamp, optional) – Expected end time. If not specified, the maximum timestamp is used
min_failures (int, optional) – Minimum number of consecutive failures required for reporting, default = 1
exact_times (bool, optional) – Controls how missing times are checked. If True, times are expected to occur at regular intervals (specified in frequency) and the DataFrame is reindexed to match the expected frequency. If False, times only need to occur once or more within each interval (specified in frequency) and the DataFrame is not reindexed.

Returns

dictionary – Results include cleaned data, mask, and test results summary

pecos.monitoring.check_range(data, bound, key=None, min_failures=1)[source]¶

Check for data that is outside expected range

Parameters

data (pandas DataFrame) – Data used in the quality control test, indexed by datetime
bound (list of floats) – [lower bound, upper bound], None can be used in place of a lower or upper bound
key (string, optional) – Data column name or translation dictionary key. If not specified, all columns are used in the test.
min_failures (int, optional) – Minimum number of consecutive failures required for reporting, default = 1

Returns

dictionary – Results include cleaned data, mask, and test results summary

pecos.monitoring.check_increment(data, bound, key=None, increment=1, absolute_value=True, min_failures=1)[source]¶

Check data increments using the difference between values

Parameters

data (pandas DataFrame) – Data used in the quality control test, indexed by datetime
bound (list of floats) – [lower bound, upper bound], None can be used in place of a lower or upper bound
key (string, optional) – Data column name or translation dictionary key. If not specified, all columns are used in the test.
increment (int, optional) – Time step shift used to compute difference, default = 1
absolute_value (boolean, optional) – Use the absolute value of the increment data, default = True
min_failures (int, optional) – Minimum number of consecutive failures required for reporting, default = 1

Returns

dictionary – Results include cleaned data, mask, and test results summary

pecos.monitoring.check_delta(data, bound, window, key=None, direction=None, min_failures=1)[source]¶

Check for stagnant data and/or abrupt changes in the data using the difference between max and min values (delta) within a rolling window

Parameters

data (pandas DataFrame) – Data used in the quality control test, indexed by datetime
bound (list of floats) – [lower bound, upper bound], None can be used in place of a lower or upper bound
window (int or float) – Size of the rolling window (in seconds) used to compute delta
key (string, optional) – Data column name or translation dictionary key. If not specified, all columns are used in the test.
direction (str, optional) –
Options = ‘positive’, ‘negative’, or None
- If direction is positive, then only identify positive deltas (the min occurs before the max)
- If direction is negative, then only identify negative deltas (the max occurs before the min)
- If direction is None, then identify both positive and negative deltas
min_failures (int, optional) – Minimum number of consecutive failures required for reporting, default = 1

Returns

dictionary – Results include cleaned data, mask, and test results summary

pecos.monitoring.check_outlier(data, bound, window=None, key=None, absolute_value=False, streaming=False, min_failures=1)[source]¶

Check for outliers using normalized data within a rolling window

The upper and lower bounds are specified in standard deviations. Data normalized using (data-mean)/std.

Parameters

data (pandas DataFrame) – Data used in the quality control test, indexed by datetime
bound (list of floats) – [lower bound, upper bound], None can be used in place of a lower or upper bound
window (int or float, optional) – Size of the rolling window (in seconds) used to normalize data, If window is set to None, data is normalized using the entire data sets mean and standard deviation (column by column). default = None.
key (string, optional) – Data column name or translation dictionary key. If not specified, all columns are used in the test.
absolute_value (boolean, optional) – Use the absolute value the normalized data, default = True
streaming (boolean, optional) – Indicates if streaming analysis should be used, default = False
min_failures (int, optional) – Minimum number of consecutive failures required for reporting, default = 1

Returns

dictionary – Results include cleaned data, mask, and test results summary

pecos.monitoring.check_missing(data, key=None, min_failures=1)[source]¶

Check for missing data

Parameters

data (pandas DataFrame) – Data used in the quality control test, indexed by datetime
key (string, optional) – Data column name or translation dictionary key. If not specified, all columns are used in the test.
min_failures (int, optional) – Minimum number of consecutive failures required for reporting, default = 1

Returns

dictionary – Results include cleaned data, mask, and test results summary

pecos.monitoring.check_corrupt(data, corrupt_values, key=None, min_failures=1)[source]¶

Check for corrupt data

Parameters

data (pandas DataFrame) – Data used in the quality control test, indexed by datetime
corrupt_values (list of int or floats) – List of corrupt data values
key (string, optional) – Data column name or translation dictionary key. If not specified, all columns are used in the test.
min_failures (int, optional) – Minimum number of consecutive failures required for reporting, default = 1

Returns

dictionary – Results include cleaned data, mask, and test results summary

pecos.monitoring.check_custom_static(data, quality_control_func, key=None, min_failures=1, error_message=None)[source]¶

Use custom functions that operate on the entire dataset at once to perform quality control analysis

Parameters

data (pandas DataFrame) – Data used in the quality control test, indexed by datetime
quality_control_func (function) – Function that operates on self.df and returns a mask and metadata
key (string, optional) – Data column name or translation dictionary key. If not specified, all columns are used in the test.
min_failures (int, optional) – Minimum number of consecutive failures required for reporting, default = 1
error_message (str, optional) – Error message

Returns

dictionary – Results include cleaned data, mask, test results summary, and metadata

pecos.monitoring.check_custom_streaming(data, quality_control_func, window, key=None, rebase=None, min_failures=1, error_message=None)[source]¶

Check for anomolous data using a streaming framework which removes anomolous data from the history after each timestamp. A custom quality control function is supplied by the user to determine if the data is anomolous.

Parameters

data (pandas DataFrame) – Data used in the quality control test, indexed by datetime
quality_control_func (function) – Function that determines if the last data point is normal or anomalous. Returns a mask and metadata for the last data point.
window (int or float) – Size of the rolling window (in seconds) used to define history If window is set to None, data is normalized using the entire data sets mean and standard deviation (column by column).
key (string, optional) – Data column name or translation dictionary key. If not specified, all columns are used in the test.
rebase (int, float, or None) – Value between 0 and 1 that indicates the fraction of default = None.
min_failures (int, optional) – Minimum number of consecutive failures required for reporting, default = 1
error_message (str, optional) – Error message

Returns

dictionary – Results include cleaned data, mask, test results summary, and metadata