pecos.metrics module

The metrics module contains metrics that describe the quality control analysis or compute quantities that might be of use in the analysis

pecos.metrics.qci(mask, tfilter=None)[source]

Compute the quality control index (QCI) for each column, defined as:

\(QCI=\dfrac{\sum_{t\in T}X_{dt}}{|T|}\)

where \(T\) is the set of timestamps in the analysis. \(X_{dt}\) is a data point for column \(d\) time t` that passed all quality control test. \(|T|\) is the number of data points in the analysis.

Parameters
  • mask (pandas DataFrame) – Test results mask, returned from pm.mask

  • tfilter (pandas Series, optional) – Time filter containing boolean values for each time index

Returns

pandas Series – Quality control index

pecos.metrics.rmse(data1, data2, tfilter=None)[source]

Compute the root mean squared error (RMSE) for each column, defined as:

\(RMSE=\sqrt{\dfrac{\sum{(data_1-data_2)^2}}{n}}\)

where \(data_1\) is a time series, \(data_2\) is a time series, and \(n\) is a number of data points.

Parameters
  • data1 (pandas DataFrame) – Data

  • data2 (pandas DataFrame) – Data. Note, the column names in data1 must equal the column names in data2

  • tfilter (pandas Series, optional) – Time filter containing boolean values for each time index

Returns

pandas Series – Root mean squared error

pecos.metrics.time_integral(data, tfilter=None)[source]

Compute the time integral (F) for each column, defined as:

\(F=\int{fdt}\)

where \(f\) is a column of data \(dt\) is the time step between observations. The integral is computed using the trapezoidal rule from numpy.trapz. Results are given in [original data units]*seconds. NaN values are set to 0 for integration.

Parameters
  • data (pandas DataFrame) – Data

  • tfilter (pandas Series, optional) – Time filter containing boolean values for each time index

Returns

pandas Series – Integral

pecos.metrics.time_derivative(data, tfilter=None)[source]

Compute the derivative (f’) of each column, defined as:

\(f'=\dfrac{df}{dt}\)

where \(f\) is a column of data \(dt\) is the time step between observations. The derivative is computed using central differences from numpy.gradient. Results are given in [original data units]/seconds.

Parameters
  • data (pandas DataFrame) – Data

  • tfilter (pandas Series, optional) – Filter containing boolean values for each time index

Returns

pandas DataFrame – Derivative of the data

pecos.metrics.probability_of_detection(observed, actual, tfilter=None)[source]

Compute probability of detection (PD) for each column, defined as:

\(PD=\dfrac{TP}{TP+FN}\)

where \(TP\) is number of true positives and \(FN\) is the number of false negatives.

Parameters
  • observed (pandas DataFrame) – Estimated conditions (True = background, False = anomalous), returned from pm.mask

  • actual (pandas DataFrame) – Actual conditions, (True = background, False = anomalous). Note, the column names in observed must equal the column names in actual

  • tfilter (pandas Series, optional) – Filter containing boolean values for each time index

Returns

pandas Series – Probability of detection

pecos.metrics.false_alarm_rate(observed, actual, tfilter=None)[source]

Compute false alarm rate (FAR) for each column, defined as:

\(FAR=\dfrac{TN}{TN+FP}\)

where \(TN\) is number of true negatives and \(FP\) is the number of false positives.

Parameters
  • estimated (pandas DataFrame) – Estimated conditions (True = background, False = anomalous), returned from pm.mask

  • actual (pandas DataFrame) – Actual conditions, (True = background, False = anomalous). Note, the column names in observed must equal the column names in actual.

  • tfilter (pandas Series, optional) – Filter containing boolean values for each time index

Returns

pandas Series – False alarm rate