Software: Least Squares Anomaly Detection

Least Squares Anomaly Detection is a flexible, fast, probabilistic method for calculating outlier scores on test data, given training examples of inliers. The model is controlled by two parameters: sigma (a kernel length scale, controlling how 'smooth' the result should be) and rho (a regularisation parameter, which controls the sensitivity to outliers). The effect of altering these parameters is shown in one of the demos accompanying the Python implementation:

Where there are multiple inlier classes in training data, the method works as a robust classifier, i.e. it can assign to each test datapoint the probability of being in any of the inlier classes and the probability of being in an outlier class. An example provided with the code shows the method being used to classify handwritten digits 0 to 9 given only training examples of digits 0 to 8.

The method can also be applied to detection anomalies in sequences, with a Hidden Markov Model based extension to the static method. An example is included showing inference of abnormalities in an electrocardiagram time series (data from PhysioNet):

The Python software here provides training and inference methods, in a class which is compatible with the scikit-learn package. The class lsanomaly.LSAnomaly() can replace other methods such as svm.OneClassSVM() in any of the scikit-learn outlier detection examples.

Example usage

>>> import lsanomaly >>> import numpy as np >>> X_train = np.array([[1.1],[1.3],[1.2],[1.05],[0.8]]) >>> X_test = np.array([[1.15],[3.6],[1.25]]) >>> anomalymodel = lsanomaly.LSAnomaly() >>> anomalymodel.fit(X_train) >>> anomalymodel.predict(X_test) [0.0, 'anomaly', 0.0] >>> anomalymodel.predict_proba(X_test) array([[ 1.00000000e+000, 0.00000000e+000], [ 5.15255628e-103, 1.00000000e+000], [ 1.00000000e+000, 0.00000000e+000]])

Download and installation

Installation:

pip install lsanomaly

Source code is available at http://github.com/lsanomaly/lsanomaly. Thanks to David Westerhoff for packaging and making improvements to the code, and to Babak Farrokhzad for a bug fix in the example usage.

Reference

J.A. Quinn, M. Sugiyama. A least-squares approach to anomaly detection in static and sequential data. Pattern Recognition Letters 40:36-40, 2014.

pdf preprint