deepod.models.DeepIsolationForestTS

class deepod.models.DeepIsolationForestTS(epochs=100, batch_size=1000, lr=0.001, seq_len=100, stride=1, rep_dim=128, hidden_dims=100, bias=False, n_ensemble=50, n_estimators=6, max_samples=256, n_jobs=1, epoch_steps=-1, prt_steps=10, device='cuda', verbose=2, random_state=42)[source]

Deep isolation forest for anomaly detection (TKDE’23)

Implementation of a Deep Isolation Forest model for time-series anomaly detection, as described in TKDE’23. This model combines deep learning methods for dimensionality reduction with the traditional Isolation Forest algorithm to detect anomalies in time-series data.

Parameters:
  • epochs (int, optional) – Number of training epochs. Default is 100.

  • batch_size (int, optional) – Batch size for training. Default is 1000.

  • lr (float, optional) – Learning rate for the optimizer. Default is 1e-3.

  • seq_len (int, optional) – Length of the input sequences. Default is 100.

  • stride (int, optional) – Stride of the sliding window over the time series. Default is 1.

  • hidden_dims (str, optional) – String representation of the hidden layer dimensions, separated by commas.

  • bias (bool, optional) – If True, adds a bias term to the layers of the neural network. Default is False.

  • n_ensemble (int, optional) – Number of ensemble models to train.

  • n_estimators (int, optional) – Number of base estimators in the Isolation Forest. Default is 6.

  • max_samples (int, optional) – Maximum number of samples to draw to train each base estimator. Default is 256.

  • n_jobs (int, optional) – Number of jobs to run in parallel for Isolation Forest training. Default is 1.

  • epoch_steps (int, optional) – Number of steps per epoch. If -1, all batches will be processed.

  • prt_steps (int, optional) – Interval of epochs at which to print progress updates.

  • device (str, optional) – Device to use for training (‘cuda’ or ‘cpu’). Default is ‘cuda’.

  • verbose (int, optional) – Verbosity mode. 0 = silent, 1 = progress bar, 2 = one line per epoch.

  • random_state (int, optional) – Seed for random number generation for reproducibility. Default is 42.

Methods

__init__([epochs, batch_size, lr, seq_len, ...])

Initializes the Deep Isolation Forest Time-Series model with the specified hyperparameters.

decision_function(X)

Predict raw anomaly scores of X using the fitted detector.

decision_function_update(z, scores)

for any updating operation after decision function

epoch_update()

for any updating operation after each training epoch

fit(X[, y])

Fits the Deep Isolation Forest model on the provided time-series data.

fit_auto_hyper(X[, y, X_test, y_test, ...])

Fit detector.

inference_forward(batch_x, net, criterion)

define forward step in inference

inference_prepare(X)

define test_loader

load_model(path)

load_ray_checkpoint(best_config, best_checkpoint)

predict(X[, return_confidence])

Predict if a particular sample is an outlier or not.

save_model(path)

set_seed(seed)

set_tuned_net(config)

set_tuned_params()

training_forward(batch_x, net, criterion)

define forward step in training

training_prepare(X, y)

define train_loader, net, and criterion

decision_function(X)[source]

Predict raw anomaly scores of X using the fitted detector.

The anomaly score of an input sample is computed based on the fitted detector. For consistency, outliers are assigned with higher anomaly scores.

Parameters:

X (np.ndarray) – The input samples of shape (n_samples, n_features). Sparse matrices are accepted only if they are supported by the base estimator.

Returns:

The anomaly score of the input samples with the shape of (n_samples,).

Return type:

anomaly_scores (np.ndarray)

decision_function_update(z, scores)

for any updating operation after decision function

epoch_update()

for any updating operation after each training epoch

fit(X, y=None)[source]

Fits the Deep Isolation Forest model on the provided time-series data.

Parameters:
  • X (np.ndarray) – The input samples of shape (n_samples, n_features).

  • y (np.ndarray, optional) – Target values of shape (n_samples, ) (ignored in unsupervised training).

Returns:

The fitted estimator.

Return type:

self

fit_auto_hyper(X, y=None, X_test=None, y_test=None, n_ray_samples=5, time_budget_s=None)

Fit detector. y is ignored in unsupervised methods.

Parameters:
  • X (numpy array of shape (n_samples, n_features)) – The input samples.

  • y (numpy array of shape (n_samples, )) – Not used in unsupervised methods, present for API consistency by convention. used in (semi-/weakly-) supervised methods

  • X_test (numpy array of shape (n_samples, n_features), default=None) – The input testing samples for hyper-parameter tuning.

  • y_test (numpy array of shape (n_samples, ), default=None) – Label of input testing samples for hyper-parameter tuning.

  • n_ray_samples (int, default=5) – Number of times to sample from the hyperparameter space

  • time_budget_s (int, default=None) – Global time budget in seconds after which all trials of Ray are stopped.

Returns:

config – tuned hyper-parameter

Return type:

dict

inference_forward(batch_x, net, criterion)[source]

define forward step in inference

inference_prepare(X)[source]

define test_loader

predict(X, return_confidence=False)

Predict if a particular sample is an outlier or not.

Parameters:
  • X (numpy array of shape (n_samples, n_features)) – The input samples.

  • return_confidence (boolean, optional(default=False)) – If True, also return the confidence of prediction.

Returns:

  • outlier_labels (numpy array of shape (n_samples,)) – For each observation, tells whether it should be considered as an outlier according to the fitted model. 0 stands for inliers and 1 for outliers.

  • confidence (numpy array of shape (n_samples,).) – Only if return_confidence is set to True.

training_forward(batch_x, net, criterion)[source]

define forward step in training

training_prepare(X, y)[source]

define train_loader, net, and criterion