Algorithms

SPMF offers implementations of the following data mining algorithms.

Sequential Pattern Mining

These algorithms discover sequential patterns in a set of sequences. For a good overview of sequential pattern mining algorithms, please read this survey paper.

Sequential Rule Mining

These algorithms discover sequential rules in a set of sequences.

Sequence Prediction

These algorithms predict the next symbol(s) of a sequence based on a set of training sequences

Itemset Mining

These algorithms discover interesting itemsets (sets of values) that appear in a transaction database (database records containing symbolic data). For a good overview of itemset mining, please read this survey paper.

Episode Mining

These algorithms discover patterns (episodes) that appear in a single sequence of events.

Periodic Pattern Mining

These algorithms discover patterns that periodically appear in the data

Graph Pattern Mining

These algorithms discover patterns in graphs

Algorithms for mining patterns in a database of labelled graphs the TKG algorithm for mining the top-k frequent subgraphs in a graph database (Fournier-Viger, 2019, powerpoint) the gSpan algorithm for mining the frequent subgraphs in a graph database (Yan et al., 2002)

Algorithms for mining patterns in a dynamic attributed graph the TSeqMiner algorithm (Fournier-Viger et al., 2019)



High-Utility Pattern Mining

These algorithms discover patterns having a high utility (importance) in different kinds of data. For a good overview of high utility itemset mining, you may read this survey paper, and the high utility-pattern mining book.

Association Rule Mining

These algorithms discover interesting associations between symbols (values) in a transaction database (database records with binary attributes).

Stream mining

These algorithms discovers various kinds of patterns in a stream (an infinite sequence of database records (transactions))

the estDec algorithm for mining recent frequent itemsets in a data stream (Chang & Lee, 2003)

algorithm for mining in a data stream (Chang & Lee, 2003) the estDec+ algorithm for mining recent frequent itemsets in a data stream (Shin et al., 2014)

algorithm for mining in a data stream (Shin et al., 2014) the CloStream algorithm for mining frequent closed itemsets in a data stream (Yen et al, 2009)

algorithm for mining in a data stream (Yen et al, 2009) algorithms for mining the top-k high utility itemsets from a data stream with a window the FHMDS and FHMDS-Naive algorithms (Dawar et al. 2017)

from a with a window

Clustering

These algorithms automatically find clusters in different kinds of data

the original K-Means algorithm (MacQueen, 1967)

algorithm (MacQueen, 1967) the Bisecting K-Means algorithm (Steinbach et al, 2000)

algorithm (Steinbach et al, 2000) algorithms for density-based clustering the DBScan algorithm (Ester et al., 1996) the Optics algorithm to extract a cluster ordering of points, which can then be use to generate DBScan style clusters and more (Ankerst et al, 1999)

a hierarchical clustering algorithm

algorithm a tool called Cluster Viewer for visualizing clusters

for a tool called Instance Viewer for visualizing the input of clustering algorithms

Time series mining

These algorithms perform various tasks to analyze time series data

an algorithm for converting a time series to a sequence of symbols using the SAX representation of time series. Note that if one converts a set of time series with SAX, he will obtain a sequence database , which allows to then apply traditional algorihtms for sequential rule mining and sequential pattern mining on time series (SAX, 2007).

of symbols using the Note that if one converts a set of time series with SAX, he will obtain a , which allows to then apply traditional algorihtms for sequential rule mining and sequential pattern mining on time series (SAX, 2007). algorithms for calculating the prior moving average of a time series (to remove noise)

of a (to remove noise) algorithms for calculating the cumulative moving average f a time series (to remove noise)

f a (to remove noise) algorithms for calculating the central moving average of a time series (to remove noise)

of a (to remove noise) an algorithm for calculating the median smoothing of a time series (to remove noise)

of a (to remove noise) an algorithm for calculating the exponential smoothing of a time series (to remove noise)

of a (to remove noise) an algorithm for calculating the min max normalization of a time series

of a an algorithm for calculating the autocorrelation function of a time series

of a an algorithm for calculating the standardization of a time series

of a an algorithm for calculating the first and second order differencing of a time series

of a an algorithm for calculating the piecewise aggregate approximation of a time series (to reduce the number of data points of a time series)

of a (to reduce the number of data points of a time series) an algorithm for calculating the linear regression of a time series (using the least squares method)

(using the least squares method) an algorithm for splitting a time series into segments of a given length

into an algorithm for splitting a time series into a given number of segments

into a algorithms to cluster time series ( group time-series according to their similarities). This can be done by applying the clustering algorithms offered in SPMF ( K-Means, Bisecting K-Means, DBScan, OPTICS, Hierarchical clustering ) on time series .

group time-series according to their similarities). This can be done by applying the clustering algorithms offered in SPMF ( ) on . a tool called Time Series Viewer for visualizing time series

Classification

the ID3 algorithm for building decision trees (Quinlan, 1986)

Text mining

an algorithm for classifying text documents using a Naive Bayes classifier approach (S. Raghu, 2015)

using a Naive Bayes classifier approach (S. Raghu, 2015) an algorithm for clustering texts using the tf*idf measure (S. Raghu, 2015)

Data structures

red-black tree,

itemset-tree,

binary tree,

KD-tree,

triangular matrix.

Tools

A tool for generating a synthetic transaction database

A tool for generating a synthetic sequence database

A tool for generating a synthetic sequence database with timestamps

A tool for calculating statistics about a transaction database

A tool for calculating statistics about a transaction database with utility information

A tool for calculating statistics about a sequence database

A tool for converting a sequence database to a transaction database

A tool for converting a transaction database to a sequence database

A tool for converting a text file to a sequence database (each sentences becomes a sequence)

A tool for converting a sequence database in various formats (CSV, KOSARAK, BMS, IBM...) to a sequence database in SPMF format

A tool for converting a transaction database in various formats (CSV...) to a transaction database in SPMF format

A tool for converting time-series to a sequence database

A tool to generate utility values for a transaction database

A tool to add timestamps to a sequence database

A tool for removing utility information from a database having utility information

A tool to resize a database in SPMF format (a text file) using a percentage of lines of data from an original database.

A tool for visualizing time-series

Visual map of algorithms

You can visualize the relationship between the various data mining algorithms offered in SPMF by clicking on this map (last updated : 2015/09/12 - SPMF 0.97):