Short Bytes: FAISS is an open-source library released by Facebook for similarity search and clustering high-dimensional data. This library finds application in complex datasets like images and videos which could not fit in RAM all at once.

W

ith the advent of highly successful Machine Learning methods, there has been a boom in big datasets across varied domains. With these huge datasets, hardware becomes a bottleneck. Processing these datasets requires high memory bandwidth and processor capabilities. Furthermore, indexing the data points, clustering and search become highly demanding.

Researchers at Facebook AI Research or FAIR recently published a research paper describing an efficient design for clustering and similarity search. Their new algorithmic structure performs much faster than the previous state-of-art algorithms and utilises GPU for higher memory bandwidth and computational throughput.

Recommended: Top 15 Facebook Open Source Projects You Must Know

Based on their research, they have created a library called FAISS and open-sourced it. Although the algorithms for clustering and similarity search are well-known, this library optimizes those algorithms to perform efficiently on GPUs. Some the algorithms implemented in the library include –

Fast K-Nearest Neighbour

QuickSelect

Warpselect

K-Means clustering

As a test of how the library performs, in the following figure, the first and the last image are given and the algorithm computes the intermediate transitional images from a collection of 95 million images.

Top Features of FAISS Library –

Written in C++ with complete Python wrappers

Supports single/multiple GPUs

Highly Scalable, typically supports up to 100 dimensions

Built on BLAS and CUDA libraries

8.5x faster performance than current state-of-art libraries

Here is the GitHub repo of the FAISS library. So what do you think about the new library? Share your thoughts with us in comments.