We have created a pioneering tool powered by data science and machine learning to allow for the analysis of, and detection of plagiarism in, ICO white papers. Uploading white papers for comparison also helps to expand our database and benefits the community. The corpus is currently comprised of over 1200 documents (2.9GB) that were scraped from various ICO lists.

Consequently, Titan empowers users, and Invictus’s analysts, with the ability to evaluate the originality and legitimacy of early-stage investment opportunities within the ICO space and specific sectors thereof.

We will be creating additional machine learning powered tools — for example: a code quality algorithm that is able to determine if a token smart contract conforms to best practices. Code quality is often a strong predictor of team technical strength and long-term success.

How do I use it?

Using the Titan AI tool is quite a simple process, all you need to do is:

Upload a project white paper.

The tool will then:

Detect if content has been plagiarized from other papers in our large corpus of white paper data. If there is matched content, it will return the relevant paragraphs from the uploaded paper alongside the paragraphs and papers from our database. It will also return a selection of the most similar white papers for comparison to your uploaded file. The most similar papers will be returned even if there is no direct match for plagiarism. You can visually explore the ‘topics’ that are shared between the documents.

How does it work?

The tool analyzes ICO white papers and is capable of identifying plagiarized content — Titan is even capable of detecting cases where plagiarized content has been restructured and synonym substitutions have been made.

The second stage of the Titan tool has now been released. It returns a selection of the most similar projects so that you can perform a thorough analysis on the sector the ICO is competing in.

Crowded sectors and strong competition can seriously affect a project’s ICO raise and long-term prospects. Topics shared between the most similar documents can be explored visually.

This is performed using Artificial Neural Networks and doc2vec pre-processing, along with Latent Dirichlet Allocation to create abstract representations of white papers, that are then clustered using t-distributed stochastic neighbor embedding (t-SNE).

Users can also explore a graphical representation of our corpus of documents.