While I’m fairly new to ethtrader and I don’t know too much about every crypto project out there, I am an old dog when it comes to trading and using data and computers to help me make decisions.

I want to share with you a new way to think about trading and investing in crypto. How we can use the available data that is out there and transform it into information. As inputs and signals for our trading needs. I want to show you how to think like an experienced trader/investor and how to use data for your advantage.

News and Sentiment Data

Today data is everywhere, the trick is how do we make use of it. New York hedge funds pay for satellite data so they can see how many cars are parked in Home Depot’s parking lot or monitor the shadows of oil tanks to estimate weekly storage inventories. When there is money involved you can bet your ass off someone smart is going to find a way to use data to help them trade better.

In this post I am going to share a simple example of how you can use data from the ethtrader subreddit and transform it into useful information that hopefully is helpful to you as a crypto trader/investor.

Worldcloud of ethtrader Daily Discussion comment Aug 17th

Same word cloud without the ETH shape

The underlying value for the words/ font size was generated using an open source python packages, scikit-learn and spaCy, using a TF-IDF algorithm. While you don’t need to understand the exact mechanics of how the algorithm works, let me just show you what a word cloud looks like if we didn’t use this algorithm and instead use a simple word frequency count.

Which word cloud provided better information? If you didn’t know anything about ethtrader maybe the second word cloud is better in the sense it gives you a really good idea about what the subreddit is all about. But assuming you are reading this in ethtrader, then you already know that, and the first algorithm, which attempts to display what are the most ‘different words today vs previous days’ gives you a good idea of what was on people’s minds today, and thus a proxy for ‘news’ today. For a full list of the words and weights using the TF-IDF model, please see this document

Customized Models

The TF-IDF model I used to create the word cloud is ‘off the shelf’. But its not hard to change the settings or re-work part of the algorithms calculation to come up with different results. At the bottom of this document I included another model, called bag of terms, which you can also use to summarize the document. The point being that there are many more refinements that can be made. But that is above my pay grade and more for my partner, who is PhD in Machine Learning.

Sentiment Analysis

In the next few weeks I hope to show you guys how to make a simple sentiment analysis indicator. While the finished code may take a bit of time to run, let me explain to you how we can build a very simple ethtrader sentiment analysis indicator.

First we take in all the post titles for a given time frame, lets days 1 day.

Second we analyzie the title of the post, and we classify that text/sentence as either positive or negative.

Third we aggregate all of the posts over the given time period, lets say yesterday, and find the appropriate scores like average or median sentiment score.

Then we can compare yesterday’s sentiment score to say 2 weeks ago and get an objective, value for how much sentiment has changed. You could even use the input to create a sentiment indicator and plot it in your charts, just like some ppl have price in one window, volume in a window lower, you could have sentiment as well. Shit you can go on places like ‘blockchain stats’ and already see stuff like that. Except it doesn’t integrate well with most traders preferred location trading location, ie tradingview.

What is a positive or negative post?

To create the sentiment score above we rely on a function that scores each post as positive or negative. But how does that really work. In the simple case, it adds up positive and negative words and does a calculation. But what about double negatives? What about irony? What about slang words? There are lots of things that computer models don’t pick up on and they are always changing. The point here is that saying you have a sentiment model is one thing, but being able to play around with the settings is what is really key.

Nvest.ai

Congrats if you made it this far in the article because I want to share with you my vision of the future of crypto data and analytics. As cofounder of nvest.ai, we want to build a technology platform where you can access a wide range of trading/investing tools to help you do your job better. We want to empower you as a trader by giving you access to prebuilt machine learning models, like the sentiment analysis and words summaries discussed above, in addition to many other awesome tools like automatic technical indicators, portfolio statistics, or our pattern finder tool which is like a ‘Shazam’ for charts! We have a working prototype for everything I just mentioned.

On Wall Street, professionals pay over $25,000 a year to use a Bloomberg (I have one at work and its pretty awesome), but its not so easy to use and its fairly limited in terms of next generation machine learning tools. But with nvest we are going to use the power of crypto to build a decentralized community and make something even better, without the high fees. Do checkout our website or email me if you have any questions.

I don’t like to bad mouth anyone but there are a few ‘trader’ focused projects out there already and IMO they are crap. A few have had funding for over 6 months and haven’t released anything but a glorified coinmarketcap leaderboard or a simple sentiment analysis I demonstrated above. I mention this not to put them down but to tell you we can do so much better. Not because I am an awesome Zuckerberg hacker. But because my partner Saul and the rest of the team we have lined up are experienced software developers, we understand financial markets and we know machine learning.

At this point I only ask that you help us increase awareness of the project. Help me get this project off the ground, and build our social media presence and I promise to build you something awesome. Whether you are an addicted day trader, a crypto newbie or an experienced trader, or a hard core lunix programmer or someone lives off their iPhone, nvest will be a place for all of us.

— —

Topic Model Results for past 2 weeks of ETH trader comments—

topic 0 : lubin rsi miner node transfer sign stock draw fact security

topic 1 : fee coffee cmc bakkt developer starbuck wine accept nyse payment

topic 2 : ema property buterin squeeze hedge gentleman recession lira turkish vechain

topic 3 : cdp delta reversal liquidate stock hand free ta intrinsic equity

topic 4 : panic cdp hand ln write miner delay liquidate free sad

topic 5 : delay cdp decision sec trader hand gas bag panic stock

topic 6 : ta analysis keto fork study pot miner drink trader etheroll

topic 7 : cdp liquidate collateral troll fundamental fact stock developer nbsp liquidity

topic 8 : ffg rsi ta fee tesla tweet confirmation bug reversal ema

topic 9 : war address player storm tweet mlb pro hash private gas

Bag of Terms of Eth Trader Comments Aug 18

{“weekend”: 5, “mid”: 4, “rip”: 4, “absolute”: 3, “write”: 4, “doom”: 3, “digit”: 3, “gut”: 4, “suck”: 3, “massively”: 3, “insane”: 4, “explain”: 3, “listen”: 6, “comfortable”: 3, “fear”: 6, “reminder”: 6, “mental”: 3, “process”: 4, “random”: 3, “cryptocurrencie”: 3, “peak”: 3, “cold”: 3, “tax”: 3, “optimistic”: 4, “catch”: 3, “loan”: 3, “peter”: 3, “brandt”: 3, “transfer”: 6, “fee”: 6, “movement”: 4, “bagholder”: 3, “house”: 6, “flat”: 5, “cuz”: 3, “recession”: 6, “turkish”: 4, “lira”: 5, “augur”: 6, “win”: 6, “ahead”: 4, “promise”: 3, “reflect”: 3, “remind”: 3, “lunatic”: 3, “push”: 6, “gwei”: 3, “gentleman”: 9, “triangle”: 3, “settle”: 3, “trader”: 5, “scenario”: 4, “king”: 4, “stand”: 3, “bois”: 3, “dapp”: 12, “launch”: 7, “correction”: 3, “ltc”: 6, “s”: 7, “silly”: 3, “reject”: 3, “squeeze”: 9, “target”: 6, “somewhat”: 3, “face”: 3, “wallet”: 5, “whatev”: 3, “liquidity”: 5, “compare”: 3, “growth”: 4, “grow”: 6, “sit”: 5, “pressure”: 5, “hedge”: 9, “book”: 3, “player”: 3, “fast”: 4, “family”: 5, “affect”: 8, “neo”: 4, “portfolio”: 3, “ema”: 87, “prediction”: 4, “st”: 3, “hodler”: 4, “investor”: 14, “buterin”: 14, “demand”: 6, “strange”: 3, “tough”: 5, “bore”: 3, “sick”: 3, “benefit”: 3, “reduce”: 4, “lead”: 3, “hand”: 4, “die”: 3, “list”: 3, “stable”: 5, “code”: 3, “service”: 3, “delete”: 4, “housing”: 3, “speculation”: 5, “economy”: 3, “venezuela”: 3, “economic”: 3, “personal”: 3, “sneaker”: 3, “claim”: 4, “effect”: 6, “user”: 8, “nah”: 3, “release”: 4, “speculative”: 3, “individual”: 3, “cent”: 3, “joe”: 5, “speak”: 3, “layer”: 4, “trust”: 4, “assume”: 4, “boom”: 3, “country”: 3, “infrastructure”: 3, “mover”: 3, “advantage”: 4, “sound”: 4, “surprised”: 3, “metric”: 3, “successfully”: 3, “frame”: 3, “rid”: 3, “classic”: 3, “generally”: 5, “nano”: 4, “tether”: 7, “remindmebot”: 5, “subject”: 5, “feedback”: 3, “drive”: 3, “security”: 4, “customer”: 3, “decline”: 3, “tank”: 3, “overall”: 3, “stock”: 5, “suggest”: 4, “vechain”: 5, “paper”: 3, “emerge”: 3, “common”: 3, “property”: 12, “wealth”: 7, “sync”: 3, “field”: 3, “valuation”: 4, “apply”: 4, “mr”: 3, “estate”: 3, “cdp”: 3, “margin”: 3, “convert”: 4, “institution”: 6, “public”: 5, “strategy”: 3, “leader”: 3, “mental process”: 6, “peter brandt”: 6, “ema ema”: 170, “mover advantage”: 6, “remindmebot subject”: 8, “grow wealth”: 6, “mr buterin”: 6, “property property”: 6, “ema ema ema”: 252}