I have seen some of Sentdex his Youtube videos about matplotlib and Pandas. The level of these videos is for starters but they give a good idea about the used framework. He isn’t focused on style which is good which avoids the typical overload of a beginner. The result of the first 4 parts of his matplotlib course was this simple piece of code to load stock data and make a graph.

import matplotlib.pyplot as plt import matplotlib.dates as mdates import urllib import numpy as np def plotdata(stock): url = 'http://chartapi.finance.yahoo.com/instrument/1.0/'+stock+'/chartdata;type=quote;range=10y/csv' csvdata = urllib.request.urlopen(url).read() linedata = csvdata.split(b'

') skip = 0 for line in csvdata.split(b'

'): if line and line[0:1].isdigit(): break skip += 1; date, closep, highp, lowp, openp, volume = np.loadtxt(linedata, unpack=True, delimiter=',', skiprows=skip, converters={0: mdates.bytespdate2num('%Y%m%d')}) fig, ax1 = plt.subplots(nrows=1, ncols=1) ax1.plot_date(date,closep,'-', label='Price') fig.autofmt_xdate() ax1.grid(True, color='g',linestyle='-') plt.subplots_adjust(bottom=0.13,hspace=0.2,wspace=0.2) plt.legend() plt.xlabel('Date') plt.ylabel('Price') plt.title('Graph') plt.show() plotdata('TSLA')

This piece of code is simple and not very useful. It loads column data into several variables and plot some of the columns. But this peace of code isn’t very useful. It loads data into a local variables of a plot function. Further calculations have to be inserted into a plot function, which is very ugly. The readability of the code is already low.

So the code has to change. It’s a good idea to rip apart the different functions, which are

load data

doing a graph

The first idea was to use Pandas instead of use pure numpy code:

from pandas_datareader import data as pddata from datetime import datetime as dt from dateutil.relativedelta import relativedelta ticker = 'TSLA' years = 10 end = dt.today() start = end - relativedelta(years=years) stock = pddata.get_data_yahoo(ticker, start=start, end=end )

The Pandas function is easy to handle. Pandas stores all the columns into one data frame. A Pandas data frame is offering numerical calculations and this is a opportunity for the future of ambitions with stocks. Each column of a data frame can be plotted with matplotlib. But

Metadata such as the ticker isn’t stored together with the data frame.

Every time the program is running the data has to be downloaded which isn’t good.

Even when the loading of data is much cleaner or at least it’ s looking much cleaner, there is a need for handling more than one stock.

Storing all the data could be done in a dictionary like

stocks = dict() stock = pddata.get_data_yahoo(ticker, start=start, end=end ) stocks['TSLA'] = stock

A better solution is to link a stock with all his data to an object and the definition of an object is a class. I’m starting with a first concept:

class Stock(object): def __init__(self, symbol): self.ticker = symbol @classmethod def from_yahoo(cls, ticker : str, years : int): ..... stock = Stock(ticker) stock.data = pddata.get_data_yahoo(ticker, start=start, end=end ) return stock stock1 = Stock.from_yahoo('TSLA',1) stock2 = Stock.from_yahoo('EBAY',1)

At this point we are able to load the data. The next step is save the data on disk. Using pickle is the easy method. For the handling of files I’m using the new pathlib in Python 3.

class Stock(object): ... def to_file(self, path : str): datafile = Path(path) / Path(self.ticker + '.dat') with datafile.open('wb') as f: pickle.dump(self,f,protocol=3) @classmethod def from_file(cls, path : str, ticker : str): datafile = Path(path) / Path(ticker + '.dat') with datafile.open('rb') as f: stock = pickle.load(f) return stock

So far we are able to store or load a stock from disk. This functionality allows it to store or read a Stock object from disk. But why should we decide how to load a stock? Lets a piece of Python code do that:

class Stock(object): ... @classmethod def load(cls, path : str, symbol : str, start_date : dt = dt(2006, 1, 1)): datafile = Path(path) / Path(symbol + '.dat') if datafile.exists(): stock = cls.from_file(path, symbol) else: stock = None if (not stock) or start_date < stock.data.index.min(): stock = cls._from_yahoo(symbol, start=start_date, end=dt.today() ) stock.to_file(path) return stock def update(self, path : str): if self.querydate < dt.today(): try: newdata = pddata.get_data_yahoo(self.ticker, start = self.querydate, end = dt.today() ) self.data = pd.concat([self.data, newdata]).drop_duplicates() self.querydate = dt.today() except OSError: self.is_healthy = False self.to_file(path)

This constructor load() loads the data from Yahoo when no data file is found on disk. The former methods are handy and helped us to extend the functionality of this class. The next idea was to get a sort of update to avoid loading of all the data every time. we want to get the newest data

The update() method of a loaded stock gets the last write access from an attribute and sets the start date to the last write access. There was even the choice to store the data into a SQL database, but this is good enough.

With OOP data and metadata gets collected in one object and this is becomes handy, when the programmer knows where he finds data. Confusion of variable names is excluded and a class can be easily extended. All the functionality is under one hood. In this case, I wouldn’t extend the class further. Calculations of some indicators should be done in a different framework. A base class for calculations is accessing the stock data and classes based on this base class are doing the calculations incl. signaling for buy and sell.

Plotting isn’t a task for this class. Plotting should be a special class for Stocks, like having up to 5 linked plots i in a window linked by time.