I am a working “Data Scientist”, which I put in quotes because the term means pretty much whatever one wants it to mean. All I know is that I work with a lot of data and I build models and sometimes pretty charts (sometimes ugly ones). Oh, and I use SQL quite often. I also suffer from impostor syndrome, which I probably owe to my tremendous appetite for learning (and learning about) new things that tend to be way over my head mathematically, which I invariably internalize as an existential question:

Why is everyone but me doing such cutting edge ML research?

Don’t get me wrong, I have a pretty good knowledge of applied math and statistics. Obviously, because it would be difficult to be a “Data Scientist” without the ability to create regression models and perform basic statistical calculations. But when I read a paper on a new machine learning algorithm, I usually skip straight to the results, and check if there is a “package to do it”, because I generally can’t grasp enough of the nitty gritty details given in the Methods section to bother with it, let alone come up with something like it on my own. I’m probably underselling my understanding, but the end result is the same. I often feel like I’m missing out on opportunities to be better at what I have chosen to do for a living.

The root of my problem, as I have diagnosed it, is that I don’t have a good enough “pure” math background. I have read books on “Probability” and “Statistics”, but not “Theory of Probability” and “Theory of Statistics”, which I now understand are quite different. One can apply methods of the former, without understanding the latter. But to dig deeper and be able to devise new methods tends to require knowledge of theory to some extent. Now, I don’t have time to get a second PhD in math, but my hope is that I can fill some of these holes in my mathematical education with diligent self-teaching.

Having identified the core problem, I decided I need to build a curriculum of sorts for myself to keep this journey focused. To do this, I’ve basically worked my way backwards from where I want to be. The first thing I realize looking through “Theory of Probability” and “Theory of Statistics” books is that at some point I will want to tackle Measure Theory, which seems to be the core building block of these subjects:

In mathematical analysis, a measure on a set is a systematic way to assign a number to each suitable subset of that set, intuitively interpreted as its size. In this sense, a measure is a generalization of the concepts of length, area, and volume. A particularly important example is the Lebesgue measure on a Euclidean space, which assigns the conventional length, area, and volume of Euclidean geometry to suitable subsets of the n-dimensional Euclidean space Rn. For instance, the Lebesgue measure of the interval [0, 1] in the real numbers is its length in the everyday sense of the word — specifically, 1.

So, yeah, clearly I can’t tackle Measure Theory without knowing what “mathematical analysis” is. Believe me, I’ve tried reading up on measure theory, but it just wasn’t sinking in. So I need to learn some Real Analysis, which from my (meta) research seems to be a very basic core requirement of most undergrad math degrees:

Real analysis (traditionally, the theory of functions of a real variable) is a branch of mathematical analysis dealing with the real numbers and real-valued functions of a real variable. In particular, it deals with the analytic properties of real functions and sequences, including convergence and limits of sequences of real numbers, the calculus of the real numbers, and continuity, smoothness and related properties of real-valued functions.

Apparently real analysis is like the calculus I learned on steroids. Neat! There are other areas of analysis that seem potentially interesting and useful for a data scientist, especially vector spaces and functional analysis, which essentially is the theoretical basis for linear algebra. So I’m determined to learn linear algebra (which is ubiquitous in ML algorithms) from the ground up theory-wise.

As a “Data Scientist”, I also write lots of programs. Once again, because I am so curious about things, I am one of those earthlings who likes to learn programming languages and paradigms just for the heck of it. The good part is that the knowledge I gain from this exercise tends to rub off in my day job and has helped make me a better programmer (well, at least, I like to tell myself it has). One of the areas of programming that I have been interested in for a quite some time is functional programming, and in the past year or so have tried (not for the first time!) to pick up Haskell, an almost monk-like pure functional programming language. The reason I mention Haskell is that you can hardly attempt to learn the language without the term Category Theory popping up along the way:

Category theory[1] formalizes mathematical structure and its concepts in terms of a collection of objects and of arrows (also called morphisms). A category has two basic properties: the ability to compose the arrows associatively and the existence of an identity arrow for each object. The language of category theory has been used to formalize concepts of other high-level abstractions such as sets, rings, and groups.

If you’re thinking that sounds very mathematical, it’s because it *is* math! I tried reading some books on category theory and watching some YouTube lectures, but didn’t get very far. It’s really abstract stuff. But I had an epiphany recently which was that just as I should probably tackle Real Analysis before jumping into Measure Theory proper, why not learn Abstract Algebra before jumping into Category Theory?

In algebra, which is a broad division of mathematics, abstract algebra (occasionally called modern algebra) is the study of algebraic structures. Algebraic structures include groups, rings, fields, modules, vector spaces, lattices, and algebras. The term abstract algebra was coined in the early 20th century to distinguish this area of study from the other parts of algebra. Algebraic structures, with their associated homomorphisms, form mathematical categories. Category theory is a formalism that allows a unified way for expressing properties and constructions that are similar for various structures.

Hey look, “groups”, “rings”, yada, yada. Seems like I’m on the right track.

So I have picked up a few books to start my journey (mostly based on good reviews on Amazon, Quora, and StackExchange), and my plan is from time to time to post updates on my progress (maybe even some challenging proofs I tackle!), to sort of hold me to sticking with this. Feel free to post book suggestions in the comments. Thanks for reading!