Before I started taking writing seriously, I had a loose grasp of many mathematical and technical concepts; and I was not sure how to tackle open-ended problems. Maintaining this research blog has clarified my thoughts and improved how I approach research problems. The goal of this post is to explain why I have found the process so valuable.

As an overview, here are my reasons with page jumps:

Working through confusion

This quote from Richard Feynman is at the top of my blog’s landing page:

I learned very early the difference between knowing the name of something and knowing something.

I would phrase this in my own words as, “Using the word for something does not mean you understand it.” While this is true in general, I hypothesize that jargon is especially susceptible to this kind of misuse because an expert listener might infer a mutual understanding that does not exist. This feeling of verbal common ground can even be gamed. Many of us have done this on exams, hoping for partial credit by stitching together the outline of a proof or using the right words in an essay with the hopes that the professor connects the dots for us.

What does this have to do with blogging? Blogging is a public act. Anyone can read this. When I write a blog post, I imagine my supervisor, a respected colleague, or a future employer reading my explanation. These imagined readers force me to ask myself honestly if I understand what I am writing. How do I know when a post is done? I write until I stop having significant questions, until my imaginary audience stop raising their hands. The end result is that writing forces me to acknowledge and then work through my confusion.

In my mind, the writing style of scientific papers inadvertently contributes to the problem of jargon abuse because it is designed to highlight and convey novelty; any concept that is not a main contribution may be cited and then taken as a given. A novice might mistake this writing style for how a scientist should actually think or speak. Summarizing a paper in your own words restructures the content to focus on learning rather than novelty.

Calibrating confidence

It is difficult to know what you should know when you have a lot to learn and are in an intelligence-signaling environment. A side effect of having written detailed technical notes is that I calibrate my confidence on a topic. If I now understand something, I am sure of it and can explain myself clearly. If I don’t understand something, I have a sense of why it is difficult to understand or what prerequisite knowledge I am missing.

This idea reminds me of Mindy Kaling’s Guide to Killer Confidence, in which Khaling writes,

People talk about confidence without ever bringing up hard work. That’s a mistake… I don’t understand how you could have self-confidence if you don’t do the work.

In childhood, college, and even early graduate school, people have many structures that force them to do the work: homework, tests, admission essays, qualifying exams. But as you enter the research life in earnest, these structures mostly disappear. For me, writing things down is the best way I have found to ensure that I actually do the work.

Furthermore, writing has given me a template for how to feel confident whenever I need to: do the work. In my first year of graduate school, I botched presenting papers in my lab’s group meeting. This is because I passively read papers by just underlining key sentences or writing notes and questions in the margins. As a result, I wasn’t sure if I knew what I knew, and that was clear to others. Blogging has taught me how to read a paper because explaining something is a more active form of understanding. Now I summarize the main contribution in my own words, write out the notation and problem setup, define terms, and rederive the main equations or results. This process mimics the act of presenting and is great practice for it.

Learning with intention

In his essay, Principles of Effective Research, Michael Nielsen writes,

In my opinion the reason most people fail to do great research is that they are not willing to pay the price in self-development. Say some new field opens up that combines field $X$ and field $Y$. Researchers from each of these fields flock to the new field. My experience is that virtually none of the researchers in either field will systematically learn the other field in any sort of depth. The few who do put in this effort often achieve spectacular results.

I have thought a lot about this quote throughout my PhD, perhaps because of my experience as a self-taught programmer. When I first started teaching myself to program, I felt that I had no imagination. I couldn’t be creative because I was too focused on finding the syntax bug or reasoning about program structure. However, with proficiency came creativity. Programming became less important than what I was building and why. When I started my PhD, I hypothesized that the same rules would apply: I wouldn’t be able to think creatively about machine learning until I built up the requisite knowledge base. In programming, you can practice by writing programs; but how can you practice research? For me, writing detailed, expository technical notes is the equivalent of the programmer’s side project: it forces me to intentionally and systematically build my knowledge base by understanding ideas, working through proofs, and implementing models.

A watershed moment in my first machine-learning research project came when I decided to systematically work through the background material: I blogged about canonical correlation analysis, then factor analysis, and finally probabilistic canonical correlation analysis. My understanding and confidence in the material changed profoundly. I became intellectually committed in a way that was impossible without first understanding the problem.

The importance of self-development continues even after one has reached proficiency. There is a famous aphorism, “If all you have is a hammer, everything looks like a nail.” If you ask a person to do a task quickly, they will resort to the tools they know. Similarly, when I am against a publication deadline, stressed about a lagging project, or trying hard to “just think,” I fall back on familiar thought patterns. There is an embarrassing period of my PhD in which all I did was hyperparameter tune a neural network because I didn’t know what else to try. For this reason, I find brainstorming is typically useless for me. Under pressure, my mind, like a cart on a well-worn path, finds the same old ruts. Once again, writing breaks this cycle because it requires more active participation.

Flanking the problem

Hard problems are intimidating; and I often do not know where to start and am worried that I will waste my time. Writing blog posts about the larger context of a problem is my way of flanking it, of head faking myself about what I am actually doing. This lowers the psychological stakes because, rather than directly attacking the problem, I am producing something that I know will be valuable either way.

Let me give an example. Currently, I am working on a latent variable model of neuron spiking data with a complex Bayesian inference procedure. When I started the project, I did not have the background in Bayesian methods to immediately start working. Furthermore, I started the project in March and had already committed to a summer internship. I knew that it would be hard but important to keep some momentum over the summer. So I decided to flank the project by writing a series of blog posts on topics that I knew were both important for the project and generally useful: the Laplace approximation, Gaussian process regression and its efficient implementation, Monte Carlo methods, Poisson–gamma mixture models, and Polya gamma augmentation. These posts, written in the spring and summer, allowed me to start thinking about and preparing for the problem indirectly.

As the project got underway in the fall, I wrote more blog posts as needed. For example, we tried Hamiltonian Monte Carlo (HMC) inference, and so I wrote about ergodicity and Metropolis–Hastings (MH). In this case, I blogged about MH rather than HMC because I knew that the former leads to the latter and is important foundational knowledge. I was mitigating the risk of HMC not working by teaching myself something I was confident I should know anyway. We are currently exploring using random Fourier features to scale up the the method, and I approached this new material by writing about the kernel trick, random Fourier features, and kernel ridge regression.

Research often amounts to long-term gambles. As a junior researcher, I try to mitigate my risk exposure by working on small, promising problems with guidance from my advisor and senior lab members. However, writing is my other way of mitigating risk. If my current project were to fail, the directed and intentional process of systematically attacking the background material will have prepared me well for the next problem.

Solving through understanding

The mathematician Arthur Ogus explained Alexandre Grothendieck’s approach to problem solving by saying,

If you don’t see that what you are working on is almost obvious, then you are not ready to work on that yet.

I find this quote comforting because it suggests that good ideas—at least for one famous mathematician—do not come into the mind ex niliho. Rather, good ideas come from so deeply understanding a problem that the solution seems obvious.

In my own experience, writing has gotten me closer than anything else to having original research thoughts that feel obvious. So far, these thoughts are always a decade late or unfeasible, but I am happy to be having them. Let me give an example. I have written about factor analysis, an efficient implementation of factor analysis using the Woodbury matrix identity, and randomized singular value decomposition (SVD). As I wrote about randomized SVD, I thought, “Why isn’t the matrix inversion in factor analysis implemented using randomized SVD? We’re just inverting the loadings, a probabilistic construct; a randomized algorithm seems fine.” This was not a brilliant thought. It’s a little “$A$ plus $B$,” but it was my thought, and I would not have had it without real understanding of the methods. Well, it turns out, that’s exactly how Scikit-learn does it.

My thinking on this topic has been shaped by how other researchers talk about research: the important thing is not having a big idea but a line of attack; and having a line of attack means understanding the problem. In his talk You and Your Research, Richard Hamming said,

The three outstanding problems in physics, in a certain sense, were never worked on while I was at Bell Labs… We didn’t work on (1) time travel, (2) teleportation, and (3) antigravity. They are not important problems because we do not have an attack. It’s not the consequence that makes a problem important, it is that you have a reasonable attack.

By understanding problems deeply, you increase the probability that you can work on an important, attackable problem.

Writing slowly, recalling quickly

I think of writing-as-learning as database indexing. In a database, an index is a data structure that efficiently keeps track of where rows in a table are located. To insert into a database via an index is slower than simply adding the row to the bottom of the table because the database must do some bookkeeping. However, querying a database is extremely efficient. A layperson’s example is organizing your books alphabetically.

When I write, I make the same trade-off. I appreciate that learning through writing takes longer than learning without writing, often by an order of magnitude. I feel this acutely when I am implementing someone else’s method or creating a figure for my blog when it feels like I should be doing research more directly. However, I consistently feel that the trade-off is worth it. (I suspect this will change throughout my career.) For example, my post on the SVD took many, many hours to produce. However, the SVD is a foundational and ubiquitous mathematical idea, and now my mind grasps a powerful chunk of understanding rather than a vague symbol. I understand the determinant, matrix rank, principal component analysis, and many other ideas better because I understand the SVD. I am glad I took the time to write that post.

Furthermore, I have a permanent, easily recoverable store of personalized knowledge. I probably consult my own blog at least once a day if not more. This is not vanity. Typically, the top of my mental stack is the first post listed on my blog’s landing page. Terry Tao has a good blog post on this, called “Write down what you’ve done.”

Here is an example of the benefit of recording things. My latest blog post is about kernel ridge regression. The equation for ridge regression has the matrix inversion, $(\mathbf{X}^{\top} \mathbf{X} + \lambda \mathbf{I})^{-1}$. When I learned that ridge regression is used to combat multicollinearity (linear dependencies between predictors or columns of $\mathbf{X}$), I could see why: the matrix inversion in classical linear regression is unstable for low-rank $\mathbf{X}$ without adding something to the diagonal. I had this intuition because I had already proven that $\text{rank}(\mathbf{X}) = \text{rank}(\mathbf{X}^{\top} \mathbf{X})$ in my proof of the SVD. Importantly, I had forgotten if the relationship were true, but it felt correct, and I knew exactly where to look to confirm my guess.

Contributing to the community

The writing process has made me realize how much of understanding and therefore research is simply walking well-worn paths. Richard Schwartz once talked about this in an interview:

Maybe one thing I appreciate more now is that the state of human knowledge is full of holes. When you’re young you have the impression that almost everything is known, but now I have this feeling that almost everything is unknown about mathematics. There are these very thin channels that people have gone along, like ants following each other along a trail. You find these long thin trails of things, and most things are undeveloped. I have more of a sense of the openness of it.

I appreciate that most of my writing is me, like an ant, simply following someone else’s trail. For example, when I wrote about the exponential family, I was treading a path in mathematical statistics that is roughly a hundred years old, and even the exponential family is knowledge we carved out of the infinite number of distributions. I may aspire to more detail than average in my explanations, but the end result is typically still reproduction, not production, of knowledge. I think this is okay. A MathOverflow user once asked how the average mathematician can contribute to mathematics. I love Bill Thurston’s reply and this particular paragraph:

In short, mathematics only exists in a living community of mathematicians that spreads understanding and breaths life into ideas both old and new. The real satisfaction from mathematics is in learning from others and sharing with others. All of us have clear understanding of a few things and murky concepts of many more. There is no way to run out of ideas in need of clarification. The question of who is the first person to ever set foot on some square meter of land is really secondary. Revolutionary change does matter, but revolutions are few, and they are not self-sustaining — they depend very heavily on the community of mathematicians.

Reforging existing connections, walking well-worn paths, is a contribution to the research community. This means that keeping a research blog is useful for more than just oneself. And perhaps with time and experience, I can occasionally blaze a new trail.