Confessions of an Unintentional CTO

Seven years of brutally pragmatic lessons in growing and maintaining a web app

Introduction

Both the HTML version of the book and the luxury Kindle/PDF versions are free forever.

In the summer preceding my final year of reading law at Oxford, I interned at an intellectual property law firm that serviced clients from the online poker industry. As I sat in my London chair and listened in on conference calls with wunderkind CEOs, I thought to myself, "Dash it! I’d rather be the guy stirring up trouble than the one cleaning up afterward!" Spurred on by this thought, I bought my first-ever IT book. This particular book was nothing special: just a bog-standard tutorial on creating a static HTML/CSS website—for an aquarium, of all things. As you can imagine, the end result was saturated with the most monstrous shades of blue and green you can imagine.

After my law internship ended, they offered me a job. But after being inspired by their clients’ misadventures, I too wanted to have a shot at becoming a software entrepreneur, so I turned down the position. I went back to college and completed my final year just for the sake of having a degree. (I live in Europe after all.) Soon after graduating I saw the Timothy Ferriss book, The 4-Hour Workweek, in a bookshop and, without ever reading it (waste of time!), I had a eureka moment and knew immediately that this was exactly the sort of business I wanted to have straight out of college. (I'd later grow an appetite for more demanding businesses, but that's a story for another occasion...)

Soon afterward I released a series of tiny little static (or near-static) websites. I kept it simple because I didn’t have any intention of becoming a professional programmer… I just wanted the lifestyle and the glory of being a software entrepreneur. As luck would have it, I struck gold within a few months: One of these primitive little sites—a place where I sold study guides my classmates wrote—starting making just enough money to support myself on while still living with my parents. Seven years down the road, an evolved and expanded version of this same site (Oxbridge Notes) is still alive and kicking.

The initial version of Oxbridge Notes sold just twelve different products and as such, was relatively simple to manage as a static HTML and CSS affair. What’s more, my embryonic business was marked by a near-complete lack of programming and automation: I accepted payments via a drop-in PayPal button, and I would personally email out every digital download following each confirmed sale. And all was good—for a time. But as the weeks turned to months, my ambitions grew. I wanted to turn my site into something a little more lucrative (and a little less tedious to operate). Seeing as there wasn’t enough cash to spare to hire a programmer, I realised I’d have to do the coding myself. And that’s how—without setting out to be one—I became the CTO of my own software company. Of course, I was only a CTO in terms of job responsibility; if we were talking about technical ability, I couldn’t have been any less deserving of the title!

Having resolved to grow my business through code, I bought a stack of books on web programming. Among them were Ruby language books, beginner/intermediate Rails books, Linux books, JavaScript books, and even a few computer science classics like Code Complete. I read these texts from cover to cover, transforming their contents into a gargantuan volume of flashcards along the way (an idiosyncratic learning technique which, incidentally, became such a hit in the blogosphere that it’s now been translated into Japanese).

Within a year I had progressed to the point where I was able to transform my static site into a rickety Rails app. But despite my headway, I still felt hopelessly lost and in dire need of further guidance. Since I had, by then, exhausted the intermediate level books/videos/blog posts on web development, I proceeded to the advanced resources, only to be bitterly disappointed. It would seem that my idea of what "advanced" entailed diverged sharply from what the typical author had in mind. I was concerned with issues like ensuring data integrity, easing system maintenance, knowing what not to test, developing professional-grade accounting features for taxation reporting, sharpening my application’s ability to inform me of errors, and integrating SEO/analytics/online marketing right into the very foundations of my software. By contrast, the advanced web development books I encountered were focused on issues of value mostly to engineers in larger companies—that is, engineers who were insulated by their team’s sheer size from the sundry other aspects and tradeoffs implicit in running a software business.

I didn’t want a tour of a language’s arcana or expositions of little-known parts of some framework’s APIs. I didn’t want cookbook for ever-rarer technical problems. What I wanted was generalised advice pertinent to people with real-live web applications that had real-life users and real-life problems. In other words, I wanted a book for the aspiring CTO—intentional or unintentional as he or she may turn out to be.

Perhaps it was too much for me to expect this to exist. In the grand scheme of things, web applications are a dazzlingly new technology. I surely must have been part of the very first generation who could start a web business as a one-(wo)man band and without any capital to speak of. So instead I learned through a combination of trial and error and conversations with other indie software company CTOs. And since I tend to think by working out ideas in textual form, I wound up producing this book as a side effect.

Now for a disclaimer: Please bear in mind that the ensuing text cannot help but be wildly personal. It is not a one-size-fits-all solution. Much may already be outdated because of technical progress. I would imagine that this book will be most useful to those of you who are close to me in terms of time, business situation, and overall philosophy. Put more concretely, I’d say this book will appeal to:

Solopreneurs (or tiny teams of entrepreneurs) who’ve coded a web application and now earn their living from it—or at least plan to in the near future.

Web developers who, due to resource constraints, are content to implement quick fixes that work "well enough". You’re probably in this group if you constantly feel guilty for not doing everything "by the book" (e.g. you don’t "test everything" as demanded in the Ruby on Rails community).

Programmers who lack the experience of maintaining software in use by real people over a scale of years. I believe this to be a rather large niche after having observed that many newbie programmers (including myself when I got started) only have experience building greenfield projects for themselves/small-scale clients that never get any traction with their software.

Hacker-y types (as opposed to engineers). I see this book as being more valuable to garage mechanic-type programmers than to the super abstract theoretical physicist-styled programmers.

Wrapping up now, there’s one more mystery in the title: Where does the "confessions" part come into play? Here’s where: This book is not an academic exercise in collating and synthesising airy theories… instead it is a bundle of learnings precipitated by sweat and regret. Throughout each chapter, I interweave the (often disgraceful) errors that inspired my thinking. It is my hope that these real-world nuggets of context will render the abstract parts easier to understand and the text as a whole easier to remember.

Table of Contents

All but two of the chapters in this book are available to read on right here on jackkinsella.ie. The remaining pair (Adminimisation and CSS) are published externally. Both chapters are still free to read; I only want to advise you in advance that you’ll get redirected to another domain when you click on them.

Part A: Before You Start

Although I relied 100% on Google AdWords for traffic when I started my website, SEO eventually succeeded it as my primary traffic source. Overall, I was ecstatic, as AdWords had been my biggest monthly expense. But when I zoomed in on the details (via Google Analytics traffic reports), I noticed a hitch: A few of my URLs that had once boasted great SEO power had suddenly stopped existing. At first I thought this was because the products in question had been deleted from my platform by the users. But as it turned out, these products still existed—albeit under slightly tweaked names. All that had happened was my content suppliers had modified the product names of live items, and since my system generated URLs based on these user-given names, the URLs would change too—even if they had already gathered enviable SEO powers.

When I was deliberating between payment solutions (merchant card accounts vs. payment gateways, etc.), I encountered sales material on PayPal’s website which suggested I would pay a 1.9% transaction fee on each payment. This sounded reasonable, and on this basis I was swayed to build my platform atop their service. Unfortunately for me, when I started taking actual payments, PayPal’s various hidden fees (some of which are not even itemised as such) bumped their per transaction cut up to an average of 8.6%.

All considered, which of the following two domain templates is the better one for hosting a web business: "www.mywebsite.com" or "mywebsite.com"? I couldn’t decide, so I hedged my bets and programmed my Rails application to respond to both (i.e. my router responded directly to each variant without using HTTP redirects). The eventual effect of my indecision was to split my SEO for every page across two parallel, near-identical entities so that neither would ever reach a sufficient threshold to rank in the first page of Google results (whereas if their forces were combined they would have ranked there).

When you’re doing online advertising, it’s crucial to know which advertising platforms are generating conversions (and, conversely, which ones are needlessly burning through cash). Google Analytics has a bunch of features that help with this task, but the setup is surprisingly fraught with problems. I remember one instance where it appeared that the data recorded in Google Analytics failed to match that in Facebook’s ad platform and that in my website’s sales dashboard. Ultimately, we figured out that these issues were caused by time zones not being synced across the various layers of my stack (which I defined broadly here to include plugin web services and online advertising platforms). You may think it silly for me to belabour the importance of synchronising time zones; to my defence, I estimate that even in a tiny application like my own, there are thirty places where time zones could have diverged. That is a lot of "t"s to cross…

Part B: Architecture

Throughout the earlier stages of my business, I handled the more esoteric financial transactions (such as accepting eCheques or settling chargebacks) within the PayPal website’s dashboard. My motivation was to spare myself the hassle of equipping my website with the capacity to deal with these entities. Separately, it eventually became necessary for me to generate financial reports for accounting purposes. I programmed my software to build these at the end of every quarter, drawing directly from the data in my database. This data was, of course, slightly incorrect because it didn’t include any trace of the oddball transactions I handled through PayPal’s website. The result was dodgy accounting records… a veritable nightmare.

In the aftermath of certain bugs, a serious cleanup operation is in order. Why? Because old data might be corrupt, or critically important web requests might need to be reprocessed by the patched code. The nightmare situation here is having to reconstruct these old requests by combing through the logs and gluing together the various bits and pieces. Unfortunately this is exactly what I had to do in the case of a bug striking my author-application code. At the time, I desperately needed more authors, so the labour was justified. But I could have avoided this rot altogether with an architecture that emphasises a property I call replayability.

Every gem (library) I included in my burgeoning Rails application was accompanied by a README that prescribed its own particular way of supplying required configuration details (e.g. API keys/secrets or mere preferences as to how many whatsits should appear on each page). In some libraries the preferred configuration solution was a YAML file. Other times it was database entries. And other times again it was by use of ENV variables. Ultimately I ended up succumbing to these various vendors’ demands and had three different configuration strategies running in parallel in my application. Yes, this was stupid, and, predictably, it became messy to find or edit any specific configuration options.

The frontend of most web applications can be subdivided into user-facing and admin-only. In my case, the majority of functionality was admin-only. I needed quite a lot of code for managing suppliers, inventory, orders, and taxonomies. Now seeing as this admin area was manned by my website administrators, there was a labour cost associated with how well my administrative code got the job done. Over the years I learned quite a few lessons: (1) that overly opaque functionality would scare administrators and cause them to shy away from certain actions (or always precede them with confirmatory questions); (2) that user-generated taxonomies (as well as automatically generated ones) eventually lead to the proliferation of ever-thinner categories that become a UX nightmare; (3) that certain administrator workflows are "non-resumable", in that they fail to preserve information necessary for other team members to pick up where a non-present administrator left off.

Modern web applications rely on a workforce of background scripts which do things like clean up large temporary files from the hard drive, generate backups, or send out drip-feed email marketing campaigns. This is all well and good—but what happens if one of these scripts simply stops working? Since there may be no positive "failure" to speak of, there may correspondingly be no exception to get reported. The can lead to situations—as happened to me—where some sub-system goes offline for months without anyone suspecting a thing.

I would wager that unanticipated reclicks are the most commonly occurring bug in web applications of every stripe. During a proto-version of Oxbridge Notes, I suffered this bug when impatient users would click "Place Order" twice in a row, leading to duplicate orders residing in my system.

One awful, awful morning a VPS provider who I won’t name mass-emailed its customers to inform us lucky souls that a "disgruntled former employee" of theirs had deleted vast tranches of their servers—including my own. And as if that wasn’t bad enough, this bitter man went the extra mile and also deleted all the server backups (which this VPS provider had sold as a bonus, peace-of-mind service). And that, my dear reader, is why redundancy matters in the design of backup solutions.

Part C: Maintenance

Internet platforms and marketplaces are places of transience: Sometimes users come, and sometimes they go. Today they wish to sell a product in your marketplace, and tomorrow they wish to remove it. In line with these natural dynamics, I empowered my users to delete their own database records as they saw fit. With time though, I discovered this was a terribly shortsighted decision: Various slabs of functionality (e.g. end-of-year financial reports, refunds) depend on these records sticking around for much longer that I had anticipated. As a result, my data lost its integrity.

I was once a believer in staying on the bleeding edge of software versions. This moral is promoted heavily by powerful software vendors (and library maintainers), since it reduces their maintenance load and makes their lives easier. All of this is true. But throwing yourself on the front line is not without its costs: Many times, I found that a critical dependency would stop working after I upgraded Rails/macOS etc., and I’d be forced to take hours (or even days) out of my schedule to debug and patch. Sure, this was all in the name of "progress", but as a one-man team with customers waiting, I didn’t have the luxury to be conscripted into open source development. (To give you an idea of how fervently open source developers feel about this, I got banned from a certain developer community for expressing this exact opinion.)

I have a poor memory, and I feel the cognitive load of this limitation even in projects containing only a few thousand lines of code. Upon observation, I noticed that I felt worst when there was a low level of "intuitiveness" (or consistency) to the code. To give just one personal example of what I mean by low intuitiveness, I draw your attention to the various ways I denoted the concept of permission in my codebase: Sometimes with method names that began with can_, other times with ones starting with authorised_to_, and other times again with may_. Had I used the same sub-symbol every time (e.g. can_), I would have relieved the demands on my memory. In one instance this isn’t much, but imagine a codebase flush with this sort of grand design.

Perhaps half a dozen times so far, I encountered whitespace formatting issues with my data. For example, the email field of my users table is supposed to be unique, but due to my initial lack of a whitespace-stripping function, the database would consider "jack@example.com" and "jack@example.com " as separate entities and it would allow them to coexist. This led to a crappy user experience for those affected (e.g. users who inadvertently added a space to the end of their email when registering found they were unable to log in when they typed it normally thereafter). Obviously it was trivial for me to fix this particular bug, but I realised that analogous problems can occur anywhere I accept data from the user. This then calls for a more general and far-reaching solution…

I based an early version of my web application off an open source ecommerce library for selling t-shirts. As part of this package deal, I inherited a bunch of CSS style rules. For a while, I could keep my site’s visuals looking spiffy by making a few surgical changes here and there. But eventually the complexity caught up with me and I became so terrified of my growing CSS edifice that I only even appended new CSS rules (using ever-higher selector specificities). Eventually I became gridlocked and was forced to rewrite the CSS from scratch—albeit this time on a foundation of solid first principles.

With good grounds, we programmers obsess over minimising software dependencies–modules, third-party libraries, etc. Unfortunately, this particular type of care and concern doesn't extend far enough... I remember modifying the body and URL of a product page so as to correspond with a more sophisticated SEO strategy in a certain geographic market. In the aftermath of the change, I discovered that the URLs in my training documentation for admin staff were out of sync, third-party web services (such as Google Analytics) were directing traffic to an abyss, my visual design was broken due to ever-so-slightly longer subtitle text, and that I may have been infringing certain local laws (for cookies, presentation of taxation information, etc.) This and much too much more conspire to form the weary world of non-code dependencies!

Part D: Testing

Morale is not uniquely a concern of militaries; its effects can be felt even in more mundane matters. In its absence, self-sabotaging behaviour ensues. Case in point: I lost faith in my testing suite at two separate points in the lifetime of my web application. My tests would sporadically fail for no apparent reason, and this unpredictability wore me down. I knew the code worked—so why didn’t the tests pass? My downtrodden response was to completely give up running my test suites and to stop writing new tests. (I would later be forced to play a long game of catch up.) Ultimately it turned out that the cause of my frustrations was test leakage, the prevention of which I now consider absolutely crucial for programmer morale.

For the best part of a year, I was unable to automatically test the most crucial aspect of my web application. The component in question was an awkward Flash-based multi-file upload system for the intake of new digital products my business sells. The uploader was in Flash (this was before JavaScript could do multi-file stuff), and I couldn’t spare the time to learn a dying language. By way of compromise, I told myself I would manually test this component in the browser whenever I changed any related code. As it turned out though, I was far too optimistic about my ability to infer when a piece of code was indeed "related". As a result, I introduced breaking changes on more occasions than I’d like to admit­—and because I didn’t think to run any manual tests to catch these issues, they seeped merrily through to production.

In life there are always tradeoffs: Anyone who tells you otherwise is either a politician or a Rails developer. When I was initiated into the web development world I was told to "test everything". But because I wasn’t bankrolled by Unicorn Ventures, this wasn’t remotely realistic for me. Unable to meet these supposed non-negotiable community standards, I felt a constant pang of guilt. But I needn’t have suffered so—I should have instead learned to discriminate between what’s worth testing and what isn’t.

Part E: Debugging

If I were a religious man, I’d say exception reports are a godsend. Though I cannot deny their overall benefits, I must admit that I failed to configure them properly in two specific ways over the past few years: Firstly, my reporting was not always comprehensive, in the sense that the errors and exceptions occurring within my background jobs (or within my helper servers) did not get reported. Secondly, my reporting was sometimes far too vocal. There were days when I would open my inbox to find it flooded with so many inconsequential reports that I would lose the patience for reading any of the individual reports with the requisite care and attention. This became a problem when there happened to be a genuinely important and urgent report submerged in the pile of unimportant ones.

In my naive early days as a programmer, I would respond to a bug report by diving straight into the code, fixing it, deploying that fix, and then getting on with my day as planned. What I failed to realise is that bugs are pack animals. If, for example, the reason for the bug was that I had misunderstood some aspect of the Rails API, then it’s possible—even likely—that similar bugs exist anywhere else I interact with that same slab of the Rails API. Had I been more systematic, I would have inspected all of these locales for analogous problems before signing off on the debugging session.

There is an extraordinary cognitive cost to context switching when we program computers. I feel this most acutely when I am debugging—or, to be more precise, when I am resuming a debugging session. Owing to the prodigious amount of context and care needed, before I started my ritual, I typically lost an hour finding my feet again when resuming a sufficiently thorny bug hunt.

Once upon a time I experienced sporadic credit card payment failures but couldn’t even gather the basic information about what had gone wrong. You see, whenever a payment failed, PayPal would return an error code. I parsed this error code out of their response and logged it. But this wasn’t sufficient for me to debug since my log entry consisted solely of a string of numbers (e.g. "50010"), making it near impossible for me to connect that log entry with the particular failed payment. What I should have done instead was included contextual information for identifying each of these log entries (e.g. "paypal_error_code for customer:3401 is 50010").

Part F: Documentation

I have been consistently shocked by the frequency with which configuration tasks I perceived to be "once-off"s ended up demanding repetition mere months down the line. Examples abound: Configuring development machines, replicating servers for staging, renewing SSL certs¸ configuring the DNS of a new domain with shamanic mystery, or even just integrating a comprehensive skeleton of Google Analytics conversion tracking. As a result of this unexpected repetition, I found myself devoting unhealthily large lumps of time to what can only be described as reinventing the wheel. All this waste and confusion could have been avoided had I had a sufficiently broad definition of software and documented accordingly.

I still remember how excited I felt the day I hired my first collaborating programmer. I imagined he would slot right in, start coding, and smoothly propel me to the singularity of multiplied productivity. What actually happened was decidedly underwhelming. This programmer (through no fault of his own!) stored new service objects in the "wrong" folders (at least according to organisational tastes). He didn’t use Git the way I did. He kept asking me distracting questions, like what IP address the XYZ server resides at. He suggested replacing idiosyncratic dependencies with "standard" solutions, even though I had darned good reasons for going against the grain in most cases. Ultimately all of this friction stemmed from a single source: My not having app-level documentation.

Both the HTML version of the book and the luxury Kindle/Epub versions are free forever.