On August 1, 2012, Knight Capital Group LLC (“Knight”), a leading financial market maker, experienced a major failure in the operation of its automated routing system for US equity orders. Knight originally received 212 small orders from retail customers and then mistakenly streamed thousands of orders per second into the NYSE market over a 45 minute period; it executed over 4 million trades in 154 stocks totaling more than 397 million shares and assumed a net long position in 80 stocks of approximately $3.5 billion as well as a net short position in 74 stocks of approximately $3.15 billion. Knight lost over $460 million from these unwanted positions, and by the next day, its own stock price had dropped by 75%, as employees, customers, and competitors stumbled to figure out what to do next. A week later, Knight received a $400 million cash infusion from a group investors, and by the next summer, it was acquired by a rival, Getco LLC. This essay will discuss the rise and fall of Knight and explain the IT matters that contributed to the system failure.

Founded in 1995 by Kenneth Pasternak and Walter Raquet, Knight Capital Group was a market maker and trade execution provider headquartered in Jersey City, New Jersey, across the Hudson river from Wall Street. The company’s bold insight was that the human-centered model of exchange trading was going to be fundamentally transformed by computers. Its primary customers were large broker-dealers, electronic discount brokers, hedge funds, and other institutional investors. Born in the crucible of the IT advances of the 1990’s and uplifted by the related growth of the technology-weighted NASDAQ stock market, Knight grew rapidly to become the single largest market maker of stocks listed on the NASDAQ (17%) and NYSE (16%). In July 1998, Knight raised $145 million in capital through its own Initial Public Offering (IPO) with a share price of $14.50 and market capitalization of $725 million. By the end of 1999, Knight’s share price had soared above $150, and its market cap had surged to $8 billion. A number of factors contributed to the increase in trading volumes on both the NASDAQ and NYSE markets including the flood of cash flows into equity-based mutual funds, historic high returns in US equity markets, the increasing number of companies going public, the emergence and market acceptance of electronic discount brokers, and multiple technological innovations such as the Internet, World Wide Web, and Personal Computer reducing transaction costs.

But there were downs along with the ups as well. Knight was hit hard by the burst of the dot-com bubble, with NASDAQ trading volumes depressed for months. On April 9, 2001, the SEC then announced Regulation National Marketing System (RNMS) and mandated that the stock market move to decimal pricing. Academic studies and industry forecasts suggested that investors would save money from narrower spreads at the expense of the market makers. Knight was among the hardest hit by the regulation change, and it struggled for the next year. On January 8, 2002, Knight agreed to pay $1.5 million to settle multiple NASD regulatory violation claims including failure to honor posted quotes, the improper display of limit orders, and slow, sometimes inaccurate reporting of thousands of trades to the NASD. The regulatory fine was $700,000, and its clients were paid $800,000. The NASD investigation also highlighted the existence of and executive knowledge of front-running within Knight, a Wall Street practice in which firms traded for their own accounts based on previewing customer order flow and executing their own trades before a customer’s order.

Knight needed to make changes and replaced Pasternak with Thomas Joyce, an industry veteran, in May 2002; Joyce soon shifted the firm’s business to high volume market making in other asset classes through acquisitions and organic growth. As it adjusted to the new regulatory environment, Knight recovered its footing and financial success. By 2011, the company was worth $1.5 billion, earned net income of $115 million, employed approximately 1400 people including over 100 software developers, and had opened other offices in the USA as well as the UK, Switzerland, China, and Singapore. Knight now made markets in US options and European equities and it also traded currencies and fixed income for its proprietary accounts. It was still dominant in US equity markets and managed an average daily US equity volume of more than 3.3 billion trades worth around $21 billion. As part of the business expansion and renewal strategy, Knight retired older IT systems and built new trading technology such as the Smart Market Access Routing System (SMARS). SMARS was able to execute thousands of orders per second and could compare prices between dozens of different trading venues within fractions of a second.

Some of Knight’s biggest customers were the discount brokers and online brokerages such as TD Ameritrade, E*Trade, Scottrade, and Vanguard. Knight also competed for business with financial services giants like Citigroup, UBS, and Citadel. However, these larger competitors could internalize increasingly larger amounts of trading away from the public eye in their own exclusive markets or shared private markets, so-called dark pools. Since 2008, the portion of all stock trades in the US taking place away from public markets has risen from 15% to more than 40%. As of 2018, there are about 40 dark pools and as many as 200 internalizers competing with a dozen public exchanges in the US alone.

In October 2011, the NYSE proposed a dark pool of its own, called the Retail Liquidity Program (RLP). The RLP would create a private market of traders within the NYSE that could anonymously transact shares for fractions of pennies more or less than the displayed bid and offer prices, respectively. The RLP was controversial even within NYSE Euronext, the parent company of the NYSE; its CEO, Duncan Niederauer, had written a public letter in the Financial Times criticizing dark pools for shifting “more and more information… outside the public view and excluded from the price discovery process”. The SEC decision benefited large institutional investors who could now buy or sell large blocks of shares with relative anonymity and without moving the public markets, however it came again at the expense of market makers. During the months of debate, Joyce had not given the RLP much chance for approval, saying in one interview, “Frankly, I don’t see how the SEC can be possibly OK it”. In early June 2012, the NYSE received SEC approval of its RLP, and it quickly announced the RLP would go live on August 1, 2012, giving market makers just over 30 days to prepare. Joyce insisted on participating in the RLP because giving up the order flow without a fight would have further dented profits in its best line of business.

With only a month between the RLP’s approval and it’s go-live, Knight’s software development team worked feverishly to make the necessary changes to its trade execution systems including SMARS, its algorithmic, high speed order router. A core feature of SMARS receives orders from other upstream components in Knight’s trading platform (“parent” orders) and then, as needed based on the available liquidity and price, sends one or more representative (“child”) orders to downstream, external venues for execution. The new RLP code in SMARS replaced some unused code in the relevant portion of the order router; the old code previously had been used for an order algorithm called “Power Peg”, which Knight had stopped using since 2003. Power Peg was a test program that bought high and sold low; it was specifically designed to move stock prices higher and lower in order to verify behavior of its other proprietary trading algorithms in a controlled environment. It was not to be used in the live, production environment. There were grave problems with Power Peg in the current context. First, the Power Peg code remained present and executable at the time of the RLP deployment despite its lack of use. Such “dead code” is a bad practice, but common in large software systems maintained for years. Second, the new RLP code had repurposed a flag that was formerly used to activate the Power Peg code; the intent was that when the flag was set to “yes”, the new RLP component — not Power Peg — would be activated. Such repurposing often creates confusion, had no substantial benefit and was a major mistake as we shall see shortly. Third, there had been substantial code refactorings in SMARS over the years without thorough regression testing; in 2005, Knight changed the cumulative quantity function that counted the number of shares of the parent order that had been executed and filled to decide whether to route another child order. The cumulative quantity function was now invoked earlier in the SMARS workflow which in theory was a good idea to prevent excess system activity; in practice, it was now disconnected from Power Peg which used to call it directly, could no longer throttle the algorithm when orders were filled, and Knight never retested Power Peg after this change.

In the week before go-live, a Knight engineer manually deployed the new RLP code in SMARS to its eight servers. However, the engineer made a mistake and did not copy the new code to one of the servers. Knight did not have a second engineer review the deployment, and neither was there an automated system to alert anyone to the discrepancy. Knight also had no written procedures requiring a supervisory review, all facts we shall return to later. On August 1, 8:01 AM EST, an internal system called BNET generated 97 email messages that referenced SMARS and identified an error described as “Power Peg disabled”. These obscure, internal messages were sent to Knight personnel, but their channel was not designated for high priority alerts and the staff generally did not review them in real-time; however, they were the proverbial smoke of the smoldering code and deployment bits about to burn, and it was a missed opportunity to identify and fix the DevOps issue prior to market open. At 9:30 AM EST, Knight began receiving RLP orders from broker-dealers, and SMARS distributed the incoming work to its servers. The seven servers that had the new RLP code processed the orders correctly. However, orders sent to the eighth server with the defective Power Peg code activated by the repurposed flag soon triggered the fault line of a financial tectonic plate. This server began to continuously send child orders for each incoming parent order without regard to the number of confirmed executions Knight had already received from other trading venues. The results were immediately catastrophic. For the 212 incoming parent orders processed by the defective Power Peg code, SMARS sent thousands of child orders per second that would buy high and sell low, resulting in 4 million executions in 154 stocks for more than 397 million shares in approximately 45 minutes. For 75 of these stocks, Knight’s executions jostled prices more than 5% and comprised more than 20% of trading volume; for 37 stocks, prices lurched more than 10% and Knight’s executions constituted more than 50% of trading volume.

Nanex, LLC Market Data on US Equity Volumes from August 1, 2012

Following the Flash Crash of May 6, 2010 in which the DJIA lost over 1000 points in minutes, the SEC announced several new rules to regulate securities trading. First, circuit breakers were required to stop trading if the market experienced what was labeled as “significant price fluctuations” of more than 10% during a 5-minute period. Second, the SEC required more specific conditions governing the cancellation of trades. For events involving between five and twenty stocks, trades could be cancelled if they were at least 10% away from the “reference price”, the last sale before pricing was disrupted; for events involving more than twenty stocks, trades could be cancelled if they deviated more than 30% from the reference price. Third, Securities Exchange Act Rule C.F.R 240.15c3–5 (“Rule”) went into effect, requiring the exchanges and broker-dealers to implement risk management controls to ensure integrity of their systems as well as executive review and certification of the controls. Since the Flash Crash rules were designed for price swings not trading volume, they did not kick in as intended and stop trading because few of the stocks traded by Knight on that fateful day exceeded the 10% price change threshold. By 9:34 am, NYSE computer analysts noticed that market volumes were double the normal level and traced the volume spike back to Knight. Niederauer tried calling Joyce, but Joyce was still at home recovering from knee surgery. The NYSE then alerted Knight’s chief information officer who gathered the firm’s top IT people; most trading shops would have flipped a kill switch in their algorithms or would have simply shut down systems. However, Knight had no documented procedures for incident response, again, another fact we shall return to later. So, it continued to fumble in the dark for another 20 minutes, deciding next that the problem was the new code. Because the “old” version allegedly worked, Knight reverted back to the old code still running on the eighth server and reinstalled it on the others. As it turned out, this was the worst possible decision because all eight servers now had the defective Power Peg code activated by the misappropriated RLP flag and executing without a throttle. It was not until 9:58 AM that Knight engineers identified the root cause and shut down SMARS on all the servers, however the damage had been done. Knight had executed over 4 million trades in 154 stocks totaling more than 397 million shares; it assumed a net long position in 80 stocks of approximately $3.5 billion as well as a net short position in 74 stocks of approximately $3.15 billion. Under the post-flash crash rules enforced by the NYSE, most of the trades were within the 10% price band, thus they would stand and could not be cancelled. Joyce called then SEC chairwoman, Mary Schapiro, for help reversing the trades, but to no avail; she demurred and deflected the matter back to the NYSE. Knight’s stock plunged by 33% that day, and the mark to market loss for its trades amounted to more than $460 million. News on Wall Street travels fast; other market participants could smell the blood in the water.

Announcements from TD Ameritrade and other customers in the ensuring days that they would continue to do business with Knight did calm matters somewhat, but the company simply did not have enough cash to cover and settle its position liability. Over the weekend, on August 5, Knight raised around $400 million from several investors led by Jefferies investment bank. The financing terms were 267 million convertible, preferred shares priced at $1.50 with a 2% dividend yield; if converted, these shares could give the new investors control of 70% of the company. Knight also agreed to three new board members, Martin Brand from Blackstone, Matthew Nimetz of General Atlantic, and Fred Tomczyk of TD Ameritrade. The deal was a severe blow to Knight’s shareholders, but better than the alternative of bankruptcy. The board met during the winter months of 2012 to assess takeover offers and in December, it agreed to be acquired by rival, Getco LLC, for $3.70 per share, a sizable premium to what the earlier investors had paid to keep the company float. Once the merger with Getco LLC was completed in the summer of 2013, the merged company was renamed KCG Holdings, and Joyce resigned.