$\begingroup$

This question has been re-opened again after (rightly) being closed as too broad for the purpose of clearing some misconceptions regarding one of the answers here.

The main idea that is to be stressed here is this: When it comes to high frequency trading, biggest or "as many as you can get" is rarely true.

Not only is it false, it is impossible in terms of space requirements and other factors that limit what HFT architecture can look like.

Let's assume you have all the resources, including the whole monetary aspect of this. Let's go over some of the inherent limitations for what co-located HFT machines look like.

Size: Regardless of how much money you have to spend on this, you don't get the space of even a small room for your dream HFT machine. You have a rack, (or multiple racks depending on what hardware you plan on using and how much money you have). CPU: We are all stuck on the same CPUs for the most part, and same case with field programmable gate arrays (FPGAs). We are "stuck" on them in both a hardware / architectural sense, i.e., no firm is going to be behind on the latest CPU technology, and more importantly we are limited in a software sense as well. Same case for GPUs and ASICS as well. Granted the software limitation is due to the hardware thus classifying it as a "software limitation" may seem arbitrary, but the problem relies more on the software capability rather than the hardware given the latter is more or less normalized in the field. Here is an example that makes this simple to conceptualize even if you know nothing about concurrency multi-threaded programming in C++...

Depending on the specific hardware implementation, let's go through a scenario investigating the basic structure of what that would look like.

For this example, say you can have all different tasks running on separate threads (talking specifically about HFT here) doing whatever they may be doing for the application (e.g., a thread for dealing with the market data coming in (a feed handler), one for processing the market data, and the exchange / order gateway thread). Let's discuss in more detail what this three processes entail for this specific hardware setup.

Note: In no way am I saying the following setup is industry standard nor that it is even a smart one. It is just an example demonstrating some of what the strategy might be doing with respect to hardware.

Feed handler: This thread's job is to deal with the incoming market data, and get it ready to be shipped off to the next thread which has more to do with the strategy. This job and the exchange gateway are often the biggest producers of network latency overall in an HFT system. Why? When it comes to exchanges and the network i/o, there are several factors that pose challenges that cannot possibly be predicted for and / or compensated for on the fly. Often these problems come in the form of incoming market data that is in weird / non-consistent increments, market data not showing up for a couple fractions of a second, etc. Signal Discovery / Generation: The job of this thread should be fairly self-explanatory, it is to take in the market data and based on what the basis of the strategy is, it will make some decision based on what is going on. Something important to note here is that often this is the part that you see NOT in assembly / written at the hardware level given that computations of this nature are dangerous and taxing to be iterating at the hardware level. Exchange Gateway: Again, this is also going to be one of the major contributors of latency given the un-deterministic nature of the process. The job of this thread is to ensure that whatever decision the signal generation thread made is accurately and efficiently passed along to the exchange. Often here, depending on market conditions larger orders are chopped into smaller orders are sent to various different exchanges / dark pools depending on how the market is behaving, and on what kind of market-impact analysis has been calculated. But something very specific has to happen for all of this to work for this specific system, and that is this: at some point during the running of this application there must exist some form of a "critical processing path"... where you have things that must be dealt with consecutively / in some discrete order.

But why is this the case? Why do some parts of this have to be dealt with in a sequential manner? Mainly, due to the fact that, the results of what each thread is doing must be passed along onto the next thread, this is easy to visualize with our rudimentary example of what each thread is doing in our fake HFT application.

What might be bad about this?

Well for one, you just don't need all these threads!

So, from this answer:

Q: How many Processors are required, and what clock speed? A: as many as you can get

doesn't make much sense at all!

For the most latency-sensitive aspects of a HFT system, you are going to see something that looks more like this:

Single threads isolated to single cores. Why? There are two main reasons I can think of: the OS does not get in your way (i.e., if everything is locked to one core and one thread, it is easier to deal with the OS and prevent it from delegating any other tasks to that thread, which saves you precious cache). Secondly, any implementation of lock-free programming is much easier in a setup like this. For a single thread operating on a single core, you can have single-producer single-consumer calls that integrate what happens on this core and ensures its efficient path over to the strategy core which operates in a similar manner.

Now, onto memory:

How much memory required?

Not much, actually. Definitely not "A: as must as you can get"

Think about it this way: do you want to be reading from memory for doing HFT?

No, you don't and you won't. Given the deterministic nature of this activity, it is imperative that your memory path be quickly accessible and not susceptible to a miss and read from RAM, so keeping it low level in CPU cache is by far the best option for this.

Now, storage!

How much internal storage would you need?

Again, similar to the deal with random access memory: reading from memory is bad, but reading from storage? This would be catastrophic. The data from the exchange is being stored, but not onsite! Why would you spend money on co-location space and take it up with storing tick data? The answer is that you wouldn't.

How many network ports are required?

The other answer covers this for the most part, it depends on the exchange and how it operates however generally speaking you would not see more than two or three. Theoretically, you only really need connections to order handlers of the exchange and that is another example of an inherent limitation; you get whatever bandwidth the exchange supports. This relates to the last question about fiber and bandwidth: given that you reach a limit for the bandwidth "as many as you can get" is silly and nonsensical as was stated prior. You will reach a bandwidth limit and more fiber does absolutely nothing for you.

The OS that runs on the machines depends on the hardware, and mostly that aspect of the system is less of an area of concern. Just something light on the resources that makes kernel networking easier. If you must know of a specific operating system that is used on HFT machines, I know of a successful firm using CentOS, which is a derivative of the upstream Red Hat Enterprise Linux (RHEL). The latter runs traditionally on x86, x86-64 and Itanium servers.

If this had to be summed up into something extremely short, here it is: