Java is increasingly being used for low latency work where previously C and C++ were the de-facto choice.

InfoQ brought together four experts in the field to discuss what is driving the trend, and some of the best practices when using Java in these situations.

The participants:

Peter Lawrey is a Java consultant interested in low latency and high throughput systems. He has worked for a number of hedge funds, trading firms and investment banks.

Martin Thompson is a high performance and low latency specialist, with over two decades working with large scale transactional and big-data systems, in the automotive, gaming, financial, mobile, and content management domains.

Todd L. Montgomery is Vice President of Architecture for Informatica Ultra Messaging and the chief designer and implementer of the 29West low latency messaging products.

Dr Andy Piper recently joined Push Technology as Chief Technology Officer, from Oracle.

The Questions:

What do we mean by low latency? Is it the same thing as real-time? How does it relate to high performance code in general? Some of the often cited advantages of using Java in other situations include access to the rich collection of libraries, frameworks, application servers and so on, and also the large number of available programmers. Do these advantages apply when working on low latency code? If not, what advantages does Java have over C++? How does the JVM support concurrent programs? Ignoring Garbage Collection for a moment, what other Java specific techniques (things that wouldn't apply if you were using C++) are there for writing low latency code in Java? I'm thinking here about things like warming up the JVM, getting all your classes into permgen to avoid IO, Java specific techniques for avoiding cache misses, and so on. How has managing GC behaviour affected the way people code for low latency in Java? When analysing low latency applications are there any common causes or patterns you see behind "spikes" or outliers in performance? Java 7 introduced support for Sockets Direct Protocol (SDP) over InfiniBand fabric. Is this something you've seen exploited in production systems yet? If it isn't being used, what other solutions are you seeing the wild? Perhaps a less Java specific question, but why do we need to try and avoid contention? In situations where you can't avoid it what are the best ways to manage it? Has the way you approach low latency development in Java changed in the past couple of years? Is Java suitable for other performance sensitive work. Would you use it in an HFT system, for example, or is C++ still a better choice here?

Q1: What do we mean by low latency? Is it the same thing as real-time? How does it relate to high performance code in general?

Lawrey: A system with a measured latency requirement which is too fast to see. This could be anywhere from 100 nano-seconds to 100 milli-seconds. Montgomery: Real-time and low latency can be quite different. The majority view on "real-time" would be determinism over pure speed with very closely controlled, or even bounded, outliers. However, "low latency" typically implies that pure speed is given much higher priority and some outliers may be, however slightly, more tolerable. This is certainly the case when thinking about hard real-time. One of the key pre-requisites for low latency is a keen eye for efficiency. From a system view, this efficiency must permeate the entire application stack, the OS, and the network. This means that low latency systems have to have a high degree of mechanical sympathy to all those components. In addition, many of the techniques that have emerged in low latency systems over the last several years have come from high performance techniques in OSs, languages, VMs, protocols, other system development areas, and even hardware design. Thompson: Performance is about 2 things - throughput, e.g. units per second, and response time otherwise know as latency. It is important to define the units and not just say something should be "fast". Real-time has a very specific definition and is often misused. Real-time is to do with systems that have a real time constraint from input event to response time regardless of system load. In a hard real-time system if this constraint is not honoured then a total system failure can occur. Good examples are heart pacemakers or missile control systems. With trading systems, real-time tends to have a different meaning in that the system must have high throughput and react as quickly as possible to an event, which can be considered "low latency". Missing a trading opportunity is typically not a total system failure so you cannot really call this real-time. A good trading system will have a high quality of execution for which one aspect is to have a low latency response with little deviation in response time. Piper: Latency is simply the delay between decision and action. In the context of high performance computing, low latency has typically meant that transmission delays across a network are low or that the overall delays from request to response are low. What defines "low" depends on the context – low latency over the Internet might be 200ms whereas low latency in a trading application might by 2μs. Technically low latency is not the same as real-time – low latency typically is measured as percentiles where the outliers (situations in which latency has not been low) are extremely important to know about. With real-time, guarantees are made about the behaviour of the system – so instead of measuring percentile delays you are enforcing a maximum delay. You can see how a real-time system is also likely to be a low latency system, whereas the converse is not necessarily true. Today however, the notion of enforcement is gradually being lost so that many people now use the terms interchangeably. If latency is the overall delay from request to response then it is obvious that many things contribute to this delay – CPU, network, OS, application, even the laws of physics! Thus low latency systems typically require high-performance code so that software elements of latency can be reduced.

Q2: Some of the often cited advantages of using Java in other situations include access to the rich collection of libraries, frameworks, application servers and so on, and also the large number of available programmers. Do these advantages apply when working on low latency code? If not, what advantages does Java have over C++?

Lawrey: If your application spends 90% of the time in 10% of your code, Java makes optimising that 10% harder, but writing and maintaining 90% of your code easier; especially for teams of mixed ability. Montgomery: In the capital markets, especially algorithmic trading, there are a number of factors that come into play. Often the faster an algorithm can be put into the market, the more advantage it has. Many algorithms have a shelf life and quicker time to market is key in taking advantage of that. With the community around Java and the options available, it can definitely be a competitive advantage, as opposed to C or C++ where the options may not be as broad for the use case. Sometimes, though, pure low latency can rule out other concerns. I think currently, the difference in performance between Java and C++ is so close that it's not a black and white decision based solely on speed. Improvements in GC techniques, JIT optimizations, and managed runtimes have made traditional Java weaknesses with respect to performance into some very compelling strengths that are not easy to ignore.



Thompson: Low latency systems written in Java tend to not use 3rd party or even standard libraries for two major reasons. Firstly, many libraries have not been written with performance in mind and often do not have sufficient throughput or response time. Secondly, they tend to use locks when concurrent, and they generate a lot of garbage. Both of these contribute to highly variable response times, due to lock-contention and garbage collection respectively. Java has some of the best tooling support of any language which results in significant productivity gains. Time to market is often a key requirement when building trading systems, and Java can often get you there sooner. Piper: In many ways the reverse is true – writing good low latency code in Java is relatively hard since the developer is insulated from the guarantees of the hardware by the JVM itself. The good news is that this is changing, not only are JVMs constantly getting faster and more predictable but developers are now able to take advantage of hardware guarantees through a detailed understanding of the way that Java works – in particular the Java Memory Model - and how it maps to the underlying hardware (indeed Java was the first popular language to provide a comprehensive memory model that programmers could rely on – C++ only provided one later on). A good example is the lock-free, wait-free techniques that Martin Thompson has been promoting and that our company, Push, has adopted into its own development with great success. Furthermore, as these techniques become more mainstream we are starting to see their uptake in standard libraries (e.g. the Disruptor) so that developers can adopt the techniques without needing such a detailed understanding of the underlying behaviour. Even without these techniques the safety advantages of Java (memory management, thread management etc.) can often outweigh the perceived performance advantages of C++, and of course JVM vendors have claimed for some time that modern JVMs are often faster than custom C++ code because of the holistic optimizations that they can apply across an application.

Q3. InfoQ: How does the JVM support concurrent programs?

Lawrey: Java has had built-in multi-threading support from the start and high level concurrency support standard for almost ten years. Montgomery: The JVM is a great platform for concurrent programs. The memory model allows a consistent model for developers to utilize lock-free techniques across hardware, which is a great plus for getting the most out of the hardware by applications. Lock-free and wait-free techniques are great for creating efficient data structures, something we need very desperately in the development community. In addition, some of the standard library constructs for concurrency are quite handy and can make for more resilient applications. With C++11, certain specifics aside, Java is not the only one with access to a lot of these constructs. And the C++11 memory model is a great leap forward for developers. Thompson: Java (1.5) was the first major language to have a specified memory model. A language level memory model allows programmers to reason about concurrent code at an abstraction above the hardware. This is critically important, as hardware and compilers will aggressively reorder our code to gain performance which has visibility issues across threads. With Java it is possible to write lock-free algorithms that when done well can provide some pretty amazing throughput at low and predictable latencies. Java also has rich support for locks. However, when locks are contended the operating system must get involved as an arbitrator with huge performance costs. The latency difference between a contended and uncontended lock is typically 3 orders of magnitude. Piper: Support for concurrent programs in Java start with the Java Language Specification itself – the JLS describes many Java primitives and constructs that support concurrency. At a basic level this is the java.lang.Thread class for the creation and management of threads and the synchronized keyword for the mediation of access to shared resources from different threads. On top of this, Java provides a whole package of data structures optimized for concurrent programs ( java.util.concurrent ) from concurrent hash tables to task schedulers to different lock types. One of the biggest areas of support, however, is the Java Memory Model (JMM) that was incorporated into the JLS as part of JDK 5. This provides guarantees around what developers can expect when dealing with multiple threads and their interactions. These guarantees have made it much easier to write high-performance, thread-safe code. In the development of Diffusion we rely very heavily on the Java Memory Model in order to achieve the best possible performance.

Q4. InfoQ: Ignoring Garbage Collection for a moment, what other Java specific techniques (things that wouldn't apply if you were using C++) are there for writing low latency code in Java? I'm think here about things like warming up the JVM, getting all your classes into permgen to avoid IO, Java specific techniques for avoiding cache misses, and so on.

Lawrey: Java allows you to write, test and profile your application with limited resources more effectively. This gives you more time to ensure you cover all the "big picture". I have seen many C/C++ projects spend a lot of time drilling down to the low level and still end up with longer latencies end to end. Montgomery: That is kind of tough. The only obvious one would be warm up for JVMs to do appropriate optimizations. However, some of the class and method call optimizations that can be done via class hierarchy analysis at runtime are not possible currently in C++. Most other techniques can also be done in C++ or, in some cases, don't need to be done. Low latency techniques in any language often involve what you _don't_ do that can have the biggest impact. In Java there are a handful of things to avoid that can have undesirable side effects for low latency applications. One is the use of specific APIs, such as the Reflection API. Thankfully, there are often better choices for how to achieve the same end result. Thompson: You mention most of the issues in your question :-) Basically, Java must be warmed up to get the runtime to a steady state. Once in this steady state Java can be as fast as native languages and in some cases faster. One big achilles heel for Java is lack of memory layout control. A cache miss is a lost opportunity to have executed ~500 instructions on a modern processor. To avoid cache misses we need control of memory layout and then we must access it in a predictable fashion to avoid cache misses. To get this level of control, and reduce GC pressure, we often have to create data structures in DirectByteBuffers or go off heap and use Unsafe. Both of these allow for the precise layout of data structures. This need could be removed if Java introduced support for arrays of structures. This does not need to be a language change and could be introduced by some new intrinsics. Piper: The question seems to be based on false premises. At the end of the day, writing a low latency program is very similar to writing other programs where performance is a concern - the input is code provider by a developer (whether C++ or Java) which executes on a hardware platform with some level of indirection inbetween (e.g. through the JVM or through libraries, compiler optimizers etc. in C++) – the fact that the specifics vary makes little difference. This is essentially an exercise in optimization and the rules of optimization are, as always: 1. Don't.

2. Don't Yet (for experts only). And if that does not get you where you need to be:

1. See if you actually need to speed it up.

2. Profile the code to see where it's actually spending its time.

3. Focus on the few high-payoff areas and leave the rest alone. Now of course the tools you would use to achieve this and the potential hotspots might be different between Java and C++, but that’s just because they are different. Granted, you might need to understand in a little more detail than your average Java programmer would what is going on – but the same is true for C++ also; and of course by using Java there are many things you don’t need to understand so well because they are adequately catered for by the runtime. In terms of the types of things that might need optimizing – these are the usual suspects of code paths, data structures and locks. In Diffusion we have adopted a benchmark driven approach where we are constantly profiling our application and looking for optimization opportunities.

Q5. InfoQ: How has managing GC behaviour affected the way people code for low latency in Java?

Lawrey: There are different solutions for different situations. My preferred solution is to produce so little garbage, it no longer matters. You can cut your GCs to less than once a day. IMHO, at this point the real reason to reduce garbage is to ensure you are not filling your CPU caches with garbage. Reducing the garbage you are producing can improve the performance of your code by 2-5x. Montgomery: Most of the low latency systems I have seen in Java have gone to great lengths to minimize or even try to eliminate the generation of garbage. As an example, avoiding the use of Strings altogether is not uncommon. Informatica Ultra Messaging (UM) itself has provided specific Java methods to cater to the needs of many users with respect to object reuse and avoiding some usage patterns. If I had to guess, the most common implication has been the prevalent use of object reuse. This pattern has also influenced many other non-low latency libraries, such as Hadoop. It's a common technique now within the community to provide options or methods for users of an API or framework to utilize them in a low or zero garbage manner. In addition to the effect on coding practices, there is also an operational impact for low latency systems. Many systems will take some, shall we say, creative control of GC. It's not uncommon to only allow GC to occur at specific times of the day. The implications on application design and operational requirements are a major factor in controlling outliers and gaining more determinism. Thompson: Object pools are employed or, as mentioned in the previous response, most data structures need to be managed in ByteBuffers or off heap. This results in a C style of programming in Java. If we had a truly concurrent garbage collector then this could be avoided. Piper: How long is a piece of java.lang.String ? Sorry, I’m being facetious – the truth is that some of the biggest changes to GC behaviour have come about through JVM improvements rather than through individual coding decisions by programmers. HotSpot, for instance, has come an incredibly long way from the early days when you could expect GC pauses measured in minutes. Many of these changes have been driven by competition – it used to be that BEA JRockit behaved far better than HotSpot from a latency perspective, creating much lower jitter. These days however, Oracle is merging the JRockit and HotSpot codebases precisely because the gap has narrowed so much. Similar improvements have been seen in other, more modern, JVMs such as Azul's Zing and in many cases developer attempts to "improve" GC behaviour have actually had no net benefit or made things worse. However, that's not to say that there aren't things that developers can do to manage GC – for instance by reducing object allocations through either pooling or using off heap storage to limit memory churn. It's still worth bearing in mind however that these are problems that JVM developers are also very focused on, so it still may well be either not necessary to do anything at all or easier to simply buy a commercial JVM. The worst thing you can do is prematurely optimize this area of your applications without knowing whether it is actually a problem or not, since these kinds of techniques increase application complexity through the bypass of a very useful Java feature (GC) and therefore can be hard to maintain.



Q6. InfoQ: When analysing low latency applications are there any common causes or patterns you see behind "spikes" or outliers in performance?

Lawrey: Waiting for IO of some type. CPU instruction or data cache disturbances. Context switches. Montgomery: In Java, GC pauses are beginning to be well understood and, thankfully, we have better GCs that are available. System effects are common for all languages though. OS scheduling delay is one of the many causes behind spikes. Sometimes it is the direct delay and sometimes it is a knock-on effect caused by the delay that is the real killer. Some OSs are better than others when it comes to scheduling under heavy load. Surprisingly, for many developers the impact that poor application choices can make on scheduling is something that often comes as a surprise and is often hard to debug sufficiently. Of a related note is the delay inherent from I/O and contention that I/O can cause on some systems. A good assumption to make is that any I/O call may block and will block at some point. Thinking through the implications inherent in that is very often key. And remember, network calls are I/O. There are a number of network specific causes for poor performance to cover as well. Let me list the key items to consider. Networks take time to traverse. In WAN environments, the time it takes to propagate data across the network is non-trivial.

Ethernet networks are not reliable, it is the protocols on them that provide reliability.

Loss in networks causes delay due to retransmission and recovery as well as second order effects such as TCP head-of-line blocking.

Loss in networks can occur on the receiver side due to resource starvation in various ways when UDP is in use.

Loss in networks can occur within switches and routers due to congestion. Routers and switches are natural contention points and when contended for, loss is the trade off.

Reliable network media, like InfiniBand, trade off loss for delay at the network level. The end result of loss causing delay is the same, though. To a large degree, low latency applications that make heavy use of networks often have to look at a whole host of causes of delay and additional sources of jitter within the network. Beside network delay, loss is probably a high contender for the most common cause of jitter in many low latency applications. Thompson: I see many causes of latency spikes. Garbage collection is the one most people are aware of but I also see a lot of lock contention, TCP related issues, and many Linux kernel related issues due to poor configuration. Many applications have poor algorithm design that do not amortize the expensive operations like IO and cache-misses under bursty conditions and thus suffer queueing effects. Algorithm design is often the largest cause of performance issues and latency spikes in the applications I've seen. Time To Safepoint (TTS) is a major consideration when dealing with latency spikes. Many JVM operations require all user threads to be stopped by bringing them to a safepoint. Safepoint checks are typically performed on method returns. The need for safepoints can be anything from revoking biased locks, some JNI interactions, and de-optimizing code, through to many GC phases. Often the time taken to bring all threads to a safepoint is more significant than the work to be done. The work is then followed by the significant costs in waking all those threads again to run. Getting a thread to safepoint quickly and predictably is often not a considered or optimised part of many JVMs, e.g. object cloning and array copying. Piper: The most common cause of outliers are GC pauses, however the most common cure for GC pauses is GC tuning rather than actual code changes. For instance, simply changing from the parallel collector that is used by default in JDK6 and JDK 7 to the concurrent mark sweep collector, can make a huge difference to stop-the-world GC pauses that typically cause latency spikes. Beyond tuning, another thing to bear in mind is the overall heap size being used. Very large heaps typically put more pressure on the garbage collector and can cause longer pause times – often simply eliminating memory leaks and reducing memory usage can make a big different to the overall behaviour of a low latency application. Apart from GC, lock contention is another major cause of latency spikes, but this can be rather harder to identify and resolve due to its often non-deterministic nature. It’s worth remembering also that any time the application is unable to proceed it will yield a latency spike – this could be caused by many things, even things outside the JVM's control – e.g. access to kernel or OS resources. If these kinds of constraint can be identified then it is perfectly possible to change an application to avoid the use of these resources or to change the timing of when they are used.

Q7. InfoQ: Java 7 introduced support for Sockets Direct Protocol (SDP) over InfiniBand fabric. Is this something you've seen exploited in production systems yet? If it isn't being used what other solutions are you seeing the wild?

Lawrey: I haven't used it for ethernet because it creates quite a bit of garbage. In low latency systems, you want to minimise the number of network hops and usually it's the external connections that are the only ones you cannot remove. These are almost always ethernet. Montgomery: We have not seen this that much. It has been mentioned, but we have not seen it being seriously considered. Ultra Messaging is used as the interface between SDP and the developer using messaging. SDP fits much more into a (R)DMA access pattern than a push-based usage pattern. Turning an DMA pattern into a push pattern is possible, but SDP is not that well suited for it, unfortunately. Thompson: I've not seen this used in the wild. Most people use a stack like OpenOnload and network adapters from the likes of Solarflare or Mellanox. At the extreme I've seen RDMA over InfiniBand with custom lock-free algorithms accessing shared memory directly from Java. Piper: Oracle's Exalogic and Coherence products have used Java and SDP for some time so in that sense we've seen usage of this feature in production systems for some time also. In terms of developers actually using the Java SDP support directory rather than through some third-party product, no not-so-much – but if it adds business benefit then we expect this to change. We ourselves have made used of latency optimized hardware (e.g. Solarflare 10GbE adapters) where the benefits are accrued from kernel driver installation rather than specific Java tuning.

Q8 InfoQ: Perhaps a less Java specific question, but why do we need to try and avoid contention? In situations where you can't avoid it what are the best ways to manage it?

Lawrey: For ultra low latency, this is an issue, but for multi-micro seconds latencies I don't see it as an issue. In situations where you can't avoid it be aware of, and minimise the impact of any resource contention. Montgomery: Contention is going to happen. Managing it is crucial. One of the best ways to deal with contention is architecturally. The "single writer principle" is an effective way to do that. In essence, just don't have the contention, assume a single writer and build around that base principle. Minimize the work on that single write and you would be surprised what can be done. Asynchronous behavior is a great way to avoid contention. It all revolves around the principle of "always be doing useful work". This also normally turns into the single write principle. I often like a lock-free queue in front of a single writer on a contended resource and use a thread to do all the writing. The thread does nothing but pull off a queue and do the writing operation in a loop. This works great for batching as well. A wait-free approach on the enqueue side pays off big here and that is where asynchronous behavior comes into the play for me from the perspective of the caller. Thompson: Once we have contention in an algorithm we have a fundamental scaling bottleneck. Queues form at the point of contention and Little's Law kicks in. We can also model the sequential constraint of the contention point with Amdahl's Law. Most algorithms can be reworked to avoid contention from multiple threads or execution contexts giving a parallel speed up, often via pipelining. If we really must manage contention on a given data resource then the atomic instructions provided by processors tend to be a better solution than locks because they operate in user space without ever involving the kernel. The next generation of Intel processors (Haswell) expands on these instructions to provide hardware transactional memory support for updating small amounts of data atomically. Unfortunately, Java is likely to take a long time to offer such support directly to programmers. Piper: Lock contention can be one of the biggest performance impediments for low latency applications. Locks in themselves don’t have to be expensive and in the uncontended case Java synchronized blocks perform extremely well. However with contended locks performance can fall off a cliff – not just because a thread holding a lock prevents another thread that wants the same lock from doing work, but also because simply the fact that more than one thread is accessing the lock makes the lock more expensive for the JVM to manage. Obviously avoidance is key, so don't synchronize stuff that doesn’t need it – remove locks that are not protecting anything, reduce the scope of locks that are, reduce the time that locks are held for, don't mix the responsibilities of locks, etc. Another common technique is to remove multi-threaded access – instead of giving multiple threads access to a shared data structure; updates can be queued as commands with the queue being tended by a single thread. Lock contention then simply comes down to adding items to the queue – which itself can be managed through lock-free techniques.

Q9. InfoQ: Has the way you approach low latency development in Java changed in the past couple of years?

Lawrey: Build a simple system which does what you want. Profile it as end to end as possible. Optimise and/or rewrite where you measure the bottlenecks to be. Montgomery: Entirely changed. Ultra Messaging started in 2004. At the time, the thought of using Java for low latency was just not a very obvious choice. But a few certainly did consider it. And more and more have ever since. Today I think the landscape is totally changed. Java is not only viable, it may be the predominant option for low latency systems. It's been the awesome work done by Martin Thompson and [Azul Systems'] Gil Tene that has really propelled this change in attitude within the community. Thompson: The main change over the past few years has been the continued refinement of lock-free and cache friendly algorithms. I often have fun getting involved in language shoot outs which just keep proving that the algorithms are way more important than the language to performance. Clean code that displays mechanical sympathy tends to give amazing performance, regardless of language. Piper: Java VM's and hardware are constantly changing, so low latency development is always an arms race to stay in the sweet spot of target infrastructure. JVMs have also got more robust and dependable in their implementation of the Java memory model and concurrent data structures that rely on underlying hardware support, so that techniques such as lock-free/wait-free have moved into the mainstream. Hardware also is now on a development track of increasing concurrency based on increasing execution cores, so that techniques that take advantage of these changes and minimize disruption (e.g. by giving more weight to avoiding lock contention) are becoming essential to development activities. In Diffusion we have now got down to single-digit microsecond latency all on stock Intel hardware using stock JVMs.

Q10. InfoQ: Is Java suitable for other performance sensitive work. Would you use it in an High Frequency Trading system, for example, or is C++ still a better choice here?

Lawrey: For time to market, maintainability and support from teams of mixed ability, I believe Java is the best. The space for C or C++ between where you would use Java and FPGAs or GPUs is getting narrower all the time. Montgomery: Java is definitely an option for most high performance work. For HFT, Java already has most everything needed. There is more room for work, though: more intrinsics is an obvious one. In other domains, Java can work well, I think. Just like low latency, I think it will take developers willing to try to make it happen, though. Thompson: With sufficient time, I can make a C/C++/ASM program perform better than Java, but there is not that much in it these days. Java is often the much quicker delivery route. If Java had a good concurrent garbage collector, control of memory layout, unsigned types, and some more intrinsics for access to SIMD and concurrent primitives then I'd be a very happy bunny. Piper: I see C++ as an optimization choice. Java is by far the preferred development environment from a time-to-market, reliability, higher-quality perspective, so I would always choose Java first and then switch to something else only if bottlenecks are identified that Java cannot address – it’s the optimization mantra all over again.

About the Panelists

Peter Lawrey is a Java consultant interested in low latency and high throughput systems. He has works for a number of hedge funds, trading firms and investment banks. Peter is 3rd for Java on StackOverflow, his technical blog gets 120K page views per month and is the lead developer for the OpenHFT project on github. The OpenHFT project includes Chronicle, which supports up to 100 million persisted messages per second. Peter offers a free hourly session on different low latency topics, twice a month, to the Performance Java User's Group.

Todd L. Montgomery is Vice President of Architecture for the Messaging Business Unit of 29West, now part of Informatica. As the chief architect of Informatica’s Messaging Business Unit, Todd is responsible for the design and implementation of the Ultra Messaging product family, which has over 170 production deployments within the financial services sector. In the past, Todd has held architecture positions at TIBCO and Talarian, as well as research and lecture positions at West Virginia University, contributed to the IETF, and performed research for NASA in various software fields. With a deep background in messaging systems, reliable multicast, network security, congestion control, and software assurance, Todd brings a unique perspective tempered by 20 years of practical development experience.

Martin Thompson is a high performance and low latency specialist, with experience gained over two decades working with large scale transactional and big-data domains, including automotive, gaming, financial, mobile, and content management. He believes Mechanical Sympathy - applying an understanding of the hardware to the creation of software - is fundamental to delivering elegant, high performance, solutions. Martin was the co-founder and CTO of LMAX, until he left to specialise in helping other people achieve great performance with their software. The Disruptor concurrent programming framework is just one example of what his mechanical sympathy has created.

Dr Andy Piper recently joined the Push Technology team as Chief Technology Officer. Previously a Technical Director at Oracle Corporation, Andy has over 18 years experience working at the forefront of the technology industry. In his role at Oracle, Andy lead development for Oracle Complex Event Processing (OCEP) as well as driving global product strategy and innovation. Prior to Oracle, Andy was an architect for the WebLogic Server Core at BEA Systems, a provider of middleware infrastructure technologies.