1. Trisha, could you introduce yourself to the InfoQ community? I’m Trisha Gee; I’m Senior Developer at LMAX. I've been working in financial markets for about five or six years, something like that, and I’ve been developing in Java for over 10 years now; been actually programming since I was about 10, but I probably shouldn't really go into too much detail of my whole life story.

2. What were you programming in when you were ten? Trisha: Basic



Charles: On which machine?



Trisha: BBC; on the BBC Micro.



Charles: Ah, the same generation as me. I was a Commodore 64 boy.



Trisha: So that’s where I started; doing a bit of that stuff; doing the goto line 10 and print out "Hello World", and all that stuff. But got into Java at university and that has taken me here; it’s amazing.





3. So is Java the main language you use at LMAX? Pretty much; we like to call ourselves a pure Java shop. We don’t use that many funky libraries or anything, because what we’re really interested in is speed and performance. So if we’re using any third-party libraries, it needs to be performant, it needs to not have any bugs in it; you’d be amazed how many weird concurrency bugs you can find in other people’s libraries. So we use Java; we use a number of libraries around Java like Spring and Guice and various other bits and pieces, but mostly it’s homegrown Java stuff. We do have a .NET client, so we are occasionally forced to program in C#. But we also have a UI, Web UI, so we are doing a bit of HTML, a bit of JavaScript, but also GWT; so again Java, but for the front end.

4. I guess some people are still surprised at Java being used in a low latency-type environment. Is it really that suitable? Yes. I mean, we are very passionate about that making sure that we do all our tests. So our tests prove that our latency figures are more than acceptable. We want our latency to be below 10 milliseconds and it’s down to 1.2 milliseconds; and we know we can get it lower than that. This is end-to-end latency; this isn’t just for one particular small part of our system. So we've proved that it does perform the way we want it to.



In addition, we’ve done some looking at other things we could do; so for example, early prototypes could have been done in C++. But the amount of development effort put into C++ in order to get it fast is much higher; and we’re also a very Agile shop, so getting code out quickly, and to the right standard of quality, is really, really important to us, and Java gives us that as well as being able to be high performance.

5. Since you mentioned that you’re an Agile shop, which kind of standard techniques do you use? Well, my boss literally wrote the book on continous delivery, so we do have a build pipeline. We’re using Jenkins for our continuous integration, but we don’t just do continuous integration from just unit tests and build; we have a whole build pipeline, where we do automated acceptance tests using Selenium, using drivers on our APIs, and drivers on our fixed clients. So, from that point of view, we are very quick from end of iteration build into production. So that’s one part of our 'Agility', if you like.



We also kind of run this - I call it a blend between Scrum and XP, but people don’t really like those sorts of labels. So, we run two-week iterations, so we are iterative, we do pair programming, we do test-driven development; and by test-driven, not just unit tests. These automated acceptance tests are also end-to-end; so you write the UI and you expect it to come all the way through; so we write those tests first too. We do, you know, the normal stuff you get with iteration heartbeat - retrospectives; and we have business owners who are quite closely tied in with each team, so that we know that what we’re delivering is exactly what the business wants.

6. And domain-driven design as well? Yes; so we’re pretty big on domain-driven design. Well, we’re doing lots of work to try and be as domain-driven as we can be, but it’s very easy when you are on the ground level, doing your code, to just get caught up in your lines of Java code, rather than thinking about the model. But it’s been quite interesting today at QCon London, where a lot of people have mentioned modeling your domain; and, by modeling your domain, not only do you get a best design, you get a simpler design, and you stop worrying about some of the technological things. Is there a technology which will solve this problem for us? Well, no; but if you model your domain correctly, and split it up correctly, then you don’t have to worry about a technology that’s going to solve these problems.



And that’s, kind of, one of the things that we’ve done with the Disruptor, is that it allows us to separate our design, so that different areas are taken care of. For a start, infrastructure; we don’t have to worry about the infrastructure, like messaging, message passing, when we’re doing our business logic.



When we’re doing our business logic, we only care about the particular business domain that we’re working on right now, bounded context, that sort of thing.



So, we do try and keep domain-driven design in mind when we’re designing our systems. We do lots of hand waving, and white boarding, and drawing out designs, and stuff like that.

7. Since you mentioned the Disruptor, and I know you've obviously been talking about that today as well, could you give us a quick overview of it, for people who aren't familiar with it yet? I’m yet to come up with a one-liner. It’s a very fast kind of event messaging; well, it’s a message passsing framework; so it allows you to share data between threads. And we have at the heart of the Disruptor a kind of magic ring buffer, where we put messages into the ring buffer or events into the ring buffer; and then you have a series of processes which can work on the events in the ring buffer.



And what we’ve done with the Disruptor is, we’ve come up with quite an elegant way to manage the dependencies between all the different processes which are going to work on each event as it comes in. So, for example, in the LMAX world, an event has to be taken off the wire, you know, off the message bus; is put into the ring buffer; then we need to replicate it off to a secondary server, or to DR, and we have to journal it to disc; and then, when those things are done, the next step in the pipeline is to do the business logic. And the Disruptor allows us to manage those dependencies, without contention, which gives us very fast execution of the messages through the system.

8. So how do you avoid contention when you’re reading and writing to the buffer slots in the ring buffer? The publisher owns actually writing into the ring buffer, in terms of writing the events, or in terms of writing the raw values which are going to go into your bucket in the ring buffer. Each event processor, the way that we run it, we generally tend to read the events and we might just send them elsewhere.



So for example, for replication, we just send them elsewhere, and for journaling, all we do is write them. So we don’t do any writing, so there’s no contention there.



If you are doing some writing, you need to make sure that each thing that might be writing into the event, each field on the event needs to only be written by one other thread. So, you can’t have multiple event processes writing to the same piece of data. They can be writing into the same object, but they can’t be writing to the same field; that way you’re going to get contention.



So we have to be quite disciplined with the Disruptor; if you do want to write back into the event, you can, but you can’t have two threads writing to the same thing. But when you do that, you don’t have to lock your event; you don’t have to worry about synchronization; you don’t have to worry about, you know, is this thing going to get written first? Or, is this is going to get written first, is something going to overwrite me?



If you know that only one thing is ever going to write to that one field, then you know that it’s safe when you’re reading it.

9. So how does it compare to something like a message queueing system? We've measured it against queues; so just the standard Java ArrayBlockingQueue, and it's subtantially faster than just using a simple queue. So I think if you ended up putting any sort of enterprise architecture around that queue, then it’s going to be even slower. So the performance figures are on the website, but we’re talking orders of magnitude faster than using a queue, especially queues are great if you’re doing one-to-one. But because you can have these interesting dependency graphs - like I was saying, you've got replication, and journaling and then your business logic; you've got three things that are reading from the ring buffer. You can’t really do that very easily with a queue, because even if you’re reading from a queue, you've got contention, because you’re consuming from a queue which means you’re actually writing the values of the queue. So, it’s much, much faster than queues.

10. Was it your first attempt at this? Did you try other techniques first? We took a scientific approach; so we had a number of hypotheses of the sorts of things which might be quick for us. So, for example, we were trying to solve a number of different problems, but the problem we were trying to solve, initially, was actually recoverability and reliability, more than speed.



So what we did is, we tried SEDA architecture, stage event-driven architecture, and that gives you that nice linear pipeline that we thought we wanted, where firstly, you replicate it to your secondary; secondly, you journal it to disc; and thirdly, or which ever order you want to do those in, and, then thirdly, you do your business logic - that seems to be very logical.



And that’s kind of the initial thing we did, but, between each of those stages, it’s fairly common to use a queue. And when we did our actual measuring to find out what the speed was like, we found that each of those stages was taking no time at all; it was actually the queues in between the stages which was absorbing most of the latency cost.



So we first started looking at how to do faster queues, and we came up with some faster queue implementations, and that was fine, but we thought, "Well, if we’re doing all these stages of stuff, in actual fact, we don’t need to do them serially like this; if you do them in parallel, not only do you get an increase in speed, because you can run two of these things in parallel, but also, if you're running them off the same data structure, you no longer have this latency of the queue in between each of those stages; so that’s one thing we tried.



Another thing we tried is, we looked at some grid computing stuff, and distributed stuff, but one of the things we've found when we've measured some of these problems is: if you have a problem, and you fan it out to a lot of things to try and solve it in pieces, the cost is not that bit; that’s done really quickly. The cost is when you gather all the bits and pieces and try and smoosh it back together again; and that’s quite expensive. So things like distributed computing and grid computing stuff, for some of the things we were trying to do, when really we're talking about just kind of sums, it doesn’t necessarily give us the speed increase that we wanted.

11. Do you have any kind of dynamic feed back mechanism; so if you've got several processes doing the same thing, do you have some way of increasing the number of those processes that you have, if you find you've got a bottleneck somewhere? We can, yes. So, mostly we measure the performance of the whole system, and like I said, we’ve got this SLA to be 10 milliseconds and we’re down to like 1.2 milliseconds; so we’re way off like butting up against that.



So firstly, we do loads of measurements to see where we are. So if things start getting a little bit tricky, then we don’t necessarily, dynamically, do anything with it at that point in time; we know we're several iterations off hitting a problem. So we’ll do whatever needs to be done to try and, you know, tackle garbage collection in that service; or to maybe do some profiling and see what’s causing the problems.



We have designed the system so that we can shard it, if necessary, as well. So, for example, our matching engine has been designed so that the most important thing for the matching engine is the instrument you’re placing the orders on, and everything's keyed by instrument.



So if we need the matching engine to run twice as fast, well maybe not twice as fast, but faster, we can split it into two so half the instruments go on there, half of the instruments go on there so everything gets routed into the two different matching engines.



Similarly with the risk engine; everything is done by accounts; so we can shard it based on the account that the user is on. So we've kind of designed it with the scalability and the nice thing about that, because it’s modular and because everything is done at service level is, if I want to shard the matching engine on instrument, it doesn’t impact any of the rest of the system. I don’t have to think, "Oh my goodness, how is the risk engine going to cope with that, because it doesn’t care about instruments; it does everything by account. It just sends something out on the messages which says, "This is my instrument," and then the correct matching engine will pick it up.

12. Why did you release it as open source? There are a number of different reasons; but mostly because it’s given us a bunch of different benefits, but maybe I’ll come on to that later. Initially, we open sourced it because we thought we’d done something very interesting. We haven’t done anything unique; we've pulled together a lot of different ideas that have been around for a while and this is, again, a little bit around our scientific method; so, we do a little bit of research, try some stuff out, and see how it goes.



So we thought we’d done something which was a bit unique, and we thought we were kind of flying in the face of what people were teaching. People were sort of saying, "We’ll use some framework which will magically do your MapReduce; or will magically do some multi-threaded stuff."



And what we were finding is that’s not necessarily great. What you might want to do is take absolute control over what you want; over what you want to parallelize. And we wanted to sort of say, "We want you to think about your domain and model it." So we open sourced it as much for an education exercise as anything else, really. It’s kind of to say, "Look; you know, I’m not sure that we’re doing the right things as an industry; I think this is a kind of cooler way to do it". So that was one of the things that we wanted to do.



The other thing is that we are quite a young company, so it’s useful to get our name out there and say, "Look, we want to be the fastest retail exchange in the world". And that’s our goal; we've got no lesser goal than that.



And in order to do that, one of the things we need to do is say, "We are fast. This is why we’re fast. We’re fast because we've got some of ..." and this is just the tip of the iceberg; we've got so much more, obviously, going on underneath the covers. And if we can open source this and have the experts, who've had some real hardcore Java and concurrency experts pull it apart and say, "No, that’s pretty good actually; it’s pretty quick," then that’s great for us; it kind of does say we do know what we’re doing and we are doing some stuff which is very quick and very interesting.

13. So how many developers do you have on your team at LMAX? I’m going to have to remember now. I think we’re up to about 18; is that right? Fifteen or eighteen. We are growing quite rapidly, because we’ve been live for just over a year, and one of my most commonly asked question, questions asked to me, is, "Now you’ve gone live, surely all the work’s done, and you’ve done the Disruptor, and there’s nothing else interesting to do?" But of course, everyone knows it doesn’t work that way, right? Once you've got a product out there, users are going, "Can I have this?" and other people are coming along and going, "Oh, wouldn't it be great if it did that?" And we’re going, "Oh, we didn’t really like the way we did that".



So we’ve got even more work now we've got real feedback coming in from the users. So ee’re aiming to probably nearly double the size of the team from where it was in the middle of last year; and we’d like to do that as quickly as possible, without impacting our actual velocity; that’s a challenge in its own right; trying to onboard people; obviously we are going to take a certain hit in the speed in which we can deliver stuff.



But trying to onboard people, get them up to speed, whilst still delivering value, is probably one of our big challenges at the moment.

15. Does that male dominance affect the team dynamic, do you think? I can’t really comment, because it’s the same everywhere I worked; so I’ve not really worked in a team where ... Oh no, that’s not true; I’ve worked in one team where there were two girl developers and one guy developer; and I don’t think that made any difference, because we’re three geeks; we’re not really two girls and a guy. And the same, like, in my place, whatever there is, 15 of us or whatever, the main thing I’m accused of being is a big fat geek; not 'being a girl', right? "My God, you're just so geeky." "Well, yes, of course, I am. "



So I don’t think it really impacts the team dynamic; however, there is a piece of software that we wrote which is called Auto Trish, which is based on me; which is based on my nagging ability; which I’m not sure whether I’m pleased or upset about that frankly. It's nice to be automated, because I don’t have to do it anymore; but I don’t think this has anything to do with the fact that I’m a nagging wife; I think it's got everything to do with the fact that I’m really OCD.



So this is my geeky thing, is that we have our interesting acceptance test suite, because it’s testing end-to-end, there is a certain amount of intermittancy in the amount of passes and things, because some things time out, especially as the system gets overloaded.



So in the early days, it was quite difficult to tell whether tests were failing because they were really broken or just because they’re quite intermittent; and if they are quite intermittent, which are the most intermittent ones? So my self-appointed task was to go through the test failures every morning, look at the last three or four releases, and see if it’s intermittent or if it’s genuinely failing. And I used to send out an email saying, "Well, you know, these are definitely failing; it looks like it’s probably these revisions. These are intermittently failing, but they look like they’re getting worse," or whatever, and it was quite useful; people got used to it. So people developed a whole tool around that analysis of the last 10 builds; and it’s called Auto Trish it does say, "A gentle nag" underneath it.



I don’t think it’s got anything to do with the fact that I’m a girl; I think it’s got everything to do with my OCD.

16. So there’s a lot of concern in the industry about the lack of women in, specifically I guess, programming and architecture-type roles. Do you think that’s a fair characterisation and, if so, do you think it's also a problem? It’s definitely true; there are not enough women in the industry. The industry does not have the even split of males to females that you would find when you’re walking down the street; so that’s definitely a problem, if you think it should be 50-50.



Is it a problem? I think so; I think diversity is important. I’m equally worried about the fact that I mostly see white faces; that I’m not sure that diversity of sexuality is fairly represented in IT - difficult to tell - but I’m more concerned about the lack of diversity. The lack of women, and the lack of people from various ethnic minorities, I don't know if we're even allowed to say that any more, but from various races or whatever, is more visible. And I think the fact that we are perceived as being white, middle class and male, I think that is a problem; and it’s difficult to sell to the world that we can solve your problems when we’re only a small subset of the world.



I mean, Martin Fowler talked this morning about some interesting things coming up in Africa and some of the interesting ways they're using Big Data, and some of the tools that they’re using, and that’s fascinating.



How do white, middle class men in London know what some African kid needs? I’m not sure if they do; maybe they do, maybe they don’t. And so I think diversity is really, really important.

18. So how do you address that? You have to start really young, which is difficult because employers are saying the right things and I think employers really mean this; they really do want diverse teams; they really do want women, ethnic minorities, people from poor and rich -real diversity in their teams, because there’s plenty of studies which show that diverse teams, generally speaking, produce better quality software.



So employers are saying, "Find me the girls; find me the black guys; find me the gays, whatever," but they just aren't out there, as you say. You’re coming out of university and if you’ve got 10% of graduates are women, you can’t just offer those 10% of women jobs just because they’re women. I mean, I really hate the idea of that because I hate the idea that people might think I got here because I’m a woman. I worked really hard to get here, and I really enjoy what I do, and I’m a real geek, and I don’t want people taking all the women and saying, "You going to have jobs as geeks, because we need more women". You need to start way back at like, 13 to 14, and say to girls, "You know, IT is cool; programming is cool; and this is why it’s cool."



And if they are being turned off at that age, they’re not necessarily wrong for being turned off at that age; because we’re doing something wrong if they think IT is not cool; and maybe it’s because we’re not cool; maybe it’s because we’re not doing the right stuff; or maybe it’s because they don’t really know what we do.



You know, you watch 'The IT Crowd', or whatever, and you're thinking, "System support for a big company;" you think, "God, I don’t want to do that." But I don’t want to do that either. We want to do mobile development; we want to invent Facebook; we want to invent Twitter; we want to go and talk at conferences, and that sort of thing. And there is a certain education aspect of kids before they take their GCSEs, or whatever the equivalent is elsewhere, and say, "Look," not just to the girls, but to the people who are not coming into the pipeline and say, "This is the kind of stuff; this is what IT means -iPhone applications; it means, being on the internet eight hours a day", and whatever; whatever is cool.

19. Do you think things like women-only events are a good idea? Oh, I hate women-only events. I think I might be wrong for saying it; no, I think I might be shot down by other people for saying that, but I don’t know, I’m told that at 15 through 18 that girls are more comfortable learning in an all-girl environment and maybe they don’t want to be surrounded by competitive males, who are telling them that the guys are much better at IT than the girls are; and maybe the girls do want to be in a more comfortable position.