If your Java lambdas are running slow, it can be a nightmare to find the culprit! This article will cover a bunch of tried and true tips that you can utilize to optimize your code so that it runs faster, or uses less memory.

There are many optimization techniques that you can use and some will work better than others depending on the situation. In this article we will concentrate on optimizations that can improve your Java AWS Lambda functions.

Finding which parts of the function are slow

The most important thing when optimizing is to know where to optimize. This is something a profiler can help you with; it can tell you, in extremely granular detail, all of the potentially problematic regions of your code in terms of performance.

Note that profiler data might not always be precise and sometimes that imprecision might misrepresent situations, so they should really be used as a pointer in the direction of code that needs to be optimized. It is an available tool and one that can be very important when you look at the performance of your own code.

Of course, you are not required to use profiler, you could potentially measure how long execution takes by using System.nanoTime() at the start and end of a block of code. That would naturally just be a simple metric and it would have far less overhead than using a full-fledged profiler.

Doing this may help you concentrate on only a certain portion of code rather than seeing all of the data that represents all of your code. It might be very primitive but, like debugging messages made by System.err.printf() , it is quick and easy to add and the only thing you will be given is the information you specifically ask for. You could even use a combination of this with profiling to give you a better idea of what is happening with your code, and with printf() you may find this situation only happens under certain circumstances — profilers generally do not contain any information about the input for a method, so they lack some context.

The First Place to Look: The Algorithm

No matter which optimization you plan on performing, the first factor to check is whether the algorithms you are using in your code are optimal. Improvements in the algorithm used will in many cases outweigh the other optimizations that you perform on your code, saving time and money.

For example, if you were sorting numbers or other entries passed by a user, then you will need to choose a sorting algorithm. Algorithms such as selection sort will be slow, while merge sort will perform better. If you spend your time optimizing selection sort then in most cases even the mostly badly written merge sort will probably still run faster than your lean and fast selection sort.

Reducing the complexity of your classes

There was a lot of discussion of this solution in my previous article Java Libraries are Your Lambda Enemy. When a class is initialized by the virtual machine, it parses the class and needs to load all the information for the class, however this is in itself not a complex process. The complexity generally comes from the JIT compiler needing to compile all the code so it runs faster in the virtual machine. However a large number of simple classes can slow this process down because the virtual machine needs to handle far more classes; especially when there are many class dependencies. So you should keep your class complexity down and the number of classes down.

This feels like an obvious solution — fewer classes means less for the virtual machine to do. But it’s easy to accidentally add what seem like small libraries that turn out to need many dependencies. One thing to be particularly aware of is ServiceLoader — any classes which are discovered by the service loader will be initialized when they are iterated through. So this means that services that you might not ever use might just get initialized. If you need to use the service loader to discover services then those service class implementations should be simple factory classes.

Initializing variables and classes when needed and using the cache

This is something that varies, but usually the practice on normal servers is to initialize all you can when the program starts. Since the processes are more long lived these steps are done at the start so they never really become a problem in the long run. However in the Lambdas, with the massive contention during cold starts, the time spent initializing every single variable will be costly and cut into your initial cold start which may potentially time out or not be fast enough to where it overwhelms the lambda.

So to put less pressure on the cold start you should instead opt to initialize data only when it is needed. That way any complex data is only initialized as it is needed.

If your class generates any other objects which are used on an as-needed basis, if they are not required at all times then you can use a cache to keep it in memory for awhile. If you can recycle the cached objects you will save on initialization time for that class.

The proper way to implement this cache is to use SoftReference<T> , this reference will keep pointing to the object until there are no strong references to it (fields and variables) and when memory has been exhausted (during garbage collection). If you need to know when the reference is cleared then you can use ReferenceQueue<T> ; note that this uses polling which can either return immediately or wait until an object is made available and as such if a queue is used it will likely best be placed in its own thread.

Avoid blocking

One of the worst things you can do in your code when it has to have a low latency is blocking. This blocking happens when you wait for data in another thread ( synchronized , Object.wait() , and Lock.lock() ) or when you are waiting for a result from a remote request (such as HTTP). Any time your program is just sitting around waiting for something to happen, is time spent wasted. Since only a single execution of a Lambda happens at once there will really be no benefit if the CPU is not being used at all because your cost is the same regardless of how much or how little CPU you use (if you Thread.sleep() with a long duration that will cost you despite nothing happening).

If it is possible in your code, one way to reduce the time spent blocking is to create a thread which works in the background, then when you actually need the result you could block until it is available. You can have a worker object which stores the result of the calculation and have within there an atomic type (such as AtomicInteger or AtomicObject<T> ), read that value to see if one is available then if not lock on a monitor and then wait for a result to be calculated, reading in a loop and ignoring InterupptedException .

Alternatively a more performative implementation would use ReadWriteLock and Condition to perform the same function (since lower level locks, despite requiring more implementation work, can perform better than volatile and synchronized ).

Of course, if the thread has to run and finish but no result is needed, then it can just wait on a lock then once it is obtained it can just unlock and exit. Naturally if you want to avoid using locks and instead just constantly read from an atomic variable until some value is set, your busy loop should Thread.yield() so that we can tell the operating system that we want to give up the rest of our CPU slice and give that to another thread. Yielding may or may not have an effect depending on the number of threads which can run at the same time in the container the lambda is running in, however if in the event there are not enough resources to run so many threads at once it will free up CPU slices so a thread which is actually doing something can do what it needs to do.

***

Hopefully you find this information useful and can use it to help optimize your Lambdas so that they perform better, to lower your costs and allow you to serve requests faster.

If you’d like to learn more about IOpipe, try our 21-day free trial! You can also chat with us in either the IOpipe community Slack.