What’s better? Using the JDK’s String.replace() or something like Apache Commons Lang’s Apache Commons Lang’s StringUtils.replace() ?

In this article, I’ll compare the two, first in a profiling session using Java Mission Control (JMC), then in a benchmark using JMH, and we’ll see that Java 9 heavily improved things in this area.

Profiling using JMC

In a recent profiling session where I checked for any “obvious” bottlenecks in jOOQ, I’ve discovered this nasty regular expression pattern instantiation:

Tons of int[] instances were allocated by a regular expression pattern. That’s weird, because in general, inside of jOOQ’s internals, special care is always taken to pre-compile any regular expressions that are needed in static members, e.g.:

private static final Pattern TYPE_NAME_PATTERN = Pattern.compile("\\([^\\)]*\\)");

This allows for using the Pattern in a far more optimal way, than e.g. by using String.replaceAll():

// Much better, pattern is pre-compiled TYPE_NAME_PATTERN.matcher(castTypeName).replaceAll("") // Much worse, pattern is compiled *every time* castTypeName.replaceAll("\\([^\\)]*\\)", "")

That should be clear to everyone. The price to pay for this is the fact that the pattern is stored “far away” in some static member, rather than being visible right where it is used, which is a bit less readable. At least in my opinion.

SIDENOTE: People tend to get all angry about premature optimisation and such. Yes, these optimisations are micro optimisations and aren’t always worth the trouble. But this article is about jOOQ, a library that does a lot of expression tree transformations, and it is important for jOOQ to eliminate even 1% “bottlenecks”, as they make a difference. So, please read this article in this context. Consider also our previous post about this subject: Top 10 Easy Performance Optimisations in Java

What was the problem in jOOQ?

Now, what appears to be obvious when using regular expressions seems less obvious when using ordinary, constant string replacements, such as when calling String.replace(CharSequence) , as was done in the linked jOOQ issue #6672. The relevant piece of code was escaping all inline strings that are sent to the SQL database, to prevent syntax errors and, of course, SQL injection:

static final String escape(Object val, Context<?> context) { String result = val.toString(); if (needsBackslashEscaping(context.configuration())) result = result.replace("\\", "\\\\"); return result.replace("'", "''"); }

We’re always escaping apostrophes by doubling them, and in some databases (e.g. MySQL), we often have to escape backslashes as well (unfortunately, not all ORMs seem to do this or even be aware of this MySQL “feature”).

Unfortunately as well, despite heavy use of Apache Commons Lang’s StringUtils.replace() in jOOQ’s internals, every now and then a String.replace(CharSequence) sneaks in, because it’s just so convenient to write.

Meh, does it matter?

Usually, in ordinary business logic, it shouldn’t (again – don’t optimise prematurely), but in jOOQ, which is essentially a SQL string manipulation library, it can get quite costly if a single replace call is done excessively (for good reasons, of course), and it is slower than it should be. And it is, prior to Java 9, when this method was optimised. I’ve done the profiling with Java 8, where internally, String.replace() uses a literal regex pattern (i.e. a pattern with a “literal” flag that is faster, but it is a pattern, nonetheless).

Not only does the method appear as a major offender in the GC allocation view, it also triggers quite some action in the “hot methods” view of JMC:

Those are quite a few Pattern methods. The percentages have to be understood in the context of a benchmark, running millions of queries against an H2 in-memory database, so the overhead is significant!

Using Apache Commons Lang’s StringUtils

A simple fix is to use Apache Commons Lang’s StringUtils instead:

static final String escape(Object val, Context<?> context) { String result = val.toString(); if (needsBackslashEscaping(context.configuration())) result = StringUtils.replace(result, "\\", "\\\\"); return StringUtils.replace(result, "'", "''"); }

Now, the pressure has changed significantly. The int[] allocation is barely noticeable in comparison:

And much fewer Pattern calls are made, overall.

Benchmarking using JMH

Profiling can be very useful to spot bottlenecks, but it needs to be read with care. It introduces some artefacts and slight overheads and it is not 100% accurate when sampling call stacks, which might lead the wrong conclusions at times. This is why it is sometimes important to back claims by running an actual benchmark. And when benchmarking, please, don’t just loop 1 million times in a main() method. That will be very very inaccurate, except for very obvious, order-of-magnitude scale differences.

I’m using JMH here, running the following simple benchmark:

package org.jooq.test.benchmark; import org.apache.commons.lang3.StringUtils; import org.openjdk.jmh.annotations.Benchmark; import org.openjdk.jmh.annotations.Fork; import org.openjdk.jmh.annotations.Measurement; import org.openjdk.jmh.annotations.Warmup; import org.openjdk.jmh.infra.Blackhole; @Fork(value = 3, jvmArgsAppend = "-Djmh.stack.lines=3") @Warmup(iterations = 5) @Measurement(iterations = 7) public class StringReplaceBenchmark { private static final String SHORT_STRING_NO_MATCH = "abc"; private static final String SHORT_STRING_ONE_MATCH = "a'bc"; private static final String SHORT_STRING_SEVERAL_MATCHES = "'a'b'c'"; private static final String LONG_STRING_NO_MATCH = "abcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabc"; private static final String LONG_STRING_ONE_MATCH = "abcabcabcabcabcabcabcabcabcabcabca'bcabcabcabcabcabcabcabcabcabcabcabcabc"; private static final String LONG_STRING_SEVERAL_MATCHES = "abcabca'bcabcabcabcabcabc'abcabcabca'bcabcabcabcabcabca'bcabcabcabcabcabcabc"; @Benchmark public void testStringReplaceShortStringNoMatch(Blackhole blackhole) { blackhole.consume(SHORT_STRING_NO_MATCH.replace("'", "''")); } @Benchmark public void testStringReplaceLongStringNoMatch(Blackhole blackhole) { blackhole.consume(LONG_STRING_NO_MATCH.replace("'", "''")); } @Benchmark public void testStringReplaceShortStringOneMatch(Blackhole blackhole) { blackhole.consume(SHORT_STRING_ONE_MATCH.replace("'", "''")); } @Benchmark public void testStringReplaceLongStringOneMatch(Blackhole blackhole) { blackhole.consume(LONG_STRING_ONE_MATCH.replace("'", "''")); } @Benchmark public void testStringReplaceShortStringSeveralMatches(Blackhole blackhole) { blackhole.consume(SHORT_STRING_SEVERAL_MATCHES.replace("'", "''")); } @Benchmark public void testStringReplaceLongStringSeveralMatches(Blackhole blackhole) { blackhole.consume(LONG_STRING_SEVERAL_MATCHES.replace("'", "''")); } @Benchmark public void testStringUtilsReplaceShortStringNoMatch(Blackhole blackhole) { blackhole.consume(StringUtils.replace(SHORT_STRING_NO_MATCH, "'", "''")); } @Benchmark public void testStringUtilsReplaceLongStringNoMatch(Blackhole blackhole) { blackhole.consume(StringUtils.replace(LONG_STRING_NO_MATCH, "'", "''")); } @Benchmark public void testStringUtilsReplaceShortStringOneMatch(Blackhole blackhole) { blackhole.consume(StringUtils.replace(SHORT_STRING_ONE_MATCH, "'", "''")); } @Benchmark public void testStringUtilsReplaceLongStringOneMatch(Blackhole blackhole) { blackhole.consume(StringUtils.replace(LONG_STRING_ONE_MATCH, "'", "''")); } @Benchmark public void testStringUtilsReplaceShortStringSeveralMatches(Blackhole blackhole) { blackhole.consume(StringUtils.replace(SHORT_STRING_SEVERAL_MATCHES, "'", "''")); } @Benchmark public void testStringUtilsReplaceLongStringSeveralMatches(Blackhole blackhole) { blackhole.consume(StringUtils.replace(LONG_STRING_SEVERAL_MATCHES, "'", "''")); } }

Notice that I tried to run 2 x 3 different string replacement scenarios:

The string is “short”

The string is “long”

Cross joining (there, finally some SQL in this post!) the above with:

No match is found

One match is found

Several matches are found

That’s important because different optimisations can be implemented for those different cases, and probably, in jOOQ’s case, there is mostly no match in this particular case.

I ran this benchmark once on Java 8:

$ java -version java version "1.8.0_141" Java(TM) SE Runtime Environment (build 1.8.0_141-b15) Java HotSpot(TM) 64-Bit Server VM (build 25.141-b15, mixed mode)

And on Java 9:

$ java -version java version "9" Java(TM) SE Runtime Environment (build 9+181) Java HotSpot(TM) 64-Bit Server VM (build 9+181, mixed mode)

As Tagir Valeev was kind enough to remind me that this issue was supposed to be fixed in Java 9:

They fixed this in Java 9. Just update, don't use outdated versions of Java! — Tagir Valeev (@tagir_valeev) October 10, 2017

The results are:

Java 8

testStringReplaceLongStringNoMatch thrpt 21 4809343.940 ▒ 66443.628 ops/s testStringUtilsReplaceLongStringNoMatch thrpt 21 25063493.793 ▒ 660657.256 ops/s testStringReplaceLongStringOneMatch thrpt 21 1406989.855 ▒ 43051.008 ops/s testStringUtilsReplaceLongStringOneMatch thrpt 21 6961669.111 ▒ 141504.827 ops/s testStringReplaceLongStringSeveralMatches thrpt 21 1103323.491 ▒ 17047.449 ops/s testStringUtilsReplaceLongStringSeveralMatches thrpt 21 3899108.777 ▒ 41854.636 ops/s testStringReplaceShortStringNoMatch thrpt 21 5936992.874 ▒ 68115.030 ops/s testStringUtilsReplaceShortStringNoMatch thrpt 21 171660973.829 ▒ 377711.864 ops/s testStringReplaceShortStringOneMatch thrpt 21 3267435.957 ▒ 240198.763 ops/s testStringUtilsReplaceShortStringOneMatch thrpt 21 9943846.428 ▒ 270821.641 ops/s testStringReplaceShortStringSeveralMatches thrpt 21 2313713.015 ▒ 28806.738 ops/s testStringUtilsReplaceShortStringSeveralMatches thrpt 21 5447065.933 ▒ 139525.472 ops/s

As can be seen, the difference is “catastrophic”. Apache Commons Lang’s StringUtils drastically outpeforms the JDK’s String.replace() in every discipline, especially when no match is found in a short string! That’s because the library optimises for this particular case:

... int end = searchText.indexOf(searchString, start); if (end == INDEX_NOT_FOUND) { return text; }

Java 9

Things look a bit differently for Java 9:

testStringReplaceLongStringNoMatch thrpt 21 55528132.674 ▒ 479721.812 ops/s testStringUtilsReplaceLongStringNoMatch thrpt 21 55767541.806 ▒ 754862.755 ops/s testStringReplaceLongStringOneMatch thrpt 21 4806322.839 ▒ 217538.714 ops/s testStringUtilsReplaceLongStringOneMatch thrpt 21 8366539.616 ▒ 142757.888 ops/s testStringReplaceLongStringSeveralMatches thrpt 21 2685134.029 ▒ 78108.171 ops/s testStringUtilsReplaceLongStringSeveralMatches thrpt 21 3923819.576 ▒ 351103.020 ops/s testStringReplaceShortStringNoMatch thrpt 21 122398496.629 ▒ 1350086.256 ops/s testStringUtilsReplaceShortStringNoMatch thrpt 21 121139633.453 ▒ 2756892.669 ops/s testStringReplaceShortStringOneMatch thrpt 21 18070522.151 ▒ 498663.835 ops/s testStringUtilsReplaceShortStringOneMatch thrpt 21 11367395.622 ▒ 153377.552 ops/s testStringReplaceShortStringSeveralMatches thrpt 21 7548407.681 ▒ 168950.209 ops/s testStringUtilsReplaceShortStringSeveralMatches thrpt 21 5045065.948 ▒ 175251.545 ops/s

Java 9’s implementation is now similar to that of Apache Commons, with the same optimisation for non-matches:

public String replace(CharSequence target, CharSequence replacement) { String tgtStr = target.toString(); String replStr = replacement.toString(); int j = indexOf(tgtStr); if (j < 0) { return this; } ...

It is still quite slower for matches in long strings, but faster for matches in short strings. The tradeoff for jOOQ will be to still prefer Apache Commons because:

Most people are still on Java 8 or less, currently

Most replacements won’t match and both implementations fare equally well for that in Java 9, but Apache Commons is much faster for this category in Java 8

If there’s a match and thus a replacement, the speed depends on the string length, where the faster implementation is currently undecided

Conclusion

This micro optimisation stuff matters in jOOQ because jOOQ is a library that does a lot of SQL string manipulation. Every allocation and every CPU cycle that is wasted when manipulating SQL strings slows down the library, and thus impacts all of its users. In a situation like this, it is definitely worth considering not using these useful JDK String methods, and opting for the much faster Apache Commons implementations instead.

Things have improved a lot in Java 9, in case of which this can mostly be ignored. But if you still need to support Java 8 (we still support Java 6 in our commercial distributions!), then this has to be considered.