I will show you another good game that I know

gc: GC profiling via standard MBeans comp: JIT compiler profiling via standard MBeans cl: Classloader profiling via standard MBeans hs_rt: HotSpot (tm) runtime profiling via implementation-specific MBeans hs_cl: HotSpot (tm) classloader profiling via implementation-specific MBeans hs_comp: HotSpot (tm) JIT compiler profiling via implementation-specific MBeans hs_gc: HotSpot (tm) memory manager (GC) profiling via implementation-specific MBeans hs_thr: HotSpot (tm) threading subsystem via implementation-specific MBeans stack: Simple and naive Java stack profiler

But that is not all, oh no, that is not all

It gets rid of your gambling debts, it quits smoking

It's a friend, and it's a companion,

And it's the only product you will ever need

Follow these easy assembly instructions it never needs ironing

Well it takes weights off hips, bust, thighs, chin, midriff,

Gives you dandruff, and it finds you a job, it is a job

...

'Cause it's effective, it's defective, it creates household odors,

It disinfects, it sanitizes for your protection

It gives you an erection, it wins the election

Why put up with painful corns any longer?

It's a redeemable coupon, no obligation, no salesman will visit your home

I'm particularly happy with the runnable jar as a means to packaging the benchmarks as I can now take the same jar and try it out on different environments which is important to my work process. My only grumble is the lack of support for parametrization which leads me to use a system property to switch between the direct and heap buffer output tests. I'm assured this is also in the cards.There's even more! Whenever I run any type of experiment the first question is how to explain the results and what differences one implementation has over the other. For small bits of code the answer will usually be 'read the code you lazy bugger' but when comparing 3rd party libraries or when putting large compound bits of functionality to the test profiling is often the answer, which is why JMH comes with a set of profilers:Covering the lot exceeds the scope of this blog post, let's focus on obvious ones that might prove helpful for this experiment. Running with theandprofiler () give this output:The above supports the theory that getBytes() is slower because it generates more garbage than the alternatives, and highlights the low garbage impact of custom/charset encoder. Running with theandprofilers gives us the following output:What I can read from it is that getBytes() spends less time in encoding then the other 2 due to the overheads involved in getting to the encoding phase. Custom encoder spends the most time on on encoding, but what is significant is that as it outperforms charset encoder, and the ratios are similar we can deduce that the encoding algorithm itself is faster.The free functionality does not stop here! To quote Tom Waits 'What more?' you ask, well... there's loads more functionality around multi threading I will not attempt to try in this post and several more annotations to play with. In a further post I'd like to go back and compare this awesome new tool with the previous 2 variations of this benchmark and see if, how and why results differ...Many thanks to the great dudes who built this framework of whom I'm only familiar with Master Shipilev (who also took time to review, thanks again), they had me trial it a few months back and I've been struggling to shut up about it ever since :-)Related JMH posts:UPDATE (1/08/2013): If you are looking for more JMH related info, see Shipilev's slides on benchmarking. UPDATE (1/07/2014): The samples repository has been updated to reflect JMH progress, sample code may have minor differences from the code presented above and command line options may differ. I will follow up at some point with a re-vamp of JMH related posts but for now you can always reproduce above results by reviving the older code from history.