Confirmed: OpenJDK JIT compiler emits efficient conditional move machine codes

On popular programmers’ Q&A site stackoverflow.com the hit question of the last few years dealt with the effect of branching on program performance. I recommend that you read the brilliant explanation there, but if you don’t have much time, here’s the summary. Modern processors execute commands in a pipeline. While One command is finishing executing, the next one is already started, the next one is being prepared and so on. Pipelining can only work if processor knows which command will execute next. If the program is linear, determining next command is trivial. But if the program branches (think “if” statement), processor can only guess what comes next. If the processor guesses wrong, it would have to back out several steps of the wrong branch it took. Such backout would severely degrade performance.

Fortunately, modern processors have branch prediction mechanisms, which allow them to guess branching correctly when a particular branch exhibits consistent behavior. An example would be an if statement inside a loop. If code takes “then” branch consistently, processor will be able to make correct guess and performance will remain optimal.

So efficient code would avoid branching or at least make it branching predictable. One could assume that to write efficient code, programmer needs to be aware of branch prediction and code around it. An example in the linked Stack Overflow article shows how sorting data before processing it in a loop creates consistent branching and dramatically improves performance. Another trick would be to replace mini-branches with equivalent branchless code. For example, function min(a,b) is normally written as

if (a<b) return a; else return b; 

It could rewritten (for Java’s 32 bit signed ints, with an assumption on input range not causing overflow/underflow) as

return a-((b-a)>>31&1)*(a-b)

Should you write code in this obscure style to gain performance? I’m going to show that in Java this trick is unnecessary.

The key to the problem lies in the fact that in some cases, an if statement doesn’t produce branching! X86 processors (Intel and AMD) that power most PCs and great many servers have recognized Conditional Move instructions in Pentium Pro, released about 20 years ago in 1995. Conditional move instruction is a way to achieve branching effect without branching code. The meaning of this instruction is “move value from one place to another IF specified condition is met, do nothing otherwise”. This instruction can be used to determine the minimum of 2 values or to express a ternary operator (?:) in C or Java.

The Stack Overflow post cited benchmark results that suggest that C compilers are not consistent in emitting these efficient conditional move instructions. Java seemed to use conditional move, based on performance results. I ran a slightly different benchmark (computing minimum of 2 numbers using if-return and ternary operator), comparing run times between branch-friendly ordered input data and noisy random input data. There was no difference in execution times, suggesting that branching was not a factor.

I decided to check what instructions OpenJDK JIT compiler generates. For my tests I used Java 7 on my home Ubuntu Linux running on 64 bit AMD processor.


ivan2@bigbox:~$ java -version

java version "1.7.0_79"

OpenJDK Runtime Environment (IcedTea 2.5.5) (7u79-2.5.5-0ubuntu0.14.04.2)

OpenJDK 64-Bit Server VM (build 24.79-b02, mixed mode)

I wrote a primitive test case with simple min function:


public int ifmin(int a, int b) {

if (a<b) return a; else return b;

}

My goal was to confirm that this simple code will be compiled to native instructions as a conditional move (CMOVxx), not a conditiona jump (Jxx). My primitive function compiled to this Java bytecode:


0: iload_0

1: iload_1

2: if_icmpge 7

5: iload_0

6: ireturn

7: iload_1

8: ireturn

I caused JVM to compile the bytecode to native instructions and traced execution with gdb debugger. By the way, locating compiled code took quite a bit of effort. I probably missed an obvious approach, but in any case, I explain how I pinpointed JIT-generated machine code in a separate post. I verified that compilation happened in a log file:


Compilation events (4 events):

Event: 0.077 Thread 0x00007f03bc0a9000 1 ivan.Code::ifmin (9 bytes)

I was then able to trace the execution of the function and see the actual machine instructions. The relevant section of gdb session looked like this:


=> 0x7fffed060190: sub $0x18,%rsp

=> 0x7fffed060197: mov %rbp,0x10(%rsp)

The instructions above populated registers in preparation for comparison.


=> 0x7fffed06019c: cmp %ecx,%edx

The above instruction compared a with b


=> 0x7fffed06019e: mov %ecx,%eax

And the above moved b to result


=> 0x7fffed0601a0: <b>cmovl</b> %edx,%eax

Next: conditional move if a<b. The logic of this steps is like this: prior to this step, result=b (we just moved b to result). Now, if a<b,result will be overwritten with a. So if a<b, then result=a. Other wise, result remains unchanged, meaning result=b. Just what we wanted!

Next, finalize and return – nothing terribly interesting


=> 0x7fffed0601a3: add $0x10,%rsp

=> 0x7fffed0601a7: pop %rbp

=> 0x7fffed0601a8: test %eax,0xaf96e52(%rip) # 0x7ffff7ff7000

=> 0x7fffed0601ae: retq

Similarly, I confirmed that ternary operator is compiled into conditional move instruction. This Java code:


int a = aarr[1];

int b = barr[2];

res[3] = a<b?a:b;

produced similar machine code sequence, including conditional move:


=> 0x7fffed05fc9c: cmp %ebp,%r11d

The above compares a with b


=> 0x7fffed05fc9f: <b>cmovl</b> %r11d,%ebp

And this lines overwrites one with the other – conditionally!!

Conclusion: for simple conditionals, modern JIT compilers emit efficient branchless code. No additional programming tricks are required. Clever interview trick of computing minimum of 2 numbers through obscure bitwise code should remain confined to interview rooms – it does nothing to improve performance in real applications.

Advertisements

One Response to Confirmed: OpenJDK JIT compiler emits efficient conditional move machine codes

  1. Pingback: Locating JIT-generated machine code using gdb | Ivan Smirnov's Blog

%d bloggers like this: