問題描述
為什么生成執行操作的 Java 代碼比“解釋器循環”運行得更慢? (Why does Java code generated to perform an operation run more slowly than an "interpreter loop"?)
I have some Java code which performs bitwise operations on a BitSet. I have a list of operations and can "interpret" them by looping over them, but it's important to me that I can perform these operations as quickly as possible, so I've been trying to dynamically generate code to apply them. I generate Java source to perform the operations and compile a class implementing those operations using Javassist.
Unfortunately, my dynamically-generated code runs slower than the interpreted code. It appears that this is because HotSpot is optimizing the interpreted code but isn't optimizing the compiled code: After I run it a few thousand times, my interpreted code runs twice as fast as it did initially, but my compiled code shows no speedup. Consistent with this hypothesis, my interpreted code is initially slower than the compiled code, but is eventually faster.
I'm not sure why this is happening. My guess is that maybe Javassist uses a class loader whose classes HotSpot doesn't touch. But I'm not expert on class loading in Java, so I'm not sure if this is a reasonable guess or how to go about testing it. Here's how I'm creating and loading the class with Javassist:
ClassPool pool = ClassPool.getDefault();
CtClass tClass = pool.makeClass("foo");
// foo implements MyInterface, with one method
tClass.addInterface(pool.get(MyInterface.class.getName()));
// Get the source for the method and add it
CtMethod tMethod = CtNewMethod.make(getSource(), tClass);
tClass.addMethod(tMethod);
// finally, compile and load the class
return (MyInterface)tClass.toClass().newInstance();
Does anyone have an idea as to what's going on here? I'd really appreciate whatever help you can give.
I'm using the Sun 1.6 server JVM on Windows XP 32-bit.
參考解法
方法 1:
HotSpot doesn't care where the code comes from. For instance, it'll happily inline code called through a virtual method call with an implementation loaded by a different class loader.
I suggest you write out in source code the operations you are trying to perform for this benchmark, and then benchmark that. It's usually easier to write out an example of generated code rather than writing the generator anyway.
There a number of reasons why HotSpot might not optimise code as hard as it might. For instance very long methods will tend not to be inlined or have method inlined into them.
方法 2:
I think I understand what was going on here. My first mistake was generating methods which were too long. After I fixed that, I noticed that although my generated code was still slower, it eventually approached the speed of the interpreted code.
I think that the biggest speedup here comes from HotSpot optimizing my code. In the interpreted version, there's very little code to optimize, so HotSpot quickly takes care of it. In the generated version, there's a lot of code to optimize, so HotSpot takes longer to work its magic over all the code.
If I run my benchmarks for long enough, I now see my generated code performing just slightly better than the interpreted code.
方法 3:
There is a JVM settings that controls how fast code should be compiled -XX:CompileThreshold=10000
Number of method invocations/branches before compiling [-client: 1,500]
I do not know if this will help, because in your example, the size seem to play a vital role.
(by Justin L.、Tom Hawtin - tackline、Justin L.、ReneS)