Conversation
Many recent JITs have started to optimize repeat final field accesses. This patch moves almost all core class and module init into the Ruby constructor, so that those fields (as well as some structural references) can be made final. This should help any core classes that look up their own type on each construction (which includes almost all of them) when those constructors are inlined together with other accesses of the same field.
This commit attempts to reduce the amount of field traversal done to convert a primitive boolean value into a RubyBoolean. Typically this conversion was done by calling runtime.newBoolean(val) or RubyBoolean.newBoolean(runtime, val) but in many cases this call required retrieval of the runtime from either the context or the current object's metaclass. In cases where the context is already available and the runtime is not needed, I have changed to using RubyBoolean.newBoolean(context, val) to avoid the runtime field traversal. In addition this commit changes any runtime.getTrue() and runtime.getFalse() calls to context.tru and context.fals when the context is already present and the runtime is not otherwise needed. Any calls to the runtime versions of these methods where the runtime was already available or needed for other purposes have been left alone to reduce the size of the diff.
|
This is largely complete. There are certainly other fields in JRuby that could be made final but it's starting to get far away from the Ruby class at this point. Based on some simple looping benchmarks, there appears to be roughly a 5-10% improvement (rather noisy on this machine) on a loop that constructs five arrays each containing just the current iteration value. On a JIT like Graal that can eliminate the loop and the arrays, the PR has a larger impact, in the neighborhood of 40%+ performance improvement. This is largely due to the final Boolean values coming from ThreadContext, but the Fixnum class access also comes into play. In any case this is difficult to show in a microbenchmark, since the fields in question get cached in L1 very quickly regardless of whether they're final or not. The difference becomes a matter of accessing the value from L1 or from a register, assuming it doesn't spill to memory. But after looking at IGV output from Graal it's clear that the final values are getting accessed from memory only once when final. |
[skip ci]
Many recent JITs have started to optimize repeat final field
accesses. This patch moves almost all core class and module init
into the Ruby constructor, so that those fields (as well as some
structural references) can be made final. This should help any
core classes that look up their own type on each construction
(which includes almost all of them) when those constructors are
inlined together with other accesses of the same field.