Implement block_given? call as optimized instruction#8170
Merged
headius merged 6 commits intojruby:9.5-devfrom Mar 27, 2024
Merged
Implement block_given? call as optimized instruction#8170headius merged 6 commits intojruby:9.5-devfrom
headius merged 6 commits intojruby:9.5-devfrom
Conversation
In order to optimize calls to block_given? that end up in the expected core method, this patch alters compilation of fcall block_given? to overload the BlockGivenInstr used by the frameless defined?(yield) logic. When this instr is used for defined?, nothing changes. When used for a bare fcall to block_given?, the logic will first check if the target method is built-in, using the fast defined?(yield) logic in that case or falling back on a normal invocation otherwise. Moving this into an instruction avoids having to add special logic in CallInstr/CallBase and friends to capture block_given? calls and give them special behavior. Specifically, the frame flags specified in the block_given? definition do not have to be ignored; rather, they only apply in cases where we are calling the method in other ways, such as on a target object (e.g. Kernel.block_given?) or via metaprogramming calls like send. This simplifies the optimization, since BlockGivenInstr itself does not need a caller frame in order to handle both built-in and custom block_given?, and any non-direct calls to block_given? continue to deoptimize in the same way as before. Performance of a method containing block_given? is now equal to a method using the less-common defined?(yield) without introducing any incompatibility.
The latter form avoids any caller frame requirement since it just takes the block in hand and checks if it is given. The former form has typically required a caller frame, since it performs a normal fcall to block_given? which then needs to be able to see the caller's received block.
The optimized block_given? still needs to be treated as a call, which would pollute the flags for defined?(yield) unnecessarily. This patch moves it to its own instr.
This was referenced Mar 27, 2024
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR reworks fcalls to
block_given?as a custom instruction based onBlockGivenInstrused by the lighter-weightdefined?(yield)form.Instead of forcing a caller frame (so that the caller's received block can be accessed downstack), the new logic checks if the target method is the core version and uses
defined?(yield)logic in that case. When the method is not from JRuby core, then we dispatch normally... but without the caller frame requirement, since only the coreblock_given?is allowed to have that access. All other call paths leading toblock_given?continue to use the deoptimized logic, so we are not introducing any new incompatibility.This makes the performance of
block_given?-calling methods equivalent to those that usedefined?(yield)and avoids the framing cost in all situations that fcallblock_given?.Performance is shown through the added benchmark: