Skip to content

Split opts#7629

Merged
enebo merged 16 commits intojruby:masterfrom
enebo:split_opts
Mar 8, 2023
Merged

Split opts#7629
enebo merged 16 commits intojruby:masterfrom
enebo:split_opts

Conversation

@enebo
Copy link
Member

@enebo enebo commented Feb 6, 2023

No description provided.

@enebo enebo added this to the JRuby 9.4.2.0 milestone Feb 6, 2023
enebo added 6 commits February 7, 2023 08:14
read all and read line were calculating CR but not setting
it on the resulting string.  This prevents resulting string
from having single byte optimizable code paths work.  I only
did one benchmark using aref and it appears to be a little
faster consistently but not a lot.  I imagine using really
long lines and a complicated backtracking regexp will make
this pop.
@enebo enebo requested review from headius and lopex March 1, 2023 15:03
@enebo
Copy link
Member Author

enebo commented Mar 1, 2023

@headius @lopex I am mostly looking for holes in RubyRegexp.isSimpleString. The rest I think is just having more hotpaths and reorganizing to make selecting those hot paths easier to select.

The actual simple string logic should be very conservative in that lots of valid strings containing those chars are being rejected. The two fundamental questions are:

  1. In single char strings is there something more than ., ^, or $ that should be rejected?
  2. in isExact am I missing a special character?

Copy link
Member

@headius headius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look fine, assuming discussions today about detecting simple regex have been considered or will be considered.

@enebo enebo merged commit af84a70 into jruby:master Mar 8, 2023
@eregon
Copy link
Member

eregon commented Mar 8, 2023

Is there code out there using String#split with a single-char Regexp? Why don't they pass the String instead?
I could see it for / / since " " is special (awk-style split), but for other values it doesn't seem of any utility vs just using the String. Or maybe I'm missing something?

@enebo
Copy link
Member Author

enebo commented Mar 8, 2023

@eregon You may be missing that when people do things like this I am not there to tell them they could do it better 😃

In the case I noticed this was someone submitted an issue where the regexp was not the specific issue but it still had:

      arr << s.split(/,/).join('+')

I have seen people do stuff like /word/ in other cases as well. It is uncommon and ill-advised but why not?

@enebo enebo deleted the split_opts branch March 8, 2023 20:48
@enebo
Copy link
Member Author

enebo commented Mar 8, 2023

@eregon Actually you just put me down a philosophical hole with this question. Why should a valid regular expression which is just a simple string be expected to be slower than just using the string? I admit I had the same reaction as you originally but part of me figured since it is correctable why not do it (note: I also did a hot path on ASCII string patterns at the same time so those also sped up)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants