Skip to content

add zlib each_char#8608

Merged
headius merged 1 commit intojruby:masterfrom
danini-the-panini:zlib-fixes
Feb 8, 2025
Merged

add zlib each_char#8608
headius merged 1 commit intojruby:masterfrom
danini-the-panini:zlib-fixes

Conversation

@danini-the-panini
Copy link
Contributor

Resolves #8351

Copy link
Member

@headius headius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, new spec. 👍

@headius headius merged commit b81b8ea into jruby:master Feb 8, 2025
95 checks passed
Copy link
Member

@enebo enebo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shoot I forgot to submit this and as I said in comments this is landable as-is. Just adding these comments for posterity.

position++;
// TODO: must handle encoding. Move encoding handling methods to util class from RubyIO and use it.
// TODO: StringIO needs a love, too.
block.yield(context, runtime.newString(String.valueOf((char) (value & 0xFF))));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I totally understand how this code got here looking at getc but this has to be wrong as it uses a byte-oriented API and returns it as a char. Perhaps no one does multiple byte chars using this API as I would think #getc would have had a report by now since this has that same logic in it.

Another amusing thing to notice is MRI unit tests have no tests for this method. No wonder we are missing it. You will be the first person to actually add a test for this!

I am ok landing this if you open an issue that both each_char and getc is not recreating chars but is using bytes. It appears initialize will process opts and set encoding so I think we just need to use that and do something like RubyString#enumerateChars where we use StringSupport to look for length and that many bytes as a new char.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should add since there are no tests perhaps each_char really does only return bytes. I would be a pretty big misnaming if so.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In MRI it does return multi-byte chars, and should work the same as calling each_char on the inflated string

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danini-the-panini so then perhaps you want to take a look at fixing that for both each_char and getc in zlib? It will largely just be looking at our String-related code and doing the same thing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(as a new issue)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue created: #8621

@danini-the-panini danini-the-panini deleted the zlib-fixes branch February 11, 2025 09:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Zlib::GzipReader missing each_char iterator

3 participants