Skip to content

Weird error using IO.copy_stream, IO duck types and enumerators #4903

@HoneyryderChuck

Description

@HoneyryderChuck

Environment

  • JRuby 9.1.15.0
  • Darwin Macintosh.local 16.7.0 Darwin Kernel Version 16.7.0: Mon Nov 13 21:56:25 PST 2017; root:xnu-3789.72.11~1/RELEASE_X86_64 x86_64
  • To test with latest http-form_data, and with a jpg image (preferably with 46K)

Expected Behavior

I have a very similar code to the one from this sample:

require "http/form_data"

file = HTTP::FormData::File.new(File.join(__dir__, "..", "test", "support", "fixtures", "image.jpg")) 
$buffer = HTTP::FormData.create(image: file)


class ProcIO
  def initialize(&blk)
   @blk = blk
  end

  def write(data)
    @blk.call(data)
    data.bytesize
  end
end


def lazy_stream(&blk)
  return enum_for(__method__) unless block_given?
  IO.copy_stream($buffer, ProcIO.new(&blk))
end

def drain_stream
  @drain_stream ||= lazy_stream
  chunk = @drain_stream.next
  chunk = chunk.dup
  puts "1. drain size: #{chunk.bytesize}"
  # puts "2: drained: #{chunk[0..400].inspect}"
  chunk
rescue StopIteration
  nil 
end

while chunk = drain_stream
  puts "1. yielded size: #{chunk.bytesize}"
  # puts "2. yield #{chunk[0..400].inspect}"
  puts
end

(The puts calls are to debug and show the error)

The purpose of this code is to enumerate the IO.copy_stream call, so that its chunks can be managed inside the while block. This code works in MRI (tested with 2.4).

The main difference in implementation is that in MRI, IO.copy_stream yields chunks of 16384 bytes, while JRuby yields 8192 bytes. I've followed this into this ticket, which leads me to believe that I can't reproduce this bug in older versions of jruby (as they were buffering the source in memory).

If you limit the debug statements to 1. , you'll see these outputs.

# MRI
1. drain size: 16384
1. yielded size: 16384

1. drain size: 16384
1. yielded size: 16384

1. drain size: 13720
1. yielded size: 13720

# JRuby
1. drain size: 8192
1. yielded size: 8192

1. drain size: 8192
1. yielded size: 8192

1. drain size: 8192
1. yielded size: 8192

1. drain size: 8192
1. yielded size: 8192

1. drain size: 8192
1. yielded size: 8192

1. drain size: 5528
1. yielded size: 5528

In the end, the total bytes yielded in both solutions is similar. The gist of it is, the drained chunk must be equal to the yielded chunk.

However, if you limit the debug statements to 2. , you'll see that this is not the case in JRuby:

# MRI
2: drained: "-----------..."
2. yield "-----------..."

2: drained: "\xAE}\xCC#\xFF\x00\x88\xAF\x86\xCD|..."
2. yield "\xAE}\xCC#\xFF\x00\x88\xAF\x86\xCD|..."

2: drained: "\x15\xC7\xAC\x19$\x89\x04\x12N\xE0\b..."
2. yield "\x15\xC7\xAC\x19$\x89\x04\x12N\xE0\b..."

# JRuby
2: drained: "\x86\xC1\xA8d\xB4,\xC16\xF6\x8B\x05..."
2. yield "\x86\xC1\xA8d\xB4,\xC16\xF6\x8B\x05..."

2: drained: "\x86\xC1\xA8d\xB4,\xC16\xF6\x8B\x05..."
2. yield "\x86\xC1\xA8d\xB4,\xC16\xF6\x8B\x05..."

2: drained: "\xAE}\xCC#\xFF\x00\x88\xAF\x86\xCD|..."
2. yield "v\x1A\xCF\xB6\tF\x9D\x82\xC8\xFD|..."

2: drained: "v\x1A\xCF\xB6\tF\x9D\x82\xC8\xFD|..."
2. yield "v\x1A\xCF\xB6$\x89\x04\x12N\xE0\b..."

2: drained: "\x15\xC7\xAC\x19$\x89\x04\x12N\xE0\b..."
2. yield "\x15\xC7\xAC\x19$\x89\x04\x12N\xE0\b..."

2: drained: "\xCA\xF6;\x18\x00\xA9 \xF5\x13\x89G..."
2. yield "\xCA\xF6;\x18\x00\xA9 \xF5\x13\x89G..."

(check the 3rd yield)

Actual Behavior

As stated, I expect the pairs to be the same all the time.

I couldn't single out exactly what is the problem (The File buffer, the IO.copy_stream call, the enumeration...), and had to completely reproduce my usage to create this short script. But it's definitely a bug.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions