Skip to content

use same parameter for DNS.allocate_request_id/free_request_id to fix leak#5722

Merged
headius merged 1 commit intojruby:masterfrom
colinsurprenant:resolv_leak
Oct 28, 2019
Merged

use same parameter for DNS.allocate_request_id/free_request_id to fix leak#5722
headius merged 1 commit intojruby:masterfrom
colinsurprenant:resolv_leak

Conversation

@colinsurprenant
Copy link
Contributor

As described in logstash-plugins/logstash-filter-dns#52

resolv.rb was updated from upstream Ruby stdlib starting at JRuby 9.2.0.0 which essentially re-introduced issue logstash-plugins/logstash-filter-dns#40 where all DNS request will eventually systematically time out.

This problem will only be triggered if there are multiple nameserver hosts specified OR if there is no nameserver host but there are more than one host in /etc/resolv.conf resulting in the usage of the UnconnectedUDP class in resolv.rb.

Essentially the problem is that the DNS.allocate_request_id(host, port) call will cache the allocated id using the host and port parameters.

But the cache cleanup will be done in the close method by the DNS.free_request_id(service[0], service[1], id) call where the service[0] will be IPAddr.new(host) and not the original host string used in the sender method leading to not finding the original cache key and not freeing it from the cache which is 64k elements wide and once full, the allocate_request_id method will loop forever.

This fix just makes sure that the exact same parameters are used in both DNS.allocate_request_id and DNS.free_request_id using the IPAddr#to_s which will always yield the same host string.

I am planning on submitting this issue upstream in Ruby but I figured I'd also submit here to provide a potential quicker turnaround for this nasty bug.

@kares kares added this to the JRuby 9.2.8.0 milestone Apr 30, 2019
@colinsurprenant
Copy link
Contributor Author

colinsurprenant commented Apr 30, 2019

I just checked in upstream Ruby and it actually does not have this problem.
This bug in the JRuby version of resolv.rb was actually introduced by #4496

@headius
Copy link
Member

headius commented Apr 30, 2019

@colinsurprenant Ahh thanks for the footwork on this. We need to sort out what's the correct logic and do our best to sync it up between JRuby and CRuby.

@headius
Copy link
Member

headius commented Apr 30, 2019

@colinsurprenant Oh, so...if we pull CRuby's actual resolv.rb unmodified are we reintroducing that other issue?

@colinsurprenant
Copy link
Contributor Author

@headius yeah if you pull the latest CRuby resolv.rb it will revert #4496. I haven't really paid attention to what exactly #4496 does/fixes, I can certainly verify that and see if that fix is still needed and/or relevant in CRuby and to maybe submit it upstream?

@headius
Copy link
Member

headius commented Apr 30, 2019

@colinsurprenant Ah yes, @enebo refreshed my memory. So I believe #4496 was because our IPAddr and related structures were not quite matching CRuby...just another part of the complicated socket subsystem that we're not emulating exactly right. But things have changed in the meantime and a whole bunch of socket compat fixes got merged in within the last two years...we might be ok?

@colinsurprenant
Copy link
Contributor Author

@headius sounds good, I can certainly take a look!

@headius
Copy link
Member

headius commented Aug 2, 2019

@colinsurprenant Any status update on this? I don't have a problem merging this to fix JRuby but it would obviously be nicer if we could move toward MRI's version rather than further away.

@headius headius modified the milestones: JRuby 9.2.8.0, JRuby 9.2.9.0 Aug 12, 2019
@colinsurprenant
Copy link
Contributor Author

Dropped the ball on this :(
Let's try for 9.2.9.0.

@headius
Copy link
Member

headius commented Oct 27, 2019

@colinsurprenant Ping again. 😁

I'm still unclear on whether we should merge this or update to a different version of MRI's resolv.rb (or whether we might already have the best version due to the stdlib 2.7 update going into 9.2.9.)

@headius
Copy link
Member

headius commented Oct 28, 2019

Updating some details on my own...

The patch that @colinsurprenant proposes here does make our resolv.rb diverge further from the CRuby version. However the bug is caused by a separate divergence, and this just fixes that patch to work better.

If I'm understanding correctly, the change introduced in #4496 causes this to break because the service variable accessed in the free_request_id call is the version from IPAddr.new, which does not properly match the direct host version used by the call to allocate_request_id.

So I think the patch is is fine; the fix in #4496 was incomplete in that it only used the processed host along one path.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants