Skip to content

Never-ending getaddress() call when using compressed IPv6 nameservers in /etc/resolv.conf #3663

@nbarrientos

Description

@nbarrientos

Hi,

I managed to reproduce the following bug in a CentOS 7.2 box with only IPv6 nameservers and dual stack, jRuby 1.7.20.1 and java-1.7.0-openjdk-1.7.0.95-2.6.4.0.el7_2. The same code works fine on MRI 2.0.0. Perhaps this has already been fixed in newer versions, as to me the bug looks too obvious to be still alive. Anyway, here we go:

# cat /etc/resolv.conf
# generated by /usr/sbin/dhclient-script
search cern.ch.
nameserver 2001:1458:201:1000::5
nameserver 2001:1458:201:1100::5

The following program and the resolv.conf above makes the interpreter hang for a long time and return a failure:

require 'resolv'
puts Resolv.getaddress 'web.cern.ch'
# time java -cp /usr/share/puppetserver/puppet-server-release.jar clojure.main -m puppetlabs.puppetserver.cli.ruby --config /etc/puppetserver/conf.d -- -e "require 'resolv'; puts Resolv.getaddress 'web.cern.ch'"
Resolv::ResolvError: no address for web.cern.ch
  getaddress at /usr/share/puppetserver/puppet-server-release.jar!/META-INF/jruby.home/lib/ruby/1.9/resolv.rb:98
  getaddress at /usr/share/puppetserver/puppet-server-release.jar!/META-INF/jruby.home/lib/ruby/1.9/resolv.rb:48
      (root) at -e:1
      invoke at jruby_puppet_core.clj:232
      invoke at jruby_puppet_core.clj:226
      invoke at subcommand.clj:38
    doInvoke at ruby.clj:7
      invoke at core.clj:624
      invoke at main.clj:315
    doInvoke at main.clj:420

real    2m51.910s
user    0m21.020s
sys 0m0.654s

These are the last lines in a debugging session before the first exception is raised. The program blocks in the select() call until it times out and ResolvTimeout is raised.

#0:/usr/share/puppetserver/puppet-server-release.jar!/META-INF/jruby.home/lib/ruby/1.9/resolv.rb:676:Resolv::DNS::Requester:-:           if s = @senders[[from,msg.id]]
#0:/usr/share/puppetserver/puppet-server-release.jar!/META-INF/jruby.home/lib/ruby/1.9/resolv.rb:654:Resolv::DNS::Requester:-:           now = Time.now
#0:/usr/share/puppetserver/puppet-server-release.jar!/META-INF/jruby.home/lib/ruby/1.9/resolv.rb:655:Resolv::DNS::Requester:-:           timeout = timelimit - now
#0:/usr/share/puppetserver/puppet-server-release.jar!/META-INF/jruby.home/lib/ruby/1.9/resolv.rb:656:Resolv::DNS::Requester:-:           if timeout <= 0
#0:/usr/share/puppetserver/puppet-server-release.jar!/META-INF/jruby.home/lib/ruby/1.9/resolv.rb:659:Resolv::DNS::Requester:-:           select_result = IO.select(@socks, nil, nil, timeout)
#0:/usr/share/puppetserver/puppet-server-release.jar!/META-INF/jruby.home/lib/ruby/1.9/resolv.rb:660:Resolv::DNS::Requester:-:           if !select_result
#0:/usr/share/puppetserver/puppet-server-release.jar!/META-INF/jruby.home/lib/ruby/1.9/resolv.rb:661:Resolv::DNS::Requester:-:             raise ResolvTimeout
#0:/usr/share/puppetserver/puppet-server-release.jar!/META-INF/jruby.home/lib/ruby/1.9/resolv.rb:661:Resolv::DNS::Requester:^:             raise ResolvTimeout
/usr/share/puppetserver/puppet-server-release.jar!/META-INF/jruby.home/lib/ruby/1.9/resolv.rb:661: `Resolv::ResolvTimeout' (Resolv::ResolvTimeout)

I think that this is happening because during the previous iteration this statement was evaluated as false because

@senders[[from,msg.id]]

was nil. Why was it nil? Because the search keys didn't match, as 'from' contains the uncompressed flavor of the IPv6 address of the DNS server, whereas 'senders' has the compacted form (presumably coming from /etc/resolv.conf):

(rdb:1) p from
["2001:1458:201:1000:0:0:0:5", 53]
(rdb:1) p msg.id
12941
(rdb:1) p @senders
{[["2001:1458:201:1000::5", 53], 12941]=>#<Resolv::DNS::Requester::UnconnectedUDP::Sender:0x4605f6fa @host="2001:1458:201:1000::5", @msg="2\x8D\x01\x00\x00\x01\x00\x00\x00\x00\x00\x00\x03web\x04cern\x02ch\x00\x00\x01\x00\x01", @data=#<Resolv::DNS::Name: web.cern.ch.>, @sock=#<UDPSocket:fd 28>, @port=53>}

This situation leads to the outer loop not stopping (see "unexpected DNS message ignored"), therefore the program executes IO.select again but there's nothing to read anymore so ResolvTimeout is raised. This exception is probably caught by the caller further up and at some point .request is called again, leading to another loop. This situation repeats many times after a few minutes the call returns a ResolvError all the way up back to the user.

Handcrafting resolv.conf so all addreses are expanded there makes resolv.rb happy:

# generated by /usr/sbin/dhclient-script
search cern.ch.
nameserver 2001:1458:201:1000:0:0:0:5
nameserver 2001:1458:201:1100:0:0:0:5

This way the program quickly exits without hanging:

# time java -cp /usr/share/puppetserver/puppet-server-release.jar clojure.main -m puppetlabs.puppetserver.cli.ruby --config /etc/puppetserver/conf.d -- -e "require 'resolv'; puts Resolv.getaddress 'web.cern.ch'"
188.184.9.235

real    0m10.945s
user    0m19.857s
sys 0m0.553s

Configuring a Resolv::DNS object by hand with a compressed IPv6 address seems to work.

# cat /etc/resolv.conf
# generated by /usr/sbin/dhclient-script
search cern.ch.
nameserver 2001:1458:201:1000::5
nameserver 2001:1458:201:1100::5
....
irb(main):008:0* require 'resolv'
=> true
irb(main):009:0> Resolv::DNS.new(:nameserver => '2001:1458:201:1000::5').getaddress "web.cern.ch"
=> #<Resolv::IPv4 188.184.9.235>
irb(main):010:0> Resolv::DNS.new().getaddress "web.cern.ch"
# Hangs...

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions