Skip to content

Optimize SipHash using sun.misc.Unsafe#1681

Merged
headius merged 3 commits intojruby:jruby-1_7from
grddev:unsafe-siphash-opt
Nov 2, 2014
Merged

Optimize SipHash using sun.misc.Unsafe#1681
headius merged 3 commits intojruby:jruby-1_7from
grddev:unsafe-siphash-opt

Conversation

@grddev
Copy link
Contributor

@grddev grddev commented May 5, 2014

As SipHash is designed to consume the inputs 8 bytes at a time, a key part of the algorithm is reading the eight bytes as once. Unfortunately, there is no way to directly read eight bytes at once from a byte array without resorting to Unsafe. This implements the unsafe operation with a fallback on the original slow implementation for cases where Unsafe cannot be loaded (or where the native byte ordering is not little endian, as needed by SipHash).

The following performance comparison that tries strings of increasing lengths shows that the unsafe implementation is roughly 25% faster for really long strings, and about the same speed for short strings.

require 'benchmark'

N = 30_000_000

puts RUBY_DESCRIPTION
Benchmark.bmbm do |x|
  [0,1,8,16,32,64,1024,65536].each do |len|
    string = 'x'*len
    n = N/(0.25*len+10) + N/20_000
    x.report(len.to_s) { n.to_i.times { string.hash } }
  end
end

jruby 1.7.12 (1.9.3p392) 2014-04-15 643e292 on Java HotSpot(TM) 64-Bit Server VM 1.8.0_05-b13 [darwin-x86_64]

            user     system      total        real
0       0.650000   0.000000   0.650000 (  0.653000)
1       0.630000   0.000000   0.630000 (  0.626000)
8       0.600000   0.010000   0.610000 (  0.599000)
16      0.590000   0.000000   0.590000 (  0.584000)
32      0.570000   0.000000   0.570000 (  0.573000)
64      0.550000   0.000000   0.550000 (  0.541000)
1024    0.470000   0.000000   0.470000 (  0.471000)
65536   0.780000   0.000000   0.780000 (  0.783000)

jruby 1.7.13-SNAPSHOT (1.9.3p392) 2014-05-01 c02fcd7 on Java HotSpot(TM) 64-Bit Server VM 1.8.0_05-b13 [darwin-x86_64]

            user     system      total        real
0       0.650000   0.000000   0.650000 (  0.643000)
1       0.660000   0.000000   0.660000 (  0.654000)
8       0.610000   0.010000   0.620000 (  0.608000)
16      0.570000   0.000000   0.570000 (  0.559000)
32      0.530000   0.000000   0.530000 (  0.527000)
64      0.440000   0.000000   0.440000 (  0.442000)
1024    0.350000   0.000000   0.350000 (  0.342000)
65536   0.580000   0.000000   0.580000 (  0.583000)

Running the same benchmark against PerlHash, we can see that the unsafe implementation of SipHash is about the same speed for strings around length 64, and for longer strings the unsafe implementation of SipHash is significantly faster than PerlHash:

            user     system      total        real
0       0.460000   0.000000   0.460000 (  0.452000)
1       0.420000   0.000000   0.420000 (  0.416000)
8       0.440000   0.000000   0.440000 (  0.444000)
16      0.430000   0.010000   0.440000 (  0.422000)
32      0.410000   0.000000   0.410000 (  0.414000)
64      0.470000   0.000000   0.470000 (  0.462000)
1024    0.420000   0.000000   0.420000 (  0.425000)
65536   0.750000   0.000000   0.750000 (  0.750000)

Even though this implementation of SipHash is faster than the previous, I don't think it is fast enough to replace PerlHash entirely, as the majority of Hash keys are most probably predominantly short strings. Keep in mind though, that the 20% difference in PerlHash's advantage represents less than 100ns per invokation, whereas the 20% difference in SipHash's advantage represents a much longer time.

grddev added 3 commits May 5, 2014 09:00
As these loops will be executed only once for every #hash invokation,
it would make sense to defer the decision to unroll the loops to the
runtime.
In principle, this should allow the JIT compiler to remove all range
checks within the loop. I haven't had time to verify this though.
While one could wish that JIT compilation optimised the eight sequential
byte reads into a single long read, it in fact does not.

This implementation should fallback to the slow implementation in a
context where Unsafe fails to load, but I haven't figured out how to
test that properly.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can just assign byte value to variable of type long:

        byte byteValue = 127;
        long longValue = byteValue;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dkarpenko, while I did not write this code (you can see the same code in the original), I think the reason for the long type conversion is that the shifts should be applied to longs, as they would otherwise produce ints if I'm not mistaken.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@grddev I think you're right about that. Bit twiddling can be a little tweaky with Java's automatic type promotions.

@headius
Copy link
Member

headius commented Jun 8, 2014

Hey great find. If I remember right, I did a quick experiment with Unsafe when we put siphash into the codebase but never actually landed this. your patches look good; we'll review and get it installed. I'm not sure we'll want to make SipHash the default, since as you say PerlHash is still faster for small strings (and I wouldn't expect people to be hashing really big strings) but it will be excellent to get SipHash closer to raw native performance. Thank you!

@nahi
Copy link
Member

nahi commented Jun 8, 2014

Here's my try in 2012. I added some uncommitted sources (Murmur, Perl, etc) so that Benchmark.java should run.
https://github.com/nahi/siphash-java-inline/tree/master/perf

I don't remember the details but Unsafe version is slower. Mine calls Long.reverseBytes() so optimization could have not enough.
https://github.com/nahi/siphash-java-inline/blob/master/perf/SipHashInlineTry.java#L18-22

@grddev
Copy link
Contributor Author

grddev commented Jun 9, 2014

@nahi, I tried the benchmark with an added Long.reverseBytes() in the reader, and here are the results:

jruby 1.7.13-SNAPSHOT (1.9.3p392) 2014-05-01 c02fcd7 on Java HotSpot(TM) 64-Bit Server VM 1.8.0_05-b13 [darwin-x86_64]

            user     system      total        real
0       0.680000   0.000000   0.680000 (  0.673000)
1       0.690000   0.000000   0.690000 (  0.687000)
8       0.660000   0.000000   0.660000 (  0.651000)
16      0.620000   0.010000   0.630000 (  0.625000)
32      0.590000   0.000000   0.590000 (  0.582000)
1024    0.470000   0.000000   0.470000 (  0.478000)
65536   0.810000   0.000000   0.810000 (  0.804000)

This is indeed not faster than the 'safe' version, thus the unsafe version is only relevant with matching endianness.

@headius
Copy link
Member

headius commented Nov 2, 2014

Going for it. Thanks!

headius added a commit that referenced this pull request Nov 2, 2014
Optimize SipHash using sun.misc.Unsafe
@headius headius merged commit cb4581c into jruby:jruby-1_7 Nov 2, 2014
@headius headius added this to the JRuby 1.7.17 milestone Nov 2, 2014
@headius headius self-assigned this Nov 2, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants