Skip to content

In some cases sorting using String.casecmp results in Comparison method violates its general contract! #7946

@mrckzgl

Description

@mrckzgl

This is a very strange issue, only happening in jruby (for regular ruby it does not).
Consider this minimal example:

We got some json files with text values we want to sort:

require "json/ext"

values = files.map{|f|
  plain_text=File.read(f);
  json = JSON.parse(plain_text, :symbolize_names => true)
  json[:text][0..0]
}

values_copied = values.join("\n").split("\n")

In the real world we sort the whole text, but for this example single characters are enough.
The list contains 262 character values, a lot of them are duplicates (not sure if this matters). Characters are latin, arabic and '['.

The problem is, that this list is not sortable using casecmp:

values.sort!{|a,b|
  a.casecmp(b)
}

will result in

Unhandled Java exception: java.lang.IllegalArgumentException: Comparison method violates its general contract!
java.lang.IllegalArgumentException: Comparison method violates its general contract!
              mergeHi at java/util/TimSort.java:903
              mergeAt at java/util/TimSort.java:520
        mergeCollapse at java/util/TimSort.java:448
                 sort at java/util/TimSort.java:245
                 sort at java/util/Arrays.java:1306
         sortInternal at org/jruby/RubyArray.java:4138
            sort_bang at org/jruby/RubyArray.java:3988
                 call at org/jruby/RubyArray$INVOKER$i$0$0$sort_bang.gen:-1
[...]

Now for the strange parts.

  1. If we sort values_copied instead of values directly, sorting works without a problem.
  2. If we extract the first character via json[:text][0] instead of json[:text][0..0] it also works.
  3. Also saving the terminal output of puts(values) into a file (like the attached), load that list via split, sort also works.
  4. Case sensitive comparison using <=> also works without a problem.

So, I thought it might have something to do with the internal byte representation of the values.
However, if I compare the byte representation of the elements of values with values_copied via unpack("C*") they are identical.

Does anybody have an idea, what could cause the exception?

Environment Information

jruby 9.4.1.0 (3.1.0) 2023-02-07 237d5fa5f4 Java HotSpot(TM) 64-Bit Server VM 15.0.2+7-27 on 15.0.2+7-27 +jit [x86_64-linux]
Ubuntu 22.04 / 6.2.0-33-generic #33~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Sep 7 10:33:52 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions