Minborg

Minborg
Minborg
Showing posts with label Java. Show all posts
Showing posts with label Java. Show all posts

Thursday, September 14, 2023

Java Records are "Trusted" and Consequently Faster

 

Java Records are "Trusted" and Consequently Faster

Did you know Java records are trusted by the Hotspot VM in a special way? This makes their speed superior in some aspects compared to regular Java classes. In this short article, we will take a look at constant folding of instance fields and how this can bolster the performance of your Java application.

Background

Suppose we want to model an immutable point:

public interface Point {
    int x();
    int y();
}

Before record classes were introduced in Java, data classes had to be "manually" coded using a regular Java class like this:

public final class RegularPoint implements Point {

    private final int x;
    private final int y;

    public RegularPoint(int x, int y) {
        this.x = x;
        this.y = y;
    }

    @Override
    public int x() {
        return x;
    }

    @Override
    public int y() {
        return y;
    }

    // Implementations of toString(), hashCode() and equals()
    // omitted for brevity

}

With records, it became much easier and, we also got reasonable default implementations of the methods toString()hashCode() and equals():

public record RecordPoint(int x, int y) implements Point {}

As an extra bonus, there is an emerging property for records that makes them eligible for constant folding optimizations if used in a static context. Read more about this in the following chapters.

Setup

Suppose we keep track of the unique origin point in a static variable like this:

public static final Point ORIGIN = new RecordPoint(0, 0);

Further, assume we have a method that determines if a given Point is at the origin point:

public static boolean isOrigin(Point point) {
        return point.x() == ORIGIN.x() &&
               point.y() == ORIGIN.y();
}

We could then write a small program that demonstrates the principles:

public class Demo {

    public static final Point ORIGIN = new RecordPoint(0, 0);

    public static void main(String[] args) {
        analyze(new RegularPoint(0, 0));
        analyze(new RegularPoint(1, 1));
    }

    public static void analyze(Point point) {
        System.out.format("The point %s is %s at the origin.%n",
                point, isOrigin(point) ? "" : "not");
    }

    public static boolean isOrigin(Point point) {
        return point.x() == ORIGIN.x() &&
               point.y() == ORIGIN.y();
    }

}

When run, the code above will produce the following output:

The point RegularPoint{x=0, y=0} is at the origin.
The point RegularPoint{x=1, y=1} is not at the origin.

We could easily replace the use of new RegularPoint(…​) with new RecordPoint(…​) in the code above, and we would get a similar output:

The point RecordPoint[x=0, y=0] is at the origin.
The point RecordPoint[x=1, y=1] is not at the origin.

It appears the two implementation variants of the interface Point work as expected. But how is the performance of code affected by switching from regular Java classes to records?

Benchmarks

Here is a benchmark that can be used to measure the effects of using records over regular classes:

@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@State(Scope.Benchmark)
@Warmup(iterations = 5, time = 1)
@Measurement(iterations = 5, time = 1)
@Fork(value=3)
public class Bench {

    private static final RegularPoint REGULAR_ORIGIN = new RegularPoint(0, 0);
    private static final RecordPoint RECORD_ORIGIN = new RecordPoint(0, 0);

    private List<RegularPoint> regularPoints;
    private List<RecordPoint> recordPoints;

    @Setup
    public void setup() {
        regularPoints = IntStream.range(0, 16)
                .mapToObj(i -> new RegularPoint(i, i))
                .toList();

        recordPoints = IntStream.range(0, 16)
                .mapToObj(i -> new RecordPoint(i, i))
                .toList();
    }

    @Benchmark
    public void regular(Blackhole bh) {
        for (RegularPoint point: regularPoints) {
            if (point.x() == REGULAR_ORIGIN.x() && point.y() == REGULAR_ORIGIN.y()) {
                bh.consume(1);
            } else {
                bh.consume(0);
            }
        }
    }

    @Benchmark
    public void record(Blackhole bh) {
        for (RecordPoint point: recordPoints) {
            if (point.x() == RECORD_ORIGIN.x() && point.y() == RECORD_ORIGIN.y()) {
                bh.consume(1);
            } else {
                bh.consume(0);
            }
        }
    }

    public static void main(String[] args) throws Exception {
        org.openjdk.jmh.Main.main(args);
    }

}

When run on a Mac M1 laptop, the following results emerged (lower is better):

Benchmark      Mode  Cnt   Score   Error  Units
Bench.regular  avgt   15  10.424 ± 0.257  ns/op
Bench.record   avgt   15   9.412 ± 0.181  ns/op

As can be seen, records are about 10% faster than regular classes in this benchmark.

Here is what it looks like in a graph:

Graph 1, shows the performance of regular and record classes.

Under the Hood

Looking at why records can be faster in cases like the above, there is a clue in the class ciField.cpp which tells the Hotspot compiler which instance fields should be "trusted" when performing constant folding. The class also gives away some clues about other Java classes that benefit from the same optimizations. One example is the Foreign Function & Memory API that is slated to be finalized in Java 22 and where, for example, all classes implementing the various MemoryLayout variants all are eligible for constant folding optimizations.

The C++ class above is not available to regular Java programs but, by switching to records, we may directly reap the benefits of constant folding for instance fields.

As a final note, it should be said that modifying records fields (which are private and final) using, for example, Unsafe is …​ well …​ unsafe and would produce an undefined result. Don’t do that!

UPDATE:
Unsafe and reflection provide special protection against tampering with records making it very difficult to update record fields via backdoors. For example, trying to obtain the field offset for a record component via 
Unsafe will result in an UnsupportedOperationExcption being thrown.

Conclusion

Records offer a convenient way of expressing data carriers. As an added benefit, they also provide improved performance compared to regular Java classes in some applications.

Wednesday, August 2, 2023

Java: New Draft JEP: "Computed Constants"

Java: JEP Draft: "Computed Constants"

We finally made the draft JEP "Computed Constants" public and I can’t wait to tell you more about it! ComputedConstant objects are superfast immutable value holders that can be initialized independently of when they are created. As an added benefit, these objects may in the future be even more optimized via "condensers" that eventually might become available through project Leyden.

Background

Oftentimes, we use static fields to hold objects that are only initialized once:

// ordinary static initialization
private static final Logger LOGGER = Logger.getLogger("com.foo.Bar");
...
LOGGER.log(...);

The LOGGER variable will be unconditionally initialized as soon as the class where it is declared is loaded (loading occurs upon the class being first referenced).

One way to prevent all static fields in a class from being initialized at the same time is to use the class holder idiom allowing us to defer initialization until we actually need the variable:

// Initialization-on-demand holder idiom
Logger logger() {
    class Holder {
         static final Logger LOGGER = Logger.getLogger("com.foo.Bar");
    }
    return Holder.LOGGER;
}
...
logger().log(...);

While this works well in theory, there are significant drawbacks: 

  • Each constant that needs to be decoupled would need its own holding class (adding static footprint overhead) 
  • Only works if the decoupled constants are independent 
  • Does only work for static variables and not for instance variables and objects

Another way is to use the double-checked locking idiom that can also be used for deferring initialization. This works for both static variables, instance variables and objects:

// Double-checked locking idiom
class Foo {

    private volatile Logger logger;

    public Logger logger() {
        Logger v = logger;
        if (v == null) {
            synchronized (this) {
                v = logger;
                if (v == null) {
                    logger = v = Logger.getLogger("com.foo.Bar");
                }
            }
        }
        return v;
    }
}
...
foo.logger().log(...);

There is no way for the (current) JVM to determine that the logger is monotonic in the sense that it can only change from null to a value once and then will always remain. So, the JVM is unable to apply constant folding and other optimizations. Also, because logger needs to be declared volatile there is a small performance penalty paid for each access.

The ComputedConstant class comes to the rescue here and offers the best of two worlds: Flexible initialization and good performance!

Computed Constant

Here is how ComputedConstant can be used with the logger example:

class Bar {
    // 1. Declare a computed constant value
    private static final ComputedConstant<Logger> LOGGER =
            ComputedConstant.of( () -> Logger.getLogger("com.foo.Bar") );

    static Logger logger() {
        // 2. Access the computed value
        //    (evaluation made before the first access)
        return LOGGER.get();
    }
}

This is similar in spirit to the class-holder idiom, and offers the same performance, constant-folding, and thread-safety characteristics, but is simpler and incurs a lower static footprint since no additional class is required.

Benchmarks

I’ve run some benchmarks on my Mac Pro M1 ARM-based machine and preliminary results indicates excellent performance for static ComputedConstant fields:

Benchmark      Mode  Cnt  Score   Error  Units
staticHolder   avgt   15  0.561 ? 0.002  ns/op
doubleChecked  avgt   15  1.122 ? 0.003  ns/op
constant       avgt   15  0.563 ? 0.002  ns/op // static ComputedConstant

As can be seen, a ComputedConstant has the same performance as the static holder (but with no extra class footprint) and much better performance than a double-checked locking variable.

Collections of ComputedConstant

So far so good. However, the hidden gem in the JEP is the ability to obtain Collections of ComputedConstant elements. This is achieved using a factory method that provides not a single ComputedConstant (with its provider) but a whole List of ComputedConstant elements that is handled by a single providing mapper that can initialize all the elements in the list. This allows a large number of variables to be handled via a single list, thereby saving space compared to having many single constants and initialization lambdas (for example).

Like a ComputedConstant<V> variable, a List<ComputedConstant<V>> variable is created by providing an element mapper - typically in the form of a lambda expression, which is used to compute the value associated with the i-th element of the List when the element value is first accessed:

class Fibonacci {
    static final List<ComputedConstant<Integer>> FIBONACCI =
            ComputedConstant.of(1_000, Fibonacci::fib);

    static int fib(int n) {
        return (n < 2)
                ? n
                : FIBONACCI.get(n - 1) + FIBONACCI.get(n - 2);
    }

    int[] fibs = IntStream.range(0, 10)
            .map(Fibonacci::fib)
            .toArray(); // { 0, 1, 1, 2, 3, 5, 8, 13, 21, 34 }

}

Note how there’s only one field of type List<ComputedConstant<Integer>> to initialize - every other computation is performed on-demand when the corresponding element of the List FIBONACCI is accessed.

When a computation depends on more sub-computations, it induces a dependency graph, where each computation is a node in the graph, and has zero or more edges to each of the sub-computation nodes it depends on. For instance, the dependency graph associated with fib(5) is given below:

               ___________fib(5)___________
              /                            \
        ____fib(4)____                ____fib(3)____
       /              \              /              \
     fib(3)         fib(2)         fib(2)          fib(1)
    /      \       /      \       /      \
  fib(2)  fib(1) fib(1)  fib(0) fib(1)  fib(0)

The Computed Constant API allows modeling this cleanly, while still preserving good constant-folding guarantees and integrity of updates in the case of multi-threaded access.

Benchmarks Collections

These benchmarks were run on the same platform as above and show collections of ComputedConstant elements enjoy the same performance benefits as the single ones do:

Benchmark      Mode  Cnt  Score   Error  Units
staticHolder   avgt   15  0.570 ? 0.005  ns/op // int[] in a holder class
doubleChecked  avgt   15  1.124 ? 0.044  ns/op
constant       avgt   15  0.562 ? 0.005  ns/op // List<ComputedConstant>

Again, the ComputedConstant clocks in at native static array speed while providing much better flexibility as to when initialized.

Instance Performance

The performance for instance variables and objects is superior to holders using the double-checked idiom showed above as can be seen in the benchmarks below:

Benchmark      Mode  Cnt  Score   Error  Units
doubleChecked  avgt   15  1.259 ? 0.023  ns/op
constant       avgt   15  0.728 ? 0.022  ns/op // ComputedConstant

So, ComputedConstant is more than 40% faster than the double-checked holder class tested on my machine.

Note: Instance performance is subject to review.

Where is it?

At the time of writing this article, ComputedConstant is not yet available in the mainline JDK repository. Check out the next section for a link to the proposed source code.

Acknowledgments

Parts of the text in this article were written by Maurizio Cimadamore