Skip to content

Commit fa46c19

Browse files
committed
Switch to DoubleAdder for simpleclient, it's more performant.
Add benchmarks for both of our java clients, and a comparison with codahale/yammer metrics. This is based on internal benchmarks by Will Fleury. Add caching of the Child when there's no labels. This is the most common use case, and makes it 4-5x faster than the naive approach. Performance comparison: Codehale(a popular java instrumention library) takes 12ns per increment of a counter uncontended, rising to 18ns with 4 threads The original client takes 126ns, rising to 245ns with 4 threads. Incrementing the child takes 13ns, rising to 382ns with 4 threads (odd - it should be <245ns). With theese changes the Simpleclient's performance goes from: - Uncontended labels() increment: 70ns -> 54ns - 2-thread labels() increment: 146ns -> 103ns - 4-thread labels() increment: 509ns -> 130ns - Uncontended Child increment: 29ns -> 13ns - 2-thread Child increment: 102ns -> 16ns - 4-thread Child increment: 215ns -> 19ns - Uncontended nolabels convenience increment: 50ns -> 13ns - 2-thread nolabels convenience increment: 172ns -> 17ns - 4-thread nolabels convenience increment: 434ns -> 20ns The new numbers are much faster, and comparable with codehale. ConcurrentHashMap doesn't seem great with lots of threads, avoiding that by caching the Child is advised where you have labels and a high rate of concurrent updates. I tested on a 2-core MacBook Pro with a 2.5GHz i5 processor, Oracle Java 64 1.7.0_51.
1 parent c650cad commit fa46c19

File tree

11 files changed

+985
-36
lines changed

11 files changed

+985
-36
lines changed

benchmark/README.md

Lines changed: 150 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,150 @@
1+
# Client Benchmarks
2+
3+
This module contains microbenchmarks for client instrumentation operations.
4+
5+
## Result Overview
6+
7+
The main outcomes of the benchmarks:
8+
* Simpleclient Counters/Gauges have similar performance to Codahale Counters.
9+
* The original client is much slower than the Simpleclient or Codahale, especially when used concurrently.
10+
* Codahale Meters are slower than Codahale/Simpleclient Counters
11+
* Codahale and original client Summaries are 10x slower than other metrics.
12+
* Simpleclient `Gauge.Child.set` is relatively slow, especially when done concurrently.
13+
* Label lookups in both Prometheus clients is relatively slow.
14+
15+
Accordingly, in terms of client instrumentation performance I suggest the following:
16+
* It's cheap to extensively instrument your code with Simpleclient Counters/Gauges/Summaries without labels, or Codahale Counters.
17+
* Avoid Codahale Meters, in favour of Codahale/Simpleclient Counters and calculating the rate in your monitoring system (e.g. the `rate()` function in Prometheus).
18+
* Avoid original client Summaries and Codahale Histograms/Timers on high update rate code paths.
19+
* Avoid the original client.
20+
* For high update rate (&gt;1000 per second) prometheus metrics using labels, you should cache the Child. Java 8 may make this better due to an improved ConcurrentHashMap implementation.
21+
* If a use case appears for high update rate use of SimpleClient's `Gauge.Child.set`, we should alter `DoubleAdder` to more efficiently handle this use case.
22+
23+
## Benchmark Results
24+
25+
These benchmarks were run using JMH on a 2-core MacBook Pro with a 2.5GHz i5 processor,
26+
with Oracle Java 64 1.7.0\_51.
27+
28+
### Counters
29+
java -jar target/benchmarks.jar CounterBenchmark -wi 5 -i 5 -f 1 -t 1
30+
i.p.b.CounterBenchmark.codahaleCounterIncBenchmark avgt 5 11.554 ± 0.251 ns/op
31+
i.p.b.CounterBenchmark.codahaleMeterMarkBenchmark avgt 5 75.305 ± 7.147 ns/op
32+
i.p.b.CounterBenchmark.prometheusCounterChildIncBenchmark avgt 5 13.249 ± 0.029 ns/op
33+
i.p.b.CounterBenchmark.prometheusCounterIncBenchmark avgt 5 127.397 ± 4.072 ns/op
34+
i.p.b.CounterBenchmark.prometheusSimpleCounterChildIncBenchmark avgt 5 12.989 ± 0.285 ns/op
35+
i.p.b.CounterBenchmark.prometheusSimpleCounterIncBenchmark avgt 5 54.822 ± 7.994 ns/op
36+
i.p.b.CounterBenchmark.prometheusSimpleCounterNoLabelsIncBenchmark avgt 5 13.131 ± 1.661 ns/op
37+
38+
java -jar target/benchmarks.jar CounterBenchmark -wi 5 -i 5 -f 1 -t 2
39+
i.p.b.CounterBenchmark.codahaleCounterIncBenchmark avgt 5 16.707 ± 2.116 ns/op
40+
i.p.b.CounterBenchmark.codahaleMeterMarkBenchmark avgt 5 107.346 ± 23.127 ns/op
41+
i.p.b.CounterBenchmark.prometheusCounterChildIncBenchmark avgt 5 41.912 ± 18.167 ns/op
42+
i.p.b.CounterBenchmark.prometheusCounterIncBenchmark avgt 5 170.860 ± 5.110 ns/op
43+
i.p.b.CounterBenchmark.prometheusSimpleCounterChildIncBenchmark avgt 5 17.782 ± 2.764 ns/op
44+
i.p.b.CounterBenchmark.prometheusSimpleCounterIncBenchmark avgt 5 89.656 ± 4.577 ns/op
45+
i.p.b.CounterBenchmark.prometheusSimpleCounterNoLabelsIncBenchmark avgt 5 16.109 ± 1.723 ns/op
46+
47+
java -jar target/benchmarks.jar CounterBenchmark -wi 5 -i 5 -f 1 -t 4
48+
i.p.b.CounterBenchmark.codahaleCounterIncBenchmark avgt 5 17.628 ± 0.501 ns/op
49+
i.p.b.CounterBenchmark.codahaleMeterMarkBenchmark avgt 5 121.836 ± 15.888 ns/op
50+
i.p.b.CounterBenchmark.prometheusCounterChildIncBenchmark avgt 5 377.916 ± 7.965 ns/op
51+
i.p.b.CounterBenchmark.prometheusCounterIncBenchmark avgt 5 250.919 ± 2.728 ns/op
52+
i.p.b.CounterBenchmark.prometheusSimpleCounterChildIncBenchmark avgt 5 18.055 ± 1.391 ns/op
53+
i.p.b.CounterBenchmark.prometheusSimpleCounterIncBenchmark avgt 5 120.543 ± 1.770 ns/op
54+
i.p.b.CounterBenchmark.prometheusSimpleCounterNoLabelsIncBenchmark avgt 5 19.334 ± 1.471 ns/op
55+
56+
### Gauges
57+
58+
Codahale lacks a metric with a `set` method, so we'll compare to `Counter` which has `inc` and `dec`.
59+
60+
java -jar target/benchmarks.jar GaugeBenchmark -wi 5 -i 5 -f 1 -t 1
61+
i.p.b.GaugeBenchmark.codahaleCounterDecBenchmark avgt 5 11.620 ± 0.288 ns/op
62+
i.p.b.GaugeBenchmark.codahaleCounterIncBenchmark avgt 5 11.718 ± 0.333 ns/op
63+
i.p.b.GaugeBenchmark.prometheusGaugeChildDecBenchmark avgt 5 13.358 ± 0.554 ns/op
64+
i.p.b.GaugeBenchmark.prometheusGaugeChildIncBenchmark avgt 5 13.268 ± 0.276 ns/op
65+
i.p.b.GaugeBenchmark.prometheusGaugeChildSetBenchmark avgt 5 11.624 ± 0.210 ns/op
66+
i.p.b.GaugeBenchmark.prometheusGaugeDecBenchmark avgt 5 125.058 ± 2.764 ns/op
67+
i.p.b.GaugeBenchmark.prometheusGaugeIncBenchmark avgt 5 127.814 ± 7.741 ns/op
68+
i.p.b.GaugeBenchmark.prometheusGaugeSetBenchmark avgt 5 127.899 ± 6.690 ns/op
69+
i.p.b.GaugeBenchmark.prometheusSimpleGaugeChildDecBenchmark avgt 5 12.961 ± 0.393 ns/op
70+
i.p.b.GaugeBenchmark.prometheusSimpleGaugeChildIncBenchmark avgt 5 12.932 ± 0.212 ns/op
71+
i.p.b.GaugeBenchmark.prometheusSimpleGaugeChildSetBenchmark avgt 5 36.672 ± 1.112 ns/op
72+
i.p.b.GaugeBenchmark.prometheusSimpleGaugeDecBenchmark avgt 5 54.677 ± 3.704 ns/op
73+
i.p.b.GaugeBenchmark.prometheusSimpleGaugeIncBenchmark avgt 5 53.278 ± 1.104 ns/op
74+
i.p.b.GaugeBenchmark.prometheusSimpleGaugeSetBenchmark avgt 5 79.724 ± 2.723 ns/op
75+
i.p.b.GaugeBenchmark.prometheusSimpleGaugeNoLabelsDecBenchmark avgt 5 12.957 ± 0.437 ns/op
76+
i.p.b.GaugeBenchmark.prometheusSimpleGaugeNoLabelsIncBenchmark avgt 5 12.932 ± 0.284 ns/op
77+
i.p.b.GaugeBenchmark.prometheusSimpleGaugeNoLabelsSetBenchmark avgt 5 40.235 ± 1.735 ns/op
78+
79+
java -jar target/benchmarks.jar GaugeBenchmark -wi 5 -i 5 -f 1 -t 2
80+
i.p.b.GaugeBenchmark.codahaleCounterDecBenchmark avgt 5 17.443 ± 4.819 ns/op
81+
i.p.b.GaugeBenchmark.codahaleCounterIncBenchmark avgt 5 14.882 ± 2.875 ns/op
82+
i.p.b.GaugeBenchmark.prometheusGaugeChildDecBenchmark avgt 5 45.206 ± 29.575 ns/op
83+
i.p.b.GaugeBenchmark.prometheusGaugeChildIncBenchmark avgt 5 46.657 ± 33.518 ns/op
84+
i.p.b.GaugeBenchmark.prometheusGaugeChildSetBenchmark avgt 5 21.810 ± 9.370 ns/op
85+
i.p.b.GaugeBenchmark.prometheusGaugeDecBenchmark avgt 5 177.370 ± 2.477 ns/op
86+
i.p.b.GaugeBenchmark.prometheusGaugeIncBenchmark avgt 5 172.136 ± 3.056 ns/op
87+
i.p.b.GaugeBenchmark.prometheusGaugeSetBenchmark avgt 5 186.791 ± 7.996 ns/op
88+
i.p.b.GaugeBenchmark.prometheusSimpleGaugeChildDecBenchmark avgt 5 15.978 ± 2.762 ns/op
89+
i.p.b.GaugeBenchmark.prometheusSimpleGaugeChildIncBenchmark avgt 5 15.457 ± 1.052 ns/op
90+
i.p.b.GaugeBenchmark.prometheusSimpleGaugeChildSetBenchmark avgt 5 156.604 ± 10.953 ns/op
91+
i.p.b.GaugeBenchmark.prometheusSimpleGaugeDecBenchmark avgt 5 107.134 ± 33.620 ns/op
92+
i.p.b.GaugeBenchmark.prometheusSimpleGaugeIncBenchmark avgt 5 89.362 ± 16.608 ns/op
93+
i.p.b.GaugeBenchmark.prometheusSimpleGaugeSetBenchmark avgt 5 163.823 ± 25.270 ns/op
94+
i.p.b.GaugeBenchmark.prometheusSimpleGaugeNoLabelsDecBenchmark avgt 5 16.380 ± 1.915 ns/op
95+
i.p.b.GaugeBenchmark.prometheusSimpleGaugeNoLabelsIncBenchmark avgt 5 17.042 ± 1.113 ns/op
96+
i.p.b.GaugeBenchmark.prometheusSimpleGaugeNoLabelsSetBenchmark avgt 5 164.930 ± 2.565 ns/op
97+
98+
java -jar target/benchmarks.jar GaugeBenchmark -wi 5 -i 5 -f 1 -t 4
99+
i.p.b.GaugeBenchmark.codahaleCounterDecBenchmark avgt 5 17.291 ± 1.769 ns/op
100+
i.p.b.GaugeBenchmark.codahaleCounterIncBenchmark avgt 5 17.445 ± 0.709 ns/op
101+
i.p.b.GaugeBenchmark.prometheusGaugeChildDecBenchmark avgt 5 389.411 ± 13.078 ns/op
102+
i.p.b.GaugeBenchmark.prometheusGaugeChildIncBenchmark avgt 5 399.549 ± 29.274 ns/op
103+
i.p.b.GaugeBenchmark.prometheusGaugeChildSetBenchmark avgt 5 123.700 ± 3.894 ns/op
104+
i.p.b.GaugeBenchmark.prometheusGaugeDecBenchmark avgt 5 244.741 ± 22.477 ns/op
105+
i.p.b.GaugeBenchmark.prometheusGaugeIncBenchmark avgt 5 243.525 ± 6.332 ns/op
106+
i.p.b.GaugeBenchmark.prometheusGaugeSetBenchmark avgt 5 252.363 ± 2.664 ns/op
107+
i.p.b.GaugeBenchmark.prometheusSimpleGaugeChildDecBenchmark avgt 5 18.330 ± 2.673 ns/op
108+
i.p.b.GaugeBenchmark.prometheusSimpleGaugeChildIncBenchmark avgt 5 20.633 ± 1.219 ns/op
109+
i.p.b.GaugeBenchmark.prometheusSimpleGaugeChildSetBenchmark avgt 5 335.455 ± 4.562 ns/op
110+
i.p.b.GaugeBenchmark.prometheusSimpleGaugeDecBenchmark avgt 5 116.432 ± 4.793 ns/op
111+
i.p.b.GaugeBenchmark.prometheusSimpleGaugeIncBenchmark avgt 5 129.390 ± 2.360 ns/op
112+
i.p.b.GaugeBenchmark.prometheusSimpleGaugeSetBenchmark avgt 5 613.186 ± 20.548 ns/op
113+
i.p.b.GaugeBenchmark.prometheusSimpleGaugeNoLabelsDecBenchmark avgt 5 19.765 ± 3.189 ns/op
114+
i.p.b.GaugeBenchmark.prometheusSimpleGaugeNoLabelsIncBenchmark avgt 5 19.589 ± 1.634 ns/op
115+
i.p.b.GaugeBenchmark.prometheusSimpleGaugeNoLabelsSetBenchmark avgt 5 307.238 ± 1.918 ns/op
116+
117+
### Summaries
118+
119+
The simpleclient `Summary` doesn't have percentiles, so it's not an apples to
120+
apples comparison. The closest to the original client's `Summary` is Codahale's
121+
`Timer`, but that includes timing calls so we compare with `Histogram` instead.
122+
123+
java -jar target/benchmarks.jar SummaryBenchmark -wi 5 -i 5 -f 1 -t 1
124+
i.p.b.SummaryBenchmark.codahaleHistogramBenchmark avgt 5 187.192 ± 12.304 ns/op
125+
i.p.b.SummaryBenchmark.prometheusSimpleSummaryBenchmark avgt 5 61.140 ± 1.298 ns/op
126+
i.p.b.SummaryBenchmark.prometheusSimpleSummaryChildBenchmark avgt 5 15.322 ± 0.266 ns/op
127+
i.p.b.SummaryBenchmark.prometheusSimpleSummaryNoLabelsBenchmark avgt 5 15.750 ± 0.476 ns/op
128+
i.p.b.SummaryBenchmark.prometheusSummaryBenchmark avgt 5 988.597 ± 343.221 ns/op
129+
i.p.b.SummaryBenchmark.prometheusSummaryChildBenchmark avgt 5 1165.270 ± 916.324 ns/op
130+
131+
java -jar target/benchmarks.jar SummaryBenchmark -wi 5 -i 5 -f 1 -t 2
132+
i.p.b.SummaryBenchmark.codahaleHistogramBenchmark avgt 5 243.749 ± 37.080 ns/op
133+
i.p.b.SummaryBenchmark.prometheusSimpleSummaryBenchmark avgt 5 119.753 ± 15.056 ns/op
134+
i.p.b.SummaryBenchmark.prometheusSimpleSummaryChildBenchmark avgt 5 32.614 ± 15.976 ns/op
135+
i.p.b.SummaryBenchmark.prometheusSimpleSummaryNoLabelsBenchmark avgt 5 32.627 ± 8.943 ns/op
136+
i.p.b.SummaryBenchmark.prometheusSummaryBenchmark avgt 5 2021.984 ± 561.545 ns/op
137+
i.p.b.SummaryBenchmark.prometheusSummaryChildBenchmark avgt 5 2338.371 ± 1515.886 ns/op
138+
139+
java -jar target/benchmarks.jar SummaryBenchmark -wi 5 -i 5 -f 1 -t 4
140+
i.p.b.SummaryBenchmark.codahaleHistogramBenchmark avgt 5 559.505 ± 5.169 ns/op
141+
i.p.b.SummaryBenchmark.prometheusSimpleSummaryBenchmark avgt 5 137.072 ± 12.044 ns/op
142+
i.p.b.SummaryBenchmark.prometheusSimpleSummaryChildBenchmark avgt 5 44.228 ± 0.697 ns/op
143+
i.p.b.SummaryBenchmark.prometheusSimpleSummaryNoLabelsBenchmark avgt 5 41.223 ± 1.978 ns/op
144+
i.p.b.SummaryBenchmark.prometheusSummaryBenchmark avgt 5 4023.354 ± 1122.317 ns/op
145+
i.p.b.SummaryBenchmark.prometheusSummaryChildBenchmark avgt 5 4606.571 ± 3392.565 ns/op
146+
147+
Note the high error bars for the original client, it got slower with each iteration
148+
so I suspect a flaw in the test setup.
149+
150+

benchmark/pom.xml

Lines changed: 106 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,106 @@
1+
<?xml version="1.0" encoding="UTF-8"?>
2+
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
3+
<modelVersion>4.0.0</modelVersion>
4+
5+
<parent>
6+
<groupId>io.prometheus</groupId>
7+
<artifactId>parent</artifactId>
8+
<version>0.0.6-SNAPSHOT</version>
9+
</parent>
10+
11+
<groupId>io.prometheus</groupId>
12+
<artifactId>benchmarks</artifactId>
13+
14+
<name>Prometheus Java Client Benchmarks</name>
15+
<description>
16+
Benchmarks of client performance, and comparison to other systems.
17+
</description>
18+
19+
<licenses>
20+
<license>
21+
<name>The Apache Software License, Version 2.0</name>
22+
<url>http://www.apache.org/licenses/LICENSE-2.0.txt</url>
23+
<distribution>repo</distribution>
24+
</license>
25+
</licenses>
26+
27+
<properties>
28+
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
29+
</properties>
30+
31+
<dependencies>
32+
<dependency>
33+
<groupId>org.openjdk.jmh</groupId>
34+
<artifactId>jmh-core</artifactId>
35+
<version>1.3.2</version>
36+
</dependency>
37+
<dependency>
38+
<groupId>org.openjdk.jmh</groupId>
39+
<artifactId>jmh-generator-annprocess</artifactId>
40+
<version>1.3.2</version>
41+
</dependency>
42+
43+
<dependency>
44+
<groupId>io.prometheus</groupId>
45+
<artifactId>client</artifactId>
46+
<version>0.0.6-SNAPSHOT</version>
47+
</dependency>
48+
<dependency>
49+
<groupId>io.prometheus</groupId>
50+
<artifactId>simpleclient</artifactId>
51+
<version>0.0.6-SNAPSHOT</version>
52+
</dependency>
53+
<dependency>
54+
<groupId>com.codahale.metrics</groupId>
55+
<artifactId>metrics-core</artifactId>
56+
<version>3.0.2</version>
57+
</dependency>
58+
</dependencies>
59+
<build>
60+
<plugins>
61+
<plugin>
62+
<artifactId>maven-compiler-plugin</artifactId>
63+
<version>3.1</version>
64+
<configuration>
65+
<source>1.6</source>
66+
<target>1.6</target>
67+
</configuration>
68+
</plugin>
69+
<plugin>
70+
<groupId>org.apache.maven.plugins</groupId>
71+
<artifactId>maven-shade-plugin</artifactId>
72+
<version>2.2</version>
73+
<executions>
74+
<execution>
75+
<phase>package</phase>
76+
<goals>
77+
<goal>shade</goal>
78+
</goals>
79+
<configuration>
80+
<finalName>benchmarks</finalName>
81+
<transformers>
82+
<transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
83+
<mainClass>org.openjdk.jmh.Main</mainClass>
84+
</transformer>
85+
</transformers>
86+
<filters>
87+
<filter>
88+
<!--
89+
Shading signed JARs will fail without this.
90+
http://stackoverflow.com/questions/999489/invalid-signature-file-when-attempting-to-run-a-jar
91+
-->
92+
<artifact>*:*</artifact>
93+
<excludes>
94+
<exclude>META-INF/*.SF</exclude>
95+
<exclude>META-INF/*.DSA</exclude>
96+
<exclude>META-INF/*.RSA</exclude>
97+
</excludes>
98+
</filter>
99+
</filters>
100+
</configuration>
101+
</execution>
102+
</executions>
103+
</plugin>
104+
</plugins>
105+
</build>
106+
</project>

0 commit comments

Comments
 (0)