Skip to content

Conversation

@murfel
Copy link
Contributor

@murfel murfel commented Oct 16, 2025

No description provided.

@murfel
Copy link
Contributor Author

murfel commented Oct 16, 2025

It does perform unexpectedly badly, could you all take a look for any silly mistakes?

The simplest benchmark, sending X elements only (not receiving anything), is already pretty sad:

ChannelBenchmark.sendUnlimited                1000  avgt   10  0.008 ±  0.001  ms/op
ChannelBenchmark.sendUnlimited               10000  avgt   10  0.080 ±  0.001  ms/op
ChannelBenchmark.sendUnlimited              100000  avgt   10  0.800 ±  0.005  ms/op

Full output for the first three counts (4KB, 40KB, 400KB) of Ints (just FYI, no reason to look at this, since the snippet above is already bad enough)

Benchmark                                  (count)  Mode  Cnt  Score    Error  Units
ChannelBenchmark.manySendersManyReceivers     1000  avgt   10  0.119 ±  0.001  ms/op
ChannelBenchmark.manySendersManyReceivers    10000  avgt   10  0.900 ±  0.006  ms/op
ChannelBenchmark.manySendersManyReceivers   100000  avgt   10  9.244 ±  0.418  ms/op
ChannelBenchmark.manySendersOneReceiver       1000  avgt   10  0.108 ±  0.001  ms/op
ChannelBenchmark.manySendersOneReceiver      10000  avgt   10  0.713 ±  0.042  ms/op
ChannelBenchmark.manySendersOneReceiver     100000  avgt   10  7.010 ±  0.115  ms/op
ChannelBenchmark.oneSenderManyReceivers       1000  avgt   10  0.138 ±  0.001  ms/op
ChannelBenchmark.oneSenderManyReceivers      10000  avgt   10  0.923 ±  0.003  ms/op
ChannelBenchmark.oneSenderManyReceivers     100000  avgt   10  8.411 ±  0.035  ms/op
ChannelBenchmark.sendConflated                1000  avgt   10  0.020 ±  0.001  ms/op
ChannelBenchmark.sendConflated               10000  avgt   10  0.187 ±  0.007  ms/op
ChannelBenchmark.sendConflated              100000  avgt   10  1.834 ±  0.013  ms/op
ChannelBenchmark.sendReceiveConflated         1000  avgt   10  0.039 ±  0.001  ms/op
ChannelBenchmark.sendReceiveConflated        10000  avgt   10  0.236 ±  0.009  ms/op
ChannelBenchmark.sendReceiveConflated       100000  avgt   10  1.906 ±  0.019  ms/op
ChannelBenchmark.sendReceiveRendezvous        1000  avgt   10  0.103 ±  0.001  ms/op
ChannelBenchmark.sendReceiveRendezvous       10000  avgt   10  0.866 ±  0.021  ms/op
ChannelBenchmark.sendReceiveRendezvous      100000  avgt   10  8.270 ±  0.071  ms/op
ChannelBenchmark.sendReceiveUnlimited         1000  avgt   10  0.077 ±  0.002  ms/op
ChannelBenchmark.sendReceiveUnlimited        10000  avgt   10  0.419 ±  0.005  ms/op
ChannelBenchmark.sendReceiveUnlimited       100000  avgt   10  3.443 ±  0.061  ms/op
ChannelBenchmark.sendUnlimited                1000  avgt   10  0.008 ±  0.001  ms/op
ChannelBenchmark.sendUnlimited               10000  avgt   10  0.080 ±  0.001  ms/op
ChannelBenchmark.sendUnlimited              100000  avgt   10  0.800 ±  0.005  ms/op

@murfel murfel marked this pull request as draft October 16, 2025 15:59
}

private suspend fun send(count: Int, channel: Channel<Int>) = coroutineScope {
list.take(count).forEach { channel.send(it) }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

list.take(count) copies count elements to a new list, allocating a lot of new memory. I'd expect it to make a noticeable contribution to the runtime.

@fzhinkin
Copy link
Contributor

Just for the record, we discussed benchmarks with @murfel offline and she'll rework them.

@murfel
Copy link
Contributor Author

murfel commented Nov 4, 2025

Ran (on freshly restarted macbook, without any apps open but the terminal and system monitor)
java -jar benchmarks.jar ".ChannelBenchmark." -p count=1000,100000000 -p prefill=0,1000000,100000000

# Run complete. Total time: 00:37:21

Benchmark                                    (count)  (prefill)  Mode  Cnt     Score      Error  Units
ChannelBenchmark.manySendersManyReceivers       1000          0  avgt   10     0.113 ±    0.001  ms/op
ChannelBenchmark.manySendersManyReceivers       1000    1000000  avgt   10     0.118 ±    0.007  ms/op
ChannelBenchmark.manySendersManyReceivers       1000  100000000  avgt   10     0.357 ±    0.055  ms/op
ChannelBenchmark.manySendersManyReceivers  100000000          0  avgt   10  8796.152 ±  543.292  ms/op
ChannelBenchmark.manySendersManyReceivers  100000000    1000000  avgt   10  8683.527 ±  254.436  ms/op
ChannelBenchmark.manySendersManyReceivers  100000000  100000000  avgt   10  9434.746 ±  310.576  ms/op
ChannelBenchmark.manySendersOneReceiver         1000          0  avgt   10     0.084 ±    0.002  ms/op
ChannelBenchmark.manySendersOneReceiver         1000    1000000  avgt   10     0.068 ±    0.001  ms/op
ChannelBenchmark.manySendersOneReceiver         1000  100000000  avgt   10     0.327 ±    0.043  ms/op
ChannelBenchmark.manySendersOneReceiver    100000000          0  avgt   10  6759.587 ± 1126.828  ms/op
ChannelBenchmark.manySendersOneReceiver    100000000    1000000  avgt   10  6730.408 ±  112.128  ms/op
ChannelBenchmark.manySendersOneReceiver    100000000  100000000  avgt   10  6222.171 ±  256.355  ms/op
ChannelBenchmark.oneSenderManyReceivers         1000          0  avgt   10     0.119 ±    0.003  ms/op
ChannelBenchmark.oneSenderManyReceivers         1000    1000000  avgt   10     0.121 ±    0.003  ms/op
ChannelBenchmark.oneSenderManyReceivers         1000  100000000  avgt   10     0.353 ±    0.065  ms/op
ChannelBenchmark.oneSenderManyReceivers    100000000          0  avgt   10  8785.786 ±  567.569  ms/op
ChannelBenchmark.oneSenderManyReceivers    100000000    1000000  avgt   10  8698.243 ±  517.566  ms/op
ChannelBenchmark.oneSenderManyReceivers    100000000  100000000  avgt   10  8594.145 ±  416.015  ms/op
ChannelBenchmark.sendConflated                  1000        N/A  avgt   10     0.017 ±    0.001  ms/op
ChannelBenchmark.sendConflated             100000000        N/A  avgt   10  1504.701 ±   27.829  ms/op
ChannelBenchmark.sendReceiveConflated           1000        N/A  avgt   10     0.037 ±    0.001  ms/op
ChannelBenchmark.sendReceiveConflated      100000000        N/A  avgt   10  1722.869 ±   85.603  ms/op
ChannelBenchmark.sendReceiveRendezvous          1000        N/A  avgt   10     0.122 ±    0.018  ms/op
ChannelBenchmark.sendReceiveRendezvous     100000000        N/A  avgt   10  7300.491 ±  107.318  ms/op
ChannelBenchmark.sendReceiveUnlimited           1000          0  avgt   10     0.057 ±    0.002  ms/op
ChannelBenchmark.sendReceiveUnlimited           1000    1000000  avgt   10     0.056 ±    0.003  ms/op
ChannelBenchmark.sendReceiveUnlimited           1000  100000000  avgt   10     0.314 ±    0.038  ms/op
ChannelBenchmark.sendReceiveUnlimited      100000000          0  avgt   10  3645.250 ±  658.235  ms/op
ChannelBenchmark.sendReceiveUnlimited      100000000    1000000  avgt   10  3192.487 ±  372.223  ms/op
ChannelBenchmark.sendReceiveUnlimited      100000000  100000000  avgt   10  3965.029 ±  386.913  ms/op
ChannelBenchmark.sendUnlimited                  1000        N/A  avgt   10     0.006 ±    0.001  ms/op
ChannelBenchmark.sendUnlimited             100000000        N/A  avgt   10  1157.710 ±  248.811  ms/op

@murfel murfel marked this pull request as ready for review November 4, 2025 13:24
@murfel murfel requested a review from dkhalanskyjb November 4, 2025 13:24
@murfel murfel changed the title [Draft] Add channel benchmarks Add channel benchmarks Nov 4, 2025
@murfel
Copy link
Contributor Author

murfel commented Nov 4, 2025

Quick normalisation with ChatGPT

Produce the same table but divide the Score column [and the Error column] by the count column and Change to ns/op/element (https://chatgpt.com/share/e/690a0281-e828-800b-8895-144ecc4e07f3)

(Will do a proper Notebook for a JSON benchmark output after we agree on the benchmark correctness. Forgot to save this one as JSON and it takes 40 min to re-run.)

# Run complete. Total time: 00:37:21

Benchmark                                    (count)  (prefill)  Mode  Cnt     Score          Error        Units
ChannelBenchmark.manySendersManyReceivers       1000          0  avgt   10     113.000 ±     1.000  ns/op/element
ChannelBenchmark.manySendersManyReceivers       1000    1000000  avgt   10     118.000 ±     7.000  ns/op/element
ChannelBenchmark.manySendersManyReceivers       1000  100000000  avgt   10     357.000 ±    55.000  ns/op/element
ChannelBenchmark.manySendersManyReceivers  100000000          0  avgt   10      87.962 ±     5.433  ns/op/element
ChannelBenchmark.manySendersManyReceivers  100000000    1000000  avgt   10      86.835 ±     2.544  ns/op/element
ChannelBenchmark.manySendersManyReceivers  100000000  100000000  avgt   10      94.347 ±     3.106  ns/op/element
ChannelBenchmark.manySendersOneReceiver         1000          0  avgt   10      84.000 ±     2.000  ns/op/element
ChannelBenchmark.manySendersOneReceiver         1000    1000000  avgt   10      68.000 ±     1.000  ns/op/element
ChannelBenchmark.manySendersOneReceiver         1000  100000000  avgt   10     327.000 ±    43.000  ns/op/element
ChannelBenchmark.manySendersOneReceiver    100000000          0  avgt   10      67.596 ±    11.268  ns/op/element
ChannelBenchmark.manySendersOneReceiver    100000000    1000000  avgt   10      67.304 ±     1.121  ns/op/element
ChannelBenchmark.manySendersOneReceiver    100000000  100000000  avgt   10      62.222 ±     2.564  ns/op/element
ChannelBenchmark.oneSenderManyReceivers         1000          0  avgt   10     119.000 ±     3.000  ns/op/element
ChannelBenchmark.oneSenderManyReceivers         1000    1000000  avgt   10     121.000 ±     3.000  ns/op/element
ChannelBenchmark.oneSenderManyReceivers         1000  100000000  avgt   10     353.000 ±    65.000  ns/op/element
ChannelBenchmark.oneSenderManyReceivers    100000000          0  avgt   10      87.858 ±     5.676  ns/op/element
ChannelBenchmark.oneSenderManyReceivers    100000000    1000000  avgt   10      86.982 ±     5.176  ns/op/element
ChannelBenchmark.oneSenderManyReceivers    100000000  100000000  avgt   10      85.941 ±     4.160  ns/op/element
ChannelBenchmark.sendConflated                  1000        N/A  avgt   10      17.000 ±     1.000  ns/op/element
ChannelBenchmark.sendConflated             100000000        N/A  avgt   10      15.047 ±     0.278  ns/op/element
ChannelBenchmark.sendReceiveConflated           1000        N/A  avgt   10      37.000 ±     1.000  ns/op/element
ChannelBenchmark.sendReceiveConflated      100000000        N/A  avgt   10      17.229 ±     0.856  ns/op/element
ChannelBenchmark.sendReceiveRendezvous          1000        N/A  avgt   10     122.000 ±    18.000  ns/op/element
ChannelBenchmark.sendReceiveRendezvous     100000000        N/A  avgt   10      73.005 ±     1.073  ns/op/element
ChannelBenchmark.sendReceiveUnlimited           1000          0  avgt   10      57.000 ±     2.000  ns/op/element
ChannelBenchmark.sendReceiveUnlimited           1000    1000000  avgt   10      56.000 ±     3.000  ns/op/element
ChannelBenchmark.sendReceiveUnlimited           1000  100000000  avgt   10     314.000 ±    38.000  ns/op/element
ChannelBenchmark.sendReceiveUnlimited      100000000          0  avgt   10      36.453 ±     6.582  ns/op/element
ChannelBenchmark.sendReceiveUnlimited      100000000    1000000  avgt   10      31.925 ±     3.722  ns/op/element
ChannelBenchmark.sendReceiveUnlimited      100000000  100000000  avgt   10      39.651 ±     3.869  ns/op/element
ChannelBenchmark.sendUnlimited                  1000        N/A  avgt   10       6.000 ±     1.000  ns/op/element
ChannelBenchmark.sendUnlimited             100000000        N/A  avgt   10      11.577 ±     2.488  ns/op/element

}
}

suspend fun <E> Channel<E>.forEach(action: (E) -> Unit) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
suspend fun <E> Channel<E>.forEach(action: (E) -> Unit) {
suspend inline fun <E> Channel<E>.forEach(action: (E) -> Unit) {

repeat(maxCount) { add(it) }
}

@Setup(Level.Invocation)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why it has to be done before every benchmark function invocation and not once per trial / iteration?

JFTR, https://github.com/openjdk/jmh/blob/2a316030b509aa9874dd6ab04e21962ac92cd634/jmh-core/src/main/java/org/openjdk/jmh/annotations/Level.java#L85

Comment on lines +131 to +136
if (receiveAll) {
channel.forEach { }
} else {
repeat(countPerReceiverAtLeast) {
channel.receive()
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense to send received values into a blackhole (i.e. consume them).

@OutputTimeUnit(TimeUnit.NANOSECONDS)
@State(Scope.Benchmark)
@Fork(1)
open class ChannelBenchmark {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please elaborate what exactly you're trying to measure using these benchmarks?
Right now, it looks like "time required to create a new channel, send N messages into it (and, optionally, receive them), and then close the channel". However, I thought that initial idea was to measure the latency of sending (and receiving) a single message into the channel.

Comment on lines +118 to +121
require(senders > 0 && receivers > 0)
// Can be used with more than num cores but needs thinking it through,
// e.g., what would it measure?
require(senders + receivers <= cores)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need to include it into measurements? :)

// to allow for true parallelism
val cores = 4

// 4 KB, 40 KB, 400 KB, 4 MB, 40 MB, 400 MB
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment wants to be in alignment with the numbers, but isn't.


@State(Scope.Benchmark)
open class UnlimitedChannelWrapper {
// 0, 4 MB, 40 MB, 400 MB
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

Comment on lines +25 to +28
val maxCount = 100000000
val list = ArrayList<Int>(maxCount).apply {
repeat(maxCount) { add(it) }
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
val maxCount = 100000000
val list = ArrayList<Int>(maxCount).apply {
repeat(maxCount) { add(it) }
}
val list = List<Int>(100000000) { it }

open class ChannelBenchmark {
// max coroutines launched per benchmark
// to allow for true parallelism
val cores = 4
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Runtime.getRuntime().availableProcessors() instead of 4? Otherwise, cores is misleading.

Comment on lines +103 to +105
Channel<Int>(capacity).also {
sendManyItems(count, it)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor, style.
From https://kotlinlang.org/docs/scope-functions.html#also:

When you see also in code, you can read it as " and also do the following with the object. "

In my opinion, sending many items to a channel is the main idea, not just something that you "also" do here, so I'd opt into either a form with a variable, like

val channel = Channel<Int>(capacity)
repeat(count) {
    channel.send(list[it])
}

or used let:

Channel<Int>(capacity).let {
    sendManyItems(count, it)
}

Of the two, I prefer the first one.

Comment on lines +95 to +104
private suspend fun sendManyItems(count: Int, channel: Channel<Int>) {
repeat(count) {
// NB: it is `send`, not `trySend`, on purpose, since we are testing the `send` performance here.
channel.send(list[it])
}
}

private suspend fun runSend(count: Int, capacity: Int) {
Channel<Int>(capacity).also {
sendManyItems(count, it)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor, style: these two functions don't pass the https://wiki.haskell.org/Fairbairn_threshold for me, so I'd just inline them. Then, even the NB wouldn't be necessary, as it would be clear from the benchmark name that we are testing send.

val receiveAll = channel.isEmpty
// send almost `count` items, up to `senders - 1` items will not be sent (negligible)
val countPerSender = count / senders
// for prefilled channel only: up to `receivers - 1` items of the sent items will not be received (negligible)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this.

In total, there will be countPerSender * senders elements sent in while this function is running, so there will be wrapper.prefill + countPerSender * senders elements ultimately sent to the channel. Every receiver will receive floor(countPerSender * senders / receivers) elements, that is, in total, floor(countPerSender * senders / receivers) * receivers will leave the channel, which can leave wrapper.prefill + receivers - 1 elements inside it.

For big enough values of prefill and a small enough count, none of the items sent in runSendReceive will be received.

// Can be used with more than num cores but needs thinking it through,
// e.g., what would it measure?
require(senders + receivers <= cores)
// if the channel is prefilled, do not receive the prefilled items
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The prefilled items will be received, as the channel is FIFO, so the way I'd explain the logic I see here is that we only want to receive as many items as there were sent, which in case of a non-prefilled channel means, all the items.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants