Newest 'benchmarking' Questions

0 votes

0 answers

54 views

How to change my slurm script to perform benchmark on Thin node

I'm trying to run the following script, but after a few seconds (approximately 4) the job ends without creating the output_file with the data and without creating outout and error slurm files. #!/bin/...

andreg

11

asked Dec 2 at 13:51

Advice

0 votes

6 replies

131 views

Why is FileReader as efficient as BufferedReader in reading 1KB chunks of data?

I was trying to read data (chars) from a large text file (~250MB) in 1KB chunks and was very surprised that reading that file using either FileReader or BufferedReader takes exactly the same time, ...

sebkaminski16

19

asked Nov 22 at 7:29

1 vote

1 answer

201 views

How to benchmark atomic<int> vs atomic<size_t>?

I have a bounded queue with small size that definitely fit in int. So I want to use atomic<int> instead of atomic<size_t> for indexing/counter, since int is smaller it should be faster. ...

Huy Le

1,999

asked Nov 11 at 3:12

1 vote

0 answers

113 views

Golang benchmarks involving goroutines show higher than expected allocations when controlling the timer manually

Using go version go1.25.3 darwin/arm64. The below implementation is a simplified version of the actual implementation. type WaitObject struct{ c chan struct{} } func StartNewTestObject(d time....

Ahmad Sameh

19

asked Oct 31 at 14:27

0 votes

1 answer

75 views

Why does MSVC AVX2 /FP:strict sometimes generate inferior (slower) code to SSE2?

I was testing various expressions of a sixth order polynomial to find the fastest possible throughput. I have stumbled upon a simple polynomial expression length 6 that provokes poor code generation ...

Martin Brown

3,666

asked Sep 12 at 16:51

1 vote

1 answer

91 views

Browser debugger shows less time taken to download a base64 over a multi-part file despite the larger file size [closed]

Today, I researched Base64 encoding versus other methods and whether to use it in a JSON API, considering the 33-37% size overhead that Base64 introduces and all sorts of related topics. To ...

tfn

87

asked Sep 10 at 9:07

2 votes

0 answers

128 views

Why is Horner's method for evaluation faster than expected when loop unrolled for Polynomial lengths N<90?

I have spent some time trying to speed up code that uses Horner's method for evaluating modest length polynomials (N < 32). I have a solution using loop unrolling that works very well at -O2 or ...

Martin Brown

3,666

asked Aug 29 at 16:28

3 votes

1 answer

152 views

BenchmarkDotNet: OutOfMemoryException when benchmarking parsing a JSON file

I'm trying to benchmark the performance of a library I've written that can parse large JSON files into both an object model and a JsonDocument. So far as I can tell I'm doing everything right, but I'...

Ari Roth

5,582

asked Aug 22 at 3:23

7 votes

1 answer

342 views

How to use plain RDTSC without using asm?

I want to use RDTSC in Rust to benchmark how many ticks my function takes. There's a built-in std::arch::x86_64::_rdtsc, alas it always translates into: rdtsc shl rdx, 32 or rax, rdx ...

Daniil Tutubalin

198

asked Aug 17 at 10:50

4 votes

2 answers

223 views

Why is the the generic implementation of Vector.Log so much slower than the non-generic implementations for me?

I've run some benchmarks on Math.Log, System.Numerics.Vector.Log, System.Runtime.Intrinsics.Vector128.Log, Vector256.Log and Vector512.Log and the results were pretty surprising to me. I was expecting ...

user31260114

51

asked Aug 12 at 11:34

1 vote

1 answer

60 views

Looking for simple garbage collector load test

I'm looking for some code or some benchmark to roughly asses the pause times or cpu load caused by some GC in order to get some rough estimate how efficient it is. I just want to see whether some GC ...

OlliP

1,617

asked Aug 4 at 13:08

3 votes

1 answer

202 views

Custom hasher is faster for insert and remove, but when done together is slower, when comparing to std::collections::HashMap

I wish to benchmark various hashmaps for the <K,V> pair <u8, BoxedFnMut> where BoxedFnMut. type BoxedFnMut = Box<dyn FnMut() + Send + 'static>; To do this, I am using divan(0.1.21) ...

Naitik Mundra

502

asked Jul 30 at 11:04

4 votes

1 answer

203 views

Why is a ConcurrentDictionary faster than a Dictionary in benchmark?

I have a really simple benchmark to measure and compare performance of Dictionary<string, int> and ConcurrentDictionary<string, int>: [MemoryDiagnoser] public class ...

Pupkin

1,223

asked Jul 11 at 9:26

0 votes

0 answers

165 views

cargo bench throws "MallocStackLogging: can't turn off malloc stack logging because it was not enabled..." on Apple M4

I'm trying to run cargo bench on my new MacBook (Apple Silicon, macOS [Sequioa version 15.5]), but I get this error: cargo(31826) MallocStackLogging: can't turn off malloc stack logging because it was ...

ajita asthana

1

asked Jul 4 at 11:54

2 votes

0 answers

59 views

Statistical assumptions for Criterion benchmarks

This question is somewhat specific to Rust's Criterion, but I have kept it general so that anybody with knowledge about benchmarking can help. In my Rust codebase, I have a struct Model that is very ...

aleferna

141

asked Jun 27 at 1:11

27 votes

5 answers

9k views

Why is this code 5 times slower in C# compared to Java?

First of all we create a random binary file with 100.000.000 bytes. I used Python for this: import random import os def Main(): length = 100000000 randomArray = random.randbytes(length) ...

Vasilis Kontopoulos

379

asked Jun 23 at 8:08

1 vote

1 answer

101 views

Minimize noise for benchmarking in docker

I am writing a benchmarking framework for compiler-like programs. For benchmarking, I use a docker container (for reproducibility). However, i still measure quite a bit of noise (up to 5%!). My ...

Frobeniusnorm

93

asked Jun 20 at 10:13

2 votes

1 answer

197 views

What is the reason of this performance discrepancy between NumPy and Numba?

This Python 3.12.7 script with NumPy 2.2.4 and Numba 0.61.2: import numpy as np, timeit as ti, numba as nb def f0(a): p0 = a[:-2] p1 = a[1:-1] p2 = a[2:] return (p0 < p1) & (p1 > p2) ...

Paul Jurczak

8,650

asked Jun 19 at 2:19

0 votes

1 answer

93 views

How do I benchmark queries in PostgreSQL? [closed]

I'm learning PostgreSQL Clustering abilities and I would like to compare performance of the same query with table not clustered and with table clustered. I tried to generate 25 million user events and ...

rela589n

1,224

asked Jun 18 at 19:19

2 votes

0 answers

125 views

Best way to trigger lazy evaluation in PySpark and Polars for benchmarking

I'm currently benchmarking PySpark vs the growing alternative Polars. Basically I'm writing various queries (aggregations, filtering, sorting etc.) and measure the execution time, RAM and CPU. I ...

Ernest P W

73

asked Jun 5 at 22:21

1 vote

2 answers

157 views

C# SoA vs AoS performance

user3821908

25

asked May 13 at 18:54

0 votes

0 answers

52 views

Why does a trailing slash significantly impact performance in benchmarking with wrk or ab?

In the following FastAPI app: from fastapi import FastAPI from sqlalchemy import create_engine, text from sqlalchemy.orm import Session engine = create_engine("postgresql+psycopg://postgres:...

Dante

880

asked May 11 at 17:37

0 votes

0 answers

63 views

Python equivalent to the R `bench::bench_memory()` function?

I'm benchmarking some functions' memory in R and python, and had a great time getting results with the bench package in R that tracks all allocations by each call, allowing me to get the total and ...

Qile0317

67

asked Apr 15 at 7:06

1 vote

0 answers

116 views

RISC-V instruction equivalent to ARM's DSB execution barrier instruction for benchmarking to time loads?

I am writing a RISC-V assembly program whose goal is to assess the performance of main memory, in read access only for now. I have thought about a simple benchmark code, that would load multiple ...

SFV

11

asked Apr 9 at 14:50

-1 votes

1 answer

78 views

Best practices for running high-granularity benchmark [closed]

I am trying to run a benchmark on some family of algorithms. I have multiple algorithms, each of them with one hyperparameter, and I want to test them with multiple data sizes. Each run takes ~60 ...

David Davó

812

asked Apr 2 at 15:43

0 votes

2 answers

75 views

How can I extract the data from profvis in R?

I'm using profvis to profile my functions in R, but I want to extract specific timings for subfunctions. For example if I run a = profvis({ dat <- data.frame( x = rnorm(5e4), y = ...

user19904

182

asked Mar 26 at 10:32

0 votes

1 answer

126 views

Why is my Python implementation of selection sort seemingly so fast? [closed]

I'm starting to study algorithms and their efficiency. I started by selection sort. For the sake of interest, I wanted to compare the running time of the same algorithm implementation in python and c++...

Gwinkamp

51

asked Mar 23 at 9:04

0 votes

1 answer

177 views

DragonFly benchmark: slow on Cluster

I need help regarding dragonfly db, particularly benchmarking. So here is the story, I tried benchmarking dragonfly as a cache to replace redis. I got the expected result when testing single node; it ...

amzshow

58

asked Mar 13 at 6:24

1 vote

0 answers

81 views

How to implement a Swift analogue of `benchmark::DoNotOptimize`?

I would like to do some (micro)benchmarking in Swift. I have been using package-benchmark for this. It comes with a blackHole helper function that forces the compiler to assume that a variable is read ...

loonatick

1,197

asked Mar 7 at 16:22

7 votes

1 answer

177 views

Pointer chasing benchmark - unexpected lack of out of order execution?

I wanted a reliable benchmark which has a lot of cache misses, and the obvious go-to seemed to be a pointer chasing benchmark. I ended up trying google/multichase but the results don't seem to be what ...

Box Box Box Box

5,388

asked Mar 4 at 19:56

0 votes

0 answers

19 views

Tackling variability when recording execution times while benchmarking a library under Linux

We have designed benchmarks to test execution times of some algorithms coded within a library that we are working on. Those algorithms are mono-threaded. So the algorithms can use at most 100% of the ...

Arnaud

223

asked Feb 24 at 16:45

-2 votes

1 answer

101 views

Benchmarking Two C++ Implementations for Counting Pairs Divisible by 7

I have two C++ implementations that count pairs ((x, y)) satisfying ((x + y) % 7 == 0). Method 1 skips unnecessary iterations using y += 6, while Method 2 checks every y. I performed benchmarks on ...

Chirag Jain

1

asked Feb 19 at 17:50

7 votes

1 answer

176 views

Why is faster to do a branch than a lookup?

I have just been trying to benchmark some PRNG code for generating hex characters - I have an input of random values buf, which I want to turn into a series of hex characters, in-place. Code #define ...

Anon

381

asked Feb 3 at 5:12

6 votes

0 answers

97 views

Smaller Vec allocation in serialization results in slower code

There is a binary serialization called BorshSerialize. And in its Rust implementation, when alloc is enabled, it allocates a Vector with initial capacity of 1024 before each serialization. I thought ...

Ahmet Yazıcı

806

asked Jan 21 at 23:35

4 votes

0 answers

166 views

Why Java Stream API filter is faster than an imperative loop?

I have an important question. I performed a benchmark using the following tool versions: # JMH version: 1.37 # VM version: JDK 22, OpenJDK 64-Bit Server VM, 22+36-FR # VM invoker: /home/jack/.sdkman/...

Jack5000

49

asked Jan 18 at 19:25

1 vote

0 answers

126 views

How to increase the frequency of the CPU from C

I am writing C code for the Raspberry Pi 4 (ARM Cortex-A72), which relies on precise timing in periods of less than 1μs. To get precise timing, I use the following algorithm: clock_gettime(...

Pygmalion

921

asked Jan 18 at 16:59

0 votes

1 answer

76 views

Perform a benchmarking test on different cores on a VM Ubuntu system

I want to perform a benchmarking Test (BPFM, IOR, FIO & Sysbench) on a Ubuntu VM. The benchmark should use the available amount of cores in steps of 2^2 (So 2, 4, 8, 16, ... up to the available ...

JulianW

1

asked Jan 18 at 16:10

0 votes

0 answers

37 views

WandB for benchmark results aggregation

I'm trying to use WandB to store and view benchmarking results. I've got a snipped of code that looks basically like this for model in models_to_eval: wandb.init(project="benchmarking", ...

Sam Russell

281

asked Jan 11 at 20:59

0 votes

1 answer

98 views

Rust Conditional flag set by Cargo bench

A function which is only being used with debug assertions is deactivated by: #[cfg(debug_assertions)] fn some_debug_support_fn() {} But this makes cargo bench fail to compile, as it is missing the ...

Dávid Tóth

3,315

asked Jan 8 at 9:07

2 votes

1 answer

101 views

Getting a trace of memory address accesses using Valgrind

I have a microbenchmark which I'm using to generate memory traffic. I've profiled the application and it seems to constantly hit in L1 cache. I have a Core i5-7260U. I want to understand the actual ...

jkang

579

asked Jan 7 at 23:14

0 votes

1 answer

110 views

Why is this benchmark not measuring any branch prediction penalty?

I came across the question Wrapping real numbers which asks for help on implementing a "wrap around" functionality for double values within a given range, stating the following constraint: ...

null

5,825

asked Jan 2 at 21:34

-6 votes

2 answers

110 views

Two empty functions produce different runtime when measured in Rust with Instant::now calls

I wrote two empty functions in rust with the hope of using each to test the use of retain on vectors or the use of filter on iterators, after writing the empty function for each case, on running the ...

Brian Obot

428

asked Jan 1 at 0:27

1 vote

0 answers

33 views

Configure AirspeedVelocity for Python package with PyO3 and Maturin

I'm trying to setup Airspeed Velocity to use with my Python project. In other setups, such as github workflows the only build commands needed are: "python -m pip install --upgrade pip", ...

Attack68

4,821

asked Dec 22, 2024 at 7:39

0 votes

0 answers

92 views

How to accurately measure memory usage for iterative and recursive binary search algorithms in Java?

I am benchmarking the performance of iterative and recursive binary search algorithms in Java, specifically measuring both execution time and memory usage for different dataset sizes. However, I am ...

Ersin Karaduman

11

asked Dec 20, 2024 at 18:35

1 vote

1 answer

121 views

Cannot explain alloc/op in benchmark result

I have pretty basic benchmark comparing performance of mutex vs atomic: const ( numCalls = 1000 ) var ( wg sync.WaitGroup ) func BenchmarkCounter(b *testing.B) { var ...

vtm11

409

asked Dec 18, 2024 at 17:31

4 votes

0 answers

367 views

Facing issues with NATS Jetstreams perfomance with large number of parallel consumers. Publisher throughput decreases by 50 times for 100 consumers

Overview We have been using the the nats bench CLI tool for benchmarking the performance of NATS Jetstream over different variables. In our use-case, we require 100K ephemeral parallel consumers ...

Anish Gupta

66

asked Dec 17, 2024 at 17:06

16 votes

3 answers

1k views

C++ implementation of a simple map slower than equivalent implementation in Java: Code/Benchmark Issue?

The goal of this research is to explore the performance differences between JIT (just-in-time compilation) and AOT (ahead-of-time compilation) strategies and to understand their respective advantages ...

Joas Coder

228

asked Dec 10, 2024 at 11:42

0 votes

0 answers

37 views

What is the most "empty" Linux system call to benchmark against? [duplicate]

I want to benchmark some performance aspects of a Linux device driver (a loadable module). Specifically, how fast certain code paths are when they are invoked from userspace via system calls. In ...

Grigory Rechistov

2,418

asked Dec 9, 2024 at 14:41

5 votes

2 answers

211 views

Why are inner parallel streams faster with new pools than with the commonPool for this scenario?

So I recently ran a benchmark where I compared the performance of nested streams in 3 cases: Parallel outer stream and sequential inner stream Parallel outer and inner streams (using parallelStream) -...

Andorrax

123

asked Dec 4, 2024 at 5:10

1 vote

0 answers

130 views

Spec '06 benchmarks on Gem5: Increasing stack size by one page

I am trying to run the Spec '06 benchmarks on Gem5. All of the benchmarks I've tried seem to start up normally and then output the following and stall indefinitely: src/sim/mem_state.cc:448: info: ...

Martin Chapman

79

asked Dec 4, 2024 at 3:26

Collectives™ on Stack Overflow