Skip to main content
Filter by
Sorted by
Tagged with
2 votes
1 answer
56 views

The optimization stage of building my rust program takes a really long time. I determined this by adjusting the cargo profile settings opt-level and lto. I want to better understand what is happening ...
JamesThomasMoon's user avatar
Tooling
0 votes
5 replies
122 views

I'm building a custom programming language and want performance close to C. Currently I'm deciding between: Generating C code and compiling with GCC/Clang Generating LLVM IR directly Using a JIT My ...
Kavyansh Sharma's user avatar
3 votes
1 answer
161 views

I'm trying to get a deeper understanding about the implementation of std::launder. I'll use a classical example to illustrate: #include <cstddef> #include <memory> #include <new> ...
Oersted's user avatar
  • 4,816
0 votes
0 answers
113 views

I am writing my first interpreter and am interested in using tail calls to make branch prediciton better. Consider I have something like template<void *Handler> void wrapper(Cpu& cpu) { ...
Kryptic Coconut's user avatar
Advice
1 vote
3 replies
138 views

I've been looking at the zlib1.dll that comes with Win 11 Pro and I was hoping for some assistance with the following passage: 56b:    b8 71 80 07 80          mov    eax,0x80078071 [570:    41 0f 42 ...
Gyst's user avatar
  • 1
5 votes
1 answer
187 views

As far as I know the fastest assembly code for testing that a floating point variable contains Not-a-Number (std::isnan) is comparing it to itself, where not-equal will signify NaN. And as I can see, ...
Fedor's user avatar
  • 25.7k
3 votes
2 answers
155 views

I'm using www.godbolt.org website and the x86-64 gcc 15.2 compiler selection and the -O2 compiler options for optimization but it still makes a call to strcpy in libc. I would like this compiler to ...
blogger13's user avatar
  • 363
5 votes
2 answers
155 views

Recently, when working with C atomics, I've noticed that there are two variants: a "normal" version, whose order is always seq_cst, and an "explicit" version, where the programmer ...
lzg's user avatar
  • 123
2 votes
3 answers
260 views

This question is about the singleton pattern in modern C++ and one of its limitations in particular. I can implement the singleton pattern like this: class Logger { public: static Logger& ...
Hendrik's user avatar
  • 806
5 votes
3 answers
242 views

I noticed there was a double precision divide after all log2 calls when using Cygwin 64-bit 3.6.4 and using compiler options -O3 and -D_FILE_OFFSET_BITS=64, so I wrote a small program to illustrate it....
George Rambus's user avatar
3 votes
0 answers
130 views

gcc (GCC) 11.2.1 20220127 (Red Hat 11.2.1-9) comparing 2 identical strings with strcmp() sometimes returns a negative value. The code is compiled with -O and strcmp() seems to be inlined. ...
UBA's user avatar
  • 109
13 votes
3 answers
3k views

Both amd64 and arm64 architecture processors have an overflow flag. However, in C, the most common method to detect whether an operation causes overflow/underflow is to make functions like these: int ...
Daniil Zuev's user avatar
6 votes
0 answers
248 views

Clang's optimizer is inserting a run-time test for instruction selection into a tight loop. How can I communicate to it that the condition it's checking is unnecessary? Details Here's a function to ...
Adrian McCarthy's user avatar
0 votes
0 answers
49 views

I am trying to implement this rewrite rule from the TASO paper with ONNX Script rewriter. However, I cannot figure out how to implement a pattern with multiple outputs X and Y. The ONNX Script does ...
Anita Hailey's user avatar
Best practices
0 votes
0 replies
34 views

Question I am working on an experimental project where I aim to have a large language model (LLM) automatically optimize CUDA kernels’ nested loops. The key requirement is to extract static loop and ...
yuxuan-z's user avatar
-2 votes
1 answer
221 views

So I have the following code: float param1 = SOME_VALUE; switch (State) { case A: { foo(param1); statement1; break; } case B: { bar(); ...
Kai Yin's user avatar
Advice
2 votes
7 replies
150 views

Does anyone know if there is, in c++, any way to determine at runtime the cpu characteristics of the machine that compiled the code? For example, in gcc (which I'm using) the preprocessor variable ...
user3195869's user avatar
3 votes
2 answers
312 views

I have two functions counting the occurrences of a target char in the given input buffer. The functions vary only in how they communicate the result back to the caller; one returns the result and the ...
Devashish's user avatar
  • 213
3 votes
1 answer
114 views

With avr-gcc 14, the compiler gives this warning: avrenv.c: In function ‘main’ avrenv.c:12:13: warning: ‘strncpy’ output may be truncated copying 255 bytes from a string of length 509 [-Wstringop-...
Torsten Römer's user avatar
Best practices
2 votes
10 replies
316 views

In C, when I pass a pointer to a function, the compiler always seems to assume that the data pointed to by that pointer might be continuously modified in another thread, even though in actual API ...
Moi5t's user avatar
  • 465
0 votes
1 answer
127 views

I have to use dl library for calling libapt-pkg functions. The method I need to call is pkgCacheFile::GetPkgCache() that is declared as inline. The problem is that dlsym returns error while trying to ...
nst1911's user avatar
  • 73
4 votes
5 answers
693 views

I'm talking from a language point of view in C or C++, where the compiler sees: return condition ? a : b; vs: if (condition) return a; else return b; I've tried in my code, and both of them ...
Zebrafish's user avatar
  • 16.9k
2 votes
2 answers
155 views

Using CMAKE_BUILD_TYPE="Debug" my MSVC 2022 [17.4.33403.182] produced one idiv call for the quotient and an identical idiv call for the remainder. The code was simply [see here for the ...
JohannesWilde's user avatar
26 votes
2 answers
4k views

The C++ standard [dcl.attr.likelihood] says: [Note 2: Excessive usage of either of these attributes is liable to result in performance degradation. — end note] I’m trying to understand what “...
Artyom Fedosov's user avatar
3 votes
0 answers
134 views

I have a performance-critical C++ code base, and I want to improve (or at least measure if it's worth improving) the likelihood that clang assigns to branches, and in general understand what it's ...
meisel's user avatar
  • 2,625
0 votes
0 answers
52 views

Nodes: building a gcc_tree_node for a custom prograimming language compile and base on C++26 the modules are avilable the language using tab-block system every keyword start with '/' I want to ...
Adam Bekoudj's user avatar
1 vote
1 answer
154 views

Consider these two functions: int foo(std::array<int, 10> const& v) { auto const w = v; int s{}; for (int i = 0; i < v.size(); ++i) { s += (i % 2 == 0 ? v : w)[i]; ...
Enlico's user avatar
  • 30.8k
5 votes
3 answers
380 views

I thought that the noinline function attribute would force the compiler to treat a local function as a black box: __attribute__((noinline)) void touch_noinline(int&) {} void touch_external(int&...
sh1's user avatar
  • 5,020
0 votes
1 answer
94 views

I was testing various expressions of a sixth order polynomial to find the fastest possible throughput. I have stumbled upon a simple polynomial expression length 6 that provokes poor code generation ...
Martin Brown's user avatar
  • 3,945
2 votes
0 answers
64 views

Preparing to make Estrin's method vectorisable I changed from normal linear indexing of the coefficients to bitreversed and restricted it to strictly powers of 2. Neither MSVC nor ICX can see how to ...
Martin Brown's user avatar
  • 3,945
1 vote
1 answer
134 views

Looking at the codegen of a check inside for-loop I wanted to see if there is an optimization opportunity by outlining is_some_and but both cases had the same codegen. struct V { len: Option<...
A. K.'s user avatar
  • 39.8k
5 votes
1 answer
201 views

In the classic talk An (In-)Complete Guide to C++ Object Lifetimes by Jonathan Müller, there is a useful guideline as follows: Q: When do I need to use std::launder? A: When you want to re-use the ...
xmllmx's user avatar
  • 44.8k
7 votes
1 answer
340 views

When compiling the following code using GCC 9.3.0 with O2 optimization enabled and running it on Ubuntu 20.04 LTS, x86_64 architecture, unexpected output occurs. #include <algorithm> #include &...
mzd6id99's user avatar
1 vote
2 answers
201 views

The switch statements in the following two functions int foo(int value) { switch (value) { case 0: return 0; case 1: return 0; case 2: return 1; } } int ...
notgapriel's user avatar
1 vote
0 answers
101 views

Say if I have an array of integers, int array[NUM_ELEMENTS];, access to it is encapsulated as setter and getter function well protected by synchronization such as semaphore, mutex, etc, do I need to ...
PkDrew's user avatar
  • 2,311
4 votes
1 answer
160 views

I need (only) the real part of the product of two complex numbers. Naturally, I can code this as real(x)*real(y) - imag(x)*imag(y); or real(x*y); The latter, however, formally first computes the ...
Walter's user avatar
  • 45.9k
29 votes
1 answer
4k views

I noticed that modern C compilers typically use push instructions to save caller-saved registers, rather than explicit mov + sub sequences. However, based on llvm-mca simulations, the mov approach ...
Moi5t's user avatar
  • 465
5 votes
1 answer
292 views

I mainly use clang, but I have also explored other compilers during my experiments, such as MinGW GCC and MSVC, but they all have this problem. E:\code\test>clang -v clang version 20.1.7 Target: ...
Moi5t's user avatar
  • 465
10 votes
3 answers
1k views

I mainly use Clang, but I have also explored other compilers during my experiments, such as MinGW GCC and MSVC, but they all have this problem. cd C:\Users\Moi5t clang -v Output: clang version 20.1.7 ...
Moi5t's user avatar
  • 465
1 vote
2 answers
188 views

I'm writing a C function float foo(float x) which manipulates a floating-point value. It so happens, that I can guarantee the function will only ever be called with finite values - neither NaN's, nor +...
einpoklum's user avatar
  • 139k
0 votes
1 answer
133 views

Consider the following CUDA code: enum { p = 5 }; __device__ float adjust_mul(float x) { return x * (1 << p); } __device__ float adjust_ldexpf(float x) { return ldexpf(x, p); } I would expect ...
einpoklum's user avatar
  • 139k
4 votes
0 answers
127 views

MSVC seems to be taking the values from my array of coefficients and scattering them around in its .rdata section, not keeping them contiguous even though they're all used together. And it takes the ...
Martin Brown's user avatar
  • 3,945
13 votes
1 answer
1k views

#include <iostream> #include <new> struct A { int const n; void f() { new (this) A{2}; } void g() { std::cout << this->n; } void h() { ...
xmllmx's user avatar
  • 44.8k
0 votes
0 answers
147 views

The tiered steps provided by Oracle are: It seems to me that... I'd be a reasonable assumption to think that optimizations should occur with methods in isolation (detached from its call-site context),...
Delark's user avatar
  • 1,395
31 votes
2 answers
5k views

In case of failure, malloc returns a null pointer. In the following code the latest GCC and clang assume malloc never fails and simple remove the branch #include <cstdlib> int main() { if (!...
Dmitry's user avatar
  • 1,779
0 votes
0 answers
81 views

I am not sure whether this is the correct category/group to ask this question. I see that there is a way to map an op to external function: https://mlir.llvm.org/docs/Dialects/Linalg/ Property 5: May ...
knightyangpku's user avatar
2 votes
0 answers
104 views

I am currently reading through the F# core library source code and stumbled upon a common pattern which made me wonder a little about the performance of it, and could not find anything about it by a ...
kam's user avatar
  • 687
0 votes
1 answer
120 views

I've recently switched from Apple clang 9.1.0 to 12.0.0 and I've noticed that the generated code is now somewhat bloated. Here's a little test project: const char *s = "Hello World"; ...
Andreas's user avatar
  • 10.8k
3 votes
2 answers
202 views

Say I have following C function: i64 my_comparator1(i64 x, i64 y) { if (x > y) { return 1; } if (x < y) { return -1; } return 0; } If I happen to know something about arguments that ...
KAction's user avatar
  • 667
1 vote
0 answers
127 views

I'm trying to write a function analysis whose results are accessible by a few custom passes in LLVM that work on functions and loops in the new pass manager. Unfortunately documentation for the new ...
Otto von Bisquick's user avatar

1
2 3 4 5
69