382 questions
Advice
0
votes
1
replies
71
views
Branch predictor training depends on call site? (Spectre experiment)
While analyzing the Spectre vulnerability, I ran into a question about how branch prediction training works.
My understanding is that the CPU accumulates prediction history for a specific conditional ...
Tooling
0
votes
1
replies
63
views
simulating aarch64 (ARM 64 bit)branch predictor unit (BPU)
I am working on a microarchitectural tooling project, and as part of a heuristic I need the ability to observe and manipulate the internal state of a branch predictor. Specifically, I am looking for ...
2
votes
0
answers
290
views
why processing sorted array is faster? [closed]
So there is this original question I assume most of the C++ developers familiar with :
Why is processing a sorted array faster than processing an unsorted array?
Answer: branch prediction
Then I tried ...
25
votes
2
answers
4k
views
Does excessive use of [[likely]] and [[unlikely]] really degrade program performance in C++?
The C++ standard [dcl.attr.likelihood] says:
[Note 2: Excessive usage of either of these attributes is liable to result in performance degradation.
— end note]
I’m trying to understand what “...
3
votes
0
answers
119
views
How well can clang 20 infer the likelihood of branches without annotations?
I have a performance-critical C++ code base, and I want to improve (or at least measure if it's worth improving) the likelihood that clang assigns to branches, and in general understand what it's ...
3
votes
1
answer
140
views
What is the overhead of jumps and call-rets for CPU front-end decoder?
How jumps and call-ret pairs affect the CPU front-end decoder in the best case scenario when there are few instructions, they are well cached, and branches are well predicted?
For example, I run a ...
4
votes
1
answer
120
views
Why prefer NOPs to unconditional jumps?
Sometimes we purposefully leave NOPs in a function for later runtime patching. Instead of:
.nops 16
Why not:
jmp 0f
.nops 14
0:
Or, if the amount that you need to patch in, varies up to a maximum:
....
7
votes
1
answer
216
views
Why branch (miss)prediction doesn't impact performance (C++)?
While trying to measure the impact of branch miss prediction, I've noticed that there is no penalty at all to branch miss prediction.
Based on the famous stack overflow question :
Why is processing a ...
2
votes
1
answer
104
views
Unconditional branch behaviour in no-taken branch prediction
I have the following code in nanoMips:
loop: lw $t1, A($t0)
lw $t2, B($t0)
sub $t3, $t1, $t2
beq $t3, $r0, else
sw $t2, A($t0)
b end
The exercise asks me to implement the no-taken branch prediction ...
5
votes
0
answers
372
views
How can Windows update affect branch prediction on CPUs?
There was recently news/benchmarks showing that Ryzen processors (Ryzen 4 and 5) benefit in games from the Windows update. AMD in their blog wrote this is because of branch-prediction changes to ...
0
votes
1
answer
247
views
Can the number of wasted cycles per branch misprediction vary greatly? And why?
When learning about the basic 5-stage pipeline processor that does in-order execution the number of wasted cycles per branch misprediction is a constant number when the processor is flushed.
But what ...
3
votes
1
answer
387
views
Why don't x86/ARM CPU just stop speculation for indirect branches when hardware prediction is not available?
As noted in the Intel optimization manual:
The default predicted target for indirect branches and calls is the
fall-through path. Fall-through prediction is overridden if and when a
hardware ...
0
votes
0
answers
34
views
Fastest way to check if a GUID of set size has been encountered before
There are many questions on checking finding a GUID in a list etc. But I could not find any for just determining if a message was seen before or not.
I have an API which receives requests with a ...
2
votes
1
answer
152
views
Why branch prediction is faster than function call in enum?
This is my code. I can totally understand that benchIfAndSwitch is faster than benchSiwtch because “branch prediction”, but why is benchEnum not the fastest one? There is no if or switch statementtt.
...
0
votes
1
answer
244
views
Optimization of branch prediction inside hot loop in C++
I have performance critical code which calculates inter-atomic forcefield. It is controled by variables like bPBC, shifts, doBonds, doPiSigma, doPiPiI which can be switched on and off by user which ...
1
vote
0
answers
590
views
Understanding branch prediction and how predictors are selected
I don't think this is a duplicate, as this question is regarding how to write optimal code to cater to the branch predictor, as well as validating my personal understanding of how it works in general.
...
0
votes
0
answers
56
views
Wrap into Combine (Promise-Future) code with 2 completions/branches?
https://developers.google.com/admob/ios/privacy
class ViewController: UIViewController {
// Use a boolean to initialize the Google Mobile Ads SDK and load ads once.
private var ...
1
vote
0
answers
121
views
problem occurred while designing an enhancement pipeline datapath for branches (MIPS)
i'm studying MIPS pipeline in Patterson and Hennesy TextBook
this picture below shows the edits for beq instruction :
The idea is to calculate branch target and detect if taken or not in decode stage,...
0
votes
1
answer
136
views
Use branch prediction with no else statement
I am currently implementing selectionsort. Using the code below and the driver file to test it below. I am currently trying to do micro optimizations to see what speeds it up.
public static void ...
1
vote
1
answer
163
views
If I want to observe the prediction accuracy of different branches of O3cpu in gem5, should I modify the O3 code? If so, do I need to rebuild gem5
In O3, only one algorithm, bpred_unit, is used, and gem5 also provides several other branch prediction algorithms. I want to compare the prediction accuracy of different algorithms, what should I do?...
1
vote
4
answers
401
views
Branch prediction and UB (undefined behavior)
I know a little something about branch prediction. This happens at the CPU and has nothing to do with compilation. Although you might be able to tell the compiler if one branch is more likely than the ...
1
vote
1
answer
97
views
I'd like to know why there's a two-fold difference in execution time between these two codes
In computer architecture class, I learned that when the "if" statement is executed in assembly language, it involves the use of branch prediction strategies. Furthermore, it was emphasized ...
1
vote
3
answers
241
views
How to avoid branching when finding runs of the same value and storing as a range (like run-length encoding)?
I have the following logic:
struct Range {
int start;
int end;
};
bool prev = false;
Range range;
std::vector<Range> result;
for (int i = 0; i < n; i++) {
bool curr = ...; // this is ...
2
votes
1
answer
218
views
In c++ what is the importance in terms of performance of using 'else' in situations where it doesn't change the flow of the program? [duplicate]
There are cases where (logically at least) it makes no difference if I leave out the else keyword, for example:
int func(int num)
{
if(num == 10) return 99999;
**else** return -1;
}
Question ...
1
vote
1
answer
435
views
How to replace nested IF/ELSE branches with SIMD (SSE or AVX)?
EDIT x 2
Added more comprehensive function which returns an abstract register class: the function outputs a register full of floats. I don't care the actual length - SSE, AVX... - because Google ...
0
votes
0
answers
19
views
__builtin_expect_with_probability use in gcc [duplicate]
The use of builtin_expect_with_probability gcc function is for condition check with probability like in below example
__builtin_expect_with_probability(!!(x),1,1.0)
can someone tell me what is the ...
2
votes
1
answer
1k
views
How much does a mispredicted conditional branch cost?
On x86-64 whatever micro architecture and ARM64 devices, how many clock cycles does a mispredicted conditional branch cost? And I suppose I should also ask what the figure is for a successfully ...
1
vote
1
answer
337
views
How can I cause indirect (function pointer) call to be correctly jump/branch predicted?
let's say I have a function that accepts a callback argument (example given in rust and C)
void foo(void (*bar)(int)) {
// lots of computation
bar(3);
}
fn foo(bar: fn(u32)) {
// lots of ...
1
vote
0
answers
186
views
Does branchless programming make sense on very old x86 CPUs? (before 80486)
Modern CPUs since at least the 486 ¹) have a tightly-pipelined design, so conditional branches can cause "stalls" in which the pipeline has to be flushed and the code restarted on a ...
0
votes
1
answer
548
views
how do i get job id of batch prediction job on vertex AI?
i need to get the prediction details of batch prediction job which are stored on google cloud storage, however to get that i need to get JOB ID from BatchPredictionJob
i tired to write the results to ...
1
vote
0
answers
284
views
Branch Prediction: What is the BTB eviction scheme used in modern CPUs (Intel skylake for example)?
For branch prediction, the BHT(Branch history table) is indexed by branch virtual address. Aliasing problem happens when two or more branches hash to the same entry in the BHT(Branch history table), ...
0
votes
2
answers
1k
views
Is 2-bit prediction always better than 1-bit?
Does 2-bit prediction always better than 1-bit? And from wikipedia, how ‘a loop-closing conditional jump is mispredicted once rather than twice.’ with 2-bit prediction?
According to this answer, 2-bit ...
3
votes
0
answers
186
views
Why is there a connection between branch prediction failure and "rep ret" in the K8 processor?
I am currently looking for answers to why gcc generates strange instructions like "rep ret" in the generated assembly code. I came across a question on Stack Overflow where someone raised a ...
3
votes
1
answer
894
views
Why do we need stalls even if branches can be determined?
I am learning about pipelining and was reading about control hazards from the book Computer Organization and Design: The Hardware/Software Interface (MIPS Edition). There is a paragraph in the book (...
0
votes
0
answers
39
views
What is the depth of CPU branch prediction? [duplicate]
If CPU is already in the path of a branch A speculatively, will it continue to speculatively execute the next branch B? or wait until branch A retire?
if (A) {
/* body of branch A */
if(B) {
...
0
votes
0
answers
141
views
Optimizing a branch like a jump table? [duplicate]
I was wondering if I have a branch
bool condition = x > y; // just an example
if(condition)
{
// do the thing...
}
else
{
// do the other thing...
}
It can be optimized to something like this ...
3
votes
0
answers
160
views
Branch predictor friendly tree traversal
I have an AVL tree and I need to traverse it in ascending and descending order.
I implemented a simple algorithm, where knowing the tree size in advance, I allocate an array and assign 0 to a counter, ...
1
vote
0
answers
137
views
Influencing branchiness when branch behaviour is known
Before I begin, yes, I'm aware of the compiler built-ins __builtin_expect and __builtin_unpredictable (Clang). They do solve the issue to some extent, but my question is about something neither ...
3
votes
1
answer
237
views
How to view branch predictor tables of a process using a debugger (gdb)?
I know that most modern processors maintain a branch prediction table (BPT). I have read the gdb documentation but I could not found any command that should give desired results. Based on this, I have ...
7
votes
3
answers
2k
views
Is the if-branch faster than the else branch?
I came across this very nice infographic which gives a rough estimation about the CPU-cylces used for certain operations. While studying I noticed an entry "Right branch of if" which I ...
-1
votes
1
answer
484
views
Is branch prediction purely cpu behavior, or will the compiler give some hints?
In go standard package src/sync/once.go, a recent revision change the snippets
if atomic.LoadUint32(&o.done) == 1 {
return
}
//otherwise
...
to:
//if atomic.LoadUint32(&o.done) == ...
7
votes
1
answer
735
views
How to handle branch mispredictions that seem to depend on machine code position?
While trying to benchmark implementations of a simple sparse unit lower triangular backward solve in CSC format, I observe strange behavior. The performance seems to vary drastically, depending on ...
0
votes
1
answer
546
views
How debuggers deal with out-of-order execution and branch prediction
I know that modern CPUs do OoO execution and got advanced branch predictors that may fail, how does the debugger deal with that? So, if the cpu fails in predicting a branch how does the debugger know ...
5
votes
1
answer
2k
views
Rust generic parameters and compile time if
Using C++ template and if constexpr I found a trick that I like a lot: suppose you have a function with some tunable option that are known compile-time, I can write something like
template <bool ...
0
votes
3
answers
553
views
How good is the Visual Studio compiler at branch-prediction for simple if-statements?
Here is some c++ pseudo-code as an example:
bool importantFlag = false;
for (SomeObject obj : arr) {
if (obj.someBool) {
importantFlag = true;
}
obj.doSomethingUnrelated();
}
...
3
votes
0
answers
3k
views
Perf branch misses on non conditional instructions
I want to understand branch prediction behavior of a program I work on. For this, I use the perf tool. I recorded with:
perf record -e branches,branch-misses
and visualizing it with
perf report --...
5
votes
0
answers
150
views
If a function was entered via a near call, can it do a far tail call without breaking return address prediction?
Consider this code:
.globl _non_tail, _tail
.text
.code32
_non_tail:
lcall $0x33, $_non_tail.heavensgate
ret
.code64
_non_tail.heavensgate:
# do stuff. there's 12 bytes on the stack ...
6
votes
2
answers
969
views
Is there automatic L1i cache prefetching on x86?
I looked at the wiki article on branch target predictor; it's somewhat confusing:
I thought the branch target predictor comes into play when a CPU decides which instruction(s) to fetch next (into the ...
0
votes
0
answers
332
views
can you produce BEQL MIPS instruction with C code?
So I have this code snippet in C
int unit_test_case08(int a, int b)
{
int success = 1336;
if(a != b)
{
success = 1337;
}
else
{
success = -1;
}
return ...
0
votes
1
answer
198
views
How to profile branch prediction hitrate in Java
Is there a tool available to profile java applications regarding branch (mis)prediction statistics for if statements?
I know VisualVM and JDK Mission Control but did not find such functionality.