Skip to main content
Filter by
Sorted by
Tagged with
1 vote
0 answers
24 views

I have Intellij Ultimate edition V2025.3 "Profiler" does not exist in Settings/Preferences > Build, Execution, Deployment > Java Profiler. I have tried the below option as well, no ...
Hoda Alemi's user avatar
Advice
1 vote
2 replies
139 views

What is the difference between an interrupt and a context switch? I understand the concept of an interrupt and how it occurs. However, I'm digging deeper into the topic. I studied Computer ...
Gabriele's user avatar
3 votes
1 answer
154 views

I'm experimenting with measuring CPU's instructions latency and throughput on P and E cores using RDPMC on Win 11, something like that: MOV ECX, 0x40000000 ; Instructions Counter RDPMC ; Read ...
Andrey Dmitriev's user avatar
0 votes
1 answer
71 views

I am trying to implement Cache allocation Technology`s impact with my CPU. However, when I use either lscpu to see whether my CPU supports, or cpuid -l 0x10, output is false. How is this possible? How ...
Ali Hosseini's user avatar
1 vote
1 answer
106 views

I've been digging into "true" randomness idea, and I've noticed that modern CPUs support instructions for generating randomness. X64 has RDRAND instruction, while ARM has RNDR (I'm not ...
freakish's user avatar
  • 57k
1 vote
1 answer
108 views

Building on this question here The term thread divergence is used in CUDA; from my understanding it's a situation where different threads are assigned to do different tasks and this results in a big ...
bigcodeszzer's user avatar
0 votes
1 answer
407 views

I’m running Silero VAD (via PyTorch + torchaudio) on a Linode cloud instance (2 dedicated CPUs, 4 GB RAM). When I process 10-minute audio chunks, I always get repeated warnings like this and it doesn'...
Uktamjon's user avatar
7 votes
1 answer
228 views

I'm experimenting with the IMUL r64, r64 instruction on an Intel Xeon E5-1620 v3 (Haswell architecture, base clock 3.5 GHz, turbo boost up to 3.6 GHz, Hyper Threading is enabled). My test loop is ...
Andrey Dmitriev's user avatar
2 votes
0 answers
74 views

Need to do CPU profiling for Jruby application (jruby version : 1.7.20.1-8) which uses ruby version (1.9.3). I tried using default profiler but getting below error due to version compatibility issue ...
maulik trapasiya's user avatar
0 votes
1 answer
58 views

Looking at the CPUUtilized Cloudwatch metric for my Fargate service, it's showing max cpu units used as 1040 over the past 4 weeks, using a sampling period of 1 minute. I have 4 vCPUs provisioned to ...
Seanf123's user avatar
0 votes
1 answer
178 views

I have a docker image and an EC2. When I run this image on my EC2, it takes x seconds to finish. When I run the app natively, it also takes x seconds. But if I deploy the exact image in a container in ...
wildcat's user avatar
  • 81
2 votes
0 answers
220 views

I am measuring the latency of instructions. For 64-bit primitives, integer division takes about 25 cycles each, usually on my 2.3GHz Digital Ocean vCPU, while floating point division takes about 10 ...
Zack Light's user avatar
0 votes
0 answers
75 views

Memory addresses must be aligned before they are used. I know that if they are not, performance costs more in CPU caching. I discovered that certain processors raise exceptions when unaligned memories ...
LEE LUNA's user avatar
-3 votes
1 answer
114 views

I have a question regarding these two instructions: lw r2, 10(r1) lw r1, 10(r2) Is there a hazard here, do I need stalls in between two of them? I want to know if any kind of hazard happens here? I ...
mer mer's user avatar
  • 17
1 vote
0 answers
44 views

My code involves slicing large tensors on the CPU by index and asynchronously transmitting them back to the GPU. However, through the Profiler debugging tool, I found that this step would seriously ...
Ponytail's user avatar
1 vote
0 answers
86 views

I think the title says it all: i have implemented a popcnt function that counts bits as a loop with shifts and one with inline asm with the actual cpu instruction. This is my c code: #define ...
newbee.a's user avatar
2 votes
1 answer
135 views

We have some multimedia processing applications designed as a set of filters for processing data buffers. If temporal data in between filters is not very large and can fit in L1 or L2/L3 caches - the ...
DTL2020's user avatar
  • 101
1 vote
0 answers
77 views

I'm doing an in-depth CPU microarchitectural resource analysis. I want to know the requirements of my program on processor microarchitectural resources and compare the requirements of different ...
Gerrie's user avatar
  • 455
0 votes
0 answers
104 views

I have been writing my own x86 32-bit operating system for the past month or so. My system uses just one core. Anyway, I have been reading a lot about memory fences, CPU optimizations, and compiler ...
c.abate's user avatar
  • 442
0 votes
0 answers
51 views

I'm currently working on a parallel and distributed computing project where I'm comparing the performance of XGBoost running on CPU vs GPU. The goal is to demonstrate how GPU acceleration can improve ...
Mxneeb's user avatar
  • 19
1 vote
1 answer
283 views

I want to get the CPU temperature using Python code. I’m using Windows 11 24H2 and Python 3.10.6. I’ve already tried using WinTmp.CPU_Temp(): import WinTmp print(WinTmp.CPU_Temp()) >>> 0.0 ...
Tim Ryzikov's user avatar
0 votes
1 answer
177 views

I have an Intel Arria 10 SoC FPGA system with 5.4.104-lts Linux built with Yocto 3.3.1 and Poky. The installed FPGA image is doing nothing more than making interrupts to an UIO device, 50 times a sec. ...
yepp's user avatar
  • 1
0 votes
0 answers
62 views

The Docker Compose project only returns this error in the logs and no more details, and even the twa process stops and stays on the first page, which is the splash-screen, and the process does not ...
Ali Ghorbani's user avatar
2 votes
1 answer
117 views

It could operate identically on both 256-bit halves of a 512-bit AVX512 register. Like identical operation on 128-bits lanes of 256-bits registers in AVX/AVX2. Any tech reasons?
Akon's user avatar
  • 481
1 vote
1 answer
289 views

I have a multithreaded Spring Boot microservice running in a Kubernetes pod with a CPU limit of 1 (1000m). Does this mean only one CPU core is used to run all my threads one by one, or can multiple ...
jashan khangura's user avatar
0 votes
1 answer
115 views

In this article https://www.lighterra.com/papers/modernmicroprocessors it is stated that (under Multiple issue - Superscalar) the fetch and decode/dispatch stages must be enhanced so they can decode ...
Rishi's user avatar
  • 41
1 vote
2 answers
429 views

I need to get the processor name using Delphi. Nothing fancy, i just need to retrieve what Windows System > About shows ; in the example below, i want to retrieve the '13th Gen Intel(R) Core(TM) i9-...
delphirules's user avatar
  • 7,780
-4 votes
1 answer
152 views

What exactly happens at the hardware level when a divergence occurs in SIMD and SIMT architectures, and how does each handle the execution of different instruction paths? I found this question, but ...
Rishi's user avatar
  • 41
2 votes
1 answer
124 views

In Hyper-threading (or SMT) when two threads of a CPU core gets swapped in and out, does a context-switch occur. Would it be called a context switch?, if not what is the terminology for it.
Rishi's user avatar
  • 41
1 vote
2 answers
133 views

Okay my question is probably dumb. But I cant find any answers that correct me. I learned that in DDR4 -lets say the stick has 8 chips- each chip parallelly contributes 8 bit to the 64 bit bus width. ...
Rishi's user avatar
  • 41
6 votes
1 answer
206 views

I have recently been looking into the latency and throughput of CPU instructions and have even written some benchmarks to experiment. However, I am struggling to understand how to properly benchmark ...
mihai145's user avatar
0 votes
2 answers
253 views

The following code is using for measuring CPU % usage. Public Sub Macro1() Dim strComputer As String Dim objWMIService As Object Dim colItems As Object Dim objItem As Object strComputer = ".&...
Kram Kramer's user avatar
0 votes
1 answer
118 views

I'm configuring .bashrc on my Raspberry Pi 5 to automatically activate a virtual environment and limit the CPU cores from 4 to 1 when I navigate to a specific directory. When I move to a different ...
이정환's user avatar
0 votes
1 answer
184 views

This is my code: (Get-Counter '\Processor(_Total)\% Processor Time').CounterSamples.CookedValue I am trying to receive the average CPU Utilization with Get-Counter but every time i try i get this ...
mimi m's user avatar
  • 71
0 votes
0 answers
49 views

Apologies for the very primitive question. I am using the online version of Jupyter notebook for some programming assignemnts because I have only an old and slow chromebook- I did not want to download ...
Meep's user avatar
  • 413
0 votes
1 answer
110 views

While I am benchmarking my Rocketcore CPU, I encountered failed Coremark benchmarking. After some debug, I reduce the issue scope to unsuccessful global initialization of 0 value. In Coremark, it will ...
Jasminy's user avatar
  • 119
1 vote
0 answers
151 views

I need to get % CPU currently used by my process on macOS. I expect it to be calculated this way, and it works on Windows: (currentAppTime - lastTrackedAppTime) * 100% / (currentSysTime - ...
Bibasmall's user avatar
1 vote
1 answer
83 views

I have a simple Hello World program written in C, which I statically compiled using: gcc -static -fno-pie -o hello{1|2} hello.c. I expected that executing these two binaries would exhibit cache ...
Khrn's user avatar
  • 354
1 vote
1 answer
128 views

What do the letters in the ports of the uops.info table mean? For example ADD (R64, R64) lists 1*p0156B at ports. The documentation says 1*p0156 means one microinstruction can be executed at ports 0, ...
asdfldsfdfjjfddjf's user avatar
0 votes
0 answers
240 views

A simple Python script (Selenium + ChromeDriver): # import the By class, which allows you to choose how to search for an element from selenium.webdriver.common.by import By # initialize the browser ...
Sergey Saz's user avatar
-1 votes
1 answer
143 views

As far as I know, L1 is VIPT for at least Intel chips. VIVT caches don't depend on address translation, so they can fully operate in parallel with TLB lookup. VIPT can also achieve some parallelism by ...
Devashish's user avatar
  • 193
0 votes
0 answers
113 views

I was working on a cpu only rendering project with SDL in C. I implemented very good error handling and I got this error when I try to resize the window, "ERROR: SDL Error in render thread: ...
Tejas Patil's user avatar
1 vote
0 answers
126 views

I am writing C code for the Raspberry Pi 4 (ARM Cortex-A72), which relies on precise timing in periods of less than 1μs. To get precise timing, I use the following algorithm: clock_gettime(...
Pygmalion's user avatar
  • 921
-1 votes
1 answer
98 views

There were 2 pods running in my micro-service, both of them got restarted with kubernetes reason as OOM killed enter image description here (The above dashboard uses the following query->sum(0,...
Yash Arora's user avatar
0 votes
1 answer
76 views

I want to perform a benchmarking Test (BPFM, IOR, FIO & Sysbench) on a Ubuntu VM. The benchmark should use the available amount of cores in steps of 2^2 (So 2, 4, 8, 16, ... up to the available ...
JulianW's user avatar
0 votes
1 answer
118 views

I'm currently training my simple prediction AI but my GPU is training at 40S per epochs while my CPU is training at 9S per epochs my CPU is i7-4720HQ and my GPU is Nvidia 950m this is my code `import ...
Vio Octavio's user avatar
0 votes
1 answer
233 views

I’m building a SwiftUI app where I display the current time with seconds. I use the .numericText transition for the text to add a smooth animation whenever the seconds change. However, I’ve noticed a ...
user1569766's user avatar
0 votes
2 answers
94 views

Using ADB in a java application to monitor android device status every three seconds. Height adb commands are used : adb shell settings get global airplane_mode_on adb shell settings get system ...
rejdrouin's user avatar
  • 101
2 votes
0 answers
129 views

I timed a fairly naive BLAS-like matrix multiplication (DGEMM) function: void dgemm_naive(const int M, const int N, const int K, const double alpha, const double *A, const int lda, ...
ligro's user avatar
  • 29
2 votes
1 answer
129 views

I need low level information about the node, like number of cores, core ID and other things which is part of the kubelet in a pod running in the node. How do I get this?
imawful's user avatar
  • 135

1
2 3 4 5
95