5,501 questions
-6
votes
0
answers
43
views
Virtual Machine Clustering [closed]
I have a few old x86-64 computers of different makes lying around, plus a few active ones that are rarely used. I was thinking about setting up some type of cluster. I tried a few options like Docker ...
1
vote
0
answers
63
views
How to always launch a SLURM job from the same node, while running the job on multiple nodes
Due to a licensing agreement, I have a piece of software installed in our HPC that must always be launched from the same node in the system, referred to as the "head node" from now on. When ...
Advice
2
votes
7
replies
141
views
determine cpu after c++ compilation with gcc?
Does anyone know if there is, in c++, any way to determine at runtime the cpu characteristics of the machine that compiled the code? For example, in gcc (which I'm using) the preprocessor variable ...
0
votes
0
answers
25
views
Integrate socket.io namespaces with Node Cluster
I am trying to integrate socket.io with Node's HTTP alongside Node's Cluster Module. Consider the reproducible example:
index.js:
let cluster = require('cluster')
let fs = require('fs')
let http = ...
0
votes
0
answers
60
views
Snakemake slurm cluster status in version 9.6.3
I am currently using Snakemake version 9.6.3 on a cluster managed by an SLURM scheduler. In previous workflows, I relied on version 6, which supported the --cluster, --cluster-status, and --parsable ...
0
votes
0
answers
51
views
Can i set slurm job array size using another job file?
I am currently running a slurm job file on an array that I manually set the size of, e.g.
sbatch --array=1-6 myjob.array
the size of the array is determined when setting up the code to run. E.g. if I ...
0
votes
0
answers
215
views
CRC Status Shows 'OpenShift: Unreachable' Even After Multiple Restarts and Setup
I am trying to run Red Hat CodeReady Containers (CRC) with OpenShift 4.19.8 on an Ubuntu VM (running on VMware).
No matter what I do, crc status always shows:
crc status output (https://i.sstatic.net/...
0
votes
1
answer
65
views
ModuleNotFoundError in GCP after trying to sumbit a job
new to GCP, I am trying to submit a job inside Dataproc with a .py file & attached also pythonproject.zip file (it is a project) but I am getting the below error ModuleNotFoundError: No module ...
0
votes
0
answers
117
views
Check if a server is part of an Azure cluster using C# and PowerShell
I am testing some code to check if a named server is part of an Azure cluster. I currently have a simple console application where the user enters the name of the server to check and the code then ...
0
votes
1
answer
81
views
How to concurrently remove lines from a file in Python?
I have a cluster of compute nodes, each node with many CPUs. I want them to execute commands located in a file, one command per line. The drive where the command file is located is mounted on all ...
2
votes
1
answer
72
views
How to identify price regimes / trends in Pandas
I have created the following pandas dataframe, which is an example of 26 stock prices (Open, High, Low, Close):
import pandas as pd
import numpy as np
ds = {
'Date' : ['15/06/2025','16/06/2025','17/...
1
vote
0
answers
112
views
Eigen + MKL on a cluster doesn't call GEMM for large matrix multiplications
I have a c++ code that uses Eigen for linear algebra calculations. The code runs on a cluster where there is a module with Intel MKL. I am completely ignorant of how to do it but I wanted to try to ...
0
votes
0
answers
13
views
Proxmox migration and autoboot
I migrated my vm from one node to another over the cluster using the migrate function, without having a downtime. Theses vm where set to auto boot. Is this setting is kept trough the migration on a ...
1
vote
0
answers
47
views
Google Cloud Dataproc Cluster Creation Fails: "Failed to validate permissions for default service account"
Despite the Default Compute Engine Service Account having the necessary roles and being explicitly specified in my cluster creation command, I am still encountering the "Failed to validate ...
0
votes
1
answer
115
views
Issue while deploying container in GKE
I am getting this error while deploying in GKE :
Error from server: Get "https://10.x.x.x:10200/containerLogs/server-center/server-center-dev-86f67jkilo-rwrnm/server-center-dev": No agent ...
0
votes
1
answer
76
views
Snakemake access snakemake.config in profile config.yaml file
I want to run a pipeline on a cluster where the name of the jobs are of the form : smk-{config["simulation"]}-{rule}-{wildcards}. Can I just do :
snakemake --profile slurm --configfile ...
-1
votes
1
answer
78
views
Best practices for running high-granularity benchmark [closed]
I am trying to run a benchmark on some family of algorithms.
I have multiple algorithms, each of them with one hyperparameter, and I want to test them with multiple data sizes. Each run takes ~60 ...
1
vote
1
answer
104
views
Snakemake in cluster different ways
When running snakemake on a cluster, and if we don't have specific requirements for some rules about number of cores/memory, then what is the difference between :
Using the classic way, i.e. calling ...
1
vote
0
answers
126
views
"invalid_grant" "Code not valid" in Keycloak with multiple containers using same client
Sorry if this matter was discussed before. I looked for something like that, but found nothing.
We have a scenario where we have a Keycloak, an NGINX proxy, four containers having a monolithic legacy ...
0
votes
0
answers
120
views
Undefined references when building gnina?
I'm trying to compile Gnina on my cluster session (CentOS V7, SLURM, no root/sudo access, GPU Nvidia Cuda 12.2)
I install all dependencies and compiled them with success, but when it come to the final ...
2
votes
1
answer
37
views
Is --nodes 2 (without '=') accepted way of requesting nodes in slurm?
I just realized that I have been always using a slurm script, where in the first line I specify number of nodes in a wrong way. I see two options are either #SBATCH N 2 or #SBATCH --nodes=2. Instead I ...
0
votes
0
answers
27
views
Kafka duplicate message consumption from multiple servers under same websphere cluster
we have Kafka consumers running on two websphere servers under same cluster. Kafka consumers are configured with auto offset commit and using same consumer group, and polling every 5 seconds. The ...
0
votes
0
answers
47
views
AWS PCS cluster creation failed with cloud formation
Im creating a complete HPC architecture on AWS using service AWS PCS.
In my cloud formation template literally all resource creation is successful but AWS PCS.
Cluster:
Type: AWS::PCS::Cluster
...
1
vote
0
answers
70
views
MaxRSS larger than ReqMem for slurm job runing python. How come?
I am confused about how exactly allocated and used memory are defined for jobs on a cluster. I am limiting my jobs to use maximum of 150 GB, however they seem to be using ~650GB. This has not happened ...
0
votes
1
answer
158
views
How to determine buddy nodes dependencies in Vertica
How can I find in Vertica (Enterprise mode, K-safety 1) node dependencies so that I could build a node graph like this?
The following query:
select n.name, d.dependency_id
from v_internal....
0
votes
1
answer
36
views
Error in R: "one node produced an error: Argumento NA/NaN" when extracting raster values in package parallel version 4.2.2
I'm working on an R script to build a correlation matrix based on values extracted from WorldClim bioclimatic layers at specific geographic coordinates (WGS84).
I have a dataset (db) with 842 obs. of ...
0
votes
0
answers
69
views
Pacemaker dynamic location constraint with expression
I am experimenting with storage clusters using RHEL9.3 and GFS2 with DRBD replication.
So far I found a stable solution by using 3 nodes for main (one is DRBD Primary and mounts the DRBD disk, while ...
0
votes
0
answers
87
views
Node.js Cluster Not Distributing Load Evenly on Windows with Round-Robin Scheduling
I'm using the Node.js cluster module in a simple Express application on Windows, aiming to leverage multiple CPU cores for parallel request handling. My goal is to prevent blocking when CPU-intensive ...
0
votes
1
answer
172
views
How to utilize multiple CPUs for training of YOLO?
I have access to a large CPU cluster that does not have GPUs. Is it possible to speed up YOLO training by parallelizing between multiple CPU nodes?
The docs say that device parameter specifies the ...
0
votes
0
answers
27
views
SLURM Job Array with HybPiper Restarting on Large Sequences
I am running HybPiper assembly on several samples using a SLURM cluster with job arrays. However, every 10 minutes, the SLURM job appears to restart from the beginning without any clear error messages ...
0
votes
0
answers
110
views
Problem when running custom flink jar application on cluster
I have a little problem when running a custom jar application on a cluster. First, I ran my custom jar application in a local flink installation:
/bin/flink run /home/osboxes/WordCount.jar --input ...
0
votes
0
answers
29
views
Clusters added into mapbox gl using setPaintProperty for circle-color showing wrong color for some clusters when zoom in
We are showing cluster points on our map based on the lat, long values according to the zoom level. We are having 3 different colors which will be shown based on some condition, colors are red, green ...
0
votes
0
answers
37
views
Galera Cluster (on GMD Gui need a refence cmd to Recover Cluster)
Under the cluster Galera Manager Daemon (gmd) gui dropdown there is the Recover Cluster option as shown in the image this work fine but requires me to manually press it:
Recover Cluster Button
What is ...
0
votes
0
answers
479
views
Can't configure properly Gres for Slurm
I'm configuring my gpu cluster and I have some problems. This is mi slurm.conf file:
sudo cat /etc/slurm-llnl/slurm.conf
ClusterName=emotions
SlurmctldHost=oceano
#SlurmctldHost=
#
#DisableRootJobs=...
0
votes
1
answer
70
views
How to add shapefiles and raster files as covariates in spatstat kppm function
I encountered various errors while running the spatstat package. Here is a summary of my data: a shapefile of landslide point events, a watershed shapefile as the observation window, and several ...
0
votes
0
answers
91
views
flutter_map_marker_cluster opening of the cluster
While using flutter_map_marker_cluster is it possible to give padding to the markers that open when a cluster is clicked?
Since there is an appbar on top of the application screen and a bottom ...
0
votes
0
answers
35
views
Pre-staging large data files for parallel job execution
Apologies in advance if this is a mundane or unclear question.
I want to scale up a workflow on on a cluster to run a program concurrently on several nodes. The program in question references a large, ...
1
vote
0
answers
99
views
Quartz scheduler standby mode in the whole cluster
I'm using Quartz Scheduler in a cluster mode. For the debug purposes I want to put all the server instances to the standby mode.
Say we have three server instances:
node1
node2
node3
A separate ...
0
votes
0
answers
322
views
Error 'undefined symbol: _PyUnicode_Ready' after Ubuntu update
I'm using a computer cluster, where all machines are running Ubuntu 24.04 after an update, while the machine I use to access the cluster is running Ubuntu 22.04. The cluster runs with SLURM. In ...
1
vote
0
answers
313
views
Databricks workflow dbt tasks high startup time of 3 min
We are running DBT tasks within a workflow.
To run certain DBT models conditionally, we triggered them in separate tasks. Currently, every additional task we create adds 3 or more minutes to the total ...
0
votes
1
answer
219
views
How can I ensure that my Python logic runs exclusively on the Apache Ray Worker Nodes?
I am using Apache Ray to create a customized cluster for running my logic. However, when I submit my tasks with ray.remote, they are executing on the driver node rather than on the worker nodes I ...
0
votes
2
answers
349
views
Ridiculous VMEM usage when using Ray on a cluster
Initial Problem:
I am testing out a multiprocessing Python package called Ray to parallelise my code. Original code works fine on my laptop, core-i7-13800H, 32GB RAM. When running on a local cluster ...
1
vote
0
answers
114
views
Prioritize specific nodes in slurm job submission
Is there a way to prioritize certain nodes over others in a job submission without admin privileges?
I know about the --nodelist, --constraint or --exclude directives, but if set, the job runs only if ...
0
votes
1
answer
237
views
Launch one single SBATCH script that changes the number of nodes
I need to test the scalability of a distributed algorithm, and when running tests on my cluster of choice, I would love to dynamically set up the number of nodes in a single script.
What I want to ...
1
vote
1
answer
86
views
Snakemake remote rule stalling before executing script in PBS cluster
I have a snakemake (7.22.0) that's stalling after they start. I have rules that run on a cluster (through pbs) and execute an external Python script. I noticed that now some of the rules stall for ...
0
votes
1
answer
40
views
in Apache storm how to change supervisor id
I copied the VM containing Storm supervisor on it. when start both VMs with a master to setup a Storm cluster, in Storm UI shows only one supervisor.
In another word, both supervisors has same id so ...
2
votes
1
answer
139
views
Spring boot Tomcat session replication with Traefik
I'm trying to setup session replication using Spring boot with Traefik. I've found how it can be achieved with Tomcat and its server.xml file in the following link: Tomcat session replication in ...
0
votes
2
answers
1k
views
what is the SLURM command to show usage of an account for all of its members?
I tried "sacct -A <allocation_name> ----allusers format=User,JobID,CPUTime,MaxRSS,Elapsed"
Yet, it shows the usage of resources for each job for each user of that account, while I need ...
1
vote
0
answers
22
views
Why am I getting "TXN_REQUEST_IGNORED ERROR 10906" in GridDB due to an unknown event during cluster operations?
I’m using GridDB for a distributed database setup and recently encountered the following error while performing operations across nodes in the cluster:
from griddb_python import StoreFactory, ...
0
votes
0
answers
727
views
docker compose with Galera Cluster
I am setting a Galera Cluster using docker compose on my CentOS AlmaLinux 8.
Here is the firts attempt:
docker-compose.yml :
services:
sv-isoluce-galeradb-1:
build:
context: ./buildimg/...