Newest 'cluster-computing' Questions

-6 votes

0 answers

43 views

Virtual Machine Clustering [closed]

I have a few old x86-64 computers of different makes lying around, plus a few active ones that are rarely used. I was thinking about setting up some type of cluster. I tried a few options like Docker ...

Wikus Groeninck

1

asked yesterday

1 vote

0 answers

63 views

How to always launch a SLURM job from the same node, while running the job on multiple nodes

Due to a licensing agreement, I have a piece of software installed in our HPC that must always be launched from the same node in the system, referred to as the "head node" from now on. When ...

cknott

11

asked Dec 10 at 4:36

Advice

2 votes

7 replies

141 views

determine cpu after c++ compilation with gcc?

Does anyone know if there is, in c++, any way to determine at runtime the cpu characteristics of the machine that compiled the code? For example, in gcc (which I'm using) the preprocessor variable ...

user3195869

105

asked Nov 22 at 2:17

0 votes

0 answers

25 views

Integrate socket.io namespaces with Node Cluster

I am trying to integrate socket.io with Node's HTTP alongside Node's Cluster Module. Consider the reproducible example: index.js: let cluster = require('cluster') let fs = require('fs') let http = ...

Issac Howard

351

asked Oct 24 at 14:05

0 votes

0 answers

60 views

Snakemake slurm cluster status in version 9.6.3

I am currently using Snakemake version 9.6.3 on a cluster managed by an SLURM scheduler. In previous workflows, I relied on version 6, which supported the --cluster, --cluster-status, and --parsable ...

jeje

11

asked Oct 20 at 13:39

0 votes

0 answers

51 views

Can i set slurm job array size using another job file?

I am currently running a slurm job file on an array that I manually set the size of, e.g. sbatch --array=1-6 myjob.array the size of the array is determined when setting up the code to run. E.g. if I ...

Sam

1,522

asked Oct 6 at 14:02

0 votes

0 answers

215 views

CRC Status Shows 'OpenShift: Unreachable' Even After Multiple Restarts and Setup

I am trying to run Red Hat CodeReady Containers (CRC) with OpenShift 4.19.8 on an Ubuntu VM (running on VMware). No matter what I do, crc status always shows: crc status output (https://i.sstatic.net/...

cyrine maamer

1

asked Sep 22 at 21:48

0 votes

1 answer

65 views

ModuleNotFoundError in GCP after trying to sumbit a job

new to GCP, I am trying to submit a job inside Dataproc with a .py file & attached also pythonproject.zip file (it is a project) but I am getting the below error ModuleNotFoundError: No module ...

SofiaNiki

1

asked Sep 3 at 11:16

0 votes

0 answers

117 views

Check if a server is part of an Azure cluster using C# and PowerShell

I am testing some code to check if a named server is part of an Azure cluster. I currently have a simple console application where the user enters the name of the server to check and the code then ...

Nigel Tunnicliffe

141

asked Sep 3 at 9:16

0 votes

1 answer

81 views

How to concurrently remove lines from a file in Python?

I have a cluster of compute nodes, each node with many CPUs. I want them to execute commands located in a file, one command per line. The drive where the command file is located is mounted on all ...

Botond

2,842

asked Jul 18 at 14:44

2 votes

1 answer

72 views

How to identify price regimes / trends in Pandas

I have created the following pandas dataframe, which is an example of 26 stock prices (Open, High, Low, Close): import pandas as pd import numpy as np ds = { 'Date' : ['15/06/2025','16/06/2025','17/...

Giampaolo Levorato

1,762

asked Jul 15 at 13:27

1 vote

0 answers

112 views

Eigen + MKL on a cluster doesn't call GEMM for large matrix multiplications

I have a c++ code that uses Eigen for linear algebra calculations. The code runs on a cluster where there is a module with Intel MKL. I am completely ignorant of how to do it but I wanted to try to ...

Ratman

111

asked Jul 10 at 12:38

0 votes

0 answers

13 views

Proxmox migration and autoboot

I migrated my vm from one node to another over the cluster using the migrate function, without having a downtime. Theses vm where set to auto boot. Is this setting is kept trough the migration on a ...

btc4cash

325

asked Jun 12 at 23:39

1 vote

0 answers

47 views

Google Cloud Dataproc Cluster Creation Fails: "Failed to validate permissions for default service account"

Despite the Default Compute Engine Service Account having the necessary roles and being explicitly specified in my cluster creation command, I am still encountering the "Failed to validate ...

Lê Văn Đức

11

asked Jun 7 at 11:00

0 votes

1 answer

115 views

Issue while deploying container in GKE

I am getting this error while deploying in GKE : Error from server: Get "https://10.x.x.x:10200/containerLogs/server-center/server-center-dev-86f67jkilo-rwrnm/server-center-dev": No agent ...

SecureTech

259

asked May 21 at 14:33

0 votes

1 answer

76 views

Snakemake access snakemake.config in profile config.yaml file

I want to run a pipeline on a cluster where the name of the jobs are of the form : smk-{config["simulation"]}-{rule}-{wildcards}. Can I just do : snakemake --profile slurm --configfile ...

Kiffikiffe

153

asked Apr 2 at 16:02

-1 votes

1 answer

78 views

Best practices for running high-granularity benchmark [closed]

I am trying to run a benchmark on some family of algorithms. I have multiple algorithms, each of them with one hyperparameter, and I want to test them with multiple data sizes. Each run takes ~60 ...

David Davó

812

asked Apr 2 at 15:43

1 vote

1 answer

104 views

Snakemake in cluster different ways

When running snakemake on a cluster, and if we don't have specific requirements for some rules about number of cores/memory, then what is the difference between : Using the classic way, i.e. calling ...

Kiffikiffe

153

asked Apr 2 at 10:17

1 vote

0 answers

126 views

"invalid_grant" "Code not valid" in Keycloak with multiple containers using same client

Sorry if this matter was discussed before. I looked for something like that, but found nothing. We have a scenario where we have a Keycloak, an NGINX proxy, four containers having a monolithic legacy ...

Walter do Valle

11

asked Mar 25 at 8:45

0 votes

0 answers

120 views

Undefined references when building gnina?

I'm trying to compile Gnina on my cluster session (CentOS V7, SLURM, no root/sudo access, GPU Nvidia Cuda 12.2) I install all dependencies and compiled them with success, but when it come to the final ...

Juju

49

asked Mar 19 at 14:21

2 votes

1 answer

37 views

Is --nodes 2 (without '=') accepted way of requesting nodes in slurm?

I just realized that I have been always using a slurm script, where in the first line I specify number of nodes in a wrong way. I see two options are either #SBATCH N 2 or #SBATCH --nodes=2. Instead I ...

fahd

183

asked Mar 17 at 14:03

0 votes

0 answers

27 views

Kafka duplicate message consumption from multiple servers under same websphere cluster

we have Kafka consumers running on two websphere servers under same cluster. Kafka consumers are configured with auto offset commit and using same consumer group, and polling every 5 seconds. The ...

Jmanroe

1

asked Mar 14 at 8:18

0 votes

0 answers

47 views

AWS PCS cluster creation failed with cloud formation

Im creating a complete HPC architecture on AWS using service AWS PCS. In my cloud formation template literally all resource creation is successful but AWS PCS. Cluster: Type: AWS::PCS::Cluster ...

parthraj panchal

121

asked Mar 7 at 16:40

1 vote

0 answers

70 views

MaxRSS larger than ReqMem for slurm job runing python. How come?

I am confused about how exactly allocated and used memory are defined for jobs on a cluster. I am limiting my jobs to use maximum of 150 GB, however they seem to be using ~650GB. This has not happened ...

Newbie

11

asked Feb 26 at 23:29

0 votes

1 answer

158 views

How to determine buddy nodes dependencies in Vertica

How can I find in Vertica (Enterprise mode, K-safety 1) node dependencies so that I could build a node graph like this? The following query: select n.name, d.dependency_id from v_internal....

GriGrim

2,941

asked Feb 24 at 17:27

0 votes

1 answer

36 views

Error in R: "one node produced an error: Argumento NA/NaN" when extracting raster values in package parallel version 4.2.2

I'm working on an R script to build a correlation matrix based on values extracted from WorldClim bioclimatic layers at specific geographic coordinates (WGS84). I have a dataset (db) with 842 obs. of ...

Anderson David Ocampos Valarez

1

asked Feb 20 at 21:20

0 votes

0 answers

69 views

Pacemaker dynamic location constraint with expression

I am experimenting with storage clusters using RHEL9.3 and GFS2 with DRBD replication. So far I found a stable solution by using 3 nodes for main (one is DRBD Primary and mounts the DRBD disk, while ...

Fegendet

11

asked Feb 19 at 16:42

0 votes

0 answers

87 views

Node.js Cluster Not Distributing Load Evenly on Windows with Round-Robin Scheduling

I'm using the Node.js cluster module in a simple Express application on Windows, aiming to leverage multiple CPU cores for parallel request handling. My goal is to prevent blocking when CPU-intensive ...

Iqbal Mhd

1

asked Feb 6 at 10:46

0 votes

1 answer

172 views

How to utilize multiple CPUs for training of YOLO?

I have access to a large CPU cluster that does not have GPUs. Is it possible to speed up YOLO training by parallelizing between multiple CPU nodes? The docs say that device parameter specifies the ...

Artem Lebedev

161

asked Jan 17 at 16:05

0 votes

0 answers

27 views

SLURM Job Array with HybPiper Restarting on Large Sequences

I am running HybPiper assembly on several samples using a SLURM cluster with job arrays. However, every 10 minutes, the SLURM job appears to restart from the beginning without any clear error messages ...

Siromenthe

1

asked Jan 16 at 21:04

0 votes

0 answers

110 views

Problem when running custom flink jar application on cluster

I have a little problem when running a custom jar application on a cluster. First, I ran my custom jar application in a local flink installation: /bin/flink run /home/osboxes/WordCount.jar --input ...

ricksant

137

asked Jan 16 at 13:59

0 votes

0 answers

29 views

Clusters added into mapbox gl using setPaintProperty for circle-color showing wrong color for some clusters when zoom in

We are showing cluster points on our map based on the lat, long values according to the zoom level. We are having 3 different colors which will be shown based on some condition, colors are red, green ...

Sandeep

1

asked Jan 16 at 11:17

0 votes

0 answers

37 views

Galera Cluster (on GMD Gui need a refence cmd to Recover Cluster)

Under the cluster Galera Manager Daemon (gmd) gui dropdown there is the Recover Cluster option as shown in the image this work fine but requires me to manually press it: Recover Cluster Button What is ...

user3600775

1

asked Jan 13 at 7:05

0 votes

0 answers

479 views

Can't configure properly Gres for Slurm

I'm configuring my gpu cluster and I have some problems. This is mi slurm.conf file: sudo cat /etc/slurm-llnl/slurm.conf ClusterName=emotions SlurmctldHost=oceano #SlurmctldHost= # #DisableRootJobs=...

Ricardo Garcia

1

asked Jan 6 at 19:37

0 votes

1 answer

70 views

How to add shapefiles and raster files as covariates in spatstat kppm function

I encountered various errors while running the spatstat package. Here is a summary of my data: a shapefile of landslide point events, a watershed shapefile as the observation window, and several ...

alipin ng sahod

1

asked Dec 28, 2024 at 10:01

0 votes

0 answers

91 views

flutter_map_marker_cluster opening of the cluster

While using flutter_map_marker_cluster is it possible to give padding to the markers that open when a cluster is clicked? Since there is an appbar on top of the application screen and a bottom ...

Mustafa Samancı

29

asked Dec 19, 2024 at 7:41

0 votes

0 answers

35 views

Pre-staging large data files for parallel job execution

Apologies in advance if this is a mundane or unclear question. I want to scale up a workflow on on a cluster to run a program concurrently on several nodes. The program in question references a large, ...

gladshire

81

asked Dec 17, 2024 at 16:36

1 vote

0 answers

99 views

Quartz scheduler standby mode in the whole cluster

I'm using Quartz Scheduler in a cluster mode. For the debug purposes I want to put all the server instances to the standby mode. Say we have three server instances: node1 node2 node3 A separate ...

z0nk0

313

asked Dec 6, 2024 at 10:37

0 votes

0 answers

322 views

Error 'undefined symbol: _PyUnicode_Ready' after Ubuntu update

I'm using a computer cluster, where all machines are running Ubuntu 24.04 after an update, while the machine I use to access the cluster is running Ubuntu 22.04. The cluster runs with SLURM. In ...

Caesar.tcl

103

asked Dec 3, 2024 at 16:34

1 vote

0 answers

313 views

Databricks workflow dbt tasks high startup time of 3 min

We are running DBT tasks within a workflow. To run certain DBT models conditionally, we triggered them in separate tasks. Currently, every additional task we create adds 3 or more minutes to the total ...

Siete

407

asked Nov 21, 2024 at 14:56

0 votes

1 answer

219 views

How can I ensure that my Python logic runs exclusively on the Apache Ray Worker Nodes?

I am using Apache Ray to create a customized cluster for running my logic. However, when I submit my tasks with ray.remote, they are executing on the driver node rather than on the worker nodes I ...

question.it

3,018

asked Nov 11, 2024 at 5:14

0 votes

2 answers

349 views

Ridiculous VMEM usage when using Ray on a cluster

Initial Problem: I am testing out a multiprocessing Python package called Ray to parallelise my code. Original code works fine on my laptop, core-i7-13800H, 32GB RAM. When running on a local cluster ...

Bawb

33

asked Nov 4, 2024 at 11:51

1 vote

0 answers

114 views

Prioritize specific nodes in slurm job submission

Is there a way to prioritize certain nodes over others in a job submission without admin privileges? I know about the --nodelist, --constraint or --exclude directives, but if set, the job runs only if ...

Oskar

1,488

asked Oct 25, 2024 at 11:25

0 votes

1 answer

237 views

Launch one single SBATCH script that changes the number of nodes

I need to test the scalability of a distributed algorithm, and when running tests on my cluster of choice, I would love to dynamically set up the number of nodes in a single script. What I want to ...

Saverio Pasqualoni

33

asked Oct 16, 2024 at 16:46

1 vote

1 answer

86 views

Snakemake remote rule stalling before executing script in PBS cluster

I have a snakemake (7.22.0) that's stalling after they start. I have rules that run on a cluster (through pbs) and execute an external Python script. I noticed that now some of the rules stall for ...

Yotam Feldman

43

asked Oct 16, 2024 at 16:43

0 votes

1 answer

40 views

in Apache storm how to change supervisor id

I copied the VM containing Storm supervisor on it. when start both VMs with a master to setup a Storm cluster, in Storm UI shows only one supervisor. In another word, both supervisors has same id so ...

Simin Ghasemi

91

asked Oct 14, 2024 at 16:36

2 votes

1 answer

139 views

Spring boot Tomcat session replication with Traefik

I'm trying to setup session replication using Spring boot with Traefik. I've found how it can be achieved with Tomcat and its server.xml file in the following link: Tomcat session replication in ...

Marian Smarik

53

asked Sep 26, 2024 at 8:46

0 votes

2 answers

1k views

what is the SLURM command to show usage of an account for all of its members?

I tried "sacct -A <allocation_name> ----allusers format=User,JobID,CPUTime,MaxRSS,Elapsed" Yet, it shows the usage of resources for each job for each user of that account, while I need ...

Eslam Hussein

11

asked Sep 26, 2024 at 1:19

1 vote

0 answers

22 views

Why am I getting "TXN_REQUEST_IGNORED ERROR 10906" in GridDB due to an unknown event during cluster operations?

I’m using GridDB for a distributed database setup and recently encountered the following error while performing operations across nodes in the cluster: from griddb_python import StoreFactory, ...

Samar Mohamed

71

asked Sep 25, 2024 at 21:39

0 votes

0 answers

727 views

docker compose with Galera Cluster

I am setting a Galera Cluster using docker compose on my CentOS AlmaLinux 8. Here is the firts attempt: docker-compose.yml : services: sv-isoluce-galeradb-1: build: context: ./buildimg/...

kiminox

9

asked Sep 23, 2024 at 8:57

Collectives™ on Stack Overflow