0

I have a single computation node with 32 CPUs. I have defined two different partitions that both use this node. If I for example send two jobs on partition A requesting 20 CPUs and 25 CPUs, the second job will wait for resources. If instead I sent them on the two separate partitions, 20 CPUs on A and 25 CPUs on B, both start. If I run scontrol show node, it claims only 25 CPUs are allocated.

It plainly seems that the two partitions are not respecting the resource allocation of each other despite sharing node. I have turned OverSubscribe=NO and had no success.

Below is my slurm.conf

MpiDefault=none
ProctrackType=proctrack/linuxproc
TaskPlugin=task/affinity,task/cgroup
ReturnToService=1
#
##Debug
SlurmUser=slurm
RebootProgram="/usr/sbin/reboot"

SlurmctldPidFile=/var/run/slurmctld.pid
SlurmdPidFile=/var/run/slurmd.pid
SlurmdSpoolDir=/var/spool/slurmd
StateSaveLocation=/var/spool/slurmSave
# # SCHEDULING
SchedulerType=sched/builtin
SchedulerParameters=preempt_youngest_first,preempt_strict_order
PriorityType=priority/basic

SelectType=select/cons_res
SelectTypeParameters=CR_Core_Memory,CR_ONE_TASK_PER_CORE
PreemptType=preempt/partition_prio
PreemptMode=suspend,gang
SlurmctldParameters=preempt_send_user_signal

AccountingStorageType=accounting_storage/slurmdbd
AccountingStorageTRES=gres/license
ClusterName=simulation
 #JobAcctGatherFrequency=30
JobAcctGatherType=jobacct_gather/cgroup
#SlurmctldDebug=debug5
#SlurmdDebug=debug5
SlurmctldLogFile=/var/log/slurmLog/slurmctld.log
SlurmSchedLogFile=/var/log/slurmLog/slurmsched.log
SlurmdLogFile=/var/log/slurmLog/slurmd.log
JobCompType=jobcomp/filetxt
# # Licenses as generic resources
GresTypes=license
# #
# # COMPUTE NODES
NodeName=cl014 Sockets=2 CoresPerSocket=16 ThreadsPerCore=1 State=UNKNOWN RealMemory=237852

PartitionName=A MaxTime=INFINITE PriorityTier=2 PreemptMode=suspend Default=NO Nodes=se-got-cl014 OverSubScribe=NO
PartitionName=B MaxTime=INFINITE PriorityTier=2 PreemptMode=suspend Default=NO Nodes=se-got-cl014 OverSubscribe=NO

What are we getting wrong?

1
  • Isn't that weird thatyou have Nodes=se-got-cl014 but NodeName=cl014? Commented May 19 at 14:30

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.