I have a single computation node with 32 CPUs. I have defined two different partitions that both use this node. If I for example send two jobs on partition A requesting 20 CPUs and 25 CPUs, the second job will wait for resources. If instead I sent them on the two separate partitions, 20 CPUs on A and 25 CPUs on B, both start. If I run scontrol show node, it claims only 25 CPUs are allocated.
It plainly seems that the two partitions are not respecting the resource allocation of each other despite sharing node. I have turned OverSubscribe=NO and had no success.
Below is my slurm.conf
MpiDefault=none
ProctrackType=proctrack/linuxproc
TaskPlugin=task/affinity,task/cgroup
ReturnToService=1
#
##Debug
SlurmUser=slurm
RebootProgram="/usr/sbin/reboot"
SlurmctldPidFile=/var/run/slurmctld.pid
SlurmdPidFile=/var/run/slurmd.pid
SlurmdSpoolDir=/var/spool/slurmd
StateSaveLocation=/var/spool/slurmSave
# # SCHEDULING
SchedulerType=sched/builtin
SchedulerParameters=preempt_youngest_first,preempt_strict_order
PriorityType=priority/basic
SelectType=select/cons_res
SelectTypeParameters=CR_Core_Memory,CR_ONE_TASK_PER_CORE
PreemptType=preempt/partition_prio
PreemptMode=suspend,gang
SlurmctldParameters=preempt_send_user_signal
AccountingStorageType=accounting_storage/slurmdbd
AccountingStorageTRES=gres/license
ClusterName=simulation
#JobAcctGatherFrequency=30
JobAcctGatherType=jobacct_gather/cgroup
#SlurmctldDebug=debug5
#SlurmdDebug=debug5
SlurmctldLogFile=/var/log/slurmLog/slurmctld.log
SlurmSchedLogFile=/var/log/slurmLog/slurmsched.log
SlurmdLogFile=/var/log/slurmLog/slurmd.log
JobCompType=jobcomp/filetxt
# # Licenses as generic resources
GresTypes=license
# #
# # COMPUTE NODES
NodeName=cl014 Sockets=2 CoresPerSocket=16 ThreadsPerCore=1 State=UNKNOWN RealMemory=237852
PartitionName=A MaxTime=INFINITE PriorityTier=2 PreemptMode=suspend Default=NO Nodes=se-got-cl014 OverSubScribe=NO
PartitionName=B MaxTime=INFINITE PriorityTier=2 PreemptMode=suspend Default=NO Nodes=se-got-cl014 OverSubscribe=NO
What are we getting wrong?
Nodes=se-got-cl014butNodeName=cl014?