[slurm-users] Reservation with REPLACE_DOWN flag not replacing down nodes

I have this reservation which has the REPLACE_DOWN flag set:

ReservationName=test StartTime=2020-12-14T09:00:00 EndTime=2023-12-14T09:00:00 Duration=1095-00:00:00
   Nodes=traverse-k05g4 NodeCnt=1 CoreCnt=32 Features=(null) PartitionName=all Flags=IGNORE_JOBS,WEEKDAY,SPEC_NODES,REPLACE_DOWN,NO_HOLD_JOBS_AFTER_END
   TRES=cpu=128
   Users=(null) Groups=(null) Accounts=pppl,csi,pu,tromp,cses Licenses=(null) State=ACTIVE BurstBuffer=(null) Watts=n/a
   MaxStartDelay=(null)

Unfortunately, the one node in that reservation is down, and the
reservation isn’t being moved to another node:

# sinfo -n traverse-k05g4
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
all*         up 15-00:00:0      1  down* traverse-k05g4


I thought if I removed the node from the reservation, it would
just get assigned to a different node, or if I removed the
SPEC_NODES flag I could accomplish the same thing, but scontrol
didn’t like when I tried that:

# scontrol update reservationname=test nodes-=traverse-k05g4
scontrol: error: Reservation can't be updated with Nodes option; it is incompatible with REPLACE[_DOWN]
Error updating the reservation: Requested operation not supported on this system
slurm_update error: Requested operation not supported on this system
# scontrol update reservationname=test flags-=spec_nodes
scontrol: error: Error parsing flags -spec_nodes.  No reservation update.
slurm_update error: No error

Any ideas of what I’m doing wrong here, or what I can do to get
this reservation assigned to nodes that are up? I’m trying to
avoid deleting the entire reservation and create a new one, if
possible.

-- 
Prentice

Read more here: Source link