I have this reservation which has the REPLACE_DOWN flag set:
ReservationName=test StartTime=2020-12-14T09:00:00 EndTime=2023-12-14T09:00:00 Duration=1095-00:00:00 Nodes=traverse-k05g4 NodeCnt=1 CoreCnt=32 Features=(null) PartitionName=all Flags=IGNORE_JOBS,WEEKDAY,SPEC_NODES,REPLACE_DOWN,NO_HOLD_JOBS_AFTER_END TRES=cpu=128 Users=(null) Groups=(null) Accounts=pppl,csi,pu,tromp,cses Licenses=(null) State=ACTIVE BurstBuffer=(null) Watts=n/a MaxStartDelay=(null)
Unfortunately, the one node in that reservation is down, and the
reservation isn’t being moved to another node:
# sinfo -n traverse-k05g4 PARTITION AVAIL TIMELIMIT NODES STATE NODELIST all* up 15-00:00:0 1 down* traverse-k05g4
I thought if I removed the node from the reservation, it would
just get assigned to a different node, or if I removed the
SPEC_NODES flag I could accomplish the same thing, but scontrol
didn’t like when I tried that:
# scontrol update reservationname=test nodes-=traverse-k05g4 scontrol: error: Reservation can't be updated with Nodes option; it is incompatible with REPLACE[_DOWN] Error updating the reservation: Requested operation not supported on this system slurm_update error: Requested operation not supported on this system
# scontrol update reservationname=test flags-=spec_nodes scontrol: error: Error parsing flags -spec_nodes. No reservation update. slurm_update error: No error
Any ideas of what I’m doing wrong here, or what I can do to get
this reservation assigned to nodes that are up? I’m trying to
avoid deleting the entire reservation and create a new one, if
possible.
-- Prentice
Read more here: Source link