[Mauiusers] insufficient idle procs available ?
Jan Ploski
Jan.Ploski at offis.de
Tue Jan 29 11:13:38 MST 2008
Itay M wrote:
> We've tried the new configuration (unset resources_default.ncpus and
> unset resources_max.ncpus; from from queues and server levels as well)
> in the last few days and here are the results:
I suppose you did check with qstat -f that 'ncpus' is not mentioned
anywhere any longer?
> * For the first time we were able to see that jobs are backfilled! It
> never happend before, and this is a major improvment. Though we saw it
> only in one of our queues (named 'b_que') it might have happend in other
> queues as well (we couln'd verify it yet).
> * But - the 'insufficient idle procs available' problem is still there.
> For example, at the moment showq shows that there are plenty of non-busy
> processors ('65 of 84 Processors Active'), but checkjob says for
> queued jobs that:
>
> checking job 228665
> State: Idle
> Creds: user:b group:b class:b_que qos:hi
> WallTime: 00:00:00 of 00:05:00
> SubmitTime: Tue Jan 29 19:47:04
> (Time Queued Total: 00:07:49 Eligible: 00:07:16)
> Total Tasks: 1
> Req[0] TaskCount: 1 Partition: ALL
> Network: [NONE] Memory >= 0 Disk >= 0 Swap >= 0
> Opsys: [NONE] Arch: [NONE] Features: [NONE]
> Dedicated Resources Per Task: PROCS: 1 MEM: 512M
>
> IWD: [NONE] Executable: [NONE]
> Bypass: 0 StartCount: 0
> PartitionMask: [ALL]
> Flags: RESTARTABLE
> PE: 1.00 StartPriority: 1007
> job cannot run in partition DEFAULT (idle procs do not meet requirements
> : 0 of 1 procs found)
> idle procs: 12 feasible procs: 0
>
> :(
> What should I check next?
Maybe it has something to do with the MEM requirement (just a wild
guess... but try removing it). What does diagnose -n say for a node
which is incorrectly rejecting the job? Does it have enough free
"tokens" (not sure if this is what they are called officially) to run
the job in this b_que class?
Regards,
Jan Ploski
More information about the mauiusers
mailing list