[Mauiusers] policy confusion
Paul Van Allsburg
vanallsburg at hope.edu
Thu Nov 9 11:19:34 MST 2006
Lennart Karlsson wrote:
>Paul,
>
>My advice is that you try some of the informational commands of
>Maui, like
>
> showq
> mdiag -q
> checkjob 5846
> showstart 5846
>
>and perhaps also
>
> checkjob -v 5846
>
>These will possibly tell you quite a lot about what resources
>there are and why your job are not able to run on them.
>
>Best regards,
>-- Lennart Karlsson <Lennart.Karlsson at nsc.liu.se>
> National Supercomputer Centre in Linkoping, Sweden
> http://www.nsc.liu.se
>
>
>Paul Van Allsburg wrote:
>
>
>>I have what seems to be a simple policy but my job is stuck in the queue
>>and I don't know why. The cluster is 16 nodes/32processors, I have 4
>>queues, 'normal' is the default. The is the cluster current status:
>>
>>Job id Name User Time Use S Queue
>>---------------- ---------------- ---------------- -------- - -----
>>5805.curie ...o1-7imp-md123 hinkle 13:37:24 R
>>normal
>>5836.curie ...o1-2imp-md125 hinkle 0 Q
>>normal
>>5837.curie ...o1-9imp-md136 hinkle 0 Q
>>normal
>>5846.curie cpuburn vanallp 0 Q normal
>>
>>I have Hinkle limited to 4 processors, job 5805 is using all 4. I
>>submitted cpuburn to a single node but it's not running.
>>My maui.cfg is:
>>
>># maui.cfg 3.2p8
>>RMCFG[base] TYPE=PBS
>>RMPOLLINTERVAL 00:02:00
>>SERVERPORT 42559
>>SERVERMODE NORMAL
>>LOGFILE maui.log
>>LOGFILEMAXSIZE 10000000
>>LOGLEVEL 3
>>QUEUETIMEWEIGHT 1
>>BACKFILLPOLICY FIRSTFIT
>>RESERVATIONPOLICY CURRENTHIGHEST
>>NODEALLOCATIONPOLICY CPULOAD
>>
>>CREDWEIGHT 1
>>USERWEIGHT 1
>>GROUPWEIGHT 1
>>CLASSWEIGHT 1
>>
>>USERCFG[vanallp] MAXNODE=2
>>USERCFG[hinkle] MAXPROC=4
>>USERCFG[webmo] MAXNODE=4 PRIORITY=100000
>>USERCFG[DEFAULT] MAXNODE=9
>>GROUPCFG[DEFAULT] MAXNODE=11
>>
>># these are the 4 queues
>>CLASSCFG[webmoq] PRIORITY=1000000
>>CLASSCFG[normal] MAXNODE=14
>>CLASSCFG[debug] MAXNODE=15
>>CLASSCFG[admin] MAXNODE=16
>>
>>XFACTOR 1
>># this parm gives short wall clock jobs priority
>># limited to 1 day... see 5.1.2.5 in Maui admin guide:)
>># one day!
>>XFMINWCLIMIT 1440
>>
>>#<eof>
>>Am I missing the obvious?
>>Thanks!
>>Paul Van Allsburg
>>
>>
I think I'm a little more confused... I did a
checkjob 5846
and it immediately returned with a State: of "Running" on node 8.
I qsub'ed another, and it immediately started. I qsub'ed 14 more and
they all ran.
It seems the MAXNODE= has no effect on the scheduler in my
configuration. When I set
MAXPROC= the scheduler will correctly hold jobs based on that
setting. Where did I
go wrong?
Thanks
Paul
More information about the mauiusers
mailing list