[Moabusers] problems between torque and moab
Justin Bronder
jsbronder at gmail.com
Fri Sep 8 13:23:49 MDT 2006
Run "mdiag -n". The load on that node seems quite high, and hence Moab
will not attempt to use all of the processors. This happens every so often
on our cluster as various processes wake up and run just as Torque is
checking
the node.
For instance:
>mdiag -n
...
node110 Idle 1:2 2048:2048 darwin
WARNING: node 'node110' has more processors utilized than dedicated (1 >
0)
WARNING: processor mismatch on idle node node110 (1 available 2
configured)
WARNING: node 'node110' has been idle for 21:45:56 but load is HIGH.
load: 0.980 (check for runaway processes?)
If the load of 8.500 is correct, you'll probably want to see why it's so
high before
running any jobs.
-Justin.
On 9/8/06, Brock Palen <brockp at umich.edu> wrote:
>
> We have a new machine, it is a 8 cpu SMP machine. Torque sees that
> is has 8 cpus,
> (np=8)
> but when i try to submit a job it asking for 8 cpus i get:
>
> [brockp at nyx ~]$ qsub -I -V -l host=nyxtest2:ppn=8 -q staff
> qsub: waiting for job 28174.nyx.engin.umich.edu to start
> qsub: PBS: MOAB_INFO: interactive job can never run - partition nyx
> has insufficient instances of requested class staff configured (5 < 8)
>
> The only thing i can see is the Class: line from checknode (below)
> For the x4600 it only shows up with 1 available, and the other node
> in the staff queue (4 cpu box) shows up with all 4. But torque sees
> all the cpus, and moab should slurp all this in. Am i missing
> something?
>
> Qmgr: p n nyxtest2
> #
> # Create nodes and set their properties.
> #
> #
> # Create and define node nyxtest2
> #
> create node nyxtest2
> set node nyxtest2 state = free
> set node nyxtest2 np = 8
> set node nyxtest2 ntype = cluster
> set node nyxtest2 status = opsys=linux
> set node nyxtest2 status += uname=Linux nyxtest2 2.6.9-34.0.2.ELsmp
> #1 SMP Fri Jun 30 10:32:04 EDT 2006 x86_64
> set node nyxtest2 status += sessions=4721 5091
> set node nyxtest2 status += nsessions=2
> set node nyxtest2 status += nusers=1
> set node nyxtest2 status += idletime=17007
> set node nyxtest2 status += totmem=69492456kb
> set node nyxtest2 status += availmem=68349228kb
> set node nyxtest2 status += physmem=65395924kb
> set node nyxtest2 status += ncpus=8
> set node nyxtest2 status += loadave=8.57
> set node nyxtest2 status += netload=89545934
> set node nyxtest2 status += state=free
> set node nyxtest2 status += jobs=? 0
> set node nyxtest2 status += rectime=1157741849
>
>
> [brockp at nyx ~]$ checknode nyxtest2
> node nyxtest2
>
> State: Idle (in current state for 00:46:57)
> Configured Resources: PROCS: 8 MEM: 62G SWAP: 65G DISK: 1M
> Utilized Resources: ---
> Dedicated Resources: ---
> Opsys: linux Arch: ---
> Speed: 1.00 CPULoad: 8.500
> Network Load: 0.03 kB/s
> Flags: rmdetected
> Network: DEFAULT
> Classes: [bio1 1:1][landau 1:1][csem 1:1][staff 1:1][atlas 1:1][ib
> 1:1][violi 1:1][short 1:1][long 1:1][route 1:1][avdv 1:1]
> RM[nyx]: TYPE=PBS
> NodeAccessPolicy: SHARED
>
> Total Time: 39:20:28:02 Up: 12:19:15:46 (32.12%) Active: 1:07:09
> (0.12%)
>
> Reservations:
> staff.2290x1 User -9:23:25:41 -> INFINITY ( INFINITY)
> Blocked Resources at -2:16:11 Procs: 8/8 (100.00%) Mem: 0/63863
> (0.00%) Swap: 0/66755 (0.00%) Disk: 0/1 (0.00%)
> ALERT: node is in state Idle but load is high (8.500)
>
>
>
>
> Brock Palen
> Center for Advanced Computing
> brockp at umich.edu
> (734)936-1985
>
>
> _______________________________________________
> moabusers mailing list
> moabusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/moabusers
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/moabusers/attachments/20060908/c58bedac/attachment.html
More information about the moabusers
mailing list