[Moabusers] problems between torque and moab
wightman at clusterresources.com
wightman at clusterresources.com
Fri Sep 8 13:54:44 MDT 2006
For optimization purposes some things are configured only 1 time,
when Moab starts up. Is it possible that the node was added to torque with only
1 processor to start, later, the processors were corrected inside torque but
Moab was not restarted to pick up on the change?
- Douglas
On Fri, 8 Sep 2006, Brock Palen wrote:
> That was because i was running on the node outside of torque. but the problem
> occurs anyway,
>
> here is miag -n nyxtest2 -v
>
> [root at nyx ~]# mdiag -n nyxtest2 -v
> compute node summary
> Name State Procs Memory Disk Swap
> Speed Opsys Arch Par Load Rsv Classes Network
> Features
>
> nyxtest2 Idle 8:8 63863:63863 1:1
> 67572:67572 1.00 linux - nyx 0.00 1
> [bio1_1:1][landau_1:1][csem_1:1][staff_1:1][atlas_1:1][ib_1:1][violi_1:1][short_1:1][long_1:1][route_1:1][avdv_1:1]
> [DEFAULT] -
> ----- --- 8:8 63863:63863 1:1
> 67572:67572
>
> Total Nodes: 1 (Active: 0 Idle: 1 Down: 0)
>
>
> NODE[GLOBAL] Config Res stata: 10
> NODE[GLOBAL] Dedicated Res ---
> NODE[GLOBAL] Available Res stata: 10
>
> Notice again how the class says only a single slot is available.
>
> Brock Palen
> Center for Advanced Computing
> brockp at umich.edu
> (734)936-1985
>
>
> On Sep 8, 2006, at 3:23 PM, Justin Bronder wrote:
>
> >Run "mdiag -n". The load on that node seems quite high, and hence Moab
> >will not attempt to use all of the processors. This happens every so often
> >on our cluster as various processes wake up and run just as Torque is
> >checking
> >the node.
> >
> >For instance:
> > >mdiag -n
> >...
> >node110 Idle 1:2 2048:2048 darwin
> > WARNING: node 'node110' has more processors utilized than dedicated (1 > 0)
> > WARNING: processor mismatch on idle node node110 (1 available 2
> > WARNING: configured)
> > WARNING: node 'node110' has been idle for 21:45:56 but load is HIGH. load:
> > WARNING: 0.980 (check for runaway processes?)
> >
> >If the load of 8.500 is correct, you'll probably want to see why it's so high
> >before
> >running any jobs.
> >
> >-Justin.
> >
> >On 9/8/06, Brock Palen <brockp at umich.edu> wrote:
> >We have a new machine, it is a 8 cpu SMP machine. Torque sees that
> >is has 8 cpus,
> >(np=8)
> >but when i try to submit a job it asking for 8 cpus i get:
> >
> >[brockp at nyx ~]$ qsub -I -V -l host=nyxtest2:ppn=8 -q staff
> > qsub: waiting for job 28174.nyx.engin.umich.edu to start
> > qsub: PBS: MOAB_INFO: interactive job can never run - partition nyx
> >has insufficient instances of requested class staff configured (5 < 8)
> >
> >The only thing i can see is the Class: line from checknode (below)
> >For the x4600 it only shows up with 1 available, and the other node
> >in the staff queue (4 cpu box) shows up with all 4. But torque sees
> >all the cpus, and moab should slurp all this in. Am i missing
> >something?
> >
> >Qmgr: p n nyxtest2
> > #
> > # Create nodes and set their properties.
> > #
> > #
> > # Create and define node nyxtest2
> > #
> >create node nyxtest2
> >set node nyxtest2 state = free
> >set node nyxtest2 np = 8
> >set node nyxtest2 ntype = cluster
> >set node nyxtest2 status = opsys=linux
> >set node nyxtest2 status += uname=Linux nyxtest2 2.6.9-34.0.2.ELsmp
> >#1 SMP Fri Jun 30 10:32:04 EDT 2006 x86_64
> >set node nyxtest2 status += sessions=4721 5091
> >set node nyxtest2 status += nsessions=2
> >set node nyxtest2 status += nusers=1
> >set node nyxtest2 status += idletime=17007
> >set node nyxtest2 status += totmem=69492456kb
> >set node nyxtest2 status += availmem=68349228kb
> >set node nyxtest2 status += physmem=65395924kb
> >set node nyxtest2 status += ncpus=8
> >set node nyxtest2 status += loadave=8.57
> >set node nyxtest2 status += netload=89545934
> >set node nyxtest2 status += state=free
> >set node nyxtest2 status += jobs=? 0
> >set node nyxtest2 status += rectime=1157741849
> >
> >
> >[brockp at nyx ~]$ checknode nyxtest2
> >node nyxtest2
> >
> >State: Idle (in current state for 00:46:57)
> >Configured Resources: PROCS: 8 MEM: 62G SWAP: 65G DISK: 1M
> >Utilized Resources: ---
> >Dedicated Resources: ---
> >Opsys: linux Arch: ---
> >Speed: 1.00 CPULoad: 8.500
> >Network Load: 0.03 kB/s
> >Flags: rmdetected
> >Network: DEFAULT
> >Classes: [bio1 1:1][landau 1:1][csem 1:1][staff 1:1][atlas 1:1][ib
> >1:1][violi 1:1][short 1:1][long 1:1][route 1:1][avdv 1:1]
> >RM[nyx]: TYPE=PBS
> >NodeAccessPolicy: SHARED
> >
> >Total Time: 39:20:28:02 Up: 12:19:15:46 (32.12%) Active: 1:07:09
> >(0.12%)
> >
> >Reservations:
> > staff.2290x1 User -9:23:25:41 -> INFINITY ( INFINITY)
> > Blocked Resources at -2:16:11 Procs: 8/8 ( 100.00%) Mem: 0/63863
> >(0.00%) Swap: 0/66755 (0.00%) Disk: 0/1 (0.00%)
> >ALERT: node is in state Idle but load is high (8.500)
> >
> >
> >
> >
> >Brock Palen
> >Center for Advanced Computing
> >brockp at umich.edu
> >(734)936-1985
> >
> >
> >_______________________________________________
> >moabusers mailing list
> >moabusers at supercluster.org
> >http://www.supercluster.org/mailman/listinfo/moabusers
> >
>
>
More information about the moabusers
mailing list