[Moabusers] problems between torque and moab

wightman at clusterresources.com wightman at clusterresources.com
Fri Sep 8 13:54:44 MDT 2006


For optimization purposes some things are configured only 1 time,
when Moab starts up.  Is it possible that the node was added to torque with only
1 processor to start, later, the processors were corrected inside torque but
Moab was not restarted to pick up on the change?

- Douglas

On Fri, 8 Sep 2006, Brock Palen wrote:

> That was because i was running on the node outside of torque.  but the problem
> occurs anyway,
> 
> here is miag -n nyxtest2 -v
> 
> [root at nyx ~]# mdiag -n nyxtest2 -v
> compute node summary
> Name                    State   Procs      Memory         Disk          Swap
> Speed   Opsys   Arch Par   Load Rsv Classes                        Network
> Features
> 
> nyxtest2                 Idle    8:8     63863:63863       1:1
> 67572:67572   1.00   linux      - nyx   0.00   1
> [bio1_1:1][landau_1:1][csem_1:1][staff_1:1][atlas_1:1][ib_1:1][violi_1:1][short_1:1][long_1:1][route_1:1][avdv_1:1]
> [DEFAULT]                      -
> -----                     ---    8:8     63863:63863       1:1
> 67572:67572
> 
> Total Nodes: 1  (Active: 0  Idle: 1  Down: 0)
> 
> 
> NODE[GLOBAL] Config Res     stata: 10
> NODE[GLOBAL] Dedicated Res  ---
> NODE[GLOBAL] Available Res  stata: 10
> 
> Notice again how the class says only a single slot is available.
> 
> Brock Palen
> Center for Advanced Computing
> brockp at umich.edu
> (734)936-1985
> 
> 
> On Sep 8, 2006, at 3:23 PM, Justin Bronder wrote:
> 
> >Run "mdiag -n".  The load on that node seems quite high, and hence Moab
> >will not attempt to use all of the processors.  This happens every so often
> >on our cluster as various processes wake up and run just as Torque is
> >checking
> >the node.
> >
> >For instance:
> > >mdiag -n
> >...
> >node110                  Idle    1:2      2048:2048      darwin
> > WARNING:  node 'node110' has more processors utilized than dedicated (1 > 0)
> > WARNING:  processor mismatch on idle node node110 (1 available  2
> > WARNING:  configured)
> > WARNING:  node 'node110' has been idle for 21:45:56 but load is HIGH.  load:
> > WARNING:  0.980 (check for runaway processes?)
> >
> >If the load of 8.500 is correct, you'll probably want to see why it's so high
> >before
> >running any jobs.
> >
> >-Justin.
> >
> >On 9/8/06, Brock Palen <brockp at umich.edu> wrote:
> >We have a new machine,  it is a 8 cpu SMP machine.  Torque sees that
> >is has 8 cpus,
> >(np=8)
> >but when i try to submit a job it asking for 8 cpus i get:
> >
> >[brockp at nyx ~]$ qsub -I -V -l host=nyxtest2:ppn=8 -q staff
> > qsub: waiting for job 28174.nyx.engin.umich.edu to start
> > qsub: PBS: MOAB_INFO:  interactive job can never run - partition nyx
> >has insufficient instances of requested class staff configured (5 < 8)
> >
> >The only thing i can see is the Class: line from checknode (below)
> >For the x4600 it only shows up with 1 available, and the other node
> >in the staff queue (4 cpu box) shows up with all 4.  But torque sees
> >all the cpus, and moab should slurp all this in.  Am i missing
> >something?
> >
> >Qmgr: p n nyxtest2
> > #
> > # Create nodes and set their properties.
> > #
> > #
> > # Create and define node nyxtest2
> > #
> >create node nyxtest2
> >set node nyxtest2 state = free
> >set node nyxtest2 np = 8
> >set node nyxtest2 ntype = cluster
> >set node nyxtest2 status = opsys=linux
> >set node nyxtest2 status += uname=Linux nyxtest2 2.6.9-34.0.2.ELsmp
> >#1 SMP Fri Jun 30 10:32:04 EDT 2006 x86_64
> >set node nyxtest2 status += sessions=4721 5091
> >set node nyxtest2 status += nsessions=2
> >set node nyxtest2 status += nusers=1
> >set node nyxtest2 status += idletime=17007
> >set node nyxtest2 status += totmem=69492456kb
> >set node nyxtest2 status += availmem=68349228kb
> >set node nyxtest2 status += physmem=65395924kb
> >set node nyxtest2 status += ncpus=8
> >set node nyxtest2 status += loadave=8.57
> >set node nyxtest2 status += netload=89545934
> >set node nyxtest2 status += state=free
> >set node nyxtest2 status += jobs=? 0
> >set node nyxtest2 status += rectime=1157741849
> >
> >
> >[brockp at nyx ~]$ checknode nyxtest2
> >node nyxtest2
> >
> >State:      Idle  (in current state for 00:46:57)
> >Configured Resources: PROCS: 8  MEM: 62G  SWAP: 65G  DISK: 1M
> >Utilized   Resources: ---
> >Dedicated  Resources: ---
> >Opsys:      linux     Arch:      ---
> >Speed:      1.00      CPULoad:   8.500
> >Network Load: 0.03 kB/s
> >Flags:      rmdetected
> >Network:    DEFAULT
> >Classes:    [bio1 1:1][landau 1:1][csem 1:1][staff 1:1][atlas 1:1][ib
> >1:1][violi 1:1][short 1:1][long 1:1][route 1:1][avdv 1:1]
> >RM[nyx]:    TYPE=PBS
> >NodeAccessPolicy: SHARED
> >
> >Total Time: 39:20:28:02  Up: 12:19:15:46 (32.12%)  Active: 1:07:09
> >(0.12%)
> >
> >Reservations:
> >   staff.2290x1  User  -9:23:25:41 ->   INFINITY (  INFINITY)
> >     Blocked Resources at -2:16:11    Procs: 8/8 ( 100.00%)  Mem: 0/63863
> >(0.00%)  Swap: 0/66755 (0.00%)  Disk: 0/1 (0.00%)
> >ALERT:  node is in state Idle but load is high (8.500)
> >
> >
> >
> >
> >Brock Palen
> >Center for Advanced Computing
> >brockp at umich.edu
> >(734)936-1985
> >
> >
> >_______________________________________________
> >moabusers mailing list
> >moabusers at supercluster.org
> >http://www.supercluster.org/mailman/listinfo/moabusers
> >
> 
> 


More information about the moabusers mailing list