[Moabusers] problems between torque and moab

Brock Palen brockp at umich.edu
Fri Sep 8 13:31:24 MDT 2006


That was because i was running on the node outside of torque.  but  
the problem occurs anyway,

here is miag -n nyxtest2 -v

[root at nyx ~]# mdiag -n nyxtest2 -v
compute node summary
Name                    State   Procs      Memory          
Disk          Swap      Speed   Opsys   Arch Par   Load Rsv  
Classes                        Network                        Features

nyxtest2                 Idle    8:8     63863:63863       1:1        
67572:67572   1.00   linux      - nyx   0.00   1 [bio1_1:1] 
[landau_1:1][csem_1:1][staff_1:1][atlas_1:1][ib_1:1][violi_1:1] 
[short_1:1][long_1:1][route_1:1][avdv_1:1]  
[DEFAULT]                      -
-----                     ---    8:8     63863:63863       1:1        
67572:67572

Total Nodes: 1  (Active: 0  Idle: 1  Down: 0)


NODE[GLOBAL] Config Res     stata: 10
NODE[GLOBAL] Dedicated Res  ---
NODE[GLOBAL] Available Res  stata: 10

Notice again how the class says only a single slot is available.

Brock Palen
Center for Advanced Computing
brockp at umich.edu
(734)936-1985


On Sep 8, 2006, at 3:23 PM, Justin Bronder wrote:

> Run "mdiag -n".  The load on that node seems quite high, and hence  
> Moab
> will not attempt to use all of the processors.  This happens every  
> so often
> on our cluster as various processes wake up and run just as Torque  
> is checking
> the node.
>
> For instance:
> >mdiag -n
> ...
> node110                  Idle    1:2      2048:2048      darwin
>   WARNING:  node 'node110' has more processors utilized than  
> dedicated (1 > 0)
>   WARNING:  processor mismatch on idle node node110 (1 available  2  
> configured)
>   WARNING:  node 'node110' has been idle for 21:45:56 but load is  
> HIGH.  load:  0.980 (check for runaway processes?)
>
> If the load of 8.500 is correct, you'll probably want to see why  
> it's so high before
> running any jobs.
>
> -Justin.
>
> On 9/8/06, Brock Palen <brockp at umich.edu> wrote:
> We have a new machine,  it is a 8 cpu SMP machine.  Torque sees that
> is has 8 cpus,
> (np=8)
> but when i try to submit a job it asking for 8 cpus i get:
>
> [brockp at nyx ~]$ qsub -I -V -l host=nyxtest2:ppn=8 -q staff
> qsub: waiting for job 28174.nyx.engin.umich.edu to start
> qsub: PBS: MOAB_INFO:  interactive job can never run - partition nyx
> has insufficient instances of requested class staff configured (5 < 8)
>
> The only thing i can see is the Class: line from checknode (below)
> For the x4600 it only shows up with 1 available, and the other node
> in the staff queue (4 cpu box) shows up with all 4.  But torque sees
> all the cpus, and moab should slurp all this in.  Am i missing
> something?
>
> Qmgr: p n nyxtest2
> #
> # Create nodes and set their properties.
> #
> #
> # Create and define node nyxtest2
> #
> create node nyxtest2
> set node nyxtest2 state = free
> set node nyxtest2 np = 8
> set node nyxtest2 ntype = cluster
> set node nyxtest2 status = opsys=linux
> set node nyxtest2 status += uname=Linux nyxtest2 2.6.9-34.0.2.ELsmp
> #1 SMP Fri Jun 30 10:32:04 EDT 2006 x86_64
> set node nyxtest2 status += sessions=4721 5091
> set node nyxtest2 status += nsessions=2
> set node nyxtest2 status += nusers=1
> set node nyxtest2 status += idletime=17007
> set node nyxtest2 status += totmem=69492456kb
> set node nyxtest2 status += availmem=68349228kb
> set node nyxtest2 status += physmem=65395924kb
> set node nyxtest2 status += ncpus=8
> set node nyxtest2 status += loadave=8.57
> set node nyxtest2 status += netload=89545934
> set node nyxtest2 status += state=free
> set node nyxtest2 status += jobs=? 0
> set node nyxtest2 status += rectime=1157741849
>
>
> [brockp at nyx ~]$ checknode nyxtest2
> node nyxtest2
>
> State:      Idle  (in current state for 00:46:57)
> Configured Resources: PROCS: 8  MEM: 62G  SWAP: 65G  DISK: 1M
> Utilized   Resources: ---
> Dedicated  Resources: ---
> Opsys:      linux     Arch:      ---
> Speed:      1.00      CPULoad:   8.500
> Network Load: 0.03 kB/s
> Flags:      rmdetected
> Network:    DEFAULT
> Classes:    [bio1 1:1][landau 1:1][csem 1:1][staff 1:1][atlas 1:1][ib
> 1:1][violi 1:1][short 1:1][long 1:1][route 1:1][avdv 1:1]
> RM[nyx]:    TYPE=PBS
> NodeAccessPolicy: SHARED
>
> Total Time: 39:20:28:02  Up: 12:19:15:46 (32.12%)  Active: 1:07:09
> (0.12%)
>
> Reservations:
>    staff.2290x1  User  -9:23:25:41 ->   INFINITY (  INFINITY)
>      Blocked Resources at -2:16:11    Procs: 8/8 ( 100.00%)  Mem: 0/63863
> (0.00%)  Swap: 0/66755 (0.00%)  Disk: 0/1 (0.00%)
> ALERT:  node is in state Idle but load is high (8.500)
>
>
>
>
> Brock Palen
> Center for Advanced Computing
> brockp at umich.edu
> (734)936-1985
>
>
> _______________________________________________
> moabusers mailing list
> moabusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/moabusers
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/moabusers/attachments/20060908/354a804b/attachment.html


More information about the moabusers mailing list