<BODY contentEditable=true onload=parent.init()><P>I have a question and it could be something very simple but I don't see it:</P>
<P> </P>
<P>qmgr -c 'p s'</P>
<P># Create queues and set their attributes.<BR>#<BR>#<BR># Create and define queue ram16<BR>#<BR>create queue ram16<BR>set queue ram16 queue_type = Execution<BR>set queue ram16 resources_max.mem = 16gb<BR>set queue ram16 resources_min.mem = 8gb<BR>set queue ram16 resources_default.mem = 8gb<BR>set queue ram16 enabled = True<BR>set queue ram16 started = True<BR>#<BR># Create and define queue ram8<BR>#<BR>create queue ram8<BR>set queue ram8 queue_type = Execution<BR>set queue ram8 resources_max.mem = 8gb<BR>set queue ram8 resources_min.mem = 4gb<BR>set queue ram8 resources_default.mem = 4gb<BR>set queue ram8 enabled = True<BR>set queue ram8 started = True<BR>#<BR># Create and define queue ram4<BR>#<BR>create queue ram4<BR>set queue ram4 queue_type = Execution<BR>set queue ram4 resources_max.mem = 4gb<BR>set queue ram4 resources_default.mem = 1gb<BR>set queue ram4 enabled = True<BR>set queue ram4 started = True<BR>#<BR># Set server attributes.<BR>#<BR>
set server scheduling = True<BR>set server default_queue = ram4<BR>set server log_events = 511<BR>set server mail_from = adm<BR>set server query_other_jobs = True<BR>set server resources_default.nodes = 1<BR>set server scheduler_iteration = 60<BR>set server node_ping_rate = 300<BR>set server node_check_rate = 600<BR>set server tcp_timeout = 6<BR>set server pbs_version = 2.1.8<BR></P>
<P> </P>
<P>qstat -s</P>
<P> Req'd Req'd Elap<BR>Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time<BR>-------------------- -------- -------- ---------- ------ ----- --- ------ ----- - -----<BR>50.coupled-cluster.l user ram4 CuPCuP++.2 13037 1 -- 4gb -- R 05:16<BR> Job started on Tue Mar 20 at 16:51<BR>
57.coupled-cluster.l user ram8 CuPCuP.1d. 16589 1 -- 8gb -- R 04:17<BR> Job started on Tue Mar 20 at 17:50<BR>63.coupled-cluster.l user ram16 CuPCuP.2d. 22138 1 -- 16gb -- R 04:10<BR> Job started on Tue Mar 20 at 17:57<BR>79.coupled-cluster.l user ram16 test -- 1 -- 16gb -- Q -- <BR> Not Running: Not enough memory available<BR>98.coupled-cluster.l scoggins ram4 mpi-hello. -- 3 -- 2gb -- Q -- <BR> Not Running: Not enough memory available<BR>
101.coupled-cluster. scoggins ram8 mpi-hello. -- 3 -- 6gb -- Q -- <BR> Not Running: Not enough memory available<BR></P>
<P> </P>
<P>pbsnodes -a</P>
<P> </P>
<P>node0000<BR> state = free<BR> np = 2<BR> properties = mem16gb<BR> ntype = cluster<BR> status = opsys=linux,uname=Linux node0000 2.6.17.11-102.caos.smp #1 SMP Thu Aug 24 23:30:43 EDT 2006 x86_64,sessions=? 0,nsessions=? 0,nusers=<BR>0,idletime=53301,totmem=16355264kb,availmem=16307688kb,physmem=16355264kb,ncpus=2,loadave=0.00,netload=190750343,state=free,jobs=? 0,rectime=117445<BR>7374<BR> <BR>node0001<BR> state = free<BR> np = 2<BR> properties = mem16gb<BR> ntype = cluster<BR> status = opsys=linux,uname=Linux node0001 2.6.17.11-102.caos.smp #1 SMP Thu Aug 24 23:30:43 EDT 2006 x86_64,sessions=? 0,nsessions=? 0,nusers=<BR>
0,idletime=53264,totmem=18134864kb,availmem=18087688kb,physmem=16174976kb,ncpus=2,loadave=0.00,netload=15128314,state=free,jobs=? 0,rectime=1174457<BR>361<BR> <BR>node0002<BR> state = free<BR> np = 2<BR> properties = mem8gb<BR> ntype = cluster<BR> jobs = 0/57.coupled-cluster.lbl.gov<BR> status = opsys=linux,uname=Linux node0002 2.6.17.11-102.caos.smp #1 SMP Thu Aug 24 23:30:43 EDT 2006 x86_64,sessions=16589,nsessions=1,nusers=<BR>1,idletime=53315,totmem=10133908kb,availmem=9857084kb,physmem=8093664kb,ncpus=2,loadave=1.00,netload=104881749,state=free,jobs=57.coupled-cluster.l<BR>bl.gov,rectime=1174457362<BR> <BR>node0003<BR> state = free<BR> np = 2<BR> properties = mem8gb<BR> ntype = cluster<BR>
status = opsys=linux,uname=Linux node0003 2.6.17.11-102.caos.smp #1 SMP Thu Aug 24 23:30:43 EDT 2006 x86_64,sessions=? 0,nsessions=? 0,nusers=<BR>0,idletime=53264,totmem=10133904kb,availmem=10089216kb,physmem=8093660kb,ncpus=2,loadave=0.00,netload=14935077,state=free,jobs=? 0,rectime=11744573<BR>60<BR> <BR>node0004<BR> state = free<BR> np = 2<BR> properties = mem8gb<BR> ntype = cluster<BR> status = opsys=linux,uname=Linux node0004 2.6.17.11-102.caos.smp #1 SMP Thu Aug 24 23:30:43 EDT 2006 x86_64,sessions=? 0,nsessi</P>
<P>...</P>
<P> </P>
<P>And there are a lot more free.</P>
<P> </P>
<P>No jobs are running on the free nodes.</P>
<P> </P>
<P>cat /var/spool/torque/sched_priv/sched_config</P>
<P> </P>
<P>round_robin: False all<BR> <BR> <BR> <BR>by_queue: True prime<BR>by_queue: True non_prime<BR> <BR> <BR> <BR>strict_fifo: false ALL<BR> <BR>fair_share: false ALL<BR> <BR> <BR>help_starving_jobs false ALL<BR> <BR>sort_queues false ALL<BR> <BR>load_balancing: true ALL<BR> <BR> <BR> <BR>sort_by: shortest_job_first ALL<BR> <BR>log_filter: 256<BR> <BR>dedicated_prefix: ded<BR> <BR>max_starve: 24:00:00<BR> <BR> <BR>half_life: 24:00:00<BR> <BR>unknown_shares: 10<BR> <BR>sync_time: 1:00:00<BR></P>
<P> </P>
<P>Why are no other jobs running on the free nodes?</P>
<P> </P>
<P>Thanks</P>
<P> </P>
<P>Jackie</P>
<P> </P>
<P> </P>
<P> </P>
<P> </P>
<P> </P></BODY>