<br><br><div class="gmail_quote">On Thu, Feb 16, 2012 at 4:05 PM, Gustavo Correa <span dir="ltr"><<a href="mailto:gus@ldeo.columbia.edu">gus@ldeo.columbia.edu</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
PS - For some diagnostic, you could also try '$TORQUE/bin/pbsnodes' on the server,<br></blockquote><div>[root@wings ~]# pbsnodes</div><div>n001.default.domain</div><div> state = free</div><div> np = 1</div>
<div> ntype = cluster</div><div> status = rectime=1329430696,varattr=,jobs=,state=free,netload=42970654,gres=,loadave=0.03,ncpus=24,physmem=20463136kb,availmem=27788364kb,totmem=28655128kb,idletime=177266,nusers=1,nsessions=1,sessions=17382,uname=Linux n001 2.6.32-131.0.15.el6.x86_64 #1 SMP Tue May 10 15:42:40 EDT 2011 x86_64,opsys=linux</div>
<div> gpus = 0</div><div><br></div><div>n002.default.domain</div><div> state = free</div><div> np = 1</div><div> ntype = cluster</div><div> status = rectime=1329430653,varattr=,jobs=,state=free,netload=41152440,gres=,loadave=0.00,ncpus=24,physmem=24600084kb,availmem=31877036kb,totmem=32792076kb,idletime=177252,nusers=0,nsessions=? 0,sessions=? 0,uname=Linux n002 2.6.32-131.0.15.el6.x86_64 #1 SMP Tue May 10 15:42:40 EDT 2011 x86_64,opsys=linux</div>
<div> gpus = 0</div><div><br></div><div>These look good, right? </div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
and '$TORQUE/sbin/momctl -d 3' on the compute nodes.<br></blockquote><div><br></div><div>[root@n001 sbin]# momctl -d 3</div><div><br></div><div>Host: n001/n001.default.domain Version: 2.5.9 PID: 3598</div><div>
Server[0]: admin.default.domain (<a href="http://10.0.10.1:1023">10.0.10.1:1023</a>)</div><div> Init Msgs Received: 2 hellos/2 cluster-addrs</div><div> Init Msgs Sent: 6 hellos</div><div> Last Msg From Server: 8595 seconds (DeleteJob)</div>
<div> Last Msg To Server: 32 seconds</div><div>HomeDirectory: /var/spool/torque/mom_priv</div><div>stdout/stderr spool directory: '/var/spool/torque/spool/' (23252610 blocks available)</div><div>NOTE: syslog enabled</div>
<div>MOM active: 176853 seconds</div><div>Check Poll Time: 45 seconds</div><div>Server Update Interval: 45 seconds</div><div>LogLevel: 0 (use SIGUSR1/SIGUSR2 to adjust)</div><div>Communication Model: RPP</div>
<div>MemLocked: TRUE (mlock)</div><div>TCP Timeout: 20 seconds</div><div>Prolog: /var/spool/torque/mom_priv/prologue (disabled)</div><div>Alarm Time: 0 of 10 seconds</div>
<div>Trusted Client List: 10.0.1.20,10.0.1.19,10.0.1.18,10.0.1.17,10.0.1.16,10.0.1.15,10.0.1.14,10.0.1.13,10.0.1.12,10.0.1.11,10.0.1.10,10.0.1.9,10.0.1.8,10.0.1.7,10.0.1.6,10.0.1.5,10.0.1.4,10.0.1.3,10.0.1.2,10.0.10.1,10.0.1.1,127.0.0.1</div>
<div>Copy Command: /usr/bin/scp -rpB</div><div>NOTE: no local jobs detected</div><div><br></div><div>diagnostics complete</div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Gus Correa<br></blockquote><div><br></div><div> </div></div>