<div class="gmail_extra"><br><br><div class="gmail_quote">On Mon, Apr 23, 2012 at 10:36 AM, Coyle, James J [ITACD] <span dir="ltr"><<a href="mailto:jjc@iastate.edu" target="_blank">jjc@iastate.edu</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div link="blue" vlink="purple" lang="EN-US">
<div>
<p class="MsoNormal"><span style="color:#1f497d">Michael,<u></u><u></u></span></p>
<p class="MsoNormal"><span style="color:#1f497d"><u></u> <u></u></span></p>
<p class="MsoNormal"><span style="color:#1f497d"> Two possibilities worth exploring:<u></u><u></u></span></p>
<p class="MsoNormal"><span style="color:#1f497d"><u></u> <u></u></span></p>
<p><u></u><span style="color:#1f497d"><span>1)<span style="font:7.0pt "Times New Roman"">
</span></span></span><u></u><span style="color:#1f497d">It seems like you must be using different ports, I see references pbs_mom actions on nodes 58, 40 and 2 related to ports 707, 726 and 746 , normally I’d expect TORQUE port numbers in the 15000+ range.</span></p>
</div></div></blockquote><div><br>The ports 7078, 726 and 746 are privileged ports. If this is from the server to the mom they will be connecting on port 15002. If it is from a client app to the server they will be hitting port 15001. By default that is.<br>
<br>Ken <br></div><blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div link="blue" vlink="purple" lang="EN-US"><div><p><span style="color:rgb(31,73,125)"><u></u><u></u></span></p>
<p><span style="color:#1f497d">Is there an issue with a mismatch of ports between nodes ?<u></u><u></u></span></p>
<p><span style="color:#1f497d"><u></u> <u></u></span></p>
<p><u></u><span style="color:#1f497d"><span>2)<span style="font:7.0pt "Times New Roman"">
</span></span></span><u></u><span style="color:#1f497d">The following two URLS relate to the resolution of an issue that seems similar to yours (recent upgrade, Torque having problems some of the time.)
<u></u><u></u></span></p>
<p class="MsoNormal"><span style="color:#1f497d"><u></u> <u></u></span></p>
<p class="MsoNormal"><span style="color:#1f497d"><a href="http://www.clusterresources.com/pipermail/torqueusers/2011-March/012540.html" target="_blank">http://www.clusterresources.com/pipermail/torqueusers/2011-March/012540.html</a>
<u></u><u></u></span></p>
<p class="MsoNormal"><span style="color:#1f497d"><a href="http://serverfault.com/questions/253932/torque-works-half-of-the-time-fails-no-permission-the-other-half" target="_blank">http://serverfault.com/questions/253932/torque-works-half-of-the-time-fails-no-permission-the-other-half</a><u></u><u></u></span></p>
<p class="MsoNormal"><span style="color:#1f497d"><u></u> <u></u></span></p>
<p class="MsoNormal"><span style="color:#1f497d"><u></u> <u></u></span></p>
<p class="MsoNormal"><span style="color:#1f497d"><u></u> <u></u></span></p>
<p class="MsoNormal"><span style="font-size:16.0pt;font-family:Consolas;color:#1f497d">James Coyle, PhD<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:16.0pt;font-family:Consolas;color:#1f497d">High Performance Computing Group
<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:16.0pt;font-family:Consolas;color:#1f497d"> Iowa State Univ.
<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:16.0pt;font-family:Consolas;color:#1f497d">web:
<a href="http://www.public.iastate.edu/%7Ejjc" target="_blank"><span style="color:blue">http://jjc.public.iastate.edu/</span></a><u></u><u></u></span></p>
<p class="MsoNormal"><span style="color:#1f497d"><u></u> <u></u></span></p>
<div style="border:none;border-left:solid blue 1.5pt;padding:0in 0in 0in 4.0pt">
<div>
<div style="border:none;border-top:solid #b5c4df 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"><b><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">From:</span></b><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif""> <a href="mailto:torqueusers-bounces@supercluster.org" target="_blank">torqueusers-bounces@supercluster.org</a> [mailto:<a href="mailto:torqueusers-bounces@supercluster.org" target="_blank">torqueusers-bounces@supercluster.org</a>]
<b>On Behalf Of </b>Stevens, Michael<br>
<b>Sent:</b> Thursday, April 19, 2012 11:43 AM<br>
<b>To:</b> <a href="mailto:torqueusers@supercluster.org" target="_blank">torqueusers@supercluster.org</a><br>
<b>Subject:</b> [torqueusers] pbs_sched cores - repost<u></u><u></u></span></p>
</div>
</div><div><div class="h5">
<p class="MsoNormal"><u></u> <u></u></p>
<p class="MsoNormal"><u></u> <u></u></p>
<p class="MsoNormal">I had posted this question a few weeks ago, and received no response. Would it be more appropriate to post this to –dev? <u></u><u></u></p>
<p class="MsoNormal"><u></u> <u></u></p>
<p class="MsoNormal">I am running a 115 node cluster using torque 2.5.7 under CentOS 6.2. This cluster is in turn running on a Vmware ESX 4.0 cluster; the idea here being that we can use the physical resources of the torque cluster when no jobs are running.<br>
<br>
I am seeing crashes of pbs_sched when the cluster gets busy. Following is some data I’ve been able to assemble thus far:<u></u><u></u></p>
<p class="MsoNormal"><u></u> <u></u></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">/var/log/messages<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New""><u></u> <u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">Apr 19 08:53:31 node103 dhclient[1540]: DHCPREQUEST on eth0 to 10.80.101.10 port 67 (xid=0x6c91cbd5)<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">Apr 19 08:53:31 cluster1 dhcpd: DHCPREQUEST for 10.80.101.123 from 00:50:56:b4:7b:a3 via eth0<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">Apr 19 08:53:31 cluster1 dhcpd: DHCPACK on 10.80.101.123 to 00:50:56:b4:7b:a3 via eth0<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">Apr 19 08:53:31 node103 dhclient[1540]: DHCPACK from 10.80.101.10 (xid=0x6c91cbd5)<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">Apr 19 08:53:33 node103 ypbind: NIS domain: <a href="http://affymetrix.com" target="_blank">affymetrix.com</a>, NIS server: nis2<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">Apr 19 08:53:33 node103 dhclient[1540]: bound to 10.80.101.123 -- renewal in 16219 seconds.<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">Apr 19 08:54:09 node58 pbs_mom: LOG_ERROR::Operation now in progress (115) in scan_for_exiting, cannot connect to port 707 in client_to_svr - errno:115 Operation now in
progress<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">Apr 19 08:54:15 cluster1 abrt[3555]: saved core dump of pid 2911 (/usr/sbin/pbs_sched) to /var/spool/abrt/ccpp-<a href="tel:2012-04-19-08" value="+12012041908" target="_blank">2012-04-19-08</a>:54:14-2911.new/coredump <a href="tel:%283100672" value="+13100672" target="_blank">(3100672</a> bytes)<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">Apr 19 08:54:15 cluster1 abrtd: Directory 'ccpp-<a href="tel:2012-04-19-08" value="+12012041908" target="_blank">2012-04-19-08</a>:54:14-2911' creation detected<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">Apr 19 08:54:21 cluster1 abrtd: Package 'torque-scheduler' isn't signed with proper key<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">Apr 19 08:54:21 cluster1 abrtd: Corrupted or bad dump /var/spool/abrt/ccpp-<a href="tel:2012-04-19-08" value="+12012041908" target="_blank">2012-04-19-08</a>:54:14-2911 (res:2), deleting<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">Apr 19 08:55:46 node40 pbs_mom: LOG_ERROR::Operation now in progress (115) in post_epilogue, cannot connect to port 726 in client_to_svr - errno:115 Operation now in progress<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">Apr 19 08:55:47 node2 pbs_mom: LOG_ERROR::Operation now in progress (115) in post_epilogue, cannot connect to port 746 in client_to_svr - errno:115 Operation now in progress<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New""><u></u> <u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New""><u></u> <u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New""><u></u> <u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">scheduler log<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New""><u></u> <u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">04/19/2012 08:52:55;0040; pbs_sched;Job;<a href="http://302539.cluster1.cluster.affymetrix.com" target="_blank">302539.cluster1.cluster.affymetrix.com</a>;Job Run<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">04/19/2012 08:52:55;0040; pbs_sched;Job;<a href="http://302540.cluster1.cluster.affymetrix.com" target="_blank">302540.cluster1.cluster.affymetrix.com</a>;Job Run<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">04/19/2012 08:52:55;0040; pbs_sched;Job;<a href="http://302541.cluster1.cluster.affymetrix.com" target="_blank">302541.cluster1.cluster.affymetrix.com</a>;Job Run<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">04/19/2012 08:52:55;0040; pbs_sched;Job;<a href="http://302542.cluster1.cluster.affymetrix.com" target="_blank">302542.cluster1.cluster.affymetrix.com</a>;Job Run<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">04/19/2012 08:52:55;0040; pbs_sched;Job;<a href="http://302543.cluster1.cluster.affymetrix.com" target="_blank">302543.cluster1.cluster.affymetrix.com</a>;Job Run<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">04/19/2012 08:52:55;0040; pbs_sched;Job;<a href="http://302544.cluster1.cluster.affymetrix.com" target="_blank">302544.cluster1.cluster.affymetrix.com</a>;Job Run<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">04/19/2012 08:52:55;0080; pbs_sched;Svr;main;brk point 38178816<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">04/19/2012 08:52:58;0040; pbs_sched;Job;<a href="http://302545.cluster1.cluster.affymetrix.com" target="_blank">302545.cluster1.cluster.affymetrix.com</a>;Job Run<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">04/19/2012 08:53:08;0040; pbs_sched;Job;<a href="http://302546.cluster1.cluster.affymetrix.com" target="_blank">302546.cluster1.cluster.affymetrix.com</a>;Job Run<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">04/19/2012 08:53:10;0040; pbs_sched;Job;<a href="http://302547.cluster1.cluster.affymetrix.com" target="_blank">302547.cluster1.cluster.affymetrix.com</a>;Job Run<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">04/19/2012 08:53:14;0040; pbs_sched;Job;<a href="http://302548.cluster1.cluster.affymetrix.com" target="_blank">302548.cluster1.cluster.affymetrix.com</a>;Job Run<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">04/19/2012 08:53:18;0040; pbs_sched;Job;<a href="http://302549.cluster1.cluster.affymetrix.com" target="_blank">302549.cluster1.cluster.affymetrix.com</a>;Job Run<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">04/19/2012 08:53:24;0040; pbs_sched;Job;<a href="http://302550.cluster1.cluster.affymetrix.com" target="_blank">302550.cluster1.cluster.affymetrix.com</a>;Job Run<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">04/19/2012 09:14:01;0002; pbs_sched;Svr;Log;Log opened<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">04/19/2012 09:14:01;0002; pbs_sched;Svr;TokenAct;Account file /var/lib/torque/sched_priv/accounting/20120419 opened<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">04/19/2012 09:14:01;0002; pbs_sched;Svr;main;/usr/sbin/pbs_sched startup pid 4707<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">04/19/2012 09:14:54;0040; pbs_sched;Job;<a href="http://302552.cluster1.cluster.affymetrix.com" target="_blank">302552.cluster1.cluster.affymetrix.com</a>;Job Run<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New""><u></u> <u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New""><u></u> <u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">gdb of the crash file<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New""><u></u> <u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">[root@cluster1 sched_priv]# gdb -e /usr/sbin/pbs_sched -c core.2911<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">GNU gdb (GDB) Red Hat Enterprise Linux (7.2-50.el6)<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">Copyright (C) 2010 Free Software Foundation, Inc.<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">License GPLv3+: GNU GPL version 3 or later <<a href="http://gnu.org/licenses/gpl.html" target="_blank">http://gnu.org/licenses/gpl.html</a>><u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">This is free software: you are free to change and redistribute it.<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">There is NO WARRANTY, to the extent permitted by law. Type "show copying"<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">and "show warranty" for details.<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">This GDB was configured as "x86_64-redhat-linux-gnu".<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">For bug reporting instructions, please see:<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New""><<a href="http://www.gnu.org/software/gdb/bugs/" target="_blank">http://www.gnu.org/software/gdb/bugs/</a>>.<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">[New Thread 2911]<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">Missing separate debuginfo for
<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">Try: yum --disablerepo='*' --enablerepo='*-debuginfo' install /usr/lib/debug/.build-id/23/1bd9599ad974226f19adfdc4dae3691396c81d<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">Reading symbols from /usr/lib64/libtorque.so.2.0.0...Reading symbols from /usr/lib/debug/usr/lib64/libtorque.so.2.0.0.debug...done.<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">done.<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">Loaded symbols for /usr/lib64/libtorque.so.2.0.0<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">Loaded symbols for /lib64/libc.so.6<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done.<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">Loaded symbols for /lib64/ld-linux-x86-64.so.2<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">Reading symbols from /lib64/libnss_files.so.2...(no debugging symbols found)...done.<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">Loaded symbols for /lib64/libnss_files.so.2<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">Reading symbols from /lib64/libnss_dns.so.2...(no debugging symbols found)...done.<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">Loaded symbols for /lib64/libnss_dns.so.2<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">Reading symbols from /lib64/libresolv.so.2...(no debugging symbols found)...done.<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">Loaded symbols for /lib64/libresolv.so.2<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">Core was generated by `/usr/sbin/pbs_sched -d /var/lib/torque -a 600'.<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">Program terminated with signal 11, Segmentation fault.<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">#0 0x0000003ff4a13c44 in pbs_rescquery (c=0, resclist=<value optimized out>, num_resc=<value optimized out>,
<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New""> available=0x7fffba82910c, allocated=0x7fffba829108, reserved=0x7fffba829104, down=0x7fffba829100)<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New""> at ../Libifl/pbsD_resc.c:215<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">215 *(available + i) = *(reply->brp_un.brp_rescq.brq_avail + i);<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.47.el6_2.5.x86_64<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">(gdb) bt<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">#0 0x0000003ff4a13c44 in pbs_rescquery (c=0, resclist=<value optimized out>, num_resc=<value optimized out>,
<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New""> available=0x7fffba82910c, allocated=0x7fffba829108, reserved=0x7fffba829104, down=0x7fffba829100)<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New""> at ../Libifl/pbsD_resc.c:215<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">#1 0x000000000040c8d6 in ?? ()<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">#2 0x00007fffba829100 in ?? ()<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">#3 0x0000000000000000 in ?? ()<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">(gdb)
<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New""><u></u> <u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New""><u></u> <u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">The last few lines of strace<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New""><u></u> <u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">read(8, "+2+1+0+0+6+0", 262144) = 12<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">write(8, "+2+12+51+9Scheduler+12+182+11des"..., 53) = 53<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">poll([{fd=8, events=POLLIN|POLLHUP}], 1, 20000) = 1 ([{fd=8, revents=POLLIN}])<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">read(8, "+2+1+0+0+6+0", 262144) = 12<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">write(8, "+2+12+51+9Scheduler+12+182+11des"..., 53) = 53<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">poll([{fd=8, events=POLLIN|POLLHUP}], 1, 20000) = 1 ([{fd=8, revents=POLLIN}])<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">read(8, "+2+1+0+0+6+0", 262144) = 12<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">write(8, "+2+12+51+9Scheduler+12+252+11des"..., 62) = 62<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">poll([{fd=8, events=POLLIN|POLLHUP}], 1, 20000) = 1 ([{fd=8, revents=POLLIN}])<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">read(8, "+2+1+0+0+6+0", 262144) = 12<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">write(8, "+2+12+51+9Scheduler+12+272+11des"..., 64) = 64<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">poll([{fd=8, events=POLLIN|POLLHUP}], 1, 20000) = 1 ([{fd=8, revents=POLLIN}])<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">read(8, "+2+1+0+0+6+0", 262144) = 12<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">write(8, "+2+12+51+9Scheduler+12+192+11des"..., 54) = 54<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">poll([{fd=8, events=POLLIN|POLLHUP}], 1, 20000) = 1 ([{fd=8, revents=POLLIN}])<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">read(8, "+2+1+0+0+6+0", 262144) = 12<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">write(8, "+2+12+51+9Scheduler+12+182+11des"..., 53) = 53<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">poll([{fd=8, events=POLLIN|POLLHUP}], 1, 20000) = 1 ([{fd=8, revents=POLLIN}])<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">read(8, "+2+1+0+0+6+0", 262144) = 12<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">write(8, "+2+12+51+9Scheduler+12+182+11des"..., 53) = 53<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">poll([{fd=8, events=POLLIN|POLLHUP}], 1, 20000) = 1 ([{fd=8, revents=POLLIN}])<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">read(8, "+2+1+0+0+6+0", 262144) = 12<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">write(8, "+2+12+51+9Scheduler+12+232+11des"..., 60) = 60<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">poll([{fd=8, events=POLLIN|POLLHUP}], 1, 20000) = 1 ([{fd=8, revents=POLLIN}])<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">read(8, "+2+1+0+0+6+0", 262144) = 12<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">write(8, "+2+12+24+9Scheduler+0+12+13nodes"..., 42) = 42<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">poll([{fd=8, events=POLLIN|POLLHUP}], 1, 20000) = 1 ([{fd=8, revents=POLLIN}])<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">read(8, "+2+1+0+0+9+1+1+0+0+0", 262144) = 20<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2819, ...}) = 0<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">write(8, "+2+12+11+9Scheduler+2+22+3830255"..., 124) = 124<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">poll([{fd=8, events=POLLIN|POLLHUP}], 1, 20000) = 1 ([{fd=8, revents=POLLIN}])<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">read(8, "+2+1+0+0+1", 262144) = 10<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">write(8, "+2+12+15+9Scheduler2+<a href="http://38302551.cl" target="_blank">38302551.cl</a>"..., 67) = 67<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">poll([{fd=8, events=POLLIN|POLLHUP}], 1, 20000) = 0 (Timeout)<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">write(8, "+2+12+11+9Scheduler+2+22+3830255"..., 139) = 139<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">poll([{fd=8, events=POLLIN|POLLHUP}], 1, 20000) = 0 (Timeout)<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">write(8, "+2+12+24+9Scheduler+0+12+13nodes"..., 42) = 42<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">poll([{fd=8, events=POLLIN|POLLHUP}], 1, 20000) = 1 ([{fd=8, revents=POLLIN}])<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">read(8, "+2+1+0+0+1", 262144) = 10<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New"">--- SIGSEGV (Segmentation fault) @ 0 (0) ---<u></u><u></u></span></p>
<p class="MsoNormal" style="text-autospace:none"><span style="font-family:"Courier New""><u></u> <u></u></span></p>
<p class="MsoNormal" style="margin-bottom:12.0pt">If there is any other information I can provide, please let me know as this is reproducible.<u></u><u></u></p>
<p class="MsoNormal"><u></u> <u></u></p>
<p class="MsoNormal"><span style="font-size:10.5pt;font-family:Consolas">--<u></u><u></u></span></p>
<p class="MsoNormal">Mike Stevens <u></u><u></u></p>
<p class="MsoNormal">Senior UNIX Administrator <u></u><u></u></p>
<p class="MsoNormal">Affymetrix | 3420 Central Expressway | Santa Clara, CA 95051
<u></u><u></u></p>
<p class="MsoNormal">Tel: <a href="tel:408-731-5804" value="+14087315804" target="_blank">408-731-5804</a> | Cell: <a href="tel:408-507-5738" value="+14085075738" target="_blank">408-507-5738</a><u></u><u></u></p>
<p class="MsoNormal"><u></u> <u></u></p>
</div></div></div>
</div>
</div>
<br>_______________________________________________<br>
torqueusers mailing list<br>
<a href="mailto:torqueusers@supercluster.org">torqueusers@supercluster.org</a><br>
<a href="http://www.supercluster.org/mailman/listinfo/torqueusers" target="_blank">http://www.supercluster.org/mailman/listinfo/torqueusers</a><br>
<br></blockquote></div><br></div>