[torquedev] Fwd: tried it -- still getting errors.
Glen Beane
glen.beane at gmail.com
Mon Mar 10 11:45:13 MDT 2008
anyone have any ideas here? He's trying torque on OS X 10.5. Others have
had success with torque on Leopard
---------- Forwarded message ----------
From: barman <barman at lowell.edu>
Date: Mon, Mar 10, 2008 at 1:37 PM
Subject: tried it -- still getting errors.
To: Glen Beane <glen.beane at gmail.com>
Hi Glen, I'm back to it now.
I tried the latest 2.3 snapshot (see my description of the steps I took
and the results -- pretty much the same as before).
I'm going to try now using a difference c compiler (the intel C compiler)
and then try the 2.2 and 2.4 snapshots.
OK here is what I did.
1. removed old /var/spool/torque directory (just to be sure)
2. downloaded torque-2.3.0-snap.200803071335
3.
./configure
make
sudo make install
4. sudo torque.setup travis
5. sudo vi /var/spool/torque/server_priv/nodes
(added exo2.lowell.edu)
6. sudo /usr/local/sbin/pbs_mom
7. sudo qterm
8. sudo /usr/local/sbin/pbs_server
9. echo "sleep 30" | qsub
10. qrun 0
11. qstat (after 30 seconds)
Job id Name User Time Use S Queue
------------------------- ---------------- --------------- -------- - -----
0.exo2 STDIN travis 00:00:00 E batch
after a long while, status changes to 'C'
Job id Name User Time Use S Queue
------------------------- ---------------- --------------- -------- - -----
0.exo2 STDIN travis 00:00:00 C batch
And the system log file is fill with those malloc errors.
Mar 10 09:47:19 exo2 pbs_mom[48067]: pbs_mom(48067) malloc: *** error for
object 0x101ba0: Non-aligned pointer being freed (2)\n*** set a breakpoint
in malloc_error_break to debug
Mar 10 09:47:19 exo2 pbs_mom[48067]: pbs_mom(48067) malloc: *** error for
object 0x101bd4: Non-aligned pointer being freed\n*** set a breakpoint in
malloc_error_break to debug
Mar 10 09:47:19 exo2 pbs_mom[48067]: pbs_mom(48067) malloc: *** error for
object 0x100a40: Non-aligned pointer being freed (2)\n*** set a breakpoint
in malloc_error_break to debug
and then
Mar 10 09:47:19 exo2 ReportCrash[48073]: Formulating crash report for
process pbs_mom[48067]
Mar 10 09:47:19 exo2 ReportCrash[48073]: Saved crashreport to
/Library/Logs/CrashReporter/pbs_mom_2008-03-10-094719_exo2.crash using uid:
0 gid: 0, euid: 0 egid: 0
Mar 10 10:02:20 exo2 pbs_mom[48034]: wait_request, connection 10 to host
168453162 has timed out out after 900 seconds - closing stale connection
Mar 10 10:02:20 exo2 postfix/postdrop[48133]: warning: unable to look up
public/pickup: No such file or directory
Mar 10 10:02:21 exo2 deliver[48141]: connect(/var/imap/socket/lmtp) failed:
No such file or directory
Mar 10 10:07:21 exo2 PBS_Server[48040]: sync_node_jobs, stray job
0.exo2.lowell.edu found on exo2.lowell.edu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torquedev/attachments/20080310/7c4ed137/attachment.html
More information about the torquedev
mailing list