[torquedev] SMP system issues with pbs_mom, in mom_mach.c (15007)
Andrew Keen
keenandr at msu.edu
Thu May 29 10:56:36 MDT 2008
Hi,
I'm running into the error described at:
http://www.clusterresources.com/pipermail/torqueusers/2007-August/006046.html
on our 128 CPU SMP system.
But we're not running CPUSets, so the provided patch won't work. Here's
the gprof output (time was not reporting correctly)
gprof -b /usr/local/sbin/pbs_mom gmon.out
Flat profile:
Each sample counts as 0.000976562 seconds.
no time accumulated
% cumulative self self total
time seconds seconds calls Ts/call Ts/call name
0.00 0.00 0.00 3590616 0.00 0.00 injob
0.00 0.00 0.00 97512 0.00 0.00 str_nc_cmp
0.00 0.00 0.00 31997 0.00 0.00 get_proc_stat
0.00 0.00 0.00 12540 0.00 0.00 clear_attr
0.00 0.00 0.00 5286 0.00 0.00 find_resc_def
0.00 0.00 0.00 4890 0.00 0.00 find_attr
0.00 0.00 0.00 4422 0.00 0.00 find_resc_entry
-snip-
Call graph
granularity: each sample hit covers 4 byte(s) no time propagated
index % time self children called name
0.00 0.00 1107369/3590616 mem_sum [20]
0.00 0.00 1107369/3590616 resi_sum [22]
0.00 0.00 1375878/3590616 cput_sum [17]
[1] 0.0 0.00 0.00 3590616 injob [1]
Migrating the mom to 2.3 has reduced the impact on the server, but the
mom still spends a lot of time crawling the /proc tree.
-Andy
More information about the torquedev
mailing list