[torquedev] Profiling for pbs_mom
Steve Snelgrove
ssnelgrove at clusterresources.com
Wed May 7 15:41:05 MDT 2008
I just added the following section to the Torque Admin manual. If
anyone has much experience with profiling, I would appreciate their
comments and suggestions. Thanks.
http://www.clusterresources.com/torquedocs21/10.1troubleshooting.shtml
------------------------------------------------
Some hard problems in Torque deal with the amount of time spent in
routines. For example, one currently open problem appears to be caused
by the design of the code in linux/mom_mach.c where the statistics are
gathered for the node status. It appears that the */proc* filesystem
that contains information about the kernel and the processes is being
accessed so often on some machines that the responces to some other
message traffic is affected. The machine where this is happening has 128
processors.
To debug these kinds of problems, it can be useful to see where in the
code time is being spent. This is called profiling and there is a linux
utility *gprof* that will output a listing of routines and the amount of
time spent in these routines. This does require that the code be
compiled with special options to instrument the code and to produce a
file, gmon.out, that will be written at the end of program execution.
The following listing shows how to build Torque with profiling enabled.
Notice that the output file for pbs_mom will end up in the mom_priv
directory because its startup code changes the default directory to this
location.
# ./configure "CFLAGS=-pg -lgcov -fPIC"
# make -j5
# make install
# pbs_mom
... do some stuff for a while ...
# momctl -s
# cd /var/spool/torque/mom_priv
# gprof -b `which pbs_mom` gmon.out |less
#
More information about the torquedev
mailing list