[torquedev] memory leaks in torque-server 2.5.11: question
Lukasz Flis
l.flis at cyf-kr.edu.pl
Tue Jul 3 10:21:32 MDT 2012
Hi,
We are running quite a medium computing site in Poland.
Daily we process around 25k jobs - grid workloads and multi node jobs
submitted localy.
We are facing the problem with long running pbs_server process which
after one week or two consumes all the memory available on the machine.
As a result pbs_server is unable to spawn subprocess to unmunge credentials:
06/26/2012 15:58:20;0080;PBS_Server;Req;req_reject;Reject reply
code=15012(PBS_Server System error: Inappropriate ioctl for device
MSG=couldn't create pipe to unmunge), aux=0,
type=AlternateUserAuthentication, from qcg-comp at someserver
06/26/2012 15:59:20;0001;PBS_Server;Svr;PBS_Server;LOG_ERROR::Cannot
allocate memory (12) in pipe_and_read_unmunge, Unable to popen command
'unmunge
--input=/var/spool/torque/server_priv/credentials/munge-15-59-20-640705'
for reading
I took the core dump of a process nearing to 4GB of RSS and VIRT memory.
My question is how can I determine which part of server is leaking
memory from the core file?
Cheers
--
Lukasz Flis
More information about the torquedev
mailing list