[torquedev] memory leaks in torque-server 2.5.11: question

Lukasz Flis l.flis at cyf-kr.edu.pl
Tue Jul 3 10:21:32 MDT 2012


Hi,

We are running quite a medium computing site in Poland.
Daily we process around 25k jobs - grid workloads and multi node jobs 
submitted localy.

We are facing the problem with long running pbs_server process which 
after one week or two consumes all the memory available on the machine.
As a result pbs_server is unable to spawn subprocess to unmunge credentials:

06/26/2012 15:58:20;0080;PBS_Server;Req;req_reject;Reject reply 
code=15012(PBS_Server System error: Inappropriate ioctl for device 
MSG=couldn't create pipe to unmunge), aux=0, 
type=AlternateUserAuthentication, from qcg-comp at someserver
06/26/2012 15:59:20;0001;PBS_Server;Svr;PBS_Server;LOG_ERROR::Cannot 
allocate memory (12) in pipe_and_read_unmunge, Unable to popen command 
'unmunge 
--input=/var/spool/torque/server_priv/credentials/munge-15-59-20-640705' 
for reading

I took the core dump of a process nearing to 4GB of RSS and VIRT memory.

My question is how can I determine which part of server is leaking 
memory from the core file?

Cheers
--
Lukasz Flis


More information about the torquedev mailing list