[torquedev] Mixing Torque 2.110 and 2.3.0 on a cluster ?

Chris Samuel csamuel at vpac.org
Thu Mar 27 00:48:52 MDT 2008


----- "Glen Beane" <glen.beane at gmail.com> wrote:

> The job structure has changed slightly between 2.1 and 2.3. I need to
> verify this wouldn't cause any problems - especially witha 2.3 server
> and 2.1 moms!

OK, so we've been doing some playing this afternoon
on a test cluster and we have some good news and some
bad news.

First the bad news:

If you have a 2.1 pbs_server (to avoid job file upgrades) and
you submit a job which has a 2.3 pbs_mom as the mother superior
you will find pbs_server dies with a SEGV in a strlen() during
the obituary stage of the job clean up.  

Program terminated with signal 11, Segmentation fault.
#0  0x00c1ce33 in strlen () from /lib/libc.so.6
#0  0x00c1ce33 in strlen () from /lib/libc.so.6
#1  0x0805c43f in req_jobobit (preq=0x8f05cd8) at req_jobobit.c:1624
#2  0x080592fb in process_request (sfds=11) at process_request.c:494
#3  0x00f2998f in wait_request (waittime=10, SState=0x808c03c)
    at ../Libnet/net_server.c:320

Dunno if that's exploitable ?

Anyway, that rules out running a 2.1 server with 2.3 moms.

Now the good news:

1) pbsdsh from 2.[13] will talk to pbs_mom from 2.[31].

2) Jobs from a 2.3 pbs_server seem to run fine on a 2.1 pbs_mom

So it's sounding good..

cheers!
Chris
-- 
Christopher Samuel - (03) 9925 4751 - Systems Manager
 The Victorian Partnership for Advanced Computing
 P.O. Box 201, Carlton South, VIC 3053, Australia
VPAC is a not-for-profit Registered Research Agency


More information about the torquedev mailing list