[torquedev] Mixing Torque 2.110 and 2.3.0 on a cluster ?
Chris Samuel
csamuel at vpac.org
Thu Mar 27 00:48:52 MDT 2008
----- "Glen Beane" <glen.beane at gmail.com> wrote:
> The job structure has changed slightly between 2.1 and 2.3. I need to
> verify this wouldn't cause any problems - especially witha 2.3 server
> and 2.1 moms!
OK, so we've been doing some playing this afternoon
on a test cluster and we have some good news and some
bad news.
First the bad news:
If you have a 2.1 pbs_server (to avoid job file upgrades) and
you submit a job which has a 2.3 pbs_mom as the mother superior
you will find pbs_server dies with a SEGV in a strlen() during
the obituary stage of the job clean up.
Program terminated with signal 11, Segmentation fault.
#0 0x00c1ce33 in strlen () from /lib/libc.so.6
#0 0x00c1ce33 in strlen () from /lib/libc.so.6
#1 0x0805c43f in req_jobobit (preq=0x8f05cd8) at req_jobobit.c:1624
#2 0x080592fb in process_request (sfds=11) at process_request.c:494
#3 0x00f2998f in wait_request (waittime=10, SState=0x808c03c)
at ../Libnet/net_server.c:320
Dunno if that's exploitable ?
Anyway, that rules out running a 2.1 server with 2.3 moms.
Now the good news:
1) pbsdsh from 2.[13] will talk to pbs_mom from 2.[31].
2) Jobs from a 2.3 pbs_server seem to run fine on a 2.1 pbs_mom
So it's sounding good..
cheers!
Chris
--
Christopher Samuel - (03) 9925 4751 - Systems Manager
The Victorian Partnership for Advanced Computing
P.O. Box 201, Carlton South, VIC 3053, Australia
VPAC is a not-for-profit Registered Research Agency
More information about the torquedev
mailing list