[torquedev] torque 4.1.2 login shells problem and bash -l workaround
chenry at ittc.ku.edu
Thu Oct 18 12:34:57 MDT 2012
I have been following the torque 4 development, and I'm currently using torque 4.1.2 on RHEL6.2. I have found that I cannot get cluster jobs to run correctly without using "#!/bin/bash -l" in each script. A few sites (academic and government) are listing this workaround in their cluster FAQs.
Our site uses mpi-selector and needs to source /etc/profile for every cluster job (interactive or not). I'm going to get a million "why is mpiexec not found questions" if I have to rely on the workaround instead of addressing the problem. I have looked for settings in the documentation and read the source code.
The relevant settings are defined globally inside src/resmom/mom_main.c
... (line 205)
int src_login_batch = TRUE;
int src_login_interactive = TRUE;
and used in src/resmom/start_exec.c
... (line 3736)
if (((TJE->is_interactive == TRUE) && (src_login_interactive == FALSE)) ||
((TJE->is_interactive != TRUE) && (src_login_batch == FALSE)))
Where those values are declared as "extern int", so the values from mom_main.c are accessible once the binaries are linked.
There's no error message from the source_login_shells_or_not function, and the code looks very similar to the torque-3 code (except for being wrapped up into functions). Can anyone shed some light on the problem?
More information about the torquedev