<br><font size=2><tt>Thanks for the reply, I went ahead and rebuilt on
2.0.0p5 (just to ensure I knew what version I was using), and we tried
some of the things you suggested but struggled to get things to work. Below
is a transcript of what we tried with example qsub commands so that hopefully
you might point out where our mistake is!</tt></font>
<br>
<br><font size=2><tt>> If you use TORQUE 2.0.0p3 or later (last p5 snapshot
is best), you can<br>
> use all job variables in the stageout, like<br>
> "stageout=$HOME/out.txt@headnode:$HOME/out-$PBS_JOBID.txt"</tt></font>
<br>
<br><font size=2><tt>Using TORQUE 2.0.0p5, I tried the following submission
command to see if I could certify this behavior:</tt></font>
<br>
<br><font size=2><tt> $ echo 'echo $PBS_JOBID
> $PBS_JOBID.txt' | qsub -W stageout='$PBS_JOBID.txt@headnode:/home/todd/$PBS_JOBID.txt'</tt></font>
<br>
<br><font size=2><tt>The job failed and the system delivered the following
e-mail notification:</tt></font>
<br>
<br><font size=2><tt> </tt></font><font size=2 face="Courier New">PBS
Job Id: 45480.headnode</font>
<br><font size=2 face="Courier New"> Job
Name: STDIN</font>
<br><font size=2 face="Courier New"> An
error has occurred processing your job, see below.</font>
<br><font size=2 face="Courier New"> Post
job file processing error; job 45480.headnode on host node1/0</font>
<br>
<br><font size=2 face="Courier New"> Unable
to copy file $PBS_JOBID.txt to </font><font size=2><tt>todd</tt></font><font size=2 face="Courier New">@headnode:</font><font size=2><tt>/home/todd/</tt></font><font size=2 face="Courier New">$PBS_JOBID.txt</font>
<br><font size=2 face="Courier New"> >>>
error from copy</font>
<br><font size=2 face="Courier New"> $PBS_JOBID.txt:
No such file or directory</font>
<br><font size=2 face="Courier New"> >>>
end error output</font>
<br>
<br><font size=2 face="Courier New">Which certainly makes it look like
TORQUE didn't interpolate either instance of the $PBS_JOBID environment
variable in the stageout attribute value. What did I do wrong here?</font>
<br>
<br><font size=2><tt>> The transient TMPDIR patch went in at 2.0.0p3
(again, latest p5 snapshot<br>
> is best.)<br>
> <br>
> The job script will still need to cd to $TMPDIR.</tt></font>
<br>
<br><font size=2><tt>Again, using TORQUE 2.0.0p5, I couldn't certify the
expected behavior: i.e. that pbs_mom creates a transient temporary directory,
stores a reference to it in the environment variable $TMPDIR, and then
exports $TMPDIR to the prologue script for its usage on the compute node.</tt></font>
<br>
<br><font size=2><tt>This is what I tried:</tt></font>
<br>
<br><font size=2><tt> $ echo 'echo $TMPDIR'
| qsub</tt></font>
<br><font size=2><tt> 45481.headnode</tt></font>
<br><font size=2><tt> $ ls -la STDIN.*</tt></font>
<br><font size=2><tt> </tt></font><font size=2 face="Courier New">-rw-------
1 nxw18916 gsk_rd 0 Jan
11 2006 STDIN.e45481</font>
<br><font size=2 face="Courier New"> -rw-------
1 nxw18916 gsk_rd 1 Jan
11 2006 STDIN.o45481</font>
<br><font size=2 face="Courier New"> $
perl -e 'open F,"<STDIN.o45481";my $s=<F>;chomp $s;print
qq(single byte is \\n\n) if $s eq ""'</font>
<br><font size=2 face="Courier New"> single
byte is \n</font>
<br>
<br><font size=2><tt>So the shell seems to have interpolated $TMPDIR to
the empty string; thus the single-byte contents of STDIN.o45481 was the
newline put out by echo. (Incidentally an 'echo -n' resulted in a zero-width
file, but I didn't think of trying this for greater clarity until I had
already copied the transcript. My apologies.)</tt></font>
<br>
<br><font size=2><tt>I believe that I understand how the end result is
supposed to work: in the prologue script, I 'cd' into the pbs_mom-created
temporary directory referenced by $TMPDIR to do my work and then pbs_mom
will remove this directory after the job completes. So why does $TMPDIR
always evaluate to a zero-width string? What's going on with this?</tt></font>
<br>
<br><font size=2><tt>Thanks in advance for any assistance.</tt></font>
<br>
<br><font size=2><tt>Best,</tt></font>
<br><font size=2><tt>Nate</tt></font>
<br>
<br>
<br>
<br>
<br>
<br><font size=2><tt>On Fri, Jan 06, 2006 at 11:45:58AM -0500, nathaniel.x.woody@gsk.com
alleged:<br>
> I have a number of batch processes that I'm running with torque that
all <br>
> run the same exact process on different pieces of a large data file.
The <br>
> process creates a number of intermediate files and in the end produces
a <br>
> file to be staged-out. My problem is that as soon as more than
one job is <br>
> executing on a node, these files have the chance to stomp all over
each <br>
> other (ie Job 1 and job 2 are running on a node, Job 1 completes and
<br>
> out.txt is staged out and then deleted (which confuses job 2) because
they <br>
> all run in the same directory (the user's home directory). <br>
<br>
If you use TORQUE 2.0.0p3 or later (last p5 snapshot is best), you can<br>
use all job variables in the stageout, like<br>
"stageout=$HOME/out.txt@headnode:$HOME/out-$PBS_JOBID.txt"<br>
<br>
<br>
> What I would like to do is to convince torque to run the job in a
clean <br>
> directory (for instance, ~/00001.somehose.com), so that I can keep
the <br>
> jobs seperate without having to jump through file-renaming hoops or
making <br>
> the job start creating directories, etc. Torque essentially
does this for <br>
> the standard out and standard error files (by naming them by job id),
but <br>
> I can't seem to figure out how to get the desired behavior. Looking
<br>
> through the archives, I found a reference to something similar to
this <br>
> related to a patch that caused mom to create a temporary directory.
<br>
> However, this was a patch for torque 1.0.1 or so, and it doesn't appear
to <br>
> have been incorporated at any point. <br>
<br>
The transient TMPDIR patch went in at 2.0.0p3 (again, latest p5 snapshot<br>
is best.)<br>
<br>
The job script will still need to cd to $TMPDIR.<br>
<br>
<br>
> I've also noticed the rootdir and initdir parameters that I can set,
but I <br>
> don't think those create a directory if one doesn't already exist.<br>
<br>
Correct. qsub's -d is handy, but the directory must already exist.<br>
<br>
Some of my users create unique jobnames and do something like this:<br>
mkdir $jobname<br>
qsub -N $jobname -d $jobname<br>
<br>
<br>
> Is there a facility for doing what I describe here, or am I going
to have <br>
> do all of the work in the job script?<br>
<br>
It sounds like TORQUE does have some options for you. Let us know
if<br>
you need something else.<br>
<br>
-- <br>
Garrick Staples, Linux/HPCC Administrator<br>
University of Southern California<br>
[attachment "attm1nmb.dat" deleted by Nathaniel X Woody/PharmRD/GSK]
_______________________________________________<br>
torqueusers mailing list<br>
torqueusers@supercluster.org<br>
http://www.supercluster.org/mailman/listinfo/torqueusers<br>
</tt></font>
<br>