<br><br><div class="gmail_quote">On Mon, Jul 14, 2008 at 1:44 PM, Scott Jackson <<a href="mailto:scott@clusterresources.com">scott@clusterresources.com</a>> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<div><div></div><div class="Wj3C7c">On Fri, 2008-07-11 at 16:28 -0400, Glen Beane wrote:<br>
> I've been working on some changes in trunk that transfer the .OU<br>
> and .ER spool files from pbs_mom back to pbs_server. This is one of<br>
> the steps we need to take so that a job in the COMPLETE state can be<br>
> restarted from a checkpoint file. (the files are only returned to the<br>
> server if keep_completed is positive and the job has a checkpoint<br>
> file)<br>
><br>
> There are problems when the spool file is shared between pbs_server<br>
> and the mother superior pbs_mom. What happens is that when the files<br>
> are "returned" pbs_server takes ownership of the .ER and .OU files in<br>
> the spool dir and when pbs_mom forks to the user to copy the files<br>
> back to the user home directory they are unable to do so because of a<br>
> permission denied error. I feel that the cleanest solution is to just<br>
> separate the pbs_server and pbs_mom spool directories. In my current<br>
> working copy of trunk I have changed pbs_server to use<br>
> server_home/server_spool instead of server_home/spool. pbs_mom<br>
> continues to use server_home/spool. This solves my problems because<br>
> when the spool files are returned to pbs_server pbs_mom retains its<br>
> copy it its own spool directory. It is then free to fork to the user<br>
> to copy the files and then delete them.<br>
><br>
> Are there any objections to this change in trunk? (the change will be<br>
> introduced with the release of TORQUE 2.4.0)<br>
><br>
<br>
<br>
</div></div>No objections from me. This seems like a good approach. Personally, if I<br>
were the architect, I would have a mom, server and sched dir and under<br>
these, I would have log,spool,priv and other such directories. I know it<br>
is a big change. For me, the price of progress is worth it. It would<br>
have to be done in a minor version change (such as 2.3 to 2.4) and would<br>
have to be announced ostentatiously in the release notes.</blockquote><div><br><br>what about the waisted disk space? if pbs_mom and pbs_server are on the same node then pbs_server will get its own copy, and then moments later pbs_mom will delete its copy, so for that short period of time we have two copies of the spool files and we have the wasted time of doing the pbs_mom to pbs_server file transfer...<br>
<br>I think we can get around that, but the code is going to be a bit of a hack, which is why I originally suggested separate spool directories and no differentiation between pbs_server/pbs_mom running on the same host or different hosts<br>
<br><br> </div></div><br>