Hello Lukasz,<br><br><div class="gmail_quote">On Fri, Feb 24, 2012 at 3:54 PM, Lukasz Flis <span dir="ltr"><<a href="mailto:l.flis@cyf-kr.edu.pl">l.flis@cyf-kr.edu.pl</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Hello Christopher, Hi *<br>
<div class="im"><br>
><br>
> We don't use Lustre (we have Panasas and GPFS), but just wondering<br>
> does this happen all the time, or only occasionally ?<br>
<br>
</div>It happens occasionaly. But as I said - this seems like bug in Lustre<br>
FS, and it has nothing to do with torque code. Torque is using unlucky<br>
sequence of stat/mkdir functions which exposes lustre misbehaviour.<br>
<div class="im"><br>
> If occasionaly then if the job fails once, will it always fail, or<br>
> will it work if you try again?<br>
<br>
</div>Another call to the mkdirtree() function should succeed after few<br>
seconds of sleep.<br>
<br>
I belive this behaviour in Lustre client appeared in 1.8.x line and<br>
remains in 2.1.X. HP SFS IIRC is based on 1.4 and 1.6 so it's not affected.<br></blockquote><div>HP SFS installed here is on Lustre 1.8.4 <br></div><blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
We have observed the BUG in Lustre 1.8.(4,5.6) infrastructure. Then we<br>
moved to 2.1 line replacing all the components (servers,arrays,fabric)<br>
and the bug remained.<br>
<br>
The problem with lustre is that mkdir() call on EXISTING directory<br>
returns EPERM error instead of EEXIST once in a while, usually when<br>
stat() is called before mkdir.<br></blockquote><div> <br>The $tmpdir variable is appended with jobid, so it would be a new path every time,<br></div><div>unless the call is in a way similar to command "mkdir -p /mnt/lustre/scratch/jobs/<job id>"</div>
<div> </div><blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
I belive doing mkdir on a existing path is not very common practice and<br>
that's the reason the BUG was unnoticed for a long time<br>
<br>
Cheers,<br>
--<br>
Lukasz Flis<br>
<div class="HOEnZb"><div class="h5"><br>
<br>
<br>
_______________________________________________<br>
torqueusers mailing list<br>
<a href="mailto:torqueusers@supercluster.org">torqueusers@supercluster.org</a><br>
<a href="http://www.supercluster.org/mailman/listinfo/torqueusers" target="_blank">http://www.supercluster.org/mailman/listinfo/torqueusers</a><br>
</div></div></blockquote></div><br><br clear="all"><br>-- <br>---<br>Rishi Pathak<br>National PARAM Supercomputing Facility<br>C-DAC, Pune, India<br><br><br>