There is a fix for this in place that will be released with 4.1.4. I'm not sure exactly how it happens, but we added some functionality that makes the mom retry sending obits for jobs that are stuck in the exiting state on the mom.<div>
<br></div><div>David<br><br><div class="gmail_quote">On Mon, Dec 10, 2012 at 2:28 AM, Lech Nieroda <span dir="ltr"><<a href="mailto:nieroda.lech@uni-koeln.de" target="_blank">nieroda.lech@uni-koeln.de</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Dear list,<br>
<br>
we are currently running Torque 4.1.3 with Maui 3.3.1. The option<br>
"mom_job_sync" is on. However, we get "stray" jobs quite often - these<br>
are jobs that remain in an "EXITING" state for whatever reason and their<br>
<jobid>.JB files are often left lying around.<br>
<br>
Our workaround: at first we've tried to delete the JB files and restart<br>
the pbs_mom daemon but it turns out that a simple "momctl -h <host> -c<br>
<jobid>" does the job as well. An appropriate script runs now daily with<br>
cron and removes such jobs.<br>
<br>
So, when the server discovers a "stray job" he has the means to send a<br>
"cleaning" command to the pbs_mom but apparently doesn't do it and we<br>
have to do it manually.<br>
<br>
Any option to fix that? Is it a bug?<br>
<br>
Regards,<br>
Lech Nieroda<br>
<br>
--<br>
Dipl.-Wirt.-Inf. Lech Nieroda<br>
Regionales Rechenzentrum der Universität zu Köln (RRZK)<br>
Universität zu Köln<br>
Weyertal 121<br>
Raum 309 (3. Etage)<br>
D-50931 Köln<br>
Deutschland<br>
<br>
Tel.: <a href="tel:%2B49%20%28221%29%20470-89606" value="+4922147089606">+49 (221) 470-89606</a><br>
E-Mail: <a href="mailto:nieroda.lech@uni-koeln.de">nieroda.lech@uni-koeln.de</a><br>
_______________________________________________<br>
torqueusers mailing list<br>
<a href="mailto:torqueusers@supercluster.org">torqueusers@supercluster.org</a><br>
<a href="http://www.supercluster.org/mailman/listinfo/torqueusers" target="_blank">http://www.supercluster.org/mailman/listinfo/torqueusers</a><br>
</blockquote></div><br><br clear="all"><div><br></div>-- <br><div>David Beer | Senior Software Engineer</div><div>Adaptive Computing</div><br>
</div>