I reproduce the error message with following method:<br><b id="internal-source-marker_0.7296439143829048" style="color:rgb(0,0,0);font-family:&#39;Times New Roman&#39;;font-size:medium;font-style:normal;font-variant:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;font-weight:normal"><ol style="margin-top:0pt;margin-bottom:0pt">

<li style="list-style-type:decimal;font-size:15px;font-family:Consolas;color:rgb(0,0,0);background-color:transparent;font-weight:normal;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline">
<span style="font-size:15px;font-family:Consolas;color:rgb(0,0,0);background-color:transparent;font-weight:normal;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">qsub a.sh → job id 30</span></li>

<li style="list-style-type:decimal;font-size:15px;font-family:Consolas;color:rgb(0,0,0);background-color:transparent;font-weight:normal;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline">
<span style="font-size:15px;font-family:Consolas;color:rgb(0,0,0);background-color:transparent;font-weight:normal;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">qsub -W depend=afterok:30 b.sh → job id 31</span></li>

<li style="list-style-type:decimal;font-size:15px;font-family:Consolas;color:rgb(0,0,0);background-color:transparent;font-weight:normal;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline">
<span style="font-size:15px;font-family:Consolas;color:rgb(0,0,0);background-color:transparent;font-weight:normal;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">/etc/init.d/pbs_mom stop</span></li>

<li style="list-style-type:decimal;font-size:15px;font-family:Consolas;color:rgb(0,0,0);background-color:transparent;font-weight:normal;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline">
<span style="font-size:15px;font-family:Consolas;color:rgb(0,0,0);background-color:transparent;font-weight:normal;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">rm /var/spool/torque/server_priv/jobs/30.mgmt.chess.*</span></li>

<li style="list-style-type:decimal;font-size:15px;font-family:Consolas;color:rgb(0,0,0);background-color:transparent;font-weight:normal;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline">
<span style="font-size:15px;font-family:Consolas;color:rgb(0,0,0);background-color:transparent;font-weight:normal;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">/etc/init.d/pbs_mom start</span></li>

<li style="list-style-type:decimal;font-size:15px;font-family:Consolas;color:rgb(0,0,0);background-color:transparent;font-weight:normal;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline">
<span style="font-size:15px;font-family:Consolas;color:rgb(0,0,0);background-color:transparent;font-weight:normal;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">Job 30 will be queued forever, 31 held.</span></li>

<li style="list-style-type:decimal;font-size:15px;font-family:Consolas;color:rgb(0,0,0);background-color:transparent;font-weight:normal;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline">
<span style="font-size:15px;font-family:Consolas;color:rgb(0,0,0);background-color:transparent;font-weight:normal;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre-wrap">You will able to find error message in log &quot;PBS_Server;Req;req_reject;Reject reply code=15001(Unknown Job Id Error), aux=0, type=RegisterDependency, &quot;<br>

</span></li></ol></b><br class="Apple-interchange-newline">See the detail log at the attachment.<br><br><br><div class="gmail_quote">On 1 November 2012 17:25, Clotho Tsang <span dir="ltr">&lt;<a href="mailto:wytsang@clustertech.com" target="_blank">wytsang@clustertech.com</a>&gt;</span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">I also find &quot;Unknown Job Id Error&quot; occasionally when I submit jobs with dependency.<br>Everytime I find the case is related to dependency, but I am not able figure out how<br>

to reproduce it.<div class="HOEnZb"><div class="h5"><br><br><div class="gmail_quote">
On 30 September 2012 18:36, Rhys Hill <span dir="ltr">&lt;<a href="mailto:rhys.hill@adelaide.edu.au" target="_blank">rhys.hill@adelaide.edu.au</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


Hi everyone,<br>
<br>
I have a particular job that I run regularly as part of a development project. On<br>
the occasions where torque gets stuck, this particular job is always lost when the<br>
daemon is restarted, even though all the other jobs seem to return OK. I always get<br>
a message along these lines:<br>
<br>
Unable to requeue job, queue is not defined; job XXX queue batch<br>
<br>
where the qstat -q says:<br>
<br>
server: XXX<br>
<br>
Queue            Memory CPU Time Walltime Node  Run Que Lm  State<br>
---------------- ------ -------- -------- ----  --- --- --  -----<br>
large              --      --    24:00:00   --    0   0 --   E R<br>
long_running       --      --       --      --    0   0 --   E R<br>
image_search       --      --       --      --    0   0 --   E R<br>
batch              --      --    48:00:00   --  122   7 --   E R<br>
                                               ----- -----<br>
                                                 122     7<br>
<br>
so obviously the queue is actually there. I submit the jobs using a script like this:<br>
<br>
---<br>
<br>
#!/bin/sh<br>
DS_JOB=`qsub -l walltime=24:00:00 -l nodes=1:type2:ppn=16 -l vmem=18G ./data_statistics.sh`<br>
<br>
JOBS=`ls */job.sh`<br>
DEPS=;<br>
for j in ${JOBS}; do<br>
        JOB_ID=`qsub -l walltime=24:00:00 -l nodes=1:type2:ppn=16 -W afterok:${DS_JOB} -l vmem=18G $j`<br>
        if [ &quot;${DEPS}x&quot; = &quot;x&quot; ]; then<br>
                DEPS=&quot;afterok:${JOB_ID}&quot;<br>
        else<br>
                DEPS=&quot;${DEPS},afterok:${JOB_ID}&quot;<br>
        fi<br>
done<br>
qsub -l walltime=24:00:00 -l nodes=1:type2:ppn=16 -l vmem=18G -W depend=${DEPS} ./run_report.sh<br>
<br>
---<br>
<br>
ie. the data_statistics.sh job runs first, followed by several instances of job.sh, then run_report.sh<br>
<br>
The server log looks like this in total:<br>
<br>
09/30/2012 19:02:55;0100;PBS_Server;Job;6614.XXX;enqueuing into batch, state 4 hop 1<br>
09/30/2012 19:02:55;0080;PBS_Server;Job;6617.XXX;Unknown Job Id Error<br>
09/30/2012 19:02:55;0080;PBS_Server;Req;req_reject;Reject reply code=15001(Unknown Job Id Error), aux=0, type=RegisterDependency, from @XXX<br>
09/30/2012 19:02:55;0001;PBS_Server;Svr;PBS_Server;LOG_ERROR::reply_send_svr, did not find work task for local request<br>
09/30/2012 19:02:55;0001;PBS_Server;Svr;PBS_Server;LOG_ERROR::pbsd_init_reque, Unable to requeue job, queue is not defined; job 6614.XXX queue batch<br>
09/30/2012 19:02:55;0001;PBS_Server;Req;;Server could not connect to MOM<br>
09/30/2012 19:02:55;0001;PBS_Server;Svr;PBS_Server;LOG_ERROR::job_abt, Unable to abort Job 6614.XXX which was in substate 42<br>
09/30/2012 19:02:55;0080;PBS_Server;Job;6617.XXX;Unknown Job Id Error<br>
09/30/2012 19:02:55;0080;PBS_Server;Req;req_reject;Reject reply code=15001(Unknown Job Id Error), aux=0, type=RegisterDependency, from @XXX<br>
09/30/2012 19:02:55;0001;PBS_Server;Svr;PBS_Server;LOG_ERROR::reply_send_svr, did not find work task for local request<br>
09/30/2012 19:02:55;0100;PBS_Server;Job;6614.XXX;dequeuing from batch, state RUNNING<br>
09/30/2012 19:02:55;0100;PBS_Server;Job;6615.XXX;enqueuing into batch, state 1 hop 1<br>
09/30/2012 19:02:55;0080;PBS_Server;Job;6617.XXX;Unknown Job Id Error<br>
09/30/2012 19:02:55;0080;PBS_Server;Req;req_reject;Reject reply code=15001(Unknown Job Id Error), aux=0, type=RegisterDependency, from @XXX<br>
09/30/2012 19:02:55;0001;PBS_Server;Svr;PBS_Server;LOG_ERROR::reply_send_svr, did not find work task for local request<br>
09/30/2012 19:02:55;0001;PBS_Server;Svr;PBS_Server;LOG_ERROR::pbsd_init_reque, Unable to requeue job, queue is not defined; job 6615.XXX queue batch<br>
09/30/2012 19:02:55;0080;PBS_Server;Job;6617.XXX;Unknown Job Id Error<br>
09/30/2012 19:02:55;0080;PBS_Server;Req;req_reject;Reject reply code=15001(Unknown Job Id Error), aux=0, type=RegisterDependency, from @XXX<br>
09/30/2012 19:02:55;0001;PBS_Server;Svr;PBS_Server;LOG_ERROR::reply_send_svr, did not find work task for local request<br>
09/30/2012 19:02:55;0100;PBS_Server;Job;6615.XXX;dequeuing from batch, state EXITING<br>
09/30/2012 19:02:55;0100;PBS_Server;Job;6616.XXX;enqueuing into batch, state 1 hop 1<br>
09/30/2012 19:02:55;0080;PBS_Server;Job;6617.XXX;Unknown Job Id Error<br>
09/30/2012 19:02:55;0080;PBS_Server;Req;req_reject;Reject reply code=15001(Unknown Job Id Error), aux=0, type=RegisterDependency, from @XXX<br>
09/30/2012 19:02:55;0001;PBS_Server;Svr;PBS_Server;LOG_ERROR::reply_send_svr, did not find work task for local request<br>
09/30/2012 19:02:55;0001;PBS_Server;Svr;PBS_Server;LOG_ERROR::pbsd_init_reque, Unable to requeue job, queue is not defined; job 6616.XXX queue batch<br>
09/30/2012 19:02:55;0080;PBS_Server;Job;6617.XXX;Unknown Job Id Error<br>
09/30/2012 19:02:55;0080;PBS_Server;Req;req_reject;Reject reply code=15001(Unknown Job Id Error), aux=0, type=RegisterDependency, from @XXX<br>
09/30/2012 19:02:55;0001;PBS_Server;Svr;PBS_Server;LOG_ERROR::reply_send_svr, did not find work task for local request<br>
09/30/2012 19:02:55;0100;PBS_Server;Job;6616.XXX;dequeuing from batch, state EXITING<br>
09/30/2012 19:02:55;0100;PBS_Server;Job;6617.XXX;enqueuing into batch, state 2 hop 1<br>
09/30/2012 19:02:55;0080;PBS_Server;Job;6614.XXX;Unknown Job Id Error<br>
09/30/2012 19:02:55;0080;PBS_Server;Req;req_reject;Reject reply code=15001(Unknown Job Id Error), aux=0, type=RegisterDependency, from @XXX<br>
09/30/2012 19:02:55;0001;PBS_Server;Svr;PBS_Server;LOG_ERROR::reply_send_svr, did not find work task for local request<br>
09/30/2012 19:02:55;0001;PBS_Server;Svr;PBS_Server;LOG_ERROR::pbsd_init_reque, Unable to requeue job, queue is not defined; job 6617.XXX queue batch<br>
09/30/2012 19:02:55;0100;PBS_Server;Job;6617.XXX;dequeuing from batch, state EXITING<br>
<br>
we&#39;re using moab for scheduling, if that makes any difference.<br>
<br>
Any ideas?<br>
<br>
Cheers,<br>
<br>
--------------------------------------------------------------------------------<br>
Rhys Hill,                                             Senior Research Associate<br>
Australian Centre for Visual Technologies                 University of Adelaide<br>
<br>
Phone: <a href="tel:%2B61%208%208313%206197" value="+61883136197" target="_blank">+61 8 8313 6197</a>                           Mail:<br>
Fax:   <a href="tel:%2B61%208%208313%204366" value="+61883134366" target="_blank">+61 8 8313 4366</a>                           School of Computer Science<br>
                                                 University of Adelaide<br>
                                                 Adelaide, Australia<br>
<a href="http://www.cs.adelaide.edu.au/%7Erhys/" target="_blank">http://www.cs.adelaide.edu.au/~rhys/</a>             5005<br>
--------------------------------------------------------------------------------<br>
<br>
_______________________________________________<br>
torqueusers mailing list<br>
<a href="mailto:torqueusers@supercluster.org" target="_blank">torqueusers@supercluster.org</a><br>
<a href="http://www.supercluster.org/mailman/listinfo/torqueusers" target="_blank">http://www.supercluster.org/mailman/listinfo/torqueusers</a><br>
</blockquote></div><br><br clear="all"><br></div></div><span class="HOEnZb"><font color="#888888">-- <br>Clotho Tsang<br>Senior Software Engineer<br>Cluster Technology Limited<br>Email: <a href="mailto:clotho@clustertech.com" target="_blank">clotho@clustertech.com</a><br>

Tel: <a href="tel:%28852%29%202655-6129" value="+85226556129" target="_blank">(852) 2655-6129</a><br>
Fax: <a href="tel:%28852%29%202994-2101" value="+85229942101" target="_blank">(852) 2994-2101</a><br>Website: <a href="http://www.clustertech.com" target="_blank">www.clustertech.com</a><br><br>
</font></span></blockquote></div><br><br clear="all"><br>-- <br>Clotho Tsang<br>Senior Software Engineer<br>Cluster Technology Limited<br>Email: <a href="mailto:clotho@clustertech.com" target="_blank">clotho@clustertech.com</a><br>

Tel: (852) 2655-6129<br>Fax: (852) 2994-2101<br>Website: <a href="http://www.clustertech.com" target="_blank">www.clustertech.com</a><br><br>