<br><font size=2 face="sans-serif">I am running a job on two machines etlpoc4
and etlpoc3.</font>
<br>
<br><font size=2 face="sans-serif">When I run a job as a user that exists
in ldap it first fails to execute, then gets stuck when it fails to clean
up. When run as a local user the job runs fine.</font>
<br><font size=2 face="sans-serif">The state of the job swiches between
running and queued. </font>
<br>
<br><font size=2 face="sans-serif"><b>This is the state of the job:</b></font>
<br>
<br><font size=2 face="sans-serif">Job Id: 66.etlpoc4</font>
<br><font size=2 face="sans-serif">&nbsp; &nbsp; Job_Name = dummy_sort.4035</font>
<br><font size=2 face="sans-serif">&nbsp; &nbsp; Job_Owner = jberlin@etlpoc4</font>
<br><font size=2 face="sans-serif">&nbsp; &nbsp; job_state = R</font>
<br><font size=2 face="sans-serif">&nbsp; &nbsp; queue = batch</font>
<br><font size=2 face="sans-serif">&nbsp; &nbsp; server = etlpoc4</font>
<br><font size=2 face="sans-serif">&nbsp; &nbsp; Checkpoint = u</font>
<br><font size=2 face="sans-serif">&nbsp; &nbsp; ctime = Fri Feb 10 16:36:00
2006</font>
<br><font size=2 face="sans-serif">&nbsp; &nbsp; Error_Path = etlpoc4:/sandbox/jberlin/scratch/run/dummy_sort.4035.e66</font>
<br><font size=2 face="sans-serif">&nbsp; &nbsp; exec_host = etlpoc3/0+etlpoc4/0</font>
<br><font size=2 face="sans-serif">&nbsp; &nbsp; Hold_Types = n</font>
<br><font size=2 face="sans-serif">&nbsp; &nbsp; Join_Path = n</font>
<br><font size=2 face="sans-serif">&nbsp; &nbsp; Keep_Files = n</font>
<br><font size=2 face="sans-serif">&nbsp; &nbsp; Mail_Points = a</font>
<br><font size=2 face="sans-serif">&nbsp; &nbsp; mtime = Mon Feb 13 09:44:37
2006</font>
<br><font size=2 face="sans-serif">&nbsp; &nbsp; Output_Path = etlpoc4:/sandbox/jberlin/scratch/run/dummy_sort.4035.o66</font>
<br><font size=2 face="sans-serif">&nbsp; &nbsp; Priority = 0</font>
<br><font size=2 face="sans-serif">&nbsp; &nbsp; qtime = Fri Feb 10 16:36:00
2006</font>
<br><font size=2 face="sans-serif">&nbsp; &nbsp; Rerunable = True</font>
<br><font size=2 face="sans-serif">&nbsp; &nbsp; Resource_List.neednodes
= 2</font>
<br><font size=2 face="sans-serif">&nbsp; &nbsp; Resource_List.nodect =
2</font>
<br><font size=2 face="sans-serif">&nbsp; &nbsp; Resource_List.nodes =
2</font>
<br><font size=2 face="sans-serif">&nbsp; &nbsp; Resource_List.walltime
= 01:00:00</font>
<br><font size=2 face="sans-serif">&nbsp; &nbsp; Shell_Path_List = /bin/ksh</font>
<br><font size=2 face="sans-serif">&nbsp; &nbsp; substate = 40</font>
<br><font size=2 face="sans-serif">&nbsp; &nbsp; Variable_List = PBS_O_HOME=/home/jberlin,PBS_O_LANG=en_US.UTF-8,</font>
<br><font size=2 face="sans-serif">&nbsp; &nbsp; &nbsp; &nbsp; PBS_O_LOGNAME=jberlin,</font>
<br><font size=2 face="sans-serif">&nbsp; &nbsp; &nbsp; &nbsp; PBS_O_PATH=/prod/software/bin:/usr/local/bin:/opt/syncsort/bin:/opt/SU</font>
<br><font size=2 face="sans-serif">&nbsp; &nbsp; &nbsp; &nbsp; NWspro/bin:/tools/bin:/bin:/usr/bin:/usr/ucb:/usr/ccs/bin:/etc:/usr/etc</font>
<br><font size=2 face="sans-serif">&nbsp; &nbsp; &nbsp; &nbsp; :/usr/bin/X11:/bin:.:/usr/kerberos/bin:/usr/local/bi</font>
<br><font size=2 face="sans-serif">&nbsp; &nbsp; &nbsp; &nbsp; n:/bin:/usr/bin:/usr/X11R6/bin:/u01/app/oracle/product/10.1.0.3:/u01/app/oracle/pr</font>
<br><font size=2 face="sans-serif">&nbsp; &nbsp; &nbsp; &nbsp; oduct/10.1.0.3/bin:/u01/app/oracle/product/10.1.0.3/lib,</font>
<br><font size=2 face="sans-serif">&nbsp; &nbsp; &nbsp; &nbsp; PBS_O_MAIL=/var/spool/mail/jberlin,PBS_O_SHELL=/bin/ksh,</font>
<br><font size=2 face="sans-serif">&nbsp; &nbsp; &nbsp; &nbsp; PBS_O_HOST=etlpoc4,PBS_O_WORKDIR=/sandbox/jberlin/scratch/run,</font>
<br><font size=2 face="sans-serif">&nbsp; &nbsp; &nbsp; &nbsp; PBS_O_QUEUE=batch</font>
<br><font size=2 face="sans-serif">&nbsp; &nbsp; euser = jberlin</font>
<br><font size=2 face="sans-serif">&nbsp; &nbsp; egroup = 107</font>
<br><font size=2 face="sans-serif">&nbsp; &nbsp; hashname = 66.etlpoc4</font>
<br><font size=2 face="sans-serif">&nbsp; &nbsp; queue_rank = 38</font>
<br><font size=2 face="sans-serif">&nbsp; &nbsp; queue_type = E</font>
<br><font size=2 face="sans-serif">&nbsp; &nbsp; comment = Job started
on Mon Feb 13 at 09:44</font>
<br><font size=2 face="sans-serif">&nbsp; &nbsp; etime = Fri Feb 10 16:36:00
2006</font>
<br><font size=2 face="sans-serif">&nbsp; &nbsp; exit_status = -3</font>
<br>
<br><font size=2 face="sans-serif"><b>The server is stuck at:</b></font>
<br>
<br><font size=2 face="sans-serif">02/13/2006 09:51:45;0080;PBS_Server;Req;req_reject;Reject
reply code=15016(Request invalid for state of job), aux=0, type=JobObituary,
from pbs_mom@etlpoc3</font>
<br><font size=2 face="sans-serif">02/13/2006 09:51:45;0100;PBS_Server;Req;;Type
StatusQueue request received from Scheduler@etlpoc4, sock=13</font>
<br><font size=2 face="sans-serif">02/13/2006 09:51:45;0100;PBS_Server;Req;;Type
SelStat request received from Scheduler@etlpoc4, sock=13</font>
<br><font size=2 face="sans-serif">02/13/2006 09:51:45;0100;PBS_Server;Req;;Type
ResourceQuery request received from Scheduler@etlpoc4, sock=13</font>
<br><font size=2 face="sans-serif">02/13/2006 09:51:45;0100;PBS_Server;Req;;Type
RunJob request received from Scheduler@etlpoc4, sock=13</font>
<br><font size=2 face="sans-serif">02/13/2006 09:51:45;0008;PBS_Server;Job;66.etlpoc4;Job
Run at request of Scheduler@etlpoc4</font>
<br><font size=2 face="sans-serif">02/13/2006 09:51:45;0040;PBS_Server;Svr;etlpoc4;Scheduler
sent command recyc</font>
<br><font size=2 face="sans-serif">02/13/2006 09:51:45;0100;PBS_Server;Req;;Type
JobObituary request received from pbs_mom@etlpoc3, sock=10</font>
<br><font size=2 face="sans-serif">02/13/2006 09:51:45;0040;PBS_Server;Svr;etlpoc4;Scheduler
sent command new</font>
<br><font size=2 face="sans-serif">02/13/2006 09:51:45;0100;PBS_Server;Req;;Type
StatusServer request received from Scheduler@etlpoc4, sock=13</font>
<br><font size=2 face="sans-serif">02/13/2006 09:51:45;0100;PBS_Server;Req;;Type
StatusNode request received from Scheduler@etlpoc4, sock=13</font>
<br><font size=2 face="sans-serif">02/13/2006 09:51:45;0100;PBS_Server;Req;;Type
JobObituary request received from pbs_mom@etlpoc3, sock=10</font>
<br><font size=2 face="sans-serif">02/13/2006 09:51:45;0009;PBS_Server;Job;66.etlpoc4;obit
received for job 66.etlpoc4 from host etlpoc3 with bad state (state: QUEUED)</font>
<br><font size=2 face="sans-serif">02/13/2006 09:51:45;0080;PBS_Server;Req;req_reject;Reject
reply code=15016(Request invalid for state of job), aux=0, type=JobObituary,
from pbs_mom@etlpoc3</font>
<br><font size=2 face="sans-serif">02/13/2006 09:51:45;0100;PBS_Server;Req;;Type
StatusQueue request received from Scheduler@etlpoc4, sock=13</font>
<br><font size=2 face="sans-serif">02/13/2006 09:51:45;0100;PBS_Server;Req;;Type
SelStat request received from Scheduler@etlpoc4, sock=13</font>
<br><font size=2 face="sans-serif">02/13/2006 09:51:45;0100;PBS_Server;Req;;Type
ResourceQuery request received from Scheduler@etlpoc4, sock=13</font>
<br><font size=2 face="sans-serif">02/13/2006 09:51:45;0100;PBS_Server;Req;;Type
RunJob request received from Scheduler@etlpoc4, sock=13</font>
<br><font size=2 face="sans-serif">02/13/2006 09:51:45;0008;PBS_Server;Job;66.etlpoc4;Job
Run at request of Scheduler@etlpoc4</font>
<br><font size=2 face="sans-serif">02/13/2006 09:51:45;0040;PBS_Server;Svr;etlpoc4;Scheduler
sent command recyc</font>
<br><font size=2 face="sans-serif">02/13/2006 09:51:45;0100;PBS_Server;Req;;Type
JobObituary request received from pbs_mom@etlpoc3, sock=10</font>
<br><font size=2 face="sans-serif">02/13/2006 09:51:45;0040;PBS_Server;Svr;etlpoc4;Scheduler
sent command new</font>
<br><font size=2 face="sans-serif">02/13/2006 09:51:45;0100;PBS_Server;Req;;Type
StatusServer request received from Scheduler@etlpoc4, sock=13</font>
<br><font size=2 face="sans-serif">02/13/2006 09:51:45;0100;PBS_Server;Req;;Type
StatusNode request received from Scheduler@etlpoc4, sock=13</font>
<br><font size=2 face="sans-serif">02/13/2006 09:51:45;0100;PBS_Server;Req;;Type
JobObituary request received from pbs_mom@etlpoc3, sock=10</font>
<br><font size=2 face="sans-serif">02/13/2006 09:51:45;0009;PBS_Server;Job;66.etlpoc4;obit
received for job 66.etlpoc4 from host etlpoc3 with bad state (state: QUEUED)</font>
<br><font size=2 face="sans-serif">02/13/2006 09:51:45;0080;PBS_Server;Req;req_reject;Reject
reply code=15016(Request invalid for state of job), aux=0, type=JobObituary,
from pbs_mom@etlpoc3</font>
<br>
<br><font size=2 face="sans-serif"><b>At the same time the mom_log on etlpoc3
keeps repeating:</b></font>
<br>
<br><font size=2 face="sans-serif">02/13/2006 09:48:44;0100; &nbsp; pbs_mom;Req;;Type
Commit request received from PBS_Server@etlpoc4, sock=10</font>
<br><font size=2 face="sans-serif">02/13/2006 09:48:44;0100; &nbsp; pbs_mom;Req;;Type
StatusJob request received from PBS_Server@etlpoc4, sock=10</font>
<br><font size=2 face="sans-serif">02/13/2006 09:48:44;0001; &nbsp; pbs_mom;Svr;pbs_mom;Bad
UID for job execution (15023) in 66.etlpoc4, job_start_error from node
172.21.148.216:15003 in job_start_error</font>
<br><font size=2 face="sans-serif">02/13/2006 09:48:44;0001; &nbsp; pbs_mom;Svr;pbs_mom;Bad
UID for job execution (15023) in 66.etlpoc4, abort attempted 16 times in
job_start_error. &nbsp;ignoring abort request from node 172.21.148.216:15003</font>
<br><font size=2 face="sans-serif">02/13/2006 09:48:44;0008; &nbsp; pbs_mom;Req;send_sisters;sending
ABORT to sisters</font>
<br><font size=2 face="sans-serif">02/13/2006 09:48:44;0001; &nbsp; pbs_mom;Req;obit
reply;Job not found for obit reply</font>
<br><font size=2 face="sans-serif">02/13/2006 09:48:44;0001; &nbsp; pbs_mom;Job;66.etlpoc4;server
rejected job obit - unexpected job state</font>
<br><font size=2 face="sans-serif">02/13/2006 09:48:44;0100; &nbsp; pbs_mom;Req;;Type
DeleteJob request received from PBS_Server@etlpoc4, sock=13</font>
<br><font size=2 face="sans-serif">02/13/2006 09:48:44;0080; &nbsp; pbs_mom;Req;req_reject;Reject
reply code=15001(Unknown Job Id REJHOST=etlpoc3 MSG=cannot locate job to
delete), aux=0, type=DeleteJob, from PBS_Server@etlpoc4</font>
<br><font size=2 face="sans-serif">02/13/2006 09:48:44;0100; &nbsp; pbs_mom;Req;;Type
QueueJob request received from PBS_Server@etlpoc4, sock=10</font>
<br><font size=2 face="sans-serif">02/13/2006 09:48:44;0100; &nbsp; pbs_mom;Req;;Type
JobScript request received from PBS_Server@etlpoc4, sock=10</font>
<br><font size=2 face="sans-serif">02/13/2006 09:48:44;0100; &nbsp; pbs_mom;Req;;Type
ReadyToCommit request received from PBS_Server@etlpoc4, sock=10</font>
<br><font size=2 face="sans-serif">02/13/2006 09:48:44;0100; &nbsp; pbs_mom;Req;;Type
Commit request received from PBS_Server@etlpoc4, sock=10</font>
<br><font size=2 face="sans-serif">02/13/2006 09:48:44;0100; &nbsp; pbs_mom;Req;;Type
StatusJob request received from PBS_Server@etlpoc4, sock=10</font>
<br><font size=2 face="sans-serif">02/13/2006 09:48:44;0001; &nbsp; pbs_mom;Svr;pbs_mom;Bad
UID for job execution (15023) in 66.etlpoc4, job_start_error from node
172.21.148.216:15003 in job_start_error</font>
<br><font size=2 face="sans-serif">02/13/2006 09:48:44;0001; &nbsp; pbs_mom;Svr;pbs_mom;Bad
UID for job execution (15023) in 66.etlpoc4, abort attempted 16 times in
job_start_error. &nbsp;ignoring abort request from node 172.21.148.216:15003</font>
<br><font size=2 face="sans-serif">02/13/2006 09:48:44;0008; &nbsp; pbs_mom;Req;send_sisters;sending
ABORT to sisters</font>
<br><font size=2 face="sans-serif">02/13/2006 09:48:44;0001; &nbsp; pbs_mom;Req;obit
reply;Job not found for obit reply</font>
<br><font size=2 face="sans-serif">02/13/2006 09:48:44;0001; &nbsp; pbs_mom;Job;66.etlpoc4;server
rejected job obit - unexpected job state</font>
<br><font size=2 face="sans-serif">02/13/2006 09:48:44;0100; &nbsp; pbs_mom;Req;;Type
DeleteJob request received from PBS_Server@etlpoc4, sock=13</font>
<br><font size=2 face="sans-serif">02/13/2006 09:48:44;0080; &nbsp; pbs_mom;Req;req_reject;Reject
reply code=15001(Unknown Job Id REJHOST=etlpoc3 MSG=cannot locate job to
delete), aux=0, type=DeleteJob, from PBS_Server@etlpoc4</font>
<br><font size=2 face="sans-serif">02/13/2006 09:48:44;0100; &nbsp; pbs_mom;Req;;Type
QueueJob request received from PBS_Server@etlpoc4, sock=10</font>
<br><font size=2 face="sans-serif">02/13/2006 09:48:44;0100; &nbsp; pbs_mom;Req;;Type
JobScript request received from PBS_Server@etlpoc4, sock=10</font>
<br><font size=2 face="sans-serif">02/13/2006 09:48:44;0100; &nbsp; pbs_mom;Req;;Type
ReadyToCommit request received from PBS_Server@etlpoc4, sock=10</font>
<br><font size=2 face="sans-serif">02/13/2006 09:48:44;0100; &nbsp; pbs_mom;Req;;Type
Commit request received from PBS_Server@etlpoc4, sock=10</font>
<br>
<br>
<br><font size=2 face="sans-serif">Any ideas of how to diagnose would be
appreciated.</font>
<br>
<br><font size=2 face="sans-serif">Thanks,</font>
<br>
<br><font size=2 face="sans-serif">Jonas</font>
<br>