Hello,<br><br>I have a problem with one node. pbs_mom crashes everytime a job is run from that node (not physically, just torque decides to run it there).<br>There's also a problem when it comes to run a process on that node. I found out that this machine causes the job to stay in state 'running'.<br>
<br>I searched mom_logs and I'm curious about this line:<br> invalid home directory '/bin/sh' specified, not a directory <br>What does it mean? <br>I have a working torque/maui environment with NFS enabled. I'm running the compiled program, which you can find in my post to the list<br>
with subject "Torque with Open MPI". Every node has been configured the same way so I don't understand why this is happening.<br><br>Thank you for a reply.<br>Jozef<br><br>02/26/2008 16:27:38;0100; pbs_mom;Req;;Type QueueJob request received from <a href="mailto:PBS_Server@f135-3.informatika.fpv.umb.sk">PBS_Server@f135-3.informatika.fpv.umb.sk</a>, sock=10<br>
02/26/2008 16:27:38;0100; pbs_mom;Req;;Type JobScript request received from <a href="mailto:PBS_Server@f135-3.informatika.fpv.umb.sk">PBS_Server@f135-3.informatika.fpv.umb.sk</a>, sock=10<br>02/26/2008 16:27:38;0100; pbs_mom;Req;;Type ReadyToCommit request received from <a href="mailto:PBS_Server@f135-3.informatika.fpv.umb.sk">PBS_Server@f135-3.informatika.fpv.umb.sk</a>, sock=10<br>
02/26/2008 16:27:38;0100; pbs_mom;Req;;Type Commit request received from <a href="mailto:PBS_Server@f135-3.informatika.fpv.umb.sk">PBS_Server@f135-3.informatika.fpv.umb.sk</a>, sock=10<br>02/26/2008 16:27:38;0100; pbs_mom;Req;;Type StatusJob request received from <a href="mailto:PBS_Server@f135-3.informatika.fpv.umb.sk">PBS_Server@f135-3.informatika.fpv.umb.sk</a>, sock=10<br>
02/26/2008 16:27:38;0100; pbs_mom;Req;;Type ModifyJob request received from <a href="mailto:PBS_Server@f135-3.informatika.fpv.umb.sk">PBS_Server@f135-3.informatika.fpv.umb.sk</a>, sock=14<br>02/26/2008 16:27:38;0008; pbs_mom;Job;164.f135-3;Job Modified at request of <a href="mailto:PBS_Server@f135-3.informatika.fpv.umb.sk">PBS_Server@f135-3.informatika.fpv.umb.sk</a><br>
02/26/2008 16:27:38;0001; pbs_mom;Job;TMomFinalizeJob3;job not started, Failur<br>e job exec failure, after files staged, no retry (see syslog for more information)<br>02/26/2008 16:27:38;0001; pbs_mom;Job;164.f135-3;ALERT: job failed phase 3 start<br>
02/26/2008 16:27:38;0008; pbs_mom;Req;send_sisters;sending ABORT to sisters<br>02/26/2008 16:27:38;0080; pbs_mom;Svr;preobit_reply;top of preobit_reply<br>02/26/2008 16:27:38;0080; pbs_mom;Svr;preobit_reply;DIS_reply_read/decode_DIS_replySvr worked, top of while loop<br>
02/26/2008 16:27:38;0080; pbs_mom;Svr;preobit_reply;in while loop, no error from job stat<br>02/26/2008 16:27:38;0008; pbs_mom;Job;scan_for_terminated;checking job post-processing routine<br>02/26/2008 16:27:38;0080; pbs_mom;Job;164.f135-3;obit sent to server<br>
02/26/2008 16:27:38;0100; pbs_mom;Req;;Type CopyFiles request received from <a href="mailto:PBS_Server@f135-3.informatika.fpv.umb.sk">PBS_Server@f135-3.informatika.fpv.umb.sk</a>, sock=10<br>02/26/2008 16:27:38;0001; pbs_mom;Svr;pbs_mom;Unknown resource type (15035) in fork_to_user, invalid home directory '/bin/sh' specified<br>
, not a directory<br>02/26/2008 16:27:38;0080; pbs_mom;Req;req_reject;Reject reply code=15035(Unknown resource type REJHOST=<a href="http://f135-13.informatika.fpv.umb.sk">f135-13.informatika.fpv.umb.sk</a> M<br>SG=invalid home directory '/bin/sh' specified, not a directory), aux=0, type=CopyFiles, from <a href="mailto:PBS_Server@f135-3.informatika.fpv.umb.sk">PBS_Server@f135-3.informatika.fpv.umb.sk</a><br>
02/26/2008 16:27:38;0001; pbs_mom;Svr;pbs_mom;Inappropriate ioctl for device (25) in req_cpyfile, fork_to_user failed with rc=-15035 'in<br>valid home directory '/bin/sh' specified, not a directory' - exiting<br>
<br>