<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body text="#000000" bgcolor="#ffffff">
Just an FYI - the job would run once I used qrun. Does this point<br>
to the scheduler? (I'm just using the default scheduler that comes<br>
with Torque (i.e. not Maui).<br>
<br>
Thanks!<br>
<br>
Jeff<br>
<br>
<blockquote cite="mid:501ED091.8080208@att.net" type="cite">
<meta http-equiv="content-type" content="text/html;
charset=ISO-8859-1">
Good afternoon,<br>
<br>
I apologize for the eternal question, "why isn't my job running"<br>
but I'm not sure where to look next. I'm running Torque 4.0.2<br>
that I built on a Scientific Linux 6.2 box. <br>
<br>
The job script is,<br>
<br>
#!/bin/bash<br>
#PBS -q batch<br>
#PBS -l walltime=00:10:00<br>
#PBS -l nodes=1:ppn=1<br>
<br>
date<br>
hostname<br>
sleep 20<br>
date<br>
<br>
<br>
I submit using qsub and then "qstat -a" looks like,<br>
<br>
<tt>[laytonjb@test1 TEST]$ qstat -a<br>
<br>
test1: <br>
Req'd Req'd Elap<br>
Job ID Username Queue Jobname
SessID NDS TSK Memory Time S Time<br>
-------------------- ----------- -------- ----------------
------ ----- ------ ------ ----- - -----<br>
11.test1 laytonjb batch pbs_test2
-- 1 1 -- 00:10 Q -- </tt><br>
<br>
<br>
It stays like this forever. I looked in the logs and didn't see
any<br>
anything obvious. Here is some output that may help.<br>
<br>
<br>
Server logs:<br>
<br>
08/05/2012 16:15:35;0100;PBS_Server;Job;11.test1;enqueuing into
batch, state 1 hop 1<br>
08/05/2012 16:15:35;0008;PBS_Server;Job;11.test1;Job Queued at
request of laytonjb@test1, owner = laytonjb@test1, job name =
pbs_test2, queue = batch<br>
<br>
<br>
Scheduler logs: (FIFO scheduler):<br>
<br>
08/05/2012 15:44:44;0002; pbs_sched;Svr;die;caught signal 15<br>
08/05/2012 15:44:44;0002; pbs_sched;Svr;Log;Log closed<br>
08/05/2012 15:44:44;0002; pbs_sched;Svr;Log;Log opened<br>
08/05/2012 15:44:44;0002; pbs_sched;Svr;TokenAct;Account file
/opt/torque/sched_priv/accounting/20120805 opened<br>
08/05/2012 15:44:44;0002;
pbs_sched;Svr;main;/opt/torque/sbin/pbs_sched startup pid 4782<br>
<br>
<br>
pbs_mom logs: (I tried restarting the mom ("service pbs_mom
restart") and the output is below)<br>
<br>
08/05/2012 16:17:28;0002; pbs_mom;n/a;rm_request;shutdown<br>
08/05/2012 16:17:28;0002; pbs_mom;n/a;dep_cleanup;dependent
cleanup<br>
08/05/2012 16:17:28;0002; pbs_mom;Svr;Log;Log closed<br>
08/05/2012 16:17:31;0002; pbs_mom;Svr;Log;Log opened<br>
08/05/2012 16:17:31;0002; pbs_mom;Svr;pbs_mom;Torque Mom Version
= 4.0.2, loglevel = 0<br>
08/05/2012 16:17:31;0002; pbs_mom;Svr;setpbsserver;test1<br>
08/05/2012 16:17:31;0002; pbs_mom;Svr;mom_server_add;server
test1 added<br>
08/05/2012 16:17:31;0001; pbs_mom;Svr;pbs_mom;LOG_ERROR::No such
file or directory (2) in check_partition_confirm_script, Couldn't
stat the partition confirm command '/opt/moab/default/tools/xt4/<a
moz-do-not-send="true" href="http://partition.create.xt4.pl">partition.create.xt4.pl</a>'
- ignore this if you aren't running a cray<br>
08/05/2012 16:17:31;0002; pbs_mom;n/a;initialize;independent<br>
08/05/2012 16:17:31;0080; pbs_mom;Svr;pbs_mom;before
init_abort_jobs<br>
08/05/2012 16:17:31;0002; pbs_mom;Svr;pbs_mom;Is up<br>
08/05/2012 16:17:31;0002;
pbs_mom;Svr;setup_program_environment;MOM executable path and
mtime at launch: /usr/sbin/pbs_mom 1344179259<br>
08/05/2012 16:17:31;0002; pbs_mom;Svr;pbs_mom;Torque Mom Version
= 4.0.2, loglevel = 0<br>
<br>
<br>
pbsnodes -a:<br>
<br>
[root@test1 mom_logs]# pbsnodes -a<br>
n0001<br>
state = free<br>
np = 1<br>
ntype = cluster<br>
status =
rectime=1344197869,varattr=,jobs=,state=free,netload=120587595,gres=,loadave=0.02,ncpus=3,physmem=2956668kb,availmem=2836956kb,totmem=2956668kb,idletime=4196,nusers=1,nsessions=1,sessions=1560,uname=Linux
n0001 2.6.32-220.el6.x86_64 #1 SMP Sat Dec 10 17:04:11 CST 2011
x86_64,opsys=linux<br>
mom_service_port = 15002<br>
mom_manager_port = 15003<br>
gpus = 0<br>
<br>
<br>
<br>
qmgr -c "p s":<br>
[root@test1 mom_logs]# qmgr -c "p s"<br>
#<br>
# Create queues and set their attributes.<br>
#<br>
#<br>
# Create and define queue batch<br>
#<br>
create queue batch<br>
set queue batch queue_type = Execution<br>
set queue batch resources_default.nodes = 1<br>
set queue batch resources_default.walltime = 01:00:00<br>
set queue batch enabled = True<br>
set queue batch started = True<br>
#<br>
# Set server attributes.<br>
#<br>
set server scheduling = True<br>
set server acl_hosts = test1<br>
set server managers = laytonjb@test1<br>
set server operators = laytonjb@test1<br>
set server default_queue = batch<br>
set server log_events = 511<br>
set server mail_from = adm<br>
set server scheduler_iteration = 600<br>
set server node_check_rate = 150<br>
set server tcp_timeout = 300<br>
set server job_stat_rate = 45<br>
set server poll_jobs = True<br>
set server mom_job_sync = True<br>
set server next_job_number = 12<br>
set server moab_array_compatible = True<br>
<br>
<br>
Not sure where to start looking from here.<br>
<br>
TIA!<br>
<br>
Jeff<br>
<br>
<pre wrap="">
<fieldset class="mimeAttachmentHeader"></fieldset>
_______________________________________________
torqueusers mailing list
<a class="moz-txt-link-abbreviated" href="mailto:torqueusers@supercluster.org">torqueusers@supercluster.org</a>
<a class="moz-txt-link-freetext" href="http://www.supercluster.org/mailman/listinfo/torqueusers">http://www.supercluster.org/mailman/listinfo/torqueusers</a>
</pre>
</blockquote>
<br>
</body>
</html>