<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD><TITLE>PBS Scheduling Weirdness</TITLE>
<META http-equiv=Content-Type content="text/html; charset=us-ascii">
<META content="MSHTML 6.00.2900.3492" name=GENERATOR></HEAD>
<BODY text=#000000 bgColor=#ffffff>
<DIV dir=ltr align=left><SPAN class=030115617-20052009><FONT face=Verdana
size=2>Thank you for the thought out responses, we've solved the
issue.</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=030115617-20052009><FONT face=Verdana
size=2></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=030115617-20052009><FONT face=Verdana
size=2>Unfortunately the pbs_server was locked up and i did not know it. restart
attempts were appearing to work but we ended up killing it -9. After restarting
it, the jobs are working well.</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=030115617-20052009><FONT face=Verdana
size=2></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=030115617-20052009><FONT face=Verdana
size=2>Thanks again.</FONT></SPAN></DIV><BR>
<BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px">
<DIV class=OutlookMessageHeader lang=en-us dir=ltr align=left>
<HR tabIndex=-1>
<FONT face=Tahoma size=2><B>From:</B> Jerry Smith [mailto:jdsmit@sandia.gov]
<BR><B>Sent:</B> Wednesday, May 20, 2009 1:29 PM<BR><B>To:</B> Edsall, William
(WJ)<BR><B>Cc:</B> torqueusers@supercluster.org<BR><B>Subject:</B> Re:
[torqueusers] PBS Scheduling Weirdness<BR></FONT><BR></DIV>
<DIV></DIV><TT>Try: <BR></TT><FONT face=Verdana><FONT size=2>echo "sleep 10" |
qsub -l nodes=node4:ppn=4<BR></FONT></FONT><FONT size=2><TT>or
<BR></TT></FONT><FONT face=Verdana><FONT size=2>echo "sleep 10" | qsub -l
nodes=1:ppn=4<BR></FONT></FONT><FONT size=2><TT><BR></TT></FONT><TT>Does this
change anything?</TT><BR><FONT
size=2><TT><BR>--Jerry<BR></TT></FONT><BR>Edsall, William (WJ) wrote:
<BLOCKQUOTE
cite=mid:52CD990A674498429E6A7B4FCAE3F7D3028F71EF@USMDLMDOWX025.dow.com
type="cite">
<META content="MSHTML 6.00.2900.3492" name=GENERATOR>
<DIV dir=ltr align=left><FONT face=Verdana><FONT size=2><SPAN
class=357351016-20052009>I usually test with a STDIN command such as this.
</SPAN></FONT></FONT></DIV>
<DIV dir=ltr align=left><FONT face=Verdana><FONT size=2><SPAN
class=357351016-20052009></SPAN></FONT></FONT> </DIV>
<DIV dir=ltr align=left><FONT face=Verdana><FONT size=2><SPAN
class=357351016-20052009>> </SPAN>echo "sleep 10" | qsub -l
nodes=1:node4:ppn=4</FONT></FONT></DIV>
<DIV> </DIV>
<DIV><FONT face=Verdana size=2><SPAN class=357351016-20052009>My job runs,
but as you can see i only get one cpu, on the wrong resource. This is the
same as requesting multiple nodes. This was working and works on our other
clusters but as of monday this week it fails.</SPAN></FONT></DIV>
<DIV><FONT face=Verdana size=2><SPAN
class=357351016-20052009></SPAN></FONT> </DIV>
<DIV><FONT face=Verdana size=2><SPAN class=357351016-20052009>> qstat -f
1059<BR>Job Id: 1059<BR> Job_Name =
STDIN<BR> Job_Owner =
<deleted><BR> job_state = R<BR>
queue = batch<BR> server =
<deleted>com<BR> Checkpoint =
u<BR> ctime = Wed May 20 11:46:18
2009<BR> Error_Path = <deleted></SPAN></FONT></DIV>
<DIV><FONT face=Verdana size=2><SPAN
class=357351016-20052009><STRONG> exec_host =
node2/0</STRONG><BR> Hold_Types = n<BR>
Join_Path = n<BR> Keep_Files = n<BR>
Mail_Points = a<BR> mtime = Wed May 20 11:46:26
2009<BR> Output_Path =
<deleted>/STDIN.o1059<BR> Priority =
0<BR> qtime = Wed May 20 11:46:18
2009<BR> Rerunable = True<BR>
Resource_List.neednodes = 1<BR> Resource_List.nodect =
1<BR> Resource_List.nodes = 1<BR>
Resource_List.walltime = 01:00:00<BR> session_id =
12814<BR> substate = 42<BR>
Variable_List =
PBS_O_HOME=/home/<deleted>,PBS_O_LANG=POSIX,<BR>
PBS_O_LOGNAME=<deleted>,<BR>
PBS_O_PATH=/usr/local/torque/sbin:/usr/local/torque/bin:/usr/bin:/bin<BR>
:/usr/sbin:/sbin:/usr/local/bin:/usr/bin/X11:/usr/X11R6/bin:/usr/games<BR>
:/opt/kde3/bin:/usr/lib/mit/bin:/usr/lib/mit/sbin:/usr/lib/qt3/bin,<BR>
PBS_O_MAIL=/var/mail/<deleted>,PBS_O_SHELL=/bin/tcsh,<BR>
PBS_SERVER=txmerig.nam.dow.com,PBS_O_HOST=txmerig.nam.dow.com,<BR>
PBS_O_WORKDIR=/home/<deleted>,PBS_O_QUEUE=batch<BR>
euser = <deleted><BR> egroup =
users<BR> hashname =
1059.<deleted>.com<BR> queue_rank =
996<BR> queue_type = E<BR> comment = Job
started on Wed May 20 at 11:46<BR> etime = Wed May 20
11:46:18 2009<BR> submit_args = -l
nodes=1:node4:ppn=4<BR> start_time = Wed May 20 11:46:26
2009<BR> start_count = 1<BR></SPAN></FONT></DIV>
<DIV> </DIV>
<DIV><SPAN class=357351016-20052009></SPAN><FONT face=Verdana><FONT
size=2>o<SPAN class=357351016-20052009>n other known working clusters,
requesting resources in the same fasion works fine as seen
here:</SPAN></FONT></FONT></DIV>
<DIV><FONT face=Verdana><FONT size=2><SPAN
class=357351016-20052009> exec_host =
node14/3+node14/2+node14/1+node14/0+node13/3+node13/2+node13/1<BR>
+node13/0<BR></SPAN></FONT></FONT><BR></DIV>
<BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px">
<DIV class=OutlookMessageHeader lang=en-us dir=ltr align=left>
<HR tabIndex=-1>
<FONT face=Tahoma size=2><B>From:</B> Jerry Smith [<A
class=moz-txt-link-freetext
href="mailto:jdsmit@sandia.gov">mailto:jdsmit@sandia.gov</A>]
<BR><B>Sent:</B> Wednesday, May 20, 2009 12:02 PM<BR><B>To:</B> Edsall,
William (WJ)<BR><B>Cc:</B> <A class=moz-txt-link-abbreviated
href="mailto:torqueusers@supercluster.org">torqueusers@supercluster.org</A><BR><B>Subject:</B>
Re: [torqueusers] PBS Scheduling Weirdness<BR></FONT><BR></DIV><TT>Sorry I
forgot to ask this as well, can we get a copy of the script you are
submitting and the qsub command you are
using?<BR><BR>Jerry<BR></TT><BR>Edsall, William (WJ) wrote:
<BLOCKQUOTE
cite=mid:52CD990A674498429E6A7B4FCAE3F7D3028F71A4@USMDLMDOWX025.dow.com
type="cite">
<META content="MSHTML 6.00.2900.3492" name=GENERATOR>
<DIV dir=ltr align=left><SPAN class=971522515-20052009><FONT
face=Verdana size=2>Hello,</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=971522515-20052009><FONT
face=Verdana size=2> Here is the output. I'm using the torque
scheduler - maui is on the system but not running.</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN
class=971522515-20052009></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=971522515-20052009><FONT
face=Verdana size=2># qmgr -c "p s"<BR>#<BR># Create queues and set
their attributes.<BR>#<BR>#<BR># Create and define queue
batch<BR>#<BR>create queue batch<BR>set queue batch queue_type =
Execution<BR>set queue batch resources_default.nodes = 1<BR>set queue
batch resources_default.walltime = 01:00:00<BR>set queue batch enabled =
True<BR>set queue batch started = True<BR>#<BR># Set server
attributes.<BR>#<BR>set server scheduling = True<BR>set server acl_hosts
= txmerig<BR><U><FONT color=#0000ff>//stripped out the list of managers
and operators</FONT></U><BR>set server default_queue = batch<BR>set
server log_events = 511<BR>set server mail_from = adm<BR>set server
scheduler_iteration = 600<BR>set server node_check_rate = 150<BR>set
server tcp_timeout = 6<BR>set server next_job_number =
1054<BR></FONT></SPAN></DIV><BR>
<BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px">
<DIV class=OutlookMessageHeader lang=en-us dir=ltr align=left>
<HR tabIndex=-1>
<FONT face=Tahoma size=2><B>From:</B> Jerry Smith [<A
class=moz-txt-link-freetext href="mailto:jdsmit@sandia.gov"
moz-do-not-send="true">mailto:jdsmit@sandia.gov</A>] <BR><B>Sent:</B>
Tuesday, May 19, 2009 4:05 PM<BR><B>To:</B> Edsall, William
(WJ)<BR><B>Cc:</B> <A class=moz-txt-link-abbreviated
href="mailto:torqueusers@supercluster.org"
moz-do-not-send="true">torqueusers@supercluster.org</A><BR><B>Subject:</B>
Re: [torqueusers] PBS Scheduling Weirdness<BR></FONT><BR></DIV><TT>Can
you give us the output from:<BR><BR>qmgr -c "p s" <BR><BR>and are you
using any external scheduler, Maui or Moab or the
like?<BR><BR>Thanks,<BR><BR>--Jerry<BR></TT><BR>Edsall, William (WJ)
wrote:
<BLOCKQUOTE
cite=mid:52CD990A674498429E6A7B4FCAE3F7D3028F6EB1@USMDLMDOWX025.dow.com
type="cite">
<META content="MS Exchange Server version 6.5.7654.12"
name=Generator><!-- Converted from text/rtf format -->
<P><FONT face=Verdana size=2>Hello list,</FONT> <BR><FONT
face=Verdana size=2> Having a strange problem with torque
version: 2.4.0b1.</FONT> </P>
<P><FONT face=Verdana size=2>It seems that no matter how much
resource I request, I only get one cpu on the first available
node.</FONT> </P>
<P><FONT face=Verdana size=2>Please help me brainstorm the possible
causes.</FONT> <BR><BR><B><FONT face="Courier New" color=#ff0000
size=2>_______________________________________</FONT></B><BR><FONT
face="Courier New" color=#808080 size=2>William J.
Edsall</FONT><FONT face="Times New Roman"
color=#808080><BR></FONT></P><BR></BLOCKQUOTE></BLOCKQUOTE></BLOCKQUOTE></BLOCKQUOTE></BLOCKQUOTE></BLOCKQUOTE></BODY></HTML>