<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv=Content-Type content="text/html; charset=us-ascii">
<meta name=Generator content="Microsoft Word 11 (filtered medium)">
<style>
<!--
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0cm;
        margin-bottom:.0001pt;
        font-size:12.0pt;
        font-family:"Times New Roman";
        color:black;}
a:link, span.MsoHyperlink
        {color:blue;
        text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
        {color:blue;
        text-decoration:underline;}
pre
        {margin:0cm;
        margin-bottom:.0001pt;
        font-size:10.0pt;
        font-family:"Courier New";
        color:black;}
span.EmailStyle18
        {mso-style-type:personal-reply;
        font-family:Arial;
        color:navy;}
@page Section1
        {size:595.3pt 841.9pt;
        margin:72.0pt 90.0pt 72.0pt 90.0pt;}
div.Section1
        {page:Section1;}
-->
</style>
<!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body bgcolor=white lang=EN-AU link=blue vlink=blue>
<div class=Section1>
<div>
<p class=MsoNormal><font size=2 color=navy face=Arial><span style='font-size:
10.0pt;font-family:Arial;color:navy'>I am attempting to automatic job pre-emption
using the maui preemptor/premptee queue mechanism with openmpi jobs being
pre-empted by blcr. Almost all of this works, and my thanks are due to Eric
Roman of lbl who has written a procedure cr_mpirun to facilitate the openmpi-blcr
interaction , which he intends to release soon. I believe that the remaining
problem is caused by a maui-torque interaction on which I am seeking advice
here.<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 color=navy face=Arial><span style='font-size:
10.0pt;font-family:Arial;color:navy'><o:p> </o:p></span></font></p>
<p class=MsoNormal><font size=2 color=navy face=Arial><span lang=EN-US
style='font-size:10.0pt;font-family:Arial;color:navy'>The problem is that for
some jobs, identified below, the job on which a hold has been placed restarts
instantly and is checkpointed again. This results in the time stamp on the ckpt
file not matching what was expected, so the pre-empted job goes into the W
state and the preemptor cannot start. (The other problem of incomplete .o files
is minor by comparison because the simple workaround identified will
suffice until the problem is fixed). I have experimented unsuccessfully with a
few modifications to maui, as indicated below. I was hoping for some advice on
what else I might try. I would be interested to know whether the
moab-torque-openmpi-blcr combination is working for anyone.<o:p></o:p></span></font></p>
<p class=MsoNormal><font size=2 color=navy face=Arial><span lang=EN-US
style='font-size:10.0pt;font-family:Arial;color:navy'>Thank you. Greg Doherty <o:p></o:p></span></font></p>
</div>
<blockquote style='margin-top:5.0pt;margin-bottom:5.0pt'><pre><font size=2
color=black face="Courier New"><span style='font-size:10.0pt'><o:p> </o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'>A job asking for m nodes can pre-empt jobs with p<=m nodes, and</span></font><font
color=navy><span style='color:navy'> </span></font>restart<font color=navy><span
style='color:navy'><o:p></o:p></span></font></pre></blockquote>
<blockquote style='margin-top:5.0pt;margin-bottom:5.0pt'><pre><font size=2
color=black face="Courier New"><span style='font-size:10.0pt'>the original jobs. However the .o files of the pre-empted jobs do not<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'>contain the output produced prior to them being checkpointed. For the<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'>time being, this can be circumvented by redirecting stdout to a file<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'>when executing the cr_mpirun command, which works OK.<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'><o:p> </o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'>A job asking for m nodes cannot successfully pre-empt a job already<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'>running with p>m nodes. I believe that this is because the pre-empted<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'>job restarts immediately so the ckpt files have labels which don't<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'>match. I tried to modify MPBSI.c to stop the pre-empted job from<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'>restarting immediately, by adding a pbs_alterjob between the</span></font><font
color=navy><span style='color:navy'> </span></font>pbs_holdjob<font color=navy><span
style='color:navy'><o:p></o:p></span></font></pre></blockquote>
<blockquote style='margin-top:5.0pt;margin-bottom:5.0pt'><pre><font size=2
color=black face="Courier New"><span style='font-size:10.0pt'>and the pbs_rlsjob to delay the execution of the pre-empted job by one<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'>minute, but that simply fails with a pbs_error of 15016<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'><o:p> </o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'>04/18 16:21:35 MRMJobCheckpoint(1272,1,SC)<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'>04/18 16:21:35 MPBSJobCkpt(1272,R,SC)<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'>04/18 16:21:37 MPBSJobCkpt(Execution_Time, 1622.37)<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'>04/18 16:21:37 MPBSJobCkpt(Illegal attribute or resource value for )<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'>04/18 16:21:37 ERROR: PBS job '1272.liberty.ansto.gov.au' attr<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'>'Execution_Time:' to '1622.37' (rc: 15016 'Illegal attribute or</span></font><font
color=navy><span style='color:navy'> </span></font>resource<font color=navy><span
style='color:navy'><o:p></o:p></span></font></pre></blockquote>
<blockquote style='margin-top:5.0pt;margin-bottom:5.0pt'><pre><font size=2
color=black face="Courier New"><span style='font-size:10.0pt'>value for ')<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'>04/18 16:21:37 INFO: attribute 'PREEMPTEE' set for job 1272 <o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'><o:p> </o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'><o:p> </o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'>So, obviously I don't know what I am doing. I have fiddled with</span></font><font
color=navy><span style='color:navy'> </span></font>various<font color=navy><span
style='color:navy'><o:p></o:p></span></font></pre></blockquote>
<blockquote style='margin-top:5.0pt;margin-bottom:5.0pt'><pre><font size=2
color=black face="Courier New"><span style='font-size:10.0pt'>strings to include the month and day when trying to reset the</span></font><font
color=navy><span style='color:navy'> </span></font>execution<font color=navy><span
style='color:navy'><o:p></o:p></span></font></pre></blockquote>
<blockquote style='margin-top:5.0pt;margin-bottom:5.0pt'><pre><font size=2
color=black face="Courier New"><span style='font-size:10.0pt'>time, but to no avail. Probably pbs_alterjob does not want me to</span></font><font
color=navy><span style='color:navy'> </span></font>fiddle<font color=navy><span
style='color:navy'><o:p></o:p></span></font></pre></blockquote>
<blockquote style='margin-top:5.0pt;margin-bottom:5.0pt'><pre><font size=2
color=black face="Courier New"><span style='font-size:10.0pt'>with execution time at all at this point in proceedings. I can't find<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'>very much detailed documentation on those attributes. I have<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'>experimented with short sleep()s between the pbs_holdjob and</span></font><font
color=navy><span style='color:navy'> </span></font>pbs_rlsjob<font color=navy><span
style='color:navy'><o:p></o:p></span></font></pre></blockquote>
<blockquote style='margin-top:5.0pt;margin-bottom:5.0pt'><pre><font size=2
color=black face="Courier New"><span style='font-size:10.0pt'>also to no avail.<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'><o:p> </o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'>I enclose the following in case you can see immediately that I have</span></font><font
color=navy><span style='color:navy'> </span></font>done<font color=navy><span
style='color:navy'><o:p></o:p></span></font></pre></blockquote>
<blockquote style='margin-top:5.0pt;margin-bottom:5.0pt'><pre><font size=2
color=black face="Courier New"><span style='font-size:10.0pt'>something stupid.</span></font><font
color=navy><span style='color:navy'><o:p></o:p></span></font></pre><pre><font
size=2 color=navy face="Courier New"><span style='font-size:10.0pt;color:navy'>-------------------------------------------------------------------</span></font><o:p></o:p></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'><o:p> </o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'>int MPBSJobCkpt(<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'><o:p> </o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> mjob_t *J, /* I (modified) */<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> mrm_t *R, /* I */<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> mbool_t DoTerminateJob, /* I (boolean) */<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> char *Msg, /* O (optional) */<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> int *SC) /* O (optional) */<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'><o:p> </o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> {<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> struct attrl Ckattrib;<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'><o:p> </o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> char *CkRptr;<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> time_t Cktime;<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> struct tm *Cktmp;<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> char Cktmps[256];<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> char Cktmpline[MAX_MLINE];<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'><o:p> </o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> Ckattrib.next = NULL;<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> Ckattrib.name = ATTR_a;<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> Ckattrib.op = SET;<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'><o:p> </o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> Cktmpline[0] = '\0';<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> CkRptr = Cktmpline;<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> Ckattrib.resource = CkRptr;<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'><o:p> </o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> int rc;<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> int holdtimeout;<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'><o:p> </o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> char *ErrMsg;<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'><o:p> </o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> char tmpJobName[MAX_MNAME];<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'><o:p> </o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> const char *FName = "MPBSJobCkpt";<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'><o:p> </o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> DBG(2,fPBS) DPrint("%s(%s,R,SC)\n",<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> FName,<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> (J != NULL) ? J->Name : "NULL");<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'><o:p> </o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> if ((J == NULL) ||<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> (R == NULL) ||<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> ((J->State != mjsStarting) && (J->State != mjsRunning)))<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> {<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> return(FAILURE);<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> }<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'><o:p> </o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> MJobGetName(J,NULL,R,tmpJobName,sizeof(tmpJobName),mjnRMName);<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'><o:p> </o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> rc = blocking_pbs_holdjob(R->U.PBS.ServerSD,tmpJobName,"s",NULL);<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> /* still ok to release the job if the hold timed out, the request<o:p></o:p></span></font></pre></blockquote>
<pre wrap=""><font size=2 color=black face="Courier New"><span
style='font-size:10.0pt'>was<o:p></o:p></span></font></pre>
<blockquote style='margin-top:5.0pt;margin-bottom:5.0pt' type=cite><pre wrap=""><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> * successful. */<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> if (rc != -2) { holdtimeout = 0; } else { holdtimeout = 1; }<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'><o:p> </o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> if (rc != 0 && !holdtimeout)<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> {<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> ErrMsg = pbs_geterrmsg(R->U.PBS.ServerSD);<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'><o:p> </o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> DBG(0,fPBS) DPrint("ERROR: PBS job '%s' cannot be checkpointed<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'>(rc: %d '%s')\n",<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> J->Name,<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> rc,<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> ErrMsg);<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'><o:p> </o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> if (R->FailIteration != MSched.Iteration)<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> {<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> R->FailIteration = MSched.Iteration;<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> R->FailCount = 0;<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> }<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'><o:p> </o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> R->FailCount++;<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'><o:p> </o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> return(FAILURE);<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> }<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'><o:p> </o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> for (rc=0; rc<256; rc++) {<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> Cktmps[rc] = '\0';<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> }<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'><o:p> </o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> Cktime = time(NULL);<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> Cktime += 60;<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> Cktmp = localtime(&Cktime);<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'><o:p> </o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> if (strftime(Cktmps, sizeof(Cktmps), "%m%d%H%M.%S", Cktmp) == 0) {<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> DBG(0,fPBS) DPrint("ERROR: Greg's checkpoint addition %d \n",<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'>Cktime);<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> return(FAILURE);<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> }<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> Ckattrib.value = Cktmps;<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> DBG(2,fPBS) DPrint("%s(%s, %s)\n",<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> FName, Ckattrib.name, Ckattrib.value);<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'><o:p> </o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> rc = pbs_alterjob(R->U.PBS.ServerSD, tmpJobName, &Ckattrib, NULL);<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> ErrMsg = pbs_geterrmsg(R->U.PBS.ServerSD);<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> DBG(2,fPBS) DPrint("%s(%s)\n",<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> FName, ErrMsg);<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'><o:p> </o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> if (rc != 0)<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> {<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> ErrMsg = pbs_geterrmsg(R->U.PBS.ServerSD);<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'><o:p> </o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> DBG(2,fPBS) DPrint("ERROR: PBS job '%s' attr '%s:%s' to '%s' (rc:</span></font><font
color=navy><span style='color:navy'> </span></font>%d<font color=navy><span
style='color:navy'><o:p></o:p></span></font></pre></blockquote>
<blockquote style='margin-top:5.0pt;margin-bottom:5.0pt'><pre><font size=2
color=black face="Courier New"><span style='font-size:10.0pt'>'%s')\n",<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> tmpJobName,<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> Ckattrib.name,<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> Ckattrib.resource,<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> Ckattrib.value,<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> rc,<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> ErrMsg);<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'>/* If I do not comment this bit out, maui simply stops of course<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> and I do not even get to see all the debug messages in the log<o:p></o:p></span></font></pre></blockquote>
<pre wrap=""><font size=2 color=black face="Courier New"><span
style='font-size:10.0pt'>file.<o:p></o:p></span></font></pre>
<blockquote style='margin-top:5.0pt;margin-bottom:5.0pt' type=cite><pre wrap=""><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'><o:p> </o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> if (R->FailIteration != MSched.Iteration)<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> {<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> R->FailIteration = MSched.Iteration;<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> R->FailCount = 0;<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> }<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'><o:p> </o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> R->FailCount++;<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'><o:p> </o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> return(FAILURE);<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'>*/<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> }<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'><o:p> </o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> rc = pbs_rlsjob(R->U.PBS.ServerSD,tmpJobName,"s",NULL);<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'><o:p> </o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> if (rc != 0)<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> {<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> ErrMsg = pbs_geterrmsg(R->U.PBS.ServerSD);<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'><o:p> </o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> DBG(0,fPBS) DPrint("ERROR: PBS job '%s' cannot be released from<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'>hold (rc: %d '%s')\n",<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> J->Name,<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> rc,<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> ErrMsg);<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'><o:p> </o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> if (R->FailIteration != MSched.Iteration)<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> {<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> R->FailIteration = MSched.Iteration;<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> R->FailCount = 0;<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> }<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'><o:p> </o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> R->FailCount++;<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'><o:p> </o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> return(FAILURE);<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> }<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'><o:p> </o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> if (holdtimeout) { return(FAILURE); }<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'><o:p> </o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> /* NOTE: 'DoTerminateJob' flag not supported */<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'><o:p> </o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> DBG(7,fPBS) DPrint("INFO: job '%s' checkpointed\n",<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> J->Name);<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'><o:p> </o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> return(SUCCESS);<o:p></o:p></span></font></pre><pre><font
size=2 color=black face="Courier New"><span style='font-size:10.0pt'> } /* END MPBSJobCkpt() */<o:p></o:p></span></font></pre></blockquote>
<div>
<p class=MsoNormal><font size=3 color=black face="Times New Roman"><span
style='font-size:12.0pt'><o:p> </o:p></span></font></p>
</div>
</div>
</body>
</html>