<html><head></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><div>Thanks,</div><div><br></div><div>no recurring reservations at all. reservation policy is already set that way.</div><div><br></div><div>RESERVATIONPOLICY &nbsp; &nbsp; CURRENTHIGHEST</div><div><br></div><div>I have been having a dicksens of a time figuring out the best policy for our cluster. Lots of long jobs, some small some large.</div><div><br></div><div apple-content-edited="true">
<span class="Apple-style-span" style="border-collapse: separate; color: rgb(0, 0, 0); font-family: Helvetica; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px; -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; font-size: medium; "><div>Naveed Near-Ansari</div><div><br></div><div><br></div></span><br class="Apple-interchange-newline">
</div>
<br><div><div>On Apr 20, 2012, at 7:41 PM, Lyn Gerner wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><div>Yes, the idle nodes do fluctuate. &nbsp;If you have any recurring<br>reservations (say, for a weekly maintenance window), then it may not<br>be able to find a big enough window to run a large, 4-day job, on<br>dedicated nodes.<br><br>You might also want to check to see if RESERVATIONPOLICY is set to<br>HIGHEST, to make sure that the job keeps its priority reservation, if<br>it ever gets to the top of the queue.<br><br>Good luck,<br>Lyn<br><br>On 4/20/12, Naveed Near-Ansari &lt;<a href="mailto:naveed@caltech.edu">naveed@caltech.edu</a>&gt; wrote:<br><blockquote type="cite">Thanks.<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">The idle procs actually fluctuates:<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">job cannot run in partition DEFAULT (insufficient idle procs available: 744<br></blockquote><blockquote type="cite">&lt; 1501)<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">I don't think it is mapping to procs since there are 628 procs on the system<br></blockquote><blockquote type="cite">(314 nodes &nbsp;* 2 procs)<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">The QOS does request dedicated nodes. I have seen no issue with this on all<br></blockquote><blockquote type="cite">other jobs. When someone requests 12 tasks they get 1 12 core machine.<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">I think i may be misunderstanding how priority reservations work. &nbsp;Does it<br></blockquote><blockquote type="cite">try to find available nodes to reserve within a timeframe and no procs will<br></blockquote><blockquote type="cite">be availble within that time frame, or is it supposed to &nbsp;look out forever<br></blockquote><blockquote type="cite">to find the procs available. &nbsp;We have a lot of long running processes, so if<br></blockquote><blockquote type="cite">it is looking within a time frame (say a month), it may not be able to find<br></blockquote><blockquote type="cite">the resources. &nbsp;If this is the case, is it possible to change how far ahead<br></blockquote><blockquote type="cite">it looks?<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">I couldn't find anything in the documentation that describes specifically<br></blockquote><blockquote type="cite">how it finds resources for priority based reservations.<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">Naveed Near-Ansari<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite">On Apr 20, 2012, at 5:50 PM, Lyn Gerner wrote:<br></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite"><blockquote type="cite">So does the checkjob for 220559 still show the "insufficient idle<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">procs available: 1056 &lt; 1501" msg?<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">Seems like somehow the TASKS request is not mapping to cores (of which<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">I surmise you have 3576) but rather procs (which in the above you have<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">1056).<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">I am really grasping at straws on this: is the "ded" QOS requesting<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">dedicated nodes, and you don't have enough?<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">Not sure where else to tell you to look.<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">Best of luck,<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">Lyn<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite">On 4/20/12, Naveed Near-Ansari &lt;<a href="mailto:naveed@caltech.edu">naveed@caltech.edu</a>&gt; wrote:<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">On 04/20/2012 04:23 PM, Lyn Gerner wrote:<br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">Naveed,<br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">It looks like your setup is only showing 1056 procs, not 3552:<br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">PE: &nbsp;1501.00 &nbsp;StartPriority: &nbsp;144235<br></blockquote></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">job cannot run in partition DEFAULT (insufficient idle procs available:<br></blockquote></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">1056 &lt; 1501)<br></blockquote></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">You might play w/diagnose -t (partition) and diagnose -j (job) to see<br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">what they tell you. &nbsp;Also, you could try to explicitly make a<br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">reservation for the job, and maybe then you could get info from<br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">diagnose -r (though attempting the setres may give enough error info).<br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">Good luck,<br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">Lyn<br></blockquote></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">Thanks for looking.<br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">I think it is configured for 3768 (i said 3552 because the queue it was<br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">sent to has that many available to it). i didn't see anything clear in<br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">either diagnose command. &nbsp;I attempted to create a reservation, but it<br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">failed.<br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"># setres -u ortega -d 4:00:00:00 TASKS==1501<br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">ERROR: &nbsp;&nbsp;&nbsp;'setres' failed<br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">ERROR: &nbsp;&nbsp;&nbsp;cannot select 1501 tasks for reservation for 3:13:33:56<br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">ERROR: &nbsp;&nbsp;&nbsp;cannot select requested tasks for 'TASKS==1501'<br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">#diagnose -t<br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">Displaying Partition Status<br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">System Partition Settings: &nbsp;PList: DEFAULT PDef: DEFAULT<br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">Name &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Procs<br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">DEFAULT &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;3768<br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">Partition &nbsp;&nbsp;&nbsp;Configured &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Up &nbsp;&nbsp;&nbsp;&nbsp;U/C &nbsp;Dedicated &nbsp;&nbsp;&nbsp;&nbsp;D/U<br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">Active &nbsp;&nbsp;&nbsp;&nbsp;A/U<br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">NODE----------------------------------------------------------------------------<br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">DEFAULT &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;314 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;313 &nbsp;99.68% &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;297 &nbsp;94.89%<br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">297 &nbsp;94.89%<br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">PROC----------------------------------------------------------------------------<br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">DEFAULT &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;3768 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;3756 &nbsp;99.68% &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;3564 &nbsp;94.89%<br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">3000 &nbsp;79.87%<br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">MEM----------------------------------------------------------------------------<br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">DEFAULT &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;15156264 &nbsp;&nbsp;15107978 &nbsp;99.68% &nbsp;&nbsp;14335282 &nbsp;94.89%<br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">0 &nbsp;&nbsp;0.00%<br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">SWAP----------------------------------------------------------------------------<br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">DEFAULT &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;30227950 &nbsp;&nbsp;30131665 &nbsp;99.68% &nbsp;&nbsp;28590985 &nbsp;94.89%<br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">1400704 &nbsp;&nbsp;4.65%<br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">DISK----------------------------------------------------------------------------<br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">DEFAULT &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;314 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;313 &nbsp;99.68% &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;297 &nbsp;94.89%<br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">0 &nbsp;&nbsp;0.00%<br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">Class/Queue State<br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&lt;CLASS&gt; &lt;AVAIL&gt;:&lt;UP&gt;]...<br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"> &nbsp;&nbsp;&nbsp;DEFAULT [shared 3756:3756][debug 3756:3756][default 477:3756][gpu<br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">3756:3756]<br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">#diagnose -j 220559<br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">Name &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;State Par Proc QOS &nbsp;&nbsp;&nbsp;&nbsp;WCLimit R &nbsp;Min &nbsp;&nbsp;&nbsp;&nbsp;User<br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">Group &nbsp;Account &nbsp;QueuedTime &nbsp;Network &nbsp;Opsys &nbsp;&nbsp;Arch &nbsp;&nbsp;&nbsp;Mem &nbsp;&nbsp;Disk<br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">Procs &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Class Features<br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">220559 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Idle ALL 1501 ded &nbsp;4:00:00:00 0 1501 &nbsp;&nbsp;ortega<br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">simons &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;- &nbsp;1:23:34:41 &nbsp;&nbsp;[NONE] [NONE] [NONE] &nbsp;&nbsp;&nbsp;&gt;=0 &nbsp;&nbsp;&nbsp;&gt;=0 &nbsp;&nbsp;&nbsp;NC0<br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite">[default:1] [default]<br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"><br></blockquote></blockquote><blockquote type="cite"><br></blockquote><blockquote type="cite"><br></blockquote><br></div></blockquote></div><br></body></html>