<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">
</head>
<body bgcolor="#ffffff" text="#000000">
<font size="+1"><font face="Times New Roman, Times, serif">Hi Edsall.<br>
<br>
Thank you for responding. I have a few more nodes now, but the same
configuration. I am including the diagnose -p with other details:<br>
<br>
We have 13 64-core nodes. All nodes have the 'free' feature and a
queue named 'free' as PREEMPTEE so that we can harvest idle cycles when
the nodes are not in use by their owners.<br>
<br>
As user "juser", I load up the 'free' queue (PREEMTEE) as follows:<br>
<br>
1.hpc.cluster. juser free test 24904 1 63
-- 72:00 R 00:01<br>
2.hpc.cluster. juser free test 29346 1 63
-- 72:00 R 00:01<br>
3.hpc.cluster. juser free test 42900 1 63
-- 72:00 R 00:01<br>
4.hpc.cluster. juser free test 30291 1 63
-- 72:00 R 00:01<br>
5.hpc.cluster. juser free test 26417 1 63
-- 72:00 R 00:01<br>
6.hpc.cluster. juser free test 40206 1 63
-- 72:00 R 00:01<br>
7.hpc.cluster. juser free test 1786 1 63
-- 72:00 R 00:01<br>
8.hpc.cluster. juser free test 62436 1 63
-- 72:00 R 00:01<br>
9.hpc.cluster. juser free test 49087 1 63
-- 72:00 R 00:01<br>
10.hpc.cluster juser free test 45691 1 63
-- 72:00 R 00:01<br>
11.hpc.cluster juser free test 41386 1 63
-- 72:00 R 00:01<br>
12.hpc.cluster juser free test 35204 1 63
-- 72:00 R 00:01<br>
13.hpc.cluster juser free test 51043 1 63
-- 72:00 R 00:01<br>
14.hpc.cluster juser free test 24948 1 1
-- 72:00 R 00:01<br>
15.hpc.cluster juser free test 29390 1 1
-- 72:00 R 00:01<br>
16.hpc.cluster juser free test 42944 1 1
-- 72:00 R 00:01<br>
17.hpc.cluster juser free test 30335 1 1
-- 72:00 R 00:01<br>
18.hpc.cluster juser free test 26461 1 1
-- 72:00 R 00:01<br>
19.hpc.cluster juser free test 40250 1 1
-- 72:00 R 00:01<br>
20.hpc.cluster juser free test 1830 1 1
-- 72:00 R 00:01<br>
21.hpc.cluster juser free test 62480 1 1
-- 72:00 R 00:01<br>
22.hpc.cluster juser free test 49131 1 1
-- 72:00 R 00:01<br>
23.hpc.cluster juser free test 45735 1 1
-- 72:00 R 00:01<br>
24.hpc.cluster juser free test 41430 1 1
-- 72:00 R 00:01<br>
25.hpc.cluster juser free test 35248 1 1
-- 72:00 R 00:01<br>
26.hpc.cluster juser free test 51087 1 1
-- 72:00 R 00:01<br>
27.hpc.cluster juser free test -- 1 1
-- 72:00 Q -- <br>
28.hpc.cluster juser free test -- 1 1
-- 72:00 Q -- <br>
29.hpc.cluster juser free test -- 1 1
-- 72:00 Q -- <br>
30.hpc.cluster juser free test -- 1 1
-- 72:00 Q -- <br>
31.hpc.cluster juser free test -- 1 1
-- 72:00 Q -- <br>
32.hpc.cluster juser free test -- 1 1
-- 72:00 Q -- <br>
33.hpc.cluster juser free test -- 1 1
-- 72:00 Q -- <br>
34.hpc.cluster juser free test -- 1 1
-- 72:00 Q -- <br>
35.hpc.cluster juser free test -- 1 1
-- 72:00 Q -- <br>
<br>
As user "tw" which owes the 'tw' nodes, I run:<br>
<br>
qsub -I -q tw -l nodes=6:ppn=64<br>
<br>
And preeption works as expected:<br>
<br>
1.hpc.cluster. juser free test 24904 1 63
-- 72:00 R 00:02<br>
2.hpc.cluster. juser free test 29346 1 63
-- 72:00 S 00:01<br>
3.hpc.cluster. juser free test 42900 1 63
-- 72:00 S 00:01<br>
4.hpc.cluster. juser free test 30291 1 63
-- 72:00 S 00:01<br>
5.hpc.cluster. juser free test 26417 1 63
-- 72:00 S 00:01<br>
6.hpc.cluster. juser free test 40206 1 63
-- 72:00 S 00:01<br>
7.hpc.cluster. juser free test 1786 1 63
-- 72:00 S 00:01<br>
8.hpc.cluster. juser free test 62436 1 63
-- 72:00 R 00:01<br>
9.hpc.cluster. juser free test 49087 1 63
-- 72:00 R 00:02<br>
10.hpc.cluster juser free test 45691 1 63
-- 72:00 R 00:02<br>
11.hpc.cluster juser free test 41386 1 63
-- 72:00 R 00:02<br>
12.hpc.cluster juser free test 35204 1 63
-- 72:00 R 00:02<br>
13.hpc.cluster juser free test 51043 1 63
-- 72:00 R 00:02<br>
14.hpc.cluster juser free test 24948 1 1
-- 72:00 R 00:02<br>
15.hpc.cluster juser free test 29390 1 1
-- 72:00 S 00:02<br>
16.hpc.cluster juser free test 42944 1 1
-- 72:00 S 00:01<br>
17.hpc.cluster juser free test 30335 1 1
-- 72:00 S 00:01<br>
18.hpc.cluster juser free test 26461 1 1
-- 72:00 S 00:01<br>
19.hpc.cluster juser free test 40250 1 1
-- 72:00 S 00:01<br>
20.hpc.cluster juser free test 1830 1 1
-- 72:00 S 00:01<br>
21.hpc.cluster juser free test 62480 1 1
-- 72:00 R 00:02<br>
22.hpc.cluster juser free test 49131 1 1
-- 72:00 R 00:02<br>
23.hpc.cluster juser free test 45735 1 1
-- 72:00 R 00:02<br>
24.hpc.cluster juser free test 41430 1 1
-- 72:00 R 00:02<br>
25.hpc.cluster juser free test 35248 1 1
-- 72:00 R 00:02<br>
26.hpc.cluster juser free test 51087 1 1
-- 72:00 R 00:02<br>
27.hpc.cluster juser free test -- 1 1
-- 72:00 Q -- <br>
28.hpc.cluster juser free test -- 1 1
-- 72:00 Q -- <br>
29.hpc.cluster juser free test -- 1 1
-- 72:00 Q -- <br>
30.hpc.cluster juser free test -- 1 1
-- 72:00 Q -- <br>
31.hpc.cluster juser free test -- 1 1
-- 72:00 Q -- <br>
32.hpc.cluster juser free test -- 1 1
-- 72:00 Q -- <br>
33.hpc.cluster juser free test -- 1 1
-- 72:00 Q -- <br>
34.hpc.cluster juser free test -- 1 1
-- 72:00 Q -- <br>
35.hpc.cluster juser free test -- 1 1
-- 72:00 Q -- <br>
36.hpc.cluster tw tw STDIN 30505 6
384 -- 99:00 R -- <br>
<br>
As user 'tw', I exit and run the command:<br>
<br>
qsub -I -q tw -l nodes=6:ppn=62<br>
<br>
Everything works again as expected and Maui also starts 6 new 1-core
jobs ( jobs 21 through 26 ):<br>
<br>
1.hpc.cluster. juser free test 24904 1 63
-- 72:00 R 00:03<br>
2.hpc.cluster. juser free test 29346 1 63
-- 72:00 S 00:01<br>
3.hpc.cluster. juser free test 42900 1 63
-- 72:00 S 00:01<br>
4.hpc.cluster. juser free test 30291 1 63
-- 72:00 S 00:01<br>
5.hpc.cluster. juser free test 26417 1 63
-- 72:00 S 00:01<br>
6.hpc.cluster. juser free test 40206 1 63
-- 72:00 S 00:02<br>
7.hpc.cluster. juser free test 1786 1 63
-- 72:00 S 00:02<br>
8.hpc.cluster. juser free test 62436 1 63
-- 72:00 R 00:03<br>
9.hpc.cluster. juser free test 49087 1 63
-- 72:00 R 00:03<br>
10.hpc.cluster juser free test 45691 1 63
-- 72:00 R 00:03<br>
11.hpc.cluster juser free test 41386 1 63
-- 72:00 R 00:03<br>
12.hpc.cluster juser free test 35204 1 63
-- 72:00 R 00:03<br>
13.hpc.cluster juser free test 51043 1 63
-- 72:00 R 00:03<br>
14.hpc.cluster juser free test 24948 1 1
-- 72:00 R 00:03<br>
15.hpc.cluster juser free test 29390 1 1
-- 72:00 R 00:02<br>
16.hpc.cluster juser free test 42944 1 1
-- 72:00 R 00:02<br>
17.hpc.cluster juser free test 30335 1 1
-- 72:00 R 00:02<br>
18.hpc.cluster juser free test 26461 1 1
-- 72:00 R 00:02<br>
19.hpc.cluster juser free test 40250 1 1
-- 72:00 R 00:02<br>
20.hpc.cluster juser free test 1830 1 1
-- 72:00 R 00:02<br>
21.hpc.cluster juser free test 62480 1 1
-- 72:00 R 00:03<br>
22.hpc.cluster juser free test 49131 1 1
-- 72:00 R 00:03<br>
23.hpc.cluster juser free test 45735 1 1
-- 72:00 R 00:03<br>
24.hpc.cluster juser free test 41430 1 1
-- 72:00 R 00:03<br>
25.hpc.cluster juser free test 35248 1 1
-- 72:00 R 00:03<br>
26.hpc.cluster juser free test 51087 1 1
-- 72:00 R 00:03<br>
27.hpc.cluster juser free test 30749 1 1
-- 72:00 R -- <br>
28.hpc.cluster juser free test 44220 1 1
-- 72:00 R -- <br>
29.hpc.cluster juser free test 31513 1 1
-- 72:00 R -- <br>
30.hpc.cluster juser free test 27736 1 1
-- 72:00 R -- <br>
31.hpc.cluster juser free test 41429 1 1
-- 72:00 R -- <br>
32.hpc.cluster juser free test 3130 1 1
-- 72:00 R -- <br>
33.hpc.cluster juser free test -- 1 1
-- 72:00 Q -- <br>
34.hpc.cluster juser free test -- 1 1
-- 72:00 Q -- <br>
35.hpc.cluster juser free test -- 1 1
-- 72:00 Q -- <br>
37.hpc.cluster tw tw STDIN 30708 6
372 -- 99:00 R -- <br>
<br>
However, if I now exit and go back and try to get 6 of the 64-core
nodes (which worked before) I cannot. Maui will not preempt the new
jobs it started.<br>
<br>
My new job 38 below just sits in the queue:<br>
<br>
$ qsub -I -q tw -l nodes=6:ppn=64<br>
qsub: waiting for job 38.hpc.cluster to start<br>
<br>
# diagnose -p <br>
diagnosing job priority information (partition: ALL)<br>
<br>
Job PRIORITY* Cred( QOS) Serv(QTime)<br>
Weights -------- 100( 1000) 1( 1)<br>
<br>
38 100000109 100.0(1000.) 0.0(109.4)<br>
2 5 0.0( 0.0) 100.0( 5.3)<br>
3 5 0.0( 0.0) 100.0( 5.3)<br>
4 5 0.0( 0.0) 100.0( 5.3)<br>
5 5 0.0( 0.0) 100.0( 5.3)<br>
6 5 0.0( 0.0) 100.0( 5.3)<br>
7 5 0.0( 0.0) 100.0( 5.3)<br>
<br>
Percent Contribution -------- 100.0(100.0) 0.0( 0.0)<br>
<br>
[root@mpc-x maui]# checkjob -v 38<br>
<br>
<br>
checking job 38 (RM job '38.hpc.cluster')<br>
<br>
State: Idle<br>
Creds: user:tw group:tw class:tw qos:high<br>
WallTime: 00:00:00 of 4:03:00:00<br>
SubmitTime: Thu Feb 16 08:26:31<br>
(Time Queued Total: 00:01:37 Eligible: 00:01:37)<br>
<br>
Total Tasks: 384<br>
<br>
Req[0] TaskCount: 384 Partition: ALL<br>
Network: [NONE] Memory >= 0 Disk >= 0 Swap >= 0<br>
Opsys: [NONE] Arch: [NONE] Features: [tw]<br>
Exec: '' ExecSize: 0 ImageSize: 0<br>
Dedicated Resources Per Task: PROCS: 1<br>
NodeAccess: SHARED<br>
TasksPerNode: 64 NodeCount: 6<br>
<br>
<br>
IWD: [NONE] Executable: [NONE]<br>
Bypass: 0 StartCount: 0<br>
PartitionMask: [ALL]<br>
Flags: PREEMPTOR<br>
<br>
Reservation '38' (2:23:58:22 -> 7:02:58:22 Duration: 4:03:00:00)<br>
PE: 384.00 StartPriority: 100000163<br>
job cannot run in partition DEFAULT (idle procs do not meet
requirements : 0 of 384 procs found)<br>
idle procs: 384 feasible procs: 0<br>
<br>
Rejection Reasons: [Features : 7][CPU : 6]<br>
<br>
Detailed Node Availability Information:<br>
<br>
compute-1-1 rejected : Features<br>
compute-1-2 rejected : Features<br>
compute-1-3 rejected : Features<br>
compute-1-4 rejected : Features<br>
compute-1-5 rejected : Features<br>
compute-1-6 rejected : Features<br>
compute-1-7 rejected : CPU<br>
compute-1-8 rejected : CPU<br>
compute-1-9 rejected : CPU<br>
compute-1-10 rejected : CPU<br>
compute-1-11 rejected : CPU<br>
compute-1-12 rejected : CPU<br>
compute-1-13 rejected : Features<br>
<br>
-------------------------------------------------------<br>
Here is my PBS nodes file:<br>
<br>
# cat /opt/torque/server_priv/nodes <br>
compute-1-1 np=64 sf free<br>
compute-1-2 np=64 sf free<br>
compute-1-3 np=64 sf free<br>
compute-1-4 np=64 chem free<br>
compute-1-5 np=64 chem free<br>
compute-1-6 np=64 chem free<br>
compute-1-7 np=64 tw free<br>
compute-1-8 np=64 tw free<br>
compute-1-9 np=64 tw free<br>
compute-1-10 np=64 tw free<br>
compute-1-11 np=64 tw free<br>
compute-1-12 np=64 tw free<br>
compute-1-13 np=64 bio free<br>
<br>
------------------------------------<br>
<br>
</font></font><br>
Edsall, William (WJ) wrote:
<blockquote
cite="mid:52CD990A674498429E6A7B4FCAE3F7D308332EC3@USMDLMDOWX025.dow.com"
type="cite">
<meta http-equiv="Content-Type" content="text/html; ">
<meta name="Generator" content="Microsoft Word 12 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
        {font-family:"Cambria Math";
        panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
        {font-family:Calibri;
        panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
        {font-family:Tahoma;
        panose-1:2 11 6 4 3 5 4 4 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0in;
        margin-bottom:.0001pt;
        font-size:12.0pt;
        font-family:"Times New Roman","serif";
        color:black;}
a:link, span.MsoHyperlink
        {mso-style-priority:99;
        color:blue;
        text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
        {mso-style-priority:99;
        color:purple;
        text-decoration:underline;}
span.EmailStyle17
        {mso-style-type:personal-reply;
        font-family:"Calibri","sans-serif";
        color:#1F497D;}
.MsoChpDefault
        {mso-style-type:export-only;
        font-size:10.0pt;}
@page WordSection1
        {size:8.5in 11.0in;
        margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
        {page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
<div class="WordSection1">
<p class="MsoNormal"><span
style="font-size: 11pt; font-family: "Calibri","sans-serif"; color: rgb(31, 73, 125);">Hi,<o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size: 11pt; font-family: "Calibri","sans-serif"; color: rgb(31, 73, 125);">
What does diagnose –p say about the priority of the jobs you expect to
be preempted? Priority may take precedence over preemptability. <o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size: 11pt; font-family: "Calibri","sans-serif"; color: rgb(31, 73, 125);"><o:p> </o:p></span></p>
<div>
<div
style="border-style: solid none none; border-color: rgb(181, 196, 223) -moz-use-text-color -moz-use-text-color; border-width: 1pt medium medium; padding: 3pt 0in 0in;">
<p class="MsoNormal"><b><span
style="font-size: 10pt; font-family: "Tahoma","sans-serif"; color: windowtext;">From:</span></b><span
style="font-size: 10pt; font-family: "Tahoma","sans-serif"; color: windowtext;">
<a class="moz-txt-link-abbreviated" href="mailto:mauiusers-bounces@supercluster.org">mauiusers-bounces@supercluster.org</a>
[<a class="moz-txt-link-freetext" href="mailto:mauiusers-bounces@supercluster.org">mailto:mauiusers-bounces@supercluster.org</a>] <b>On Behalf Of </b>Joseph
Farran<br>
<b>Sent:</b> Monday, February 13, 2012 3:19 PM<br>
<b>To:</b> <a class="moz-txt-link-abbreviated" href="mailto:mauiusers@supercluster.org">mauiusers@supercluster.org</a><br>
<b>Subject:</b> [Mauiusers] Backfill Jobs prevent PREEMPTOR jobs from
running?<o:p></o:p></span></p>
</div>
</div>
<p class="MsoNormal" style="margin-bottom: 12pt;"><o:p> </o:p><span
style="font-size: 13.5pt;"><br>
</span><o:p></o:p></p>
</div>
</blockquote>
</body>
</html>