<html>
<head>
<meta http-equiv="Content-Type" content="text/html;
charset=ISO-8859-1">
<meta name="Generator" content="Microsoft Word 12 (filtered medium)">
<!--[if !mso]><style>v\:* {behavior:url(#default#VML);}
o\:* {behavior:url(#default#VML);}
w\:* {behavior:url(#default#VML);}
.shape {behavior:url(#default#VML);}
</style><![endif]-->
<style><!--
/* Font Definitions */
@font-face
        {font-family:"Cambria Math";
        panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
        {font-family:Calibri;
        panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
        {font-family:Tahoma;
        panose-1:2 11 6 4 3 5 4 4 2 4;}
@font-face
        {font-family:Consolas;
        panose-1:2 11 6 9 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0cm;
        margin-bottom:.0001pt;
        font-size:12.0pt;
        font-family:"Times New Roman","serif";
        color:black;}
a:link, span.MsoHyperlink
        {mso-style-priority:99;
        color:blue;
        text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
        {mso-style-priority:99;
        color:purple;
        text-decoration:underline;}
pre
        {mso-style-priority:99;
        mso-style-link:"HTML Preformatted Char";
        margin:0cm;
        margin-bottom:.0001pt;
        font-size:10.0pt;
        font-family:"Courier New";
        color:black;}
span.HTMLPreformattedChar
        {mso-style-name:"HTML Preformatted Char";
        mso-style-priority:99;
        mso-style-link:"HTML Preformatted";
        font-family:Consolas;
        color:black;}
span.EmailStyle19
        {mso-style-type:personal-reply;
        font-family:"Calibri","sans-serif";
        color:#1F497D;}
.MsoChpDefault
        {mso-style-type:export-only;
        font-size:10.0pt;}
@page WordSection1
        {size:612.0pt 792.0pt;
        margin:72.0pt 72.0pt 72.0pt 72.0pt;}
div.WordSection1
        {page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body text="#000000" bgcolor="#FFFFFF">
Sean (my colleague) and I have still been banging our head against
the wall with this issue.<br>
<br>
We've got torque 3.0.4 with gpu support enabled, Cuda 4.0.<br>
<br>
After some testing on my local desktop and a 17 node GPU cluster
(mixture of Tesla cards and GTX cards) we've found that if you have
a nodes file with the gpus= and attributes, plus you make a change
to state of a node (pbsnodes -o / pbsnodes -r) that has both gpus=
and an attribute, for some crazy unknown reason the nodes file is
modified, all gpus= lines are removed and any # comments. <br>
Entries that only have a gpus= or an attribute aren't affected, only
one nodes that have both.<br>
<br>
Why is there even code in Torque (specifically pbs_server) that is
capable of writing to the nodes file!!<br>
<br>
Some examples.... BLAH is just a random node attribute<br>
<br>
Test1<br>
-=-=-=-=-=-=-=<br>
nodes file contents:<br>
node1 np=1 gpus=2 BLAH<br>
node2 np=2<br>
<br>
start torque server and mom.<br>
#pbsnodes -r node2 (File doesn't change)<br>
#pbsnodes -r node1 (File changes after command is run, stat on
file confirms this)<br>
<br>
nodes file contents<br>
node1 np=1 BLAH<br>
node2 np=2<br>
=-=-=-=-=-=-=<br>
<br>
Test2<br>
-=-=-=-=-=-=-=<br>
nodes file contents:<br>
node1 np=1 gpus=2 BLAH<br>
node2 np=2 gpus=2<br>
<br>
start torque server and mom.<br>
#pbsnodes -r node2 (File doesn't change)<br>
#pbsnodes -r node1 (File changes after command is run, stat on
file confirms this)<br>
<br>
nodes file contents<br>
node1 np=1 BLAH<br>
node2 np=2<br>
=-=-=-=-=-=-=<br>
<br>
<div class="moz-forward-container">Test3<br>
-=-=-=-=-=-=-=<br>
nodes file contents:<br>
node1 np=1 BLAH<br>
node2 np=2 gpus=2<br>
<br>
start torque server and mom.<br>
#pbsnodes -r node2 (File changes after command is run, stat on
file confirms this)<br>
<br>
nodes file contents<br>
node1 np=1 BLAH<br>
node2 np=2<br>
=-=-=-=-=-=-=<br>
<br>
Test4<br>
-=-=-=-=-=-=-=<br>
nodes file contents:<br>
node1 np=1 gpus=2 <br>
node2 np=2 gpus=2<br>
<br>
start torque server and mom.<br>
#pbsnodes -r node2 (File doesn't change)<br>
#pbsnodes -r node1 (File doesn't change)<br>
<br>
nodes file contents<br>
node1 np=1 gpus=2<br>
node2 np=2 gpus=2<br>
=-=-=-=-=-=-=<br>
<br>
Regards<br>
Simon Brennan<br>
<br>
<br>
-------- Original Message --------
<table class="moz-email-headers-table" border="0" cellpadding="0"
cellspacing="0">
<tbody>
<tr>
<th nowrap="nowrap" valign="BASELINE" align="RIGHT">Subject:
</th>
<td>Re: [torqueusers] nodes file persistent gpus setting</td>
</tr>
<tr>
<th nowrap="nowrap" valign="BASELINE" align="RIGHT">Date: </th>
<td>Thu, 17 May 2012 15:50:09 +1000</td>
</tr>
<tr>
<th nowrap="nowrap" valign="BASELINE" align="RIGHT">From: </th>
<td><a class="moz-txt-link-rfc2396E" href="mailto:Gareth.Williams@csiro.au"><Gareth.Williams@csiro.au></a></td>
</tr>
<tr>
<th nowrap="nowrap" valign="BASELINE" align="RIGHT">Reply-To:
</th>
<td>Torque Users Mailing List
<a class="moz-txt-link-rfc2396E" href="mailto:torqueusers@supercluster.org"><torqueusers@supercluster.org></a></td>
</tr>
<tr>
<th nowrap="nowrap" valign="BASELINE" align="RIGHT">To: </th>
<td><a class="moz-txt-link-rfc2396E" href="mailto:torqueusers@supercluster.org"><torqueusers@supercluster.org></a></td>
</tr>
</tbody>
</table>
<br>
<br>
<meta http-equiv="Content-Type" content="text/html;
charset=ISO-8859-1">
<meta name="Generator" content="Microsoft Word 12 (filtered
medium)">
<!--[if !mso]><style>v\:* {behavior:url(#default#VML);}
o\:* {behavior:url(#default#VML);}
w\:* {behavior:url(#default#VML);}
.shape {behavior:url(#default#VML);}
</style><![endif]-->
<style><!--
/* Font Definitions */
@font-face
        {font-family:"Cambria Math";
        panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
        {font-family:Calibri;
        panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
        {font-family:Tahoma;
        panose-1:2 11 6 4 3 5 4 4 2 4;}
@font-face
        {font-family:Consolas;
        panose-1:2 11 6 9 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0cm;
        margin-bottom:.0001pt;
        font-size:12.0pt;
        font-family:"Times New Roman","serif";
        color:black;}
a:link, span.MsoHyperlink
        {mso-style-priority:99;
        color:blue;
        text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
        {mso-style-priority:99;
        color:purple;
        text-decoration:underline;}
pre
        {mso-style-priority:99;
        mso-style-link:"HTML Preformatted Char";
        margin:0cm;
        margin-bottom:.0001pt;
        font-size:10.0pt;
        font-family:"Courier New";
        color:black;}
span.HTMLPreformattedChar
        {mso-style-name:"HTML Preformatted Char";
        mso-style-priority:99;
        mso-style-link:"HTML Preformatted";
        font-family:Consolas;
        color:black;}
span.EmailStyle19
        {mso-style-type:personal-reply;
        font-family:"Calibri","sans-serif";
        color:#1F497D;}
.MsoChpDefault
        {mso-style-type:export-only;
        font-size:10.0pt;}
@page WordSection1
        {size:612.0pt 792.0pt;
        margin:72.0pt 72.0pt 72.0pt 72.0pt;}
div.WordSection1
        {page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
<div class="WordSection1">
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">HI
Sean, Woah – we are _<i>not</i>_ using the integrated nvidia
gpu support (so far anyway). Perhaps that wasn’t actually
the problem on your system – are you really sure that solved
the problem and was not just a coincidence? We have nvidia
drivers (on that compute node) but no other nvidia software
on this system.<o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Gareth<o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<div style="border:none;border-left:solid blue 1.5pt;padding:0cm
0cm 0cm 4.0pt">
<div>
<div style="border:none;border-top:solid #B5C4DF
1.0pt;padding:3.0pt 0cm 0cm 0cm">
<p class="MsoNormal"><b><span
style="font-size:10.0pt;font-family:"Tahoma","sans-serif";color:windowtext"
lang="EN-US">From:</span></b><span
style="font-size:10.0pt;font-family:"Tahoma","sans-serif";color:windowtext"
lang="EN-US"> Sean Reilly
[<a class="moz-txt-link-freetext" href="mailto:sean.reilly@ersa.edu.au">mailto:sean.reilly@ersa.edu.au</a>] <br>
<b>Sent:</b> Thursday, 17 May 2012 12:21 PM<br>
<b>To:</b> Torque Users Mailing List<br>
<b>Subject:</b> Re: [torqueusers] nodes file
persistent gpus setting<o:p></o:p></span></p>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Hi Gareth<br>
<br>
We saw the same behaviour when we enabled the tdk-1.285
libraries on the GPU backend Nodes in the ld.config path.<br>
<br>
- It is needed on the CPU (non-gpu) Nodes<br>
- But when added to the PATH on the GPU Nodes - the PBS_MOM
complains about something missing (*Sorry I cant remember
what it is - but it may have been some nvidia or nvc nvq
type library*) <br>
- Then the PBS_MOM rewrites the nodes file on the server
side.<br>
*removing the gpus= or truncating the line from
where 'gpus=' is written* <br>
<br>
this was fixed by commenting out these libs on the GPU
backend Node.<br>
<br>
/etc/ld.so.conf.d/tdk.conf <br>
#This file was made by puppet, do not edit it directly!<br>
#/opt/shared/tdk/1.285/lib64<br>
#/opt/shared/tdk/1.285/lib<br>
<br>
<br>
Regards<br>
Sean<br>
<br>
<br>
<br>
On 17/05/12 05:56, Ken Nielson wrote: <o:p></o:p></p>
<div>
<p class="MsoNormal">On Sun, Apr 1, 2012 at 7:36 PM, <<a
moz-do-not-send="true"
href="mailto:Gareth.Williams@csiro.au" target="_blank">Gareth.Williams@csiro.au</a>>
wrote:<o:p></o:p></p>
<p class="MsoNormal">Hi,<br>
<br>
Can anyone confirm the following behavior (bug)?<br>
<br>
If you give a node gpus like so:<br>
qmgr -c 'set node gpunode01 gpus = 2'<br>
or in the nodes file<br>
gpunode01 np=12 gpus=2<br>
Then the node has (logical) gpus defined and they can be
scheduled as in:<br>
<a moz-do-not-send="true"
href="http://www.adaptivecomputing.com/resources/docs/torque/3-0-3/1.5nodeconfig.php"
target="_blank">http://www.adaptivecomputing.com/resources/docs/torque/3-0-3/1.5nodeconfig.php</a><br>
(though 1.5.3 doesn't mention specifying both np= and
gpus= which I suspect needs fixing).<br>
<br>
This setup works fine for us until we restart the
pbs_server at which time the gpus disappear (you can see
this in the output of pbsnodes). The nodes file gets
altered to remove the gpus= setting.<br>
<br>
Note that we are using version 3.0.3-snap.xxx and NOT the
integrated nvidia gpu support.<br>
<br>
Does anyone else see the behavior? You don't need
physical gpus to test, just a system you are prepared to
mess with a little including restarting the pbs_server.<br>
<br>
Regards,<br>
<br>
Gareth<o:p></o:p></p>
</div>
<p class="MsoNormal"><br>
Gareth,<br>
<br>
Have you entered a ticket in bugzilla for this.<br>
<br>
Ken<br>
<br>
<o:p></o:p></p>
<pre><o:p> </o:p></pre>
<pre><o:p> </o:p></pre>
<pre>_______________________________________________<o:p></o:p></pre>
<pre>torqueusers mailing list<o:p></o:p></pre>
<pre><a moz-do-not-send="true" href="mailto:torqueusers@supercluster.org">torqueusers@supercluster.org</a><o:p></o:p></pre>
<pre><a moz-do-not-send="true" href="http://www.supercluster.org/mailman/listinfo/torqueusers">http://www.supercluster.org/mailman/listinfo/torqueusers</a><o:p></o:p></pre>
<p class="MsoNormal" style="margin-bottom:12.0pt"><o:p> </o:p></p>
<div>
<p class="MsoNormal">-- <br>
<b><span style="font-size:10.0pt">Sean Reilly</span></b><span
style="font-size:10.0pt"><br>
<br>
Systems Administrator & Applications Support Officer<br>
eResearchSA<br>
Phone : +61 8 8313 8352<br>
Mobile: +61 450 840 246<br>
<br>
<a moz-do-not-send="true"
href="http://www.ersa.edu.au/moving"><span
style="text-decoration:none"><img id="_x0000_i1025"
src="cid:part5.02000503.00040101@ersa.edu.au"
width="380" border="0" height="114"></span></a></span><o:p></o:p></p>
</div>
</div>
</div>
<br>
<br>
</div>
<br>
</body>
</html>