<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
</head>
<body text="#000000" bgcolor="#ffffff">
Hi Folks<br>
<br>
Here at eRSA we have just begun testing with this Patch - Great
Work by Jonathan Michalon !<br>
So far it is doing everything we need.<br>
<br>
Roland <br>
- as you still need to lock the assigned GPU's down
to a particular user and Job ID on the backend nodes <br>
- we do this with cuda_wrapper<br>
so there is no real need for Maui to specify the particular gpu
eg gpu/2 or gpu_2<br>
(apart form it just being a bit cleaner)<br>
<br>
We use both the Torque and Maui directives:<br>
torque #PBS -l gpus=1<br>
maui #PBS -W x=GRES:gpu@1<br>
<br>
Maui side directive+Patch takes care of the number of gpu's
actually being available<br>
Torque gives you the environment variable
PBS_RESOURCE_GRES=gpus=1<br>
<br>
The prologue script is responsible for assigning an available
gpu to this user and JobID. via wrapper_init<br>
When the job finishes or is killed - epilogue release the gpu
back into the pool. via wrapper_terminate<br>
<br>
<br>
These two scripts should be aware of the gpus avail and in use
at any time.<br>
- As Maui has ensured they should be available. *if not then the
prologue and epilogue can send admins an Error email so it can be
checked.*<br>
<br>
Its still early days for us - but so far so good.<br>
<br>
But yes it would be a nice if Maui could tell the backend nodes
about the number of GPU's assigned (and possibly the device number)
: eliminating the need for the extra #PBS -l gpus=1 setting. <br>
But not a show stopper.<br>
<br>
<br>
Regards<br>
Sean<br>
<br>
<br>
On 06/03/12 04:36, <a class="moz-txt-link-abbreviated" href="mailto:rf@q-leap.de">rf@q-leap.de</a> wrote:
<blockquote cite="mid:20309.169.908792.976630@gargle.gargle.HOWL"
type="cite">
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<blockquote type="cite">
<pre wrap="">"Jonathan" == Jonathan Michalon <a class="moz-txt-link-rfc2396E" href="mailto:jonathan.michalon@etu.unistra.fr"><jonathan.michalon@etu.unistra.fr></a> writes:
</pre>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
</blockquote>
<pre wrap="">
Hi Jonathan,
while your patch adds some functionality to count allocated GPUs as
a GRES, it lacks the important functionality to tell the job which GPUs
are available for it. If latest torque 2.5.x is built with GPU support,
you have the option to specify a nodes spec like "-l nodes=1:gpus=1" and
within the running job you can check $GPUFILE what GPUs you're
allocated. Now the problem is that a job with a "-l nodes=1:gpus=1"
specification won't be started with maui even if it has your patch. On
the other hand, using your "-W x=GRES:gpu@1" spec (without a "-l
nodes=1:gpus=1" spec) makes the job run, but
it doesn't have an idea which GPU to use. Is there an easy way to extend
your patch, so that maui will make a job run with the "-l
nodes=1:gpus=1" spec?
Cheers,
Roland
Jonathan> Hi Maui folks, GPUs in Maui are a long standing
Jonathan> problem. Last year a patch was sent by Mariusz Mamoński
Jonathan> [1], which works based on GRES parameters. I've just made
Jonathan> GPUs kind of working, by enhancing that patch. Please find
Jonathan> attached the resulting patch, which works well for Maui
Jonathan> 3.3.1. It defines a special GRES named "gpu" which works
Jonathan> as expected on my test cases.
Jonathan> Note that GRES behaviour seems quite confused as sometimes
Jonathan> they are mentioned as consumable. This patch annihilates
Jonathan> this behaviour, for the needs of GPUs.
Jonathan> To use the patch: get the sources of maui-3.3.1 and patch
Jonathan> them: patch -p1 < ../Patch-for-gpu-GRES.patch then compile
Jonathan> as usual.
Jonathan> You have to configure the GPUs in maui.cfg:
Jonathan> NODECFG[nodename] GRES=gpu:2
Jonathan> Then when queuing jobs you can request GPUs with (Torque
Jonathan> syntax): qsub -W x=GRES:gpu@1
Jonathan> I hope this helps, please test this and enhance to your
Jonathan> needs!
Jonathan> [1]
Jonathan> <a class="moz-txt-link-freetext" href="http://www.supercluster.org/pipermail/mauiusers/2011-April/004622.html">http://www.supercluster.org/pipermail/mauiusers/2011-April/004622.html</a>
Jonathan> Regards,
Jonathan> PS. This is the second attempt to send the mail…
Jonathan> -- Jonathan Michalon IT student in Strasbourg
_______________________________________________
torqueusers mailing list
<a class="moz-txt-link-abbreviated" href="mailto:torqueusers@supercluster.org">torqueusers@supercluster.org</a>
<a class="moz-txt-link-freetext" href="http://www.supercluster.org/mailman/listinfo/torqueusers">http://www.supercluster.org/mailman/listinfo/torqueusers</a>
</pre>
</blockquote>
<br>
<br>
<div class="moz-signature">-- <br>
<font size="2">
<b>Sean Reilly</b><br>
<br>
Systems Administrator & Applications Support Officer<br>
eResearchSA<br>
Phone : +61 8 8313 8352<br>
Mobile: +61 450 840 246<br>
<br>
<a href="http://www.ersa.edu.au/moving"> <img
src="cid:part1.05010709.00090006@ersa.edu.au"></a> </font></div>
</body>
</html>