<div dir="ltr">For others who are interested, the guidance at <a href="http://docs.adaptivecomputing.com/torque/Content/topics/11-troubleshooting/faq.htm#qsubNotAllow">http://docs.adaptivecomputing.com/torque/Content/topics/11-troubleshooting/faq.htm#qsubNotAllow</a> resolves my particular issue, so thanks Michel!</div>
<div class="gmail_extra"><br><br><div class="gmail_quote">On 7 February 2013 21:40, Gus Correa <span dir="ltr"><<a href="mailto:gus@ldeo.columbia.edu" target="_blank">gus@ldeo.columbia.edu</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Hi Andrew<br>
<br>
I never got much luck with procs=YZ,<br>
which is likely to be the syntax that matches what you want to do.<br>
Maui (the scheduler I use) seems not to understand that<br>
syntax very well.<br>
<br>
I wouldn't rely completely on the Torque documentation.<br>
It has good guidelines, but may have mistakes in the details.<br>
Trial and error may be the way to check what works for you.<br>
I wonder if the error message you see may come<br>
from different interpretations given to the word "node"<br>
by the torque server (pbs_server) and the scheduler (which<br>
maybe Maui, pbs_sched or perhaps Moab).<br>
<br>
If you want also to control to which nodes<br>
(and sockets and cores) each MPI *process* is sent to,<br>
I suggest that you build OpenMPI with Torque support.<br>
OpenMPI when built with Torque support<br>
will use the nodes and processors assigned<br>
by Torque to that job,<br>
but you can still decide how the sockets and<br>
cores are distributed among the various MPI processes,<br>
through switches to mpiexec such as --bynode, --bysocket,<br>
--bycore, or even finer control through their "rankfiles".<br>
<div class="im"><br>
I hope this helps,<br>
Gus Correa<br>
<br>
</div><div class="im">On 02/07/2013 03:54 PM, Andrew Dawson wrote:<br>
> Hi Gus,<br>
><br>
> Yes I can do that. What I would like to do is be able to have users<br>
> request the number of CPUs for an MPI job and not have to care how these<br>
> CPUs are distributed across physical nodes. If I do<br>
><br>
> #PBS -l nodes=1:ppn=8<br>
><br>
> then this will mean the job has to wait until there are 8 CPUs on one<br>
> physical node before starting, correct?<br>
><br>
> From the torque documentation, it seems to say I can do:<br>
><br>
> #PBS -l nodes=8<br>
><br>
> and this will be interpreted as 8 CPUs rather than 8 physical nodes.<br>
> This is what I want. Unfortunately I get the error message at submission<br>
> time saying there are not enough resources to fulfill this request, even<br>
> though there are 33 CPUs in the system. If on my system I do<br>
><br>
> #PBS -l nodes=5<br>
><br>
> then my MPI job gets sent to 5 CPUs, not necessarily on the same<br>
> physical node, which is great and exactly what I want. I would therefore<br>
> expect this to work for larger numbers but it seems that at submission<br>
> time the request is checked against the number of physical nodes rather<br>
> than virtual processors, meaning I cannot do this! It is quite frustrating.<br>
><br>
> Please ask if there is further clarification I can make.<br>
><br>
> Andrew<br>
><br>
><br>
> On 7 February 2013 19:28, Gus Correa <<a href="mailto:gus@ldeo.columbia.edu">gus@ldeo.columbia.edu</a><br>
</div><div class="im">> <mailto:<a href="mailto:gus@ldeo.columbia.edu">gus@ldeo.columbia.edu</a>>> wrote:<br>
><br>
> Hi Andrew<br>
><br>
> Not sure I understood what exactly you want to do,<br>
> but have you tried this?<br>
><br>
> #PBS -l nodes=1:ppn=8<br>
><br>
><br>
> It will request one node with 8 processors.<br>
><br>
> I hope this helps,<br>
> Gus Correa<br>
><br>
> On 02/07/2013 11:38 AM, Andrew Dawson wrote:<br>
> > Nodes file looks like this:<br>
> ><br>
> > cirrus np=1<br>
> > cirrus1 np=8<br>
> > cirrus2 np=8<br>
> > cirrus3 np=8<br>
> > cirrus4 np=8<br>
> ><br>
> > On 7 Feb 2013 16:25, "Ricardo Román Brenes"<br>
> <<a href="mailto:roman.ricardo@gmail.com">roman.ricardo@gmail.com</a> <mailto:<a href="mailto:roman.ricardo@gmail.com">roman.ricardo@gmail.com</a>><br>
</div>> > <mailto:<a href="mailto:roman.ricardo@gmail.com">roman.ricardo@gmail.com</a><br>
<div class="im">> <mailto:<a href="mailto:roman.ricardo@gmail.com">roman.ricardo@gmail.com</a>>>> wrote:<br>
> ><br>
> > hi!<br>
> ><br>
> > How does your node config file looks like?<br>
> ><br>
> > On Thu, Feb 7, 2013 at 3:10 AM, Andrew Dawson<br>
> <<a href="mailto:dawson@atm.ox.ac.uk">dawson@atm.ox.ac.uk</a> <mailto:<a href="mailto:dawson@atm.ox.ac.uk">dawson@atm.ox.ac.uk</a>><br>
</div><div><div class="h5">> > <mailto:<a href="mailto:dawson@atm.ox.ac.uk">dawson@atm.ox.ac.uk</a> <mailto:<a href="mailto:dawson@atm.ox.ac.uk">dawson@atm.ox.ac.uk</a>>>> wrote:<br>
> ><br>
> > Hi all,<br>
> ><br>
> > I'm configuring a recent torque/maui installation and I'm<br>
> having<br>
> > trouble with submitting MPI jobs. I would like for MPI<br>
> jobs to<br>
> > specify the number of processors they require and have those<br>
> > come from any available physical machine, the users shouldn't<br>
> > need to specify processors per node etc.<br>
> ><br>
> > The torque manual says that the nodes option is mapped to<br>
> > virtual processors, so for example:<br>
> ><br>
> > #PBS -l nodes=8<br>
> ><br>
> > should request 8 virtual processors. The problem I'm<br>
> having is<br>
> > that our cluster currently has only 5 physical machines<br>
> (nodes),<br>
> > and setting nodes to anything greater than 5 gives the error:<br>
> ><br>
> > qsub: Job exceeds queue resource limits MSG=cannot<br>
> locate<br>
> > feasible nodes (nodes file is empty or all systems are busy)<br>
> ><br>
> > I'm confused by this, we have 33 virtual processors available<br>
> > across the 5 nodes (4 8-core machines and one single<br>
> core) so my<br>
> > interpretation of the manual is that I should be able to<br>
> request<br>
> > 8 nodes, since these should be understood as virtual<br>
> processors?<br>
> > Am I doing something wrong?<br>
> ><br>
> > I tried setting<br>
> ><br>
> > #PBS -l procs=8<br>
> ><br>
> > but that doesn't seem to do anything, MPI stops due to having<br>
> > only 1 worker available (single core allocated to the job).<br>
> ><br>
> > Thanks,<br>
> > Andrew<br>
> ><br>
> > p.s.<br>
> ><br>
> > The queue I'm submitting jobs to is defined as:<br>
> ><br>
> > create queue normal<br>
> > set queue normal queue_type = Execution<br>
> > set queue normal resources_min.cput = 12:00:00<br>
> > set queue normal resources_default.cput = 24:00:00<br>
> > set queue normal disallowed_types = interactive<br>
> > set queue normal enabled = True<br>
> > set queue normal started = True<br>
> ><br>
> > and we are using torque version 2.5.12 and we are using maui<br>
> > 3.3.1 for scheduling<br>
> ><br>
> ><br>
> > _______________________________________________<br>
> > torqueusers mailing list<br>
> > <a href="mailto:torqueusers@supercluster.org">torqueusers@supercluster.org</a><br>
> <mailto:<a href="mailto:torqueusers@supercluster.org">torqueusers@supercluster.org</a>><br>
</div></div>> <mailto:<a href="mailto:torqueusers@supercluster.org">torqueusers@supercluster.org</a><br>
<div class="im">> <mailto:<a href="mailto:torqueusers@supercluster.org">torqueusers@supercluster.org</a>>><br>
> > <a href="http://www.supercluster.org/mailman/listinfo/torqueusers" target="_blank">http://www.supercluster.org/mailman/listinfo/torqueusers</a><br>
> ><br>
> ><br>
> ><br>
> > _______________________________________________<br>
> > torqueusers mailing list<br>
> > <a href="mailto:torqueusers@supercluster.org">torqueusers@supercluster.org</a><br>
> <mailto:<a href="mailto:torqueusers@supercluster.org">torqueusers@supercluster.org</a>><br>
</div>> <mailto:<a href="mailto:torqueusers@supercluster.org">torqueusers@supercluster.org</a><br>
<div class="im">> <mailto:<a href="mailto:torqueusers@supercluster.org">torqueusers@supercluster.org</a>>><br>
> > <a href="http://www.supercluster.org/mailman/listinfo/torqueusers" target="_blank">http://www.supercluster.org/mailman/listinfo/torqueusers</a><br>
> ><br>
> ><br>
> ><br>
> > _______________________________________________<br>
> > torqueusers mailing list<br>
> > <a href="mailto:torqueusers@supercluster.org">torqueusers@supercluster.org</a> <mailto:<a href="mailto:torqueusers@supercluster.org">torqueusers@supercluster.org</a>><br>
> > <a href="http://www.supercluster.org/mailman/listinfo/torqueusers" target="_blank">http://www.supercluster.org/mailman/listinfo/torqueusers</a><br>
><br>
> _______________________________________________<br>
> torqueusers mailing list<br>
</div>> <a href="mailto:torqueusers@supercluster.org">torqueusers@supercluster.org</a> <mailto:<a href="mailto:torqueusers@supercluster.org">torqueusers@supercluster.org</a>><br>
<div class="im">> <a href="http://www.supercluster.org/mailman/listinfo/torqueusers" target="_blank">http://www.supercluster.org/mailman/listinfo/torqueusers</a><br>
><br>
><br>
><br>
><br>
> --<br>
> Dr Andrew Dawson<br>
> Atmospheric, Oceanic & Planetary Physics<br>
> Clarendon Laboratory<br>
> Parks Road<br>
> Oxford OX1 3PU, UK<br>
> Tel: <a href="tel:%2B44%20%280%291865%20282438" value="+441865282438">+44 (0)1865 282438</a><br>
</div>> Email: <a href="mailto:dawson@atm.ox.ac.uk">dawson@atm.ox.ac.uk</a> <mailto:<a href="mailto:dawson@atm.ox.ac.uk">dawson@atm.ox.ac.uk</a>><br>
> Web Site: <a href="http://www2.physics.ox.ac.uk/contacts/people/dawson" target="_blank">http://www2.physics.ox.ac.uk/contacts/people/dawson</a><br>
><br>
><br>
<div class="HOEnZb"><div class="h5">> _______________________________________________<br>
> torqueusers mailing list<br>
> <a href="mailto:torqueusers@supercluster.org">torqueusers@supercluster.org</a><br>
> <a href="http://www.supercluster.org/mailman/listinfo/torqueusers" target="_blank">http://www.supercluster.org/mailman/listinfo/torqueusers</a><br>
<br>
_______________________________________________<br>
torqueusers mailing list<br>
<a href="mailto:torqueusers@supercluster.org">torqueusers@supercluster.org</a><br>
<a href="http://www.supercluster.org/mailman/listinfo/torqueusers" target="_blank">http://www.supercluster.org/mailman/listinfo/torqueusers</a><br>
</div></div></blockquote></div><br><br clear="all"><div><br></div>-- <br>Dr Andrew Dawson<br>Atmospheric, Oceanic & Planetary Physics<br>Clarendon Laboratory<br>Parks Road<br>Oxford OX1 3PU, UK<br>Tel: +44 (0)1865 282438<br>
Email: <a href="mailto:dawson@atm.ox.ac.uk" target="_blank">dawson@atm.ox.ac.uk</a><div>Web Site: <a href="http://www2.physics.ox.ac.uk/contacts/people/dawson" target="_blank">http://www2.physics.ox.ac.uk/contacts/people/dawson</a></div>
</div>