<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<meta name="Generator" content="Microsoft Exchange Server">
<!-- converted from text --><style><!-- .EmailQuote { margin-left: 1pt; padding-left: 4pt; border-left: #800000 2px solid; } --></style>
</head>
<body>
<style>
<!--
.x_EmailQuote
        {margin-left:1pt;
        padding-left:4pt;
        border-left:#800000 2px solid}
-->
</style>
<div><br>
<br>
Sent from my Verizon Wireless 4GLTE smartphone<br>
<br>
<div id="x_htc_header" style="">----- Reply message -----<br>
From: "Gus Correa" <gus@ldeo.columbia.edu><br>
To: "Torque Users Mailing List" <torqueusers@supercluster.org><br>
Subject: [torqueusers] pbsnodes reports the same job running many times<br>
Date: Thu, Apr 19, 2012 3:47 pm<br>
<br>
</div>
<br>
<br>
</div>
<font size="2"><span style="font-size:10pt;">
<div class="PlainText">On 04/19/2012 06:25 PM, Leonardo Gregory Brunnet wrote:<br>
> Hi Gus,<br>
><br>
> Problem solved using simply "nodes=X".<br>
> Thanks for all suggestions!<br>
><br>
> Leonardo<br>
> P.S. We never had Moab here... ncpus appeared probably<br>
> from some foreign script ;) .<br>
><br>
<br>
We also use Torque+Maui here.<br>
<br>
I don't remember exactly, but ncpus may work under the barebones<br>
Torque/PBS scheduler pbs_sched, besides Moab.<br>
'ncpus' seems to be a bit troublesome with Maui, though.<br>
The easy solution is to ask the users to stick to the<br>
'nodes=X' syntax.<br>
In a more elaborate solution you can write a qsub wrapper to<br>
replace 'ncpus' the the 'nodes' and 'ppn' syntax.<br>
<br>
Gus Correa<br>
<br>
> On 19-04-2012 18:08, Gus Correa wrote:<br>
>> Hi Leonardo<br>
>><br>
>> On 04/19/2012 04:18 PM, Leonardo Gregory Brunnet wrote:<br>
>><br>
>>> Hi Gus,<br>
>>><br>
>>> Thanks for the answer.<br>
>>><br>
>>> Yes, I am surprised that it is using four processors.<br>
>>> As previously replied to David the argument used in the qsub script was<br>
>>> ...<br>
>>> #PBS -l ncpus=1<br>
>>> ...<br>
>>><br>
>> Somebody may correct me, but I think ncpus is a Moab thing,<br>
>> which may or may not work right with Torque+Maui.<br>
>> If you search this mailing list you will find other postings<br>
>> about ncpus.<br>
>><br>
>> Here we don't use ncpus.<br>
>> We stick to the 'nodes=X:ppn=Y' syntax.<br>
>> It works for us.<br>
>><br>
>><br>
>>> and I suppose this is correct. But in fact I don't know the difference<br>
>>> between this one<br>
>>> above and<br>
>>> #PBS -l nodes=1<br>
>>><br>
>>> I have also checked that in maui.cfg there is no specification for<br>
>>><br>
>>> JOBNODEMATCHPOLICY<br>
>>><br>
>>> but, in fact I don't know what is the default. If EXACTNODE is the default<br>
>>> I should explicitely add a line to maui.cfg, correct?<br>
>>><br>
>>><br>
>> Check JOBNODEMATCHPOLICY in the Maui Admin guide, although it<br>
>> doesn't tell the default.<br>
>><br>
>> <a href="http://www.adaptivecomputing.com/resources/docs/maui/a.fparameters.php">
http://www.adaptivecomputing.com/resources/docs/maui/a.fparameters.php</a><br>
>><br>
>> You can add the line with your option for JOBNODEMATCHPOLICY<br>
>> to maui.cfg and restart maui.<br>
>> We use EXACTNODE here.<br>
>><br>
>> Gus Correa<br>
>><br>
>><br>
>>> Leonardo<br>
>>><br>
>>> On 19-04-2012 12:44, Gus Correa wrote:<br>
>>><br>
>>>> Hi Leonardo<br>
>>>><br>
>>>> Not sure if I understood the problem right.<br>
>>>> I guess the job is legitimate and running,<br>
>>>> but it surprises you that it is using four processors,<br>
>>>> right?<br>
>>>><br>
>>>> Did the user request four processors, perhaps,<br>
>>>> even though he/she is running a serial job?<br>
>>>> #PBS -l nodes=1:ppn=4<br>
>>>> This may be reasonable, say, if his/her job needs a lot<br>
>>>> of RAM, but the job is serial<br>
>>>> [or if it is Matlab ... the king of memory-greediness ...]<br>
>>>><br>
>>>> Also, beware of JOBNODEMATCHPOLICY in Maui [maui.cfg]:<br>
>>>> <a href="http://www.adaptivecomputing.com/resources/docs/maui/a.fparameters.php">
http://www.adaptivecomputing.com/resources/docs/maui/a.fparameters.php</a><br>
>>>> If set to EXACTNODE full nodes will be allocated.<br>
>>>><br>
>>>> I hope this helps,<br>
>>>> Gus Correa<br>
>>>><br>
>>>> On 04/18/2012 06:26 PM, Leonardo Gregory Brunnet wrote:<br>
>>>><br>
>>>><br>
>>>>> Dear All,<br>
>>>>><br>
>>>>> In a fresh installed torque/maui cluster the server reports<br>
>>>>> repeated execution of a job in a given node. (There is no job running<br>
>>>>> mpi)!.<br>
>>>>><br>
>>>>> The output for pbsnodes for one given node gives:<br>
>>>>><br>
>>>>> node131<br>
>>>>> state = job-exclusive<br>
>>>>> np = 4<br>
>>>>> properties = quadcore<br>
>>>>> ntype = cluster<br>
>>>>> jobs = 0/78898.master.cluster.XX.XX.XX,<br>
>>>>> 1/78898.master.cluster.XX.XX.XX, 2/78898.master.cluster.XX.XX.XX,<br>
>>>>> 3/78898.master.XX.XX.XX<br>
>>>>> status =<br>
>>>>> rectime=1334786811,varattr=,jobs=78898.master.cluster.if.ufrgs.br,state=free,netload=2914588064,gres=,loadave=1.00,ncpus=4,physmem=3985876kb,availmem=4649240kb,totmem=5062188kb,idletime=535832,nusers=2,nsessions=2,sessions=2804<br>
>>>>> 8224,uname=Linux node131 2.6.23-1-amd64 #1 SMP Fri Oct 12 23:45:48 UTC<br>
>>>>> 2007 x86_64,opsys=linux<br>
>>>>> gpus = 0<br>
>>>>><br>
>>>>> But, if we log in that node we will see what was expected, a single job.<br>
>>>>> Since the torque server (or maui) "believes" all cpu's of that node are<br>
>>>>> working,<br>
>>>>> no other jobs are sent. Any clues ?<br>
>>>>><br>
>>>>> Thanks for the help!<br>
>>>>><br>
>>>>> Leonardo<br>
>>>>><br>
>>>>> Below, you find the output for<br>
>>>>> # qmgr -c "p s"<br>
>>>>><br>
>>>>> #<br>
>>>>> # Create queues and set their attributes.<br>
>>>>> #<br>
>>>>> #<br>
>>>>> # Create and define queue padrao<br>
>>>>> #<br>
>>>>> create queue padrao<br>
>>>>> set queue padrao queue_type = Execution<br>
>>>>> set queue padrao resources_default.nodes = 7<br>
>>>>> set queue padrao resources_default.walltime = 01:00:00<br>
>>>>> set queue padrao max_user_run = 5<br>
>>>>> set queue padrao enabled = True<br>
>>>>> set queue padrao started = True<br>
>>>>> #<br>
>>>>> # Create and define queue um_mes<br>
>>>>> #<br>
>>>>> create queue um_mes<br>
>>>>> set queue um_mes queue_type = Execution<br>
>>>>> set queue um_mes resources_max.nodes = 7<br>
>>>>> set queue um_mes resources_default.nodes = 7<br>
>>>>> set queue um_mes resources_default.walltime = 720:00:00<br>
>>>>> set queue um_mes max_user_run = 5<br>
>>>>> set queue um_mes enabled = True<br>
>>>>> set queue um_mes started = True<br>
>>>>> #<br>
>>>>> # Create and define queue batch<br>
>>>>> #<br>
>>>>> create queue batch<br>
>>>>> set queue batch queue_type = Execution<br>
>>>>> set queue batch resources_default.nodes = 1<br>
>>>>> set queue batch resources_default.walltime = 01:00:00<br>
>>>>> set queue batch enabled = True<br>
>>>>> set queue batch started = True<br>
>>>>> #<br>
>>>>> # Create and define queue um_dia<br>
>>>>> #<br>
>>>>> create queue um_dia<br>
>>>>> set queue um_dia queue_type = Execution<br>
>>>>> set queue um_dia resources_max.nodes = 7<br>
>>>>> set queue um_dia resources_default.nodes = 7<br>
>>>>> set queue um_dia resources_default.walltime = 24:00:00<br>
>>>>> set queue um_dia max_user_run = 7<br>
>>>>> set queue um_dia enabled = True<br>
>>>>> set queue um_dia started = True<br>
>>>>> #<br>
>>>>> # Create and define queue uma_semana<br>
>>>>> #<br>
>>>>> create queue uma_semana<br>
>>>>> set queue uma_semana queue_type = Execution<br>
>>>>> set queue uma_semana resources_max.nodes = 7<br>
>>>>> set queue uma_semana resources_default.nodes = 7<br>
>>>>> set queue uma_semana resources_default.walltime = 168:00:00<br>
>>>>> set queue uma_semana max_user_run = 5<br>
>>>>> set queue uma_semana enabled = True<br>
>>>>> set queue uma_semana started = True<br>
>>>>> #<br>
>>>>> # Create and define queue route<br>
>>>>> #<br>
>>>>> create queue route<br>
>>>>> set queue route queue_type = Route<br>
>>>>> set queue route route_destinations = padrao<br>
>>>>> set queue route route_destinations += padrao2<br>
>>>>> set queue route enabled = True<br>
>>>>> set queue route started = True<br>
>>>>> #<br>
>>>>> # Set server attributes.<br>
>>>>> #<br>
>>>>> set server scheduling = True<br>
>>>>> set server acl_hosts = master.cluster.XX.XX.XX<br>
>>>>> set server acl_hosts += clusterapg<br>
>>>>> set server managers = root@master.cluster.XX.XX.XX<br>
>>>>> set server operators = root@master.cluster.XX.XX.XX<br>
>>>>> set server default_queue = padrao<br>
>>>>> set server log_events = 511<br>
>>>>> set server mail_from = adm<br>
>>>>> set server scheduler_iteration = 600<br>
>>>>> set server node_check_rate = 150<br>
>>>>> set server tcp_timeout = 6<br>
>>>>> set server mom_job_sync = True<br>
>>>>> set server keep_completed = 300<br>
>>>>> set server next_job_number = 79033<br>
>>>>><br>
>>>>><br>
>>>>><br>
>>>> _______________________________________________<br>
>>>> torqueusers mailing list<br>
>>>> torqueusers@supercluster.org<br>
>>>> <a href="http://www.supercluster.org/mailman/listinfo/torqueusers">http://www.supercluster.org/mailman/listinfo/torqueusers</a><br>
>>>><br>
>>>><br>
>>>><br>
>>><br>
>> _______________________________________________<br>
>> torqueusers mailing list<br>
>> torqueusers@supercluster.org<br>
>> <a href="http://www.supercluster.org/mailman/listinfo/torqueusers">http://www.supercluster.org/mailman/listinfo/torqueusers</a><br>
>><br>
>><br>
><br>
<br>
_______________________________________________<br>
torqueusers mailing list<br>
torqueusers@supercluster.org<br>
<a href="http://www.supercluster.org/mailman/listinfo/torqueusers">http://www.supercluster.org/mailman/listinfo/torqueusers</a><br>
</div>
</span></font>
</body>
</html>