Hi,<br>Still the PBS job submition and run problem, <br>I have installed oscar5.0 successfully , <br>But, when I qsub a job, the job is always in Q state, after few seconds, qstat shows nothing, I can't see middle process, and there is no output or error logs . Actually there is no mistake in job script,
<br><br>Maui is the default scheduler, the log of maui shows:<br>-----------------------------------------------------------------------------------<br>09/19 12:17:21 INFO: 2 PBS jobs detected on RM base<br>09/19 12:17:21 INFO: jobs detected: 2
<br>09/19 12:17:21 MStatClearUsage(node,Active)<br>09/19 12:17:21 MClusterUpdateNodeState()<br>09/19 12:17:21 MQueueSelectAllJobs(Q,HARD,ALL,JIList,DP,Msg)<br>09/19 12:17:21 INFO: job '308' Priority: 1<br>
00.0) Res: 0(00.0) Us: 0(00.0)<br>09/19 12:17:21 INFO: job '309' Priority: 1<br>00.0) Res: 0(00.0) Us: 0(00.0)<br><br>--------------------------------------------------------------------------------
<br>09/19 12:45:25 INFO: node 'oscarnode1.oscardomain' returned to idle pool<br>09/19 12:45:25 INFO: job ' 312' completed. QueueTime: 11 RunTime: 11 Accuracy: 0.61 X<br>Factor:
0.01<br>09/19 12:45:25 INFO: overall statistics. Accuracy: 0.00 XFactor: 0.00<br>09/19 12:45:25 INFO: job '312' completed X: 0.012222 T: 11 PS: 11 A: 0.006111<br>09/19 12:45:25 MJobSendFB(312)<br>09/19 12:45:25 MSysLaunchAction(ASList,2)
<br>09/19 12:45:25 INFO: job usage sent for job '312'<br>-----------------------------------------------------------------------------------<br><br>Can anyone tell me what's the problem?<br>Is it the problem of maui config , or something else.
<br><br>Thanks for help!<br><br><div><span class="gmail_quote">On 9/19/07, <b class="gmail_sendername"><a href="mailto:torqueusers-request@supercluster.org">torqueusers-request@supercluster.org</a></b> <<a href="mailto:torqueusers-request@supercluster.org">
torqueusers-request@supercluster.org</a>> wrote:</span><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">Send torqueusers mailing list submissions to
<br> <a href="mailto:torqueusers@supercluster.org">torqueusers@supercluster.org</a><br><br>To subscribe or unsubscribe via the World Wide Web, visit<br> <a href="http://www.supercluster.org/mailman/listinfo/torqueusers">
http://www.supercluster.org/mailman/listinfo/torqueusers</a><br>or, via email, send a message with subject or body 'help' to<br> <a href="mailto:torqueusers-request@supercluster.org">torqueusers-request@supercluster.org
</a><br><br>You can reach the person managing the list at<br> <a href="mailto:torqueusers-owner@supercluster.org">torqueusers-owner@supercluster.org</a><br><br>When replying, please edit your Subject line so it is more specific
<br>than "Re: Contents of torqueusers digest..."<br><br><br>Today's Topics:<br><br> 1. problems running jobs: Error:Number of meshes not equal to<br> number of threads (Nilesh Mistry)<br> 2. Re: defining queues by user defined node features
<br> (P Spencer Davis)<br> 3. Re: defining queues by user defined node features<br> (Garrick Staples)<br> 4. about multiserver (vanilla)<br> 5. Re: about multiserver (Jacques Foury)<br><br><br>----------------------------------------------------------------------
<br><br>Message: 1<br>Date: Mon, 17 Sep 2007 14:35:43 -0400<br>From: Nilesh Mistry <<a href="mailto:Nilesh.Mistry@senecac.on.ca">Nilesh.Mistry@senecac.on.ca</a>><br>Subject: [torqueusers] problems running jobs: Error:Number of meshes
<br> not equal to number of threads<br>To: <a href="mailto:torqueusers@supercluster.org">torqueusers@supercluster.org</a>, <a href="mailto:oscar-users@lists.sourceforge.net">oscar-users@lists.sourceforge.net</a>,<br>
<a href="mailto:mauiusers@supercluster.org">mauiusers@supercluster.org</a><br>Message-ID: <<a href="mailto:46EEC8FF.5000102@senecac.on.ca">46EEC8FF.5000102@senecac.on.ca</a>><br>Content-Type: text/plain; charset=ISO-8859-1; format=flowed
<br><br>Hello<br><br>I am having problems submitting job that requires 23 threads. I keep<br>getting the following error:<br><br>ERROR: Number of meshes not equal to number of thread<br><br>Hardware:<br>10 quad core nodes (therefore 40 processors available)
<br><br>What do I need to insure in my job queue (qmgr) , maui (maui.cfg) and<br>my submit script when using qsub?<br><br>Any and all help is greatly appreciated.<br><br>--<br>Thanks<br><br>Nilesh Mistry<br>Academic Computing Services
<br>Seneca@York & TEL Campus<br>Seneca College Of Applies Arts & Technology<br>70 The Pond Road<br>Toronto, Ontario<br>M3J 3M6 Canada<br>Phone 416 491 5050 ext 3788<br>Fax 416 661 4695<br><a href="http://acs.senecac.on.ca">
http://acs.senecac.on.ca</a><br><br><br><br><br>------------------------------<br><br>Message: 2<br>Date: Mon, 17 Sep 2007 15:12:57 -0400<br>From: P Spencer Davis <<a href="mailto:psdavis@bsu.edu">psdavis@bsu.edu</a>>
<br>Subject: Re: [torqueusers] defining queues by user defined node<br> features<br>To: <a href="mailto:torqueusers@supercluster.org">torqueusers@supercluster.org</a><br>Message-ID: <<a href="mailto:46EED1B9.5020203@bsu.edu">
46EED1B9.5020203@bsu.edu</a>><br>Content-Type: text/plain; charset=ISO-8859-1; format=flowed<br><br>One final problem, I had to change the queues so that they all have<br>resource_min.nodes=1:x86 or resource_min.nodes=1:em64 in order have jobs
<br>that request more than one processor to get queued, however this means<br>that qsub -l nodes=em64 will no longer work, nor will qsub -l<br>nodes=n35:em64. Have I just made a mess of this, or do I need to add a<br>set of serial queues as well?
<br> Spencer<br><br>P Spencer Davis wrote:<br>> Ok, I figured out my problem. It boils down to renaming the x86-64<br>> variable in my nodes file. When it was changed to em64, with the<br>> available_resource.nodes=em64 set for the short-64 and long-64 queues,
<br>> the jobs where being sorted into the proper queues. Then I set the<br>> acl_hosts=n(n)+...+n(n+1), set acl_host_enable=false, restarted maui and<br>> torque and everything works.<br>> Hope this helps someone else,
<br>> and thanks to the group for listening to me think my way<br>> out of the problem<br>> Spencer Davis<br>><br>> P Spencer Davis wrote:<br>>> I tried shutting down Maui and running the default pbs_sched instead.
<br>>> No change in behavior. I've set the resource_available.nodes to x86<br>>> or x84-64 in the execution queues thinking that the routing queue<br>>> would then route the 32 bit requests to short or long and the 64 bit
<br>>> jobs to short-64 or long-64 depending on the wall time requested, but<br>>> that has no effect. At this point I have no idea what I am doing<br>>> wrong, Any ideas?<br>>> Thanks,
<br>>> Spencer<br>>><br>><br>> _______________________________________________<br>> torqueusers mailing list<br>> <a href="mailto:torqueusers@supercluster.org">torqueusers@supercluster.org
</a><br>> <a href="http://www.supercluster.org/mailman/listinfo/torqueusers">http://www.supercluster.org/mailman/listinfo/torqueusers</a><br>><br>><br><br><br>------------------------------<br><br>Message: 3<br>Date: Mon, 17 Sep 2007 14:27:48 -0700
<br>From: Garrick Staples <<a href="mailto:garrick@usc.edu">garrick@usc.edu</a>><br>Subject: Re: [torqueusers] defining queues by user defined node<br> features<br>To: <a href="mailto:torqueusers@supercluster.org">
torqueusers@supercluster.org</a><br>Message-ID: <<a href="mailto:20070917212747.GZ19043@polop.usc.edu">20070917212747.GZ19043@polop.usc.edu</a>><br>Content-Type: text/plain; charset="us-ascii"<br><br>On Fri, Sep 14, 2007 at 03:47:43PM -0400, P Spencer Davis alleged:
<br>> Hello,<br>> I'm running v 2.1.6 of PBS as a resource manager with v 3.2.6p19 of<br>> the Maui scheduler. All the compute nodes are running RHEL 4 with the<br>> 2.6.9-55 kernel. The cluster is heterogious, 32 of the nodes are 32 bit
<br>> dual processor, and the other 32 are 64 bit dual processor. The nodes<br>> file in server_priv is configured as follows (edited for brevity)<br>> ...<br>> n31 np=2 x86<br>> n32 np=2 x86-64<br>> ...
<br><br>My advise is a completely different direction. Don't use the arch as a node property. There is already a node attribute called "arch" that you can use for this.<br><br>If you look at 'pbsnodes -a', you'll arch=i686 and arch=x86_64 associated with
<br>the different nodes. Then just add that arch to your resource request.<br><br>In general, if you've compiled and installed software correctly, 32bit binaries<br>run correctly on 64bit hosts. This means that users of 32bit binaries can
<br>simply omit the arch because their jobs run everywhere. Users of 64bit<br>binaries add "arch=x86_64" to their request and it will only run on 64bit<br>nodes.<br><br>-------------- next part --------------<br>
A non-text attachment was scrubbed...<br>Name: not available<br>Type: application/pgp-signature<br>Size: 189 bytes<br>Desc: not available<br>Url : <a href="http://www.supercluster.org/pipermail/torqueusers/attachments/20070917/57c1775f/attachment-0001.bin">
http://www.supercluster.org/pipermail/torqueusers/attachments/20070917/57c1775f/attachment-0001.bin</a><br><br>------------------------------<br><br>Message: 4<br>Date: Tue, 18 Sep 2007 11:04:54 +0800<br>From: vanilla <
<a href="mailto:vanilla0111@gmail.com">vanilla0111@gmail.com</a>><br>Subject: [torqueusers] about multiserver<br>To: <a href="mailto:torqueusers@supercluster.org">torqueusers@supercluster.org</a><br>Message-ID:<br> <
<a href="mailto:81dd40cd0709172004t312f277cge596a3642299321c@mail.gmail.com">81dd40cd0709172004t312f277cge596a3642299321c@mail.gmail.com</a>><br>Content-Type: text/plain; charset="iso-8859-1"<br><br>I have some trouble in pbs job submission and run. I know it is because of
<br>multiserver, but I can't mend it.<br>The cluster (oscar 5.0) has one head node and one compute node, as the<br>following:<br>cat /etc/hosts<br>----------------------<br># Do not remove the following line, or various programs
<br># that require network functionality will fail.<br><a href="http://127.0.0.1">127.0.0.1</a> localhost.localdomain localhost<br><a href="http://192.168.190.1">192.168.190.1</a> oscar_server.oscardomain oscar_server nfs_oscar pbs_oscar
<br><a href="http://192.168.22.107">192.168.22.107</a> dchen-linux.localdomain dchen-linux<br><br># These entries are managed by SIS, please don't modify them.<br><a href="http://192.168.190.2">192.168.190.2</a>
oscarnode1.oscardomain oscarnode1<br>---------------------------<br>1. when I config /var/spool/pbs/torque.cfg file as the following:<br>-----------------------------<br> 1 QSUBSLEEP 2<br> 2 SERVERHOST dchen-linux
<br> 3 ALLOWCOMPUTEHOSTSUMBIT true<br>------------------------------<br>qsub is successful and I can see all jobs in qstat , but all jobs just in<br>queue, can't run.<br><br>2. when I config /var/spool/pbs/torque.cfg file in another way:
<br>---------------------------------<br> 1 QSUBSLEEP 2<br> 2 SERVERHOST oscar_server<br> 3 ALLOWCOMPUTEHOSTSUMBIT true<br>----------------------------------<br>qsub failed.<br><br>How to config and run qsub successfully?
<br>Thanks for help.<br>-------------- next part --------------<br>An HTML attachment was scrubbed...<br>URL: <a href="http://www.supercluster.org/pipermail/torqueusers/attachments/20070918/264cbc97/attachment-0001.html">
http://www.supercluster.org/pipermail/torqueusers/attachments/20070918/264cbc97/attachment-0001.html</a><br><br>------------------------------<br><br>Message: 5<br>Date: Tue, 18 Sep 2007 18:30:40 +0200<br>From: Jacques Foury <
<a href="mailto:Jacques.Foury@math.u-bordeaux1.fr">Jacques.Foury@math.u-bordeaux1.fr</a>><br>Subject: Re: [torqueusers] about multiserver<br>To: <a href="mailto:torqueusers@supercluster.org">torqueusers@supercluster.org
</a><br>Message-ID: <<a href="mailto:46EFFD30.6000607@math.u-bordeaux1.fr">46EFFD30.6000607@math.u-bordeaux1.fr</a>><br>Content-Type: text/plain; charset=ISO-8859-15; format=flowed<br><br>vanilla a écrit :<br>> I have some trouble in pbs job submission and run. I know it is
<br>> because of multiserver, but I can't mend it.<br>What is a "multiserver" ? Torque can only have a single server, as far<br>as I know...<br>> The cluster (oscar 5.0) has one head node and one compute node, as the
<br>> following:<br>> cat /etc/hosts<br>> ----------------------<br>> # Do not remove the following line, or various programs<br>> # that require network functionality will fail.<br>> <a href="http://127.0.0.1">
127.0.0.1</a> <<a href="http://127.0.0.1">http://127.0.0.1</a>> localhost.localdomain localhost<br>> <a href="http://192.168.190.1">192.168.190.1</a> <<a href="http://192.168.190.1">http://192.168.190.1
</a>> oscar_server.oscardomain<br>> oscar_server nfs_oscar pbs_oscar<br>> <a href="http://192.168.22.107">192.168.22.107</a> <<a href="http://192.168.22.107">http://192.168.22.107</a>> dchen-linux.localdomain
<br>> dchen-linux<br>><br>> # These entries are managed by SIS, please don't modify them.<br>> <a href="http://192.168.190.2">192.168.190.2</a> <<a href="http://192.168.190.2">http://192.168.190.2</a>>
oscarnode1.oscardomain<br>> oscarnode1<br>> ---------------------------<br>> 1. when I config /var/spool/pbs/torque.cfg file as the following:<br>> -----------------------------<br>> 1 QSUBSLEEP 2<br>
> 2 SERVERHOST dchen-linux<br>> 3 ALLOWCOMPUTEHOSTSUMBIT true<br>> ------------------------------<br>> qsub is successful and I can see all jobs in qstat , but all jobs<br>> just in queue, can't run.
<br><br>Do you have a scheduler ? Does it run ? It is the scheduler, which<br>orders the jobs to start !<br>Anyway I don't know that file, maybe it's OSCAR-specific... can you run<br>qmgr -c "p s" and tell us what's the Torque server ?
<br><br>What's the version of Torque you're using ? Recently Torque is<br>prefferably in /var/lib/torque ... and the config file is only read when<br>creating the database for torque. After that first start, use qmgr to
<br>change parameters... and stop/start the services.<br>><br>> 2. when I config /var/spool/pbs/torque.cfg file in another way:<br>> ---------------------------------<br>> 1 QSUBSLEEP 2<br>> 2 SERVERHOST oscar_server
<br>> 3 ALLOWCOMPUTEHOSTSUMBIT true<br>> ----------------------------------<br>> qsub failed.<br>><br>> How to config and run qsub successfully?<br>> Thanks for help.<br><br>What you want is a submit host ?
<br>Just add your submit host to server's /etc/hosts.equiv and install the<br>Torque client package on the submit host.<br><br>--<br><br>Jacques Foury<br>Institut de Mathématiques de Bordeaux<br>Université Bordeaux 1 / CNRS
<br>Tel : 05 4000 69 56<br>Fax : 05 4000 21 23<br><a href="http://www.math.u-bordeaux.fr/maths/cellule">http://www.math.u-bordeaux.fr/maths/cellule</a><br><br><br><br>------------------------------<br><br>_______________________________________________
<br>torqueusers mailing list<br><a href="mailto:torqueusers@supercluster.org">torqueusers@supercluster.org</a><br><a href="http://www.supercluster.org/mailman/listinfo/torqueusers">http://www.supercluster.org/mailman/listinfo/torqueusers
</a><br><br><br>End of torqueusers Digest, Vol 38, Issue 24<br>*******************************************<br></blockquote></div><br>