<HTML dir=ltr><HEAD><TITLE>[Mauiusers] Why a rorque/maui job won't start?</TITLE>
<META http-equiv=Content-Type content="text/html; charset=unicode">
<META content="MSHTML 6.00.2900.3395" name=GENERATOR></HEAD>
<BODY>
<DIV id=idOWAReplyText35286 dir=ltr>
<DIV dir=ltr><FONT face=Arial color=#000000 size=2>From the line "#PBS -l mem=8gb,nodes=1:ppn=4,walltime=01:00:00," the user is saying, "I Need one node with four processors and 8 GB of RAM for one hour." If no nodes in your cluster have that configuration (four cores && 8 GB RAM), that's why it's blocked. </FONT></DIV>
<DIV dir=ltr><FONT face=Arial color=#000000 size=2></FONT> </DIV>
<DIV dir=ltr><FONT face=Arial color=#000000 size=2>There's no way this job will be able to be scheduled to run in a setup of "Worst case is that 3 processes run on one node and the 4th on another," because the user only requested one node.</FONT></DIV>
<DIV dir=ltr><FONT face=Arial size=2></FONT> </DIV>
<DIV dir=ltr><FONT face=Arial size=2>--Joe</FONT></DIV></DIV>
<DIV dir=ltr><BR>
<HR tabIndex=-1>
<FONT face=Tahoma size=2><B>From:</B> mauiusers-bounces@supercluster.org on behalf of Jim Kusznir<BR><B>Sent:</B> Fri 10/3/2008 12:17 PM<BR><B>To:</B> Discussion of Rocks Clusters; mauiusers@supercluster.org<BR><B>Subject:</B> [Mauiusers] Why a rorque/maui job won't start?<BR></FONT><BR></DIV>
<DIV>
<P><FONT size=2>Hello:<BR><BR>As I looked through the job queue on my cluster, I'm finding myself<BR>mystified....I have one job that just won't start, and I can't figure<BR>out why:<BR><BR>[root@aeolus changhun]# qstat<BR>Job id Name User Time Use S Queue<BR>------------------- ---------------- --------------- -------- - -----<BR>4428.aeolus CMAQ.aug.benz ramos 21:00:00 R default<BR>4429.aeolus CMAQ.dec.benz ramos 32:31:14 R default<BR>4437.aeolus hsa_xml.sh changhun 0 Q default<BR>4442.aeolus for.chem.ga2 sledburg 2095:20: R default<BR>4483.aeolus mem2Rjob2 wdavis 258:09:4 R default<BR><BR><BR>Job 4437 caught my attention, as it appears it should have started<BR>before 4442 and 4483, both of which want way more resources than it<BR>does. In addtion, at this moment I have 1 node available, and each of<BR>my nodes have 8 cores and 8GB ram. The users' job script reads:<BR><BR>[root@aeolus hsa_xml]# more hsa_xml.sh<BR>#PBS -l mem=8gb,nodes=1:ppn=4,walltime=01:00:00<BR>#PBS -m abe<BR>#PBS -M <deleted><BR># copy qsub's env to the job<BR>#PBS -V<BR><BR>cd $PBS_O_WORKDIR<BR>mpirun mpi_subdue -limit 100 hsa_xml.g<BR><BR>I'm still not entirely sure what the mem= flag is supposed to set, but<BR>in any case, here's what checkjob says:<BR><BR>[root@aeolus hsa_xml]# checkjob 4437<BR><BR><BR>checking job 4437<BR><BR>State: Idle<BR>Creds: user:changhun group:changhun class:default qos:DEFAULT<BR>WallTime: 00:00:00 of 1:00:00<BR>SubmitTime: Wed Oct 1 10:21:03<BR> (Time Queued Total: 1:22:49:15 Eligible: 00:00:00)<BR><BR>Total Tasks: 4<BR><BR>Req[0] TaskCount: 4 Partition: ALL<BR>Network: [NONE] Memory >= 0 Disk >= 0 Swap >= 0<BR>Opsys: [NONE] Arch: [NONE] Features: [NONE]<BR>Dedicated Resources Per Task: PROCS: 1 MEM: 2048M<BR><BR><BR>IWD: [NONE] Executable: [NONE]<BR>Bypass: 0 StartCount: 0<BR>PartitionMask: [ALL]<BR>Flags: RESTARTABLE<BR><BR>Holds: Batch (hold reason: NoResources)<BR>Messages: cannot create reservation for job '4437' (intital<BR>reservation attempt)<BR><BR>PE: 8.97 StartPriority: 540<BR>cannot select job 4437 for partition DEFAULT (job hold active)<BR><BR>>From this, it appears its trying to schedule 4 processes, with each<BR>process having 2 gig of RAM. Worst case is that 3 processes run on<BR>one node and the 4th on another....This has been available several<BR>times since it's been queued. Why won't this job run?<BR><BR>I suspect if the user removes the mem= limit, it will run, but this<BR>still leaves the question as to "why"<BR><BR>--Jim<BR>_______________________________________________<BR>mauiusers mailing list<BR>mauiusers@supercluster.org<BR><A href="http://www.supercluster.org/mailman/listinfo/mauiusers">http://www.supercluster.org/mailman/listinfo/mauiusers</A><BR></FONT></P></DIV></BODY></HTML>