[Mauiusers] Maui gets stopped when submit jobs to PBS
Josh Butikofer
josh at clusterresources.com
Thu Dec 14 07:19:56 MST 2006
Berit,
Try running Maui under a debugger like gdb to see why Maui is shutting down. From your description,
I would guess Maui is crashing or experiencing a seg fault. Instructions on how run Maui under gdb
can can be found at
http://www.clusterresources.com/products/maui/docs/14.6troubleshootingsystemerrors.shtml, Section
14.6.2.1.
Please send the stack trace to the list so we can get this fixed.
Thanks,
--
Joshua Butikofer
Cluster Resources, Inc.
josh at clusterresources.com
Voice: (801) 717-3707
Fax: (801) 717-3738
--------------------------
Berit Hinnemann wrote:
> Hi all,
>
> I am new to installing Torque PBS and Maui. My system is a one dual-processor
> dual-core server for testing purposes, where I try things out before getting the
> actual cluster. I have installed both Torque PBS and this seems to work fine.
> Then I installed Maui and used the file maui.cfg as below, aside from telling
> that the queue system is PBS I did not change anything.
>
> Now the behavior is that I can start the 'maui' demon, issue 'showq' and see the
> queue, but when I submit a job, the maui demon seems to stop by itself. Then,
> when I issue "showq" I get
>
> [behi at RHE4Server 1proc]$ showq
> ERROR: cannot send request to server localhost.localdomain:42559 (server may
> not be running)
> ERROR: cannot request service (status)
>
> I have appended the lines generated in maui.log below.
> The job runs fine and I can also submit several jobs, which are just done in the
> order submitted. I can also restart maui and repeat this procedure.
>
> Does anybody have an idea where I should be looking to figure out what is wrong?
> I would be grateful on any hints on how to get started.
> Best, Berit
>
> --------------------------------------
> Berit Hinnemann
> Research Scientist
> Haldor Topsøe A/S
> ---------------------------------------
> -------------------------------------------------------------------------------------------------------------------------------------
> output from maui.log upon submitting a job
> 12/13 16:23:35 INFO: scheduling complete. sleeping 30 seconds
> 12/13 16:24:06 ServerProcessRequests()
> 12/13 16:24:06 INFO: not rolling logs (585245 < 10000000)
> 12/13 16:24:06 MResAdjust(NULL,0,0)
> 12/13 16:24:06 MStatInitializeActiveSysUsage()
> 12/13 16:24:06 MStatClearUsage([NONE],Active)
> 12/13 16:24:06 ServerUpdate()
> 12/13 16:24:06 MSysUpdateTime()
> 12/13 16:24:06 INFO: starting iteration 7
> 12/13 16:24:06 MRMGetInfo()
> 12/13 16:24:06 MClusterClearUsage()
> 12/13 16:24:06 MRMClusterQuery()
> 12/13 16:24:06 MPBSClusterQuery(localhost.localdomain,RCount,SC)
> 12/13 16:24:06 __MPBSGetNodeState(Name,State,PNode)
> 12/13 16:24:06 INFO: PBS node localhost.localdomain set to state Busy
> (job-exclusive)
> 12/13 16:24:06 INFO: node 'localhost.localdomain' changed states from Idle
> to Busy
> 12/13 16:24:06 ALERT: unexpected node transition on node
> 'localhost.localdomain' Idle -> Busy
> 12/13 16:24:06
> MPBSNodeUpdate(localhost.localdomain,localhost.localdomain,Busy,localhost.localdomain)
> 12/13 16:24:06 INFO: node localhost.localdomain has joblist
> '0/10.localhost.localdomain, 1/10.localhost.localdomain,
> 2/10.localhost.localdomain, 3/10.localhost.localdomain'
> 12/13 16:24:06 ALERT: cannot locate PBS job '10.localhost.localdomain'
> (running on node localhost.localdomain)
> 12/13 16:24:06 ALERT: cannot locate PBS job '10.localhost.localdomain'
> (running on node localhost.localdomain)
> 12/13 16:24:06 ALERT: cannot locate PBS job '10.localhost.localdomain'
> (running on node localhost.localdomain)
> 12/13 16:24:06 ALERT: cannot locate PBS job '10.localhost.localdomain'
> (running on node localhost.localdomain)
> 12/13 16:24:06 MPBSLoadQueueInfo(localhost.localdomain,localhost.localdomain,SC)
> 12/13 16:24:06 INFO: queue 'batch' started state set to True
> 12/13 16:24:06 INFO: class to node not mapping enabled for queue 'batch'
> adding class to all nodes
> 12/13 16:24:06 INFO: 1 PBS resources detected on RM localhost.localdomain
> 12/13 16:24:06 INFO: resources detected: 1
> 12/13 16:24:06 MRMWorkloadQuery()
> 12/13 16:24:06 MPBSWorkloadQuery(localhost.localdomain,JCount,SC)
> 12/13 16:24:06 MPBSJobLoad(10,10.localhost.localdomain,J,TaskList,0)
> 12/13 16:24:06 MReqCreate(10,SrcRQ,DstRQ,DoCreate)
> 12/13 16:24:06 INFO: processing node request line '1:ppn=4'
> 12/13 16:24:06 MJobSetCreds(10,behi,behi,)
> 12/13 16:24:06 INFO: default QOS for job 10 set to DEFAULT(0)
> (P:DEFAULT,U:[NONE],G:[NONE],A:[NONE],C:[NONE])
> 12/13 16:24:06 INFO: default QOS for job 10 set to DEFAULT(0)
> (P:DEFAULT,U:[NONE],G:[NONE],A:[NONE],C:[NONE])
> 12/13 16:24:06 INFO: default QOS for job 10 set to DEFAULT(0)
> (P:DEFAULT,U:[NONE],G:[NONE],A:[NONE],C:[NONE])
> 12/13 16:24:06 MResJCreate(10,MNodeList,-00:00:10,ActiveJob,Res)
> 12/13 16:24:06 MStatUpdateActiveJobUsage(10)
> ---------------------------------------------------------------------------------------------------------------------------------------
> maui.cfg
> # maui.cfg 3.2.6p18
>
> SERVERHOST localhost.localdomain
> # primary admin must be first in list
> ADMIN1 root
>
> # Resource Manager Definition
>
> RMCFG[localhost.localdomain] TYPE=PBS
>
> # Allocation Manager Definition
>
> AMCFG[bank] TYPE=NONE
>
> # full parameter docs at http://supercluster.org/mauidocs/a.fparameters.html
> # use the 'schedctl -l' command to display current configuration
>
> RMPOLLINTERVAL 00:00:30
>
> SERVERPORT 42559
> SERVERMODE NORMAL
>
> # Admin: http://supercluster.org/mauidocs/a.esecurity.html
>
>
> LOGFILE maui.log
> LOGFILEMAXSIZE 10000000
> LOGLEVEL 3
>
> # Job Priority: http://supercluster.org/mauidocs/5.1jobprioritization.html
>
> QUEUETIMEWEIGHT 1
>
> # FairShare: http://supercluster.org/mauidocs/6.3fairshare.html
>
> #FSPOLICY PSDEDICATED
> #FSDEPTH 7
> #FSINTERVAL 86400
> #FSDECAY 0.80
>
> # Throttling Policies: http://supercluster.org/mauidocs/6.2throttlingpolicies.html
>
> # NONE SPECIFIED
>
> # Backfill: http://supercluster.org/mauidocs/8.2backfill.html
>
> BACKFILLPOLICY FIRSTFIT
> RESERVATIONPOLICY CURRENTHIGHEST
>
> # Node Allocation: http://supercluster.org/mauidocs/5.2nodeallocation.html
>
> NODEALLOCATIONPOLICY MINRESOURCE
>
> # QOS: http://supercluster.org/mauidocs/7.3qos.html
>
> # QOSCFG[hi] PRIORITY=100 XFTARGET=100 FLAGS=PREEMPTOR:IGNMAXJOB
> # QOSCFG[low] PRIORITY=-1000 FLAGS=PREEMPTEE
>
> # Standing Reservations:
> http://supercluster.org/mauidocs/7.1.3standingreservations.html
>
> # SRSTARTTIME[test] 8:00:00
> # SRENDTIME[test] 17:00:00
> # SRDAYS[test] MON TUE WED THU FRI
> # SRTASKCOUNT[test] 20
> # SRMAXTIME[test] 0:30:00
>
> # Creds: http://supercluster.org/mauidocs/6.1fairnessoverview.html
>
> # USERCFG[DEFAULT] FSTARGET=25.0
> # USERCFG[john] PRIORITY=100 FSTARGET=10.0-
> # GROUPCFG[staff] PRIORITY=1000 QLIST=hi:low QDEF=hi
> # CLASSCFG[batch] FLAGS=PREEMPTEE
> # CLASSCFG[interactive] FLAGS=PREEMPTOR
>
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> mauiusers mailing list
> mauiusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/mauiusers
More information about the mauiusers
mailing list