[Mauiusers] Maui vs pbs_sched
Andrus, Mr. Brian (Contractor)
brian.andrus at nrlmry.navy.mil
Mon Oct 29 12:29:27 MDT 2007
I am still having great trouble with this.
My maui.cfg:
---------------------
# MAUI configuration example
# @(#)maui.cfg David Groep 20031015.1
# for MAUI version 3.2.5
#
SERVERHOST cluster0
TYPE=PBS
PORT=15001
EPORT=15004
# Set PBS server polling interval. Since we have many short jobs
# and want fast turn-around, set this to 10 seconds (default: 2 minutes)
RMPOLLINTERVAL 00:00:10
# a max. 10 MByte log file in a logical location
LOGFILE /var/log/maui.log
LOGFILEMAXSIZE 10000000
LOGLEVEL 3
JOBNODEMATCHPOLICY EXACTNODE
NODEALLOCATIONPOLICY PRIORITY
NODECFG[DEFAULT] PRIORITY='- JOBCOUNT'
SCHEDCFG[base] SERVER=cluster0:42559
ADMINCFG[1] USERS=root
RMCFG[base] TYPE=PBS
RMCFG[base] SBINDIR=/opt/torque/sbin
---------------------
My script:
-------------------
#!/bin/bash
#PBS -j oe
#PBS -l nodes=8:ppn=1
#PBS -W x=NACCESSPOLICY:SINGLEJOB
#PBS -N TestJob
#PBS -q medium
#PBS -o output.txt
date
mpiexec --bynode /data/andrus/hello
sleep 10
---------------------
My qstat -f
-------------------------
Job Id: 2623.cluster0.default.domain
Job_Name = TestJob
Job_Owner = andrus at cluster0.default.domain
job_state = R
queue = medium
server = cluster0.default.domain
Checkpoint = u
ctime = Mon Oct 29 11:27:45 2007
Error_Path =
cluster0.default.domain:/users/andrus/data/TestJob.e2623
exec_host = n1/1+n1/0
Hold_Types = n
Join_Path = oe
Keep_Files = n
Mail_Points = a
mtime = Mon Oct 29 11:27:46 2007
Output_Path = cluster0:/users/andrus/data/output.txt
Priority = 0
qtime = Mon Oct 29 11:27:45 2007
Rerunable = True
Resource_List.mem = 768mb
Resource_List.ncpus = 2
Resource_List.nodect = 8
Resource_List.nodes = 8:ppn=1
Resource_List.walltime = 04:00:00
session_id = 25489
Variable_List = PBS_O_HOME=/users/andrus,PBS_O_LANG=en_US.UTF-8,
PBS_O_LOGNAME=andrus,
PBS_O_PATH=/opt/cwx/bin:/usr/totalview/bin/:/opt/torque/bin:/opt/pgi/
linux86-64/7.0-7/bin:/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/u
sr/X11R6/bin:/opt/gm/bin:/usr/local/ncarg/bin:/usr/openv/netbackup/bin
:/users/andrus/bin,PBS_O_MAIL=/var/spool/mail/andrus,
PBS_O_SHELL=/bin/bash,PBS_O_HOST=cluster0.default.domain,
PBS_O_WORKDIR=/users/andrus/data,PBS_O_QUEUE=medium
etime = Mon Oct 29 11:27:45 2007
x = NACCESSPOLICY:SINGLEJOB
-------------------------
Notice: It is only executing on n1/1+n1/0. I am requesting 8 nodes, but
it only runs on 1. It has 8 nodes in the resource list, but does not run
it on them all. If I use the default torque scheduler, it works as
expected.
Anyone have any ideas??
Brian Andrus perotsystems
Site Manager | Sr. Computer Scientist
Naval Research Lab
7 Grace Hopper Ave, Monterey, CA 93943
Phone (831) 656-4839 | Fax (831) 656-4866
More information about the mauiusers
mailing list