[Moabusers] moab eval: cannot migrate job to PBS
Marcelo Maia Garcia
marcelomgarcia at gmail.com
Wed Feb 14 09:41:43 MST 2007
Hi.
The OS is Red Hat EL AS 4 without updates. I downloaded the file "
moab-5.0.0-i386.tar.gz".
I installed the moab and now I trying to submitt a job, but I got the
following error:
=========================
[mgarcia at node1 ~]$ msub mysub
ERROR: cannot migrate job to PBS - cannot set grouplist, err=Operation
not permitted
[mgarcia at node1 ~]$ more mysub
#!/bin/bash
/bin/hostname
=========================
this job works fine when I submit using Torque-2.1.6.
My moab.cfg is
=========================
[mgarcia at node1 ~]$ more /opt/moab/moab.cfg
# This is the master configuration file for moab.cfg 5.0.0
# Documentation can be found at:
#
# www.clusterresources.com/products/mwm/docs/moabadmin.shtml
#
# For a complete list of all parameters (including those below) please see:
#
# www.clusterresources.com/products/mwm/docs/a.fparameters.shtml
###############################################################################
#
#
# See: www.clusterresources.com/products/mwm/docs/2.2initialconfig.shtml
#
# for more information on the initial
configuration. #
#
#
###############################################################################
SCHEDCFG[Moab] SERVER=node1.ocf.co.uk:42559
#SCHEDCFG[Moab] MODE=MONITOR
ADMINCFG[1] USERS=mgarcia,root
###############################################################################
#
#
# See: www.clusterresources.com/products/mwm/docs/13.2rmconfiguration.shtml
#
# for more information on configuring a Resource
Manager. #
#
#
###############################################################################
RMCFG[base] TYPE=PBS
RMCFG[base] SBINDIR=/usr/local/torque-2.1.6/sbin
SRCFG[base] GROUPLIST=users
GROUPCFG[users] MAXJOB=50
[mgarcia at node1 ~]$
=========================
In the moab.log I have the following messages:
=========================
(...)
02/14 10:23:56 INFO: queue is empty or cannot get PBS job info
10:24:05 1171448646 sched Moab SCHEDSTOP 15
02/14 10:24:11 ERROR: cannot update lockfile '*/opt/moab/*.moab.pid',
errno: 13 (Permission denied) 02/14 10:24:11 INFO: OS stack limits
increased from 10 MB to 4095 MB (use 'ulimit' to adjust) 02/14 10:24:11
WARNING: cannot open statfile '/opt/moab/stats/events.Wed_Feb_14_2007',
errno: 13 (Permission denied) 02/14 10:24:12 WARNING: cannot bind to port
15004, errno: 98 (Address already in use)
02/14 10:24:12 WARNING: cannot create statfile
'/opt/moab/stats/DAY.Tue_Feb_13_2007'
02/14 10:24:12 WARNING: cannot record MONTH stats
02/14 10:24:12 INFO: queue is empty or cannot get PBS job info
02/14 10:24:20 ALERT: no job ID detected
02/14 10:24:43 INFO: queue is empty or cannot get PBS job info
(...)
=========================
it seems that moab is not interacting with my torque installation.
When I try to submit a job, the message in the log file is
=========================
(...)
02/14 10:50:33 INFO: queue is empty or cannot get PBS job info
02/14 10:51:03 ALERT: no job ID detected
02/14 10:51:03 WARNING: cannot set job 'Moab.1' attr 'comment:NULL' to
'cannot set grouplist, err=Operation not permitted' (rc: -1 'modification of
specified attribute not supported')
02/14 10:51:04 INFO: queue is empty or cannot get PBS job info
(...)
=========================
I think the installation is ok:
=========================
[mgarcia at node1 ~]$ showq
active jobs------------------------
JOBID USERNAME STATE PROC REMAINING
STARTTIME
0 active jobs 0 of 1 processors in use by local jobs (0.00%)
eligible jobs----------------------
JOBID USERNAME STATE PROC WCLIMIT
QUEUETIME
0 eligible jobs
blocked jobs-----------------------
JOBID USERNAME STATE PROC WCLIMIT
QUEUETIME
0 blocked jobs
Total jobs: 0
[mgarcia at node1 ~]$
=========================
My Torque configuration is:
=========================
[mgarcia at node1 ~]$ qmgr -c "print server"
#
# Create queues and set their attributes.
#
#
# Create and define queue batch
#
create queue batch
set queue batch queue_type = Execution
set queue batch acl_host_enable = True
set queue batch acl_hosts = node1.ocf.co.uk+node2.ocf.co.uk
set queue batch acl_user_enable = True
set queue batch acl_users = mgarcia at node1.ocf.co.uk
set queue batch resources_default.nodes = 1
set queue batch resources_default.walltime = 01:00:00
set queue batch acl_group_enable = True
set queue batch acl_groups = users
set queue batch keep_completed = 20
set queue batch enabled = True
set queue batch started = True
#
# Set server attributes.
#
set server scheduling = True
set server managers = mgarcia at node1.ocf.co.uk
set server operators = mgarcia at node1.ocf.co.uk
set server default_queue = batch
set server log_events = 511
set server mail_from = adm
set server scheduler_iteration = 600
set server node_check_rate = 150
set server tcp_timeout = 6
set server pbs_version = 2.1.6
set server submit_hosts = node1
set server submit_hosts += node2
[mgarcia at node1 ~]$
=========================
What could be wrong?
Thanks for your attention
Marcelo M. Garcia
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/moabusers/attachments/20070214/3e4ce202/attachment.html
More information about the moabusers
mailing list