[Moabusers] moab eval: cannot migrate job to PBS

Marcelo Maia Garcia marcelomgarcia at gmail.com
Wed Feb 14 09:41:43 MST 2007


Hi.

 The OS is Red Hat EL AS 4 without updates. I downloaded the file "
moab-5.0.0-i386.tar.gz".

 I installed the moab and now I trying to submitt a job, but I got the
following error:
=========================
[mgarcia at node1 ~]$ msub mysub
ERROR:    cannot migrate job to PBS - cannot set grouplist, err=Operation
not permitted

[mgarcia at node1 ~]$ more mysub
#!/bin/bash

/bin/hostname
=========================
this job works fine when I submit using Torque-2.1.6.

My moab.cfg is
=========================
[mgarcia at node1 ~]$ more /opt/moab/moab.cfg
# This is the master configuration file for moab.cfg 5.0.0
# Documentation can be found at:
#
# www.clusterresources.com/products/mwm/docs/moabadmin.shtml
#
# For a complete list of all parameters (including those below) please see:
#
# www.clusterresources.com/products/mwm/docs/a.fparameters.shtml

###############################################################################

#
#
#  See: www.clusterresources.com/products/mwm/docs/2.2initialconfig.shtml
#
#  for more information on the initial
configuration.                         #
#
#
###############################################################################


SCHEDCFG[Moab]          SERVER=node1.ocf.co.uk:42559
#SCHEDCFG[Moab]         MODE=MONITOR
ADMINCFG[1]             USERS=mgarcia,root

###############################################################################

#
#
#  See: www.clusterresources.com/products/mwm/docs/13.2rmconfiguration.shtml
#
#  for more information on configuring a Resource
Manager.                    #
#
#
###############################################################################


RMCFG[base]             TYPE=PBS
RMCFG[base]             SBINDIR=/usr/local/torque-2.1.6/sbin

SRCFG[base]             GROUPLIST=users
GROUPCFG[users]         MAXJOB=50
[mgarcia at node1 ~]$
=========================

 In the moab.log I have the following messages:
=========================
(...)
02/14 10:23:56 INFO:     queue is empty or cannot get PBS job info
10:24:05 1171448646 sched    Moab         SCHEDSTOP    15
02/14 10:24:11 ERROR:    cannot update lockfile '*/opt/moab/*.moab.pid',
errno: 13 (Permission denied) 02/14 10:24:11 INFO:     OS stack limits
increased from 10 MB to 4095 MB (use 'ulimit' to adjust) 02/14 10:24:11
WARNING:  cannot open statfile '/opt/moab/stats/events.Wed_Feb_14_2007',
errno: 13 (Permission denied) 02/14 10:24:12 WARNING:  cannot bind to port
15004, errno: 98 (Address already in use)
02/14 10:24:12 WARNING:  cannot create statfile
'/opt/moab/stats/DAY.Tue_Feb_13_2007'
02/14 10:24:12 WARNING:  cannot record MONTH stats
02/14 10:24:12 INFO:     queue is empty or cannot get PBS job info
02/14 10:24:20 ALERT:    no job ID detected
02/14 10:24:43 INFO:     queue is empty or cannot get PBS job info
(...)
=========================
it seems that moab is not interacting with my torque installation.

When I try to submit a job, the message in the log file is
=========================
(...)
02/14 10:50:33 INFO:     queue is empty or cannot get PBS job info
02/14 10:51:03 ALERT:    no job ID detected
02/14 10:51:03 WARNING:  cannot set job 'Moab.1' attr 'comment:NULL' to
'cannot set grouplist, err=Operation not permitted' (rc: -1 'modification of
specified attribute not supported')
02/14 10:51:04 INFO:     queue is empty or cannot get PBS job info
(...)
=========================

 I think the installation is ok:
=========================
[mgarcia at node1 ~]$ showq

active jobs------------------------
JOBID              USERNAME      STATE  PROC   REMAINING
STARTTIME


0 active jobs               0 of 1 processors in use by local jobs (0.00%)

eligible jobs----------------------
JOBID              USERNAME      STATE  PROC     WCLIMIT
QUEUETIME


0 eligible jobs

blocked jobs-----------------------
JOBID              USERNAME      STATE  PROC     WCLIMIT
QUEUETIME


0 blocked jobs

Total jobs:  0

[mgarcia at node1 ~]$
=========================

 My Torque configuration is:
=========================
[mgarcia at node1 ~]$ qmgr -c "print server"
#
# Create queues and set their attributes.
#
#
# Create and define queue batch
#
create queue batch
set queue batch queue_type = Execution
set queue batch acl_host_enable = True
set queue batch acl_hosts = node1.ocf.co.uk+node2.ocf.co.uk
set queue batch acl_user_enable = True
set queue batch acl_users = mgarcia at node1.ocf.co.uk
set queue batch resources_default.nodes = 1
set queue batch resources_default.walltime = 01:00:00
set queue batch acl_group_enable = True
set queue batch acl_groups = users
set queue batch keep_completed = 20
set queue batch enabled = True
set queue batch started = True
#
# Set server attributes.
#
set server scheduling = True
set server managers = mgarcia at node1.ocf.co.uk
set server operators = mgarcia at node1.ocf.co.uk
set server default_queue = batch
set server log_events = 511
set server mail_from = adm
set server scheduler_iteration = 600
set server node_check_rate = 150
set server tcp_timeout = 6
set server pbs_version = 2.1.6
set server submit_hosts = node1
set server submit_hosts += node2
[mgarcia at node1 ~]$
=========================

 What could be wrong?

 Thanks for your attention

Marcelo M. Garcia
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/moabusers/attachments/20070214/3e4ce202/attachment.html


More information about the moabusers mailing list