[Moabusers] moab eval: cannot migrate job to PBS
Douglas Wightman
wightman at clusterresources.com
Wed Feb 14 12:46:38 MST 2007
Change following line in the moab.cfg file:
ADMINCFG[1] USERS=mgarcia,root
to:
ADMINCFG[1] USERS=root,mgarcia
Then start Moab as root.
Please let us know if this does not fix the problem.
- Douglas
On Wed, 2007-02-14 at 16:41 +0000, Marcelo Maia Garcia wrote:
> Hi.
>
> The OS is Red Hat EL AS 4 without updates. I downloaded the file
> "moab-5.0.0-i386.tar.gz".
>
> I installed the moab and now I trying to submitt a job, but I got the
> following error:
> =========================
> [mgarcia at node1 ~]$ msub mysub
> ERROR: cannot migrate job to PBS - cannot set grouplist,
> err=Operation not permitted
>
> [mgarcia at node1 ~]$ more mysub
> #!/bin/bash
>
> /bin/hostname
> =========================
> this job works fine when I submit using Torque-2.1.6.
>
> My moab.cfg is
> =========================
> [mgarcia at node1 ~]$ more /opt/moab/moab.cfg
> # This is the master configuration file for moab.cfg 5.0.0
> # Documentation can be found at:
> #
> # www.clusterresources.com/products/mwm/docs/moabadmin.shtml
> #
> # For a complete list of all parameters (including those below) please
> see:
> #
> # www.clusterresources.com/products/mwm/docs/a.fparameters.shtml
>
> ###############################################################################
> #
> #
> # See:
> www.clusterresources.com/products/mwm/docs/2.2initialconfig.shtml
> #
> # for more information on the initial configuration.
> #
> #
> #
> ###############################################################################
>
> SCHEDCFG[Moab] SERVER=node1.ocf.co.uk:42559
> #SCHEDCFG[Moab] MODE=MONITOR
> ADMINCFG[1] USERS=mgarcia,root
>
> ###############################################################################
> #
> #
> # See:
> www.clusterresources.com/products/mwm/docs/13.2rmconfiguration.shtml
> #
> # for more information on configuring a Resource Manager.
> #
> #
> #
> ###############################################################################
>
> RMCFG[base] TYPE=PBS
> RMCFG[base] SBINDIR=/usr/local/torque-2.1.6/sbin
>
> SRCFG[base] GROUPLIST=users
> GROUPCFG[users] MAXJOB=50
> [mgarcia at node1 ~]$
> =========================
>
> In the moab.log I have the following messages:
> =========================
> (...)
> 02/14 10:23:56 INFO: queue is empty or cannot get PBS job info
> 10:24:05 1171448646 sched Moab SCHEDSTOP 15
> 02/14 10:24:11 ERROR: cannot update lockfile '/opt/moab/.moab.pid',
> errno: 13 (Permission denied) 02/14 10:24:11 INFO: OS stack limits
> increased from 10 MB to 4095 MB (use 'ulimit' to adjust) 02/14
> 10:24:11 WARNING: cannot open statfile
> '/opt/moab/stats/events.Wed_Feb_14_2007', errno: 13 (Permission
> denied) 02/14 10:24:12 WARNING: cannot bind to port 15004, errno: 98
> (Address already in use)
> 02/14 10:24:12 WARNING: cannot create statfile
> '/opt/moab/stats/DAY.Tue_Feb_13_2007'
> 02/14 10:24:12 WARNING: cannot record MONTH stats
> 02/14 10:24:12 INFO: queue is empty or cannot get PBS job info
> 02/14 10:24:20 ALERT: no job ID detected
> 02/14 10:24:43 INFO: queue is empty or cannot get PBS job info
> (...)
> =========================
> it seems that moab is not interacting with my torque installation.
>
> When I try to submit a job, the message in the log file is
> =========================
> (...)
> 02/14 10:50:33 INFO: queue is empty or cannot get PBS job info
> 02/14 10:51:03 ALERT: no job ID detected
> 02/14 10:51:03 WARNING: cannot set job 'Moab.1' attr 'comment:NULL'
> to 'cannot set grouplist, err=Operation not permitted' (rc: -1
> 'modification of specified attribute not supported')
> 02/14 10:51:04 INFO: queue is empty or cannot get PBS job info
> (...)
> =========================
>
> I think the installation is ok:
> =========================
> [mgarcia at node1 ~]$ showq
>
> active jobs------------------------
> JOBID USERNAME STATE PROC REMAINING
> STARTTIME
>
>
> 0 active jobs 0 of 1 processors in use by local jobs
> (0.00%)
>
> eligible jobs----------------------
> JOBID USERNAME STATE PROC WCLIMIT
> QUEUETIME
>
>
> 0 eligible jobs
>
> blocked jobs-----------------------
> JOBID USERNAME STATE PROC WCLIMIT
> QUEUETIME
>
>
> 0 blocked jobs
>
> Total jobs: 0
>
> [mgarcia at node1 ~]$
> =========================
>
> My Torque configuration is:
> =========================
> [mgarcia at node1 ~]$ qmgr -c "print server"
> #
> # Create queues and set their attributes.
> #
> #
> # Create and define queue batch
> #
> create queue batch
> set queue batch queue_type = Execution
> set queue batch acl_host_enable = True
> set queue batch acl_hosts = node1.ocf.co.uk+node2.ocf.co.uk
> set queue batch acl_user_enable = True
> set queue batch acl_users = mgarcia at node1.ocf.co.uk
> set queue batch resources_default.nodes = 1
> set queue batch resources_default.walltime = 01:00:00
> set queue batch acl_group_enable = True
> set queue batch acl_groups = users
> set queue batch keep_completed = 20
> set queue batch enabled = True
> set queue batch started = True
> #
> # Set server attributes.
> #
> set server scheduling = True
> set server managers = mgarcia at node1.ocf.co.uk
> set server operators = mgarcia at node1.ocf.co.uk
> set server default_queue = batch
> set server log_events = 511
> set server mail_from = adm
> set server scheduler_iteration = 600
> set server node_check_rate = 150
> set server tcp_timeout = 6
> set server pbs_version = 2.1.6
> set server submit_hosts = node1
> set server submit_hosts += node2
> [mgarcia at node1 ~]$
> =========================
>
> What could be wrong?
>
> Thanks for your attention
>
> Marcelo M. Garcia
>
> _______________________________________________
> moabusers mailing list
> moabusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/moabusers
More information about the moabusers
mailing list