<div class="gmail_quote">Hi,</div><div class="gmail_quote"><br></div><div class="gmail_quote">I sent an email to <meta http-equiv="content-type" content="text/html; charset=utf-8"><a href="mailto:mauiusers@supercluster.org">mauiusers@supercluster.org</a> but I think my issue is more about Torque then Maui so I&#39;m forwarding it here. Also, I discovered something else while trying to diagnose my problem.<br>
<div><br></div><div>I&#39;ve been trying to setup a Torque+Maui without success.</div><div><br></div><div>The cluster here is composed of 70 nodes connected with InfiniBand (IB) and 5 nodes without IB but better processors. Each node have 2 quad-core processors with hyperthreading enabled for a total of 16 cores per node.</div>

<div><br></div><div>I want to setup a queue to run on the non-IB nodes which I call the supernodes. There is 80 cores on these. Here is Torque&#39;s queue config:</div><div><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204, 204, 204);border-left-style:solid;padding-left:1ex">

set queue supernodes queue_type = Execution</blockquote><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204, 204, 204);border-left-style:solid;padding-left:1ex">

set queue supernodes resources_available.nodes = 5</blockquote><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204, 204, 204);border-left-style:solid;padding-left:1ex">

set queue supernodes resources_default.neednodes = supernode</blockquote><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204, 204, 204);border-left-style:solid;padding-left:1ex">

set queue supernodes resources_default.walltime = 00:10:00</blockquote><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204, 204, 204);border-left-style:solid;padding-left:1ex">

set queue supernodes enabled = True</blockquote><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204, 204, 204);border-left-style:solid;padding-left:1ex">

set queue supernodes started = True</blockquote><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204, 204, 204);border-left-style:solid;padding-left:1ex">

set server scheduling = True</blockquote><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204, 204, 204);border-left-style:solid;padding-left:1ex">

set server acl_hosts = <a href="http://mydomain.com" target="_blank">mydomain.com</a></blockquote><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204, 204, 204);border-left-style:solid;padding-left:1ex">

set server operators  = <a href="mailto:root@mydomain.com" target="_blank">root@mydomain.com</a></blockquote><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204, 204, 204);border-left-style:solid;padding-left:1ex">

set server operators += <a href="mailto:me@mydomain.com" target="_blank">me@mydomain.com</a></blockquote><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204, 204, 204);border-left-style:solid;padding-left:1ex">

set server default_queue = supernodes</blockquote><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204, 204, 204);border-left-style:solid;padding-left:1ex">

set server log_events = 511</blockquote><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204, 204, 204);border-left-style:solid;padding-left:1ex">

set server mail_from = adm</blockquote><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204, 204, 204);border-left-style:solid;padding-left:1ex">

set server resources_available.ncpus = 1200</blockquote><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204, 204, 204);border-left-style:solid;padding-left:1ex">

set server resources_available.nodect = 75</blockquote><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204, 204, 204);border-left-style:solid;padding-left:1ex">

set server scheduler_iteration = 600</blockquote><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204, 204, 204);border-left-style:solid;padding-left:1ex">

set server node_check_rate = 150</blockquote><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204, 204, 204);border-left-style:solid;padding-left:1ex">

set server tcp_timeout = 6</blockquote><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204, 204, 204);border-left-style:solid;padding-left:1ex">

set server auto_node_np = False</blockquote><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204, 204, 204);border-left-style:solid;padding-left:1ex">

set server next_job_number = 90842</blockquote></div><div><br></div><div>The node file &quot;/var/spool/torque/server_priv/nodes&quot; contains (for the super nodes):</div><div><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204, 204, 204);border-left-style:solid;padding-left:1ex">

node101 np=16 supernode</blockquote><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204, 204, 204);border-left-style:solid;padding-left:1ex">

node102 np=16 supernode</blockquote><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204, 204, 204);border-left-style:solid;padding-left:1ex">

node103 np=16 supernode</blockquote><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204, 204, 204);border-left-style:solid;padding-left:1ex">

node104 np=16 supernode</blockquote><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204, 204, 204);border-left-style:solid;padding-left:1ex">

node105 np=16 supernode</blockquote></div><div><br></div><div>I can submit and run a job only if the number of cpu requested is less or equal then the number of cores of a single node: -l nodes=1:ppn=5. But if I request 5 cores on 2 different nodes (-l nodes=2:ppn=5) the job stays in a &quot;queued&quot; state indefinitely. Maui&#39;s log file does not seems to report anything suspicious, even at log level 7. Checkjob reports which nodes it is scheduled to run on (node104 and node105):</div>

<div><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204, 204, 204);border-left-style:solid;padding-left:1ex">
$ checkjob 90843</blockquote><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204, 204, 204);border-left-style:solid;padding-left:1ex">

checking job 90843</blockquote><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204, 204, 204);border-left-style:solid;padding-left:1ex">

<br></blockquote><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204, 204, 204);border-left-style:solid;padding-left:1ex">

State: Running</blockquote><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204, 204, 204);border-left-style:solid;padding-left:1ex">

Creds:  user:me  group:me  class:supernodes  qos:DEFAULT</blockquote><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204, 204, 204);border-left-style:solid;padding-left:1ex">

WallTime: 00:00:00 of 00:00:30</blockquote><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204, 204, 204);border-left-style:solid;padding-left:1ex">

SubmitTime: Tue Nov 30 18:04:11</blockquote><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204, 204, 204);border-left-style:solid;padding-left:1ex">

  (Time Queued  Total: 00:12:19  Eligible: 00:12:19)</blockquote><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204, 204, 204);border-left-style:solid;padding-left:1ex">

<br></blockquote><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204, 204, 204);border-left-style:solid;padding-left:1ex">

StartTime: Tue Nov 30 18:16:30</blockquote><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204, 204, 204);border-left-style:solid;padding-left:1ex">

Total Tasks: 10</blockquote><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204, 204, 204);border-left-style:solid;padding-left:1ex">

<br></blockquote><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204, 204, 204);border-left-style:solid;padding-left:1ex">

Req[0]  TaskCount: 10  Partition: DEFAULT</blockquote><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204, 204, 204);border-left-style:solid;padding-left:1ex">

Network: [NONE]  Memory &gt;= 0  Disk &gt;= 0  Swap &gt;= 0</blockquote><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204, 204, 204);border-left-style:solid;padding-left:1ex">

Opsys: [NONE]  Arch: [NONE]  Features: [supernode]</blockquote><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204, 204, 204);border-left-style:solid;padding-left:1ex">

Allocated Nodes:</blockquote><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204, 204, 204);border-left-style:solid;padding-left:1ex">

[node105:5][node104:5]</blockquote><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204, 204, 204);border-left-style:solid;padding-left:1ex">

<br></blockquote><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204, 204, 204);border-left-style:solid;padding-left:1ex">

<br></blockquote><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204, 204, 204);border-left-style:solid;padding-left:1ex">

IWD: [NONE]  Executable:  [NONE]</blockquote><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204, 204, 204);border-left-style:solid;padding-left:1ex">

Bypass: 0  StartCount: 739</blockquote><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204, 204, 204);border-left-style:solid;padding-left:1ex">

PartitionMask: [ALL]</blockquote><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204, 204, 204);border-left-style:solid;padding-left:1ex">

Reservation &#39;90843&#39; (00:00:00 -&gt; 00:00:30  Duration: 00:00:30)</blockquote><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204, 204, 204);border-left-style:solid;padding-left:1ex">

PE:  10.00  StartPriority:  12</blockquote></div><div><br></div><div>But then checking each node using checknode the node seems idle:</div><div><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204, 204, 204);border-left-style:solid;padding-left:1ex">

$ checknode -v node104</blockquote><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204, 204, 204);border-left-style:solid;padding-left:1ex">

checking node node104</blockquote><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204, 204, 204);border-left-style:solid;padding-left:1ex">

<br></blockquote><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204, 204, 204);border-left-style:solid;padding-left:1ex">

State:      Idle  (in current state for 00:00:00)</blockquote><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204, 204, 204);border-left-style:solid;padding-left:1ex">

Expected State:  Running   SyncDeadline: Tue Nov 30 18:27:34</blockquote><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204, 204, 204);border-left-style:solid;padding-left:1ex">

Configured Resources: PROCS: 16  MEM: 23G  SWAP: 23G  DISK: 1M</blockquote><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204, 204, 204);border-left-style:solid;padding-left:1ex">

Utilized   Resources: PROCS: 5</blockquote><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204, 204, 204);border-left-style:solid;padding-left:1ex">

Dedicated  Resources: PROCS: 5</blockquote><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204, 204, 204);border-left-style:solid;padding-left:1ex">

Opsys:       DEFAULT  Arch:      [NONE]</blockquote><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204, 204, 204);border-left-style:solid;padding-left:1ex">

Speed:      1.00  Load:       5.000</blockquote><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204, 204, 204);border-left-style:solid;padding-left:1ex">

Location:   Partition: DEFAULT  Frame/Slot:  1/1</blockquote><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204, 204, 204);border-left-style:solid;padding-left:1ex">

Network:    [DEFAULT]</blockquote><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204, 204, 204);border-left-style:solid;padding-left:1ex">

Features:   [supernode]</blockquote><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204, 204, 204);border-left-style:solid;padding-left:1ex">

Attributes: [Batch]</blockquote><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204, 204, 204);border-left-style:solid;padding-left:1ex">

Classes:    [supernodes 11:16]</blockquote><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204, 204, 204);border-left-style:solid;padding-left:1ex">

<br></blockquote><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204, 204, 204);border-left-style:solid;padding-left:1ex">

Total Time:   INFINITY  Up:   INFINITY (73.79%)  Active: 00:01:42 (0.00%)</blockquote><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204, 204, 204);border-left-style:solid;padding-left:1ex">

<br></blockquote><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204, 204, 204);border-left-style:solid;padding-left:1ex">

Reservations:</blockquote><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204, 204, 204);border-left-style:solid;padding-left:1ex">

  Job &#39;90843&#39;(x5)  00:00:00 -&gt; 00:00:30 (00:00:30)</blockquote><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204, 204, 204);border-left-style:solid;padding-left:1ex">

JobList:  90843</blockquote><blockquote class="gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-color:rgb(204, 204, 204);border-left-style:solid;padding-left:1ex">

<br></blockquote></div><div><br></div><div>Google does not say much about the &quot;ALERT:  jobs active on node but state is Idle&quot;. Looking at the pbs_mon log file on the two nodes, I get these:</div><div>node105:</div>
<div><blockquote class="gmail_quote" style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0.8ex; border-left-width: 1px; border-left-color: rgb(204, 204, 204); border-left-style: solid; padding-left: 1ex; ">
11/30/2010 20:10:09;0008;   pbs_mom;Req;send_sisters;sending ABORT to sisters</blockquote><blockquote class="gmail_quote" style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0.8ex; border-left-width: 1px; border-left-color: rgb(204, 204, 204); border-left-style: solid; padding-left: 1ex; ">
11/30/2010 20:10:09;0008;   pbs_mom;Job;<a href="http://90857.unicron.cl.uottawa.ca">90857.unicron.cl.uottawa.ca</a>;Job Modified at request of PBS_Server@mycluster</blockquote><blockquote class="gmail_quote" style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0.8ex; border-left-width: 1px; border-left-color: rgb(204, 204, 204); border-left-style: solid; padding-left: 1ex; ">
11/30/2010 20:10:09;0080;   pbs_mom;Svr;preobit_reply;top of preobit_reply</blockquote><blockquote class="gmail_quote" style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0.8ex; border-left-width: 1px; border-left-color: rgb(204, 204, 204); border-left-style: solid; padding-left: 1ex; ">
11/30/2010 20:10:09;0080;   pbs_mom;Svr;preobit_reply;DIS_reply_read/decode_DIS_replySvr worked, top of while loop</blockquote><blockquote class="gmail_quote" style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0.8ex; border-left-width: 1px; border-left-color: rgb(204, 204, 204); border-left-style: solid; padding-left: 1ex; ">
11/30/2010 20:10:09;0080;   pbs_mom;Svr;preobit_reply;cannot locate job that triggered req</blockquote><blockquote class="gmail_quote" style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0.8ex; border-left-width: 1px; border-left-color: rgb(204, 204, 204); border-left-style: solid; padding-left: 1ex; ">
11/30/2010 20:10:09;0080;   pbs_mom;Svr;preobit_reply;top of preobit_reply</blockquote><blockquote class="gmail_quote" style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0.8ex; border-left-width: 1px; border-left-color: rgb(204, 204, 204); border-left-style: solid; padding-left: 1ex; ">
11/30/2010 20:10:09;0080;   pbs_mom;Svr;preobit_reply;DIS_reply_read/decode_DIS_replySvr worked, top of while loop</blockquote><blockquote class="gmail_quote" style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0.8ex; border-left-width: 1px; border-left-color: rgb(204, 204, 204); border-left-style: solid; padding-left: 1ex; ">
11/30/2010 20:10:09;0080;   pbs_mom;Svr;preobit_reply;in while loop, no error from job stat</blockquote><blockquote class="gmail_quote" style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0.8ex; border-left-width: 1px; border-left-color: rgb(204, 204, 204); border-left-style: solid; padding-left: 1ex; ">
11/30/2010 20:10:09;0008;   pbs_mom;Job;90857.<meta http-equiv="content-type" content="text/html; charset=utf-8">mycluster;checking job post-processing routine</blockquote><blockquote class="gmail_quote" style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0.8ex; border-left-width: 1px; border-left-color: rgb(204, 204, 204); border-left-style: solid; padding-left: 1ex; ">
11/30/2010 20:10:09;0080;   pbs_mom;Job;90857.<meta http-equiv="content-type" content="text/html; charset=utf-8">mycluster;obit sent to server</blockquote><blockquote class="gmail_quote" style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0.8ex; border-left-width: 1px; border-left-color: rgb(204, 204, 204); border-left-style: solid; padding-left: 1ex; ">
11/30/2010 20:10:10;0001;   pbs_mom;Svr;pbs_mom;Bad UID for job execution (15023) in 90857.<meta http-equiv="content-type" content="text/html; charset=utf-8">mycluster, job_start_error from node <a href="http://10.0.0.104:15003">10.0.0.104:15003</a> in job_start_error</blockquote>
<blockquote class="gmail_quote" style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0.8ex; border-left-width: 1px; border-left-color: rgb(204, 204, 204); border-left-style: solid; padding-left: 1ex; ">
11/30/2010 20:10:10;0001;   pbs_mom;Svr;pbs_mom;Bad UID for job execution (15023) in 90857.<meta http-equiv="content-type" content="text/html; charset=utf-8">mycluster, abort attempted 16 times in job_start_error.  ignoring abort request from node <a href="http://10.0.0.104:15003">10.0.0.104:15003</a></blockquote>
<blockquote class="gmail_quote" style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0.8ex; border-left-width: 1px; border-left-color: rgb(204, 204, 204); border-left-style: solid; padding-left: 1ex; ">
[repeated many times, until I cancel the job] </blockquote></div><div>node104:</div><div><blockquote class="gmail_quote" style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0.8ex; border-left-width: 1px; border-left-color: rgb(204, 204, 204); border-left-style: solid; padding-left: 1ex; ">
11/30/2010 20:31:31;0008;   pbs_mom;Job;90858.mycluster;no group entry for group me, user=me, errno=0 (Success)</blockquote><blockquote class="gmail_quote" style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0.8ex; border-left-width: 1px; border-left-color: rgb(204, 204, 204); border-left-style: solid; padding-left: 1ex; ">
11/30/2010 20:31:31;0008;   pbs_mom;Job;90858.mycluster;ERROR:    received request &#39;ABORT_JOB&#39; from <a href="http://10.0.0.105:1023">10.0.0.105:1023</a> for job &#39;90858.mycluster&#39; (job does not exist locally)</blockquote>
<blockquote class="gmail_quote" style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0.8ex; border-left-width: 1px; border-left-color: rgb(204, 204, 204); border-left-style: solid; padding-left: 1ex; ">
[repeated many times, until I cancel the job]</blockquote></div><div><br></div><div><meta http-equiv="content-type" content="text/html; charset=utf-8">On the headnode, running &quot;groups&quot; as my username returns my username as the group, but on each nodes it returns 1009, the group ID. Could that be the problem? Using &quot;getent&quot; on the headnode I get:</div>
<meta http-equiv="content-type" content="text/html; charset=utf-8"><div><blockquote class="gmail_quote" style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0.8ex; border-left-width: 1px; border-left-color: rgb(204, 204, 204); border-left-style: solid; padding-left: 1ex; ">
$ getent passwd me</blockquote><blockquote class="gmail_quote" style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0.8ex; border-left-width: 1px; border-left-color: rgb(204, 204, 204); border-left-style: solid; padding-left: 1ex; ">
me:x:1001:1009:My Name:/home/me:/bin/bash </blockquote></div><div>while on a compute node I get:</div><blockquote class="gmail_quote" style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0.8ex; border-left-width: 1px; border-left-color: rgb(204, 204, 204); border-left-style: solid; padding-left: 1ex; ">
$ getent passwd me</blockquote><blockquote class="gmail_quote" style="margin-top: 0px; margin-right: 0px; margin-bottom: 0px; margin-left: 0.8ex; border-left-width: 1px; border-left-color: rgb(204, 204, 204); border-left-style: solid; padding-left: 1ex; ">
me:Password_checksum:1001:1009:My Name:/home/me:/bin/bash</blockquote><div><br></div><div>On either the headnode or compute nodes, using &quot;ypcat group&quot; returns nothing. Shouldn&#39;t that return the same groups as &quot;groups&quot; on the headnode?</div>
<div><br></div><div>I really don&#39;t know what to check next.</div><div><br></div><div>Thank you very much for any suggestion.</div><div><br>
</div><div>Regards,</div><div><br></div><font color="#888888"><div>Nicolas</div>
</font></div><br>