[Mauiusers] OpenPBS + Maui compatibility

help@supercluster.org help@supercluster.org
Sat, 19 Jan 2002 13:59:19 -0700 (MST)


Adrian,

  It appears PBS does not trust the ID/host under which Maui is
running.  A few questions/things to check:

  - Have you followed all steps contained within the PBS-Maui Integration
Guide at http://supercluster.org/documentation/maui/pbsintegration.html?
This guide covers configuration required to enable PBS MOM's to trust an
external scheduler.

  - Does 'diagnose -n' show all nodes marked down?  Does a node marked
down ever become available again?

  - Does the scheduler ever 'hang' or is it just unable to locate idle
resources because everyhing is marked down?  If it hangs, have you applied
the Sandia fault tolerance patches to PBS?

  - Do PBS commands experience or indicate any failures?

  - Did the problem always exist?  Has something changed?

  - Do nodes appear to get marked 'down' piecemeal or all at once?

Thanks






On Sat, 19 Jan 2002, Adrian Taga wrote:

>
> Hi,
>
> I have a working configuration of OpenPBS with the Maui scheduler.
> However, it seems to be some communication problems betwen Maui and pbs_mom.
> Shortly after I start up maui, warnings like this will appear in the output of diagnose -n:
> WARNING:  node 'sun1' is BUSY for 0:30:03 and CPU utilization is LOW.  load:  0.000 (check job 2621?)
> But the actual load is 2.00!
> The real problem is that after a while, Maui simply refuses to start more jobs!!
> I have to periodically restart Maui to make it work.
> What could be the problem?
>
> The PBS_MOM logfiles are full of messages like this:
>
> 11/22/2001 18:58:05;0100;   pbs_mom;Req;;Type 19 request received from PBS_Server@sun1, sock=11
> 11/22/2001 18:58:06;0001;   pbs_mom;Svr;pbs_mom;Error 0 (0) in rm_request, bad attempt to connect
>         message refused from port 63310 addr 130.238.196.254
> 11/22/2001 18:58:09;0001;   pbs_mom;Svr;pbs_mom;Error 0 (0) in rm_request, bad attempt to connect
>         message refused from port 63310 addr 130.238.196.254
> 11/22/2001 18:58:10;0001;   pbs_mom;Svr;pbs_mom;Error 0 (0) in rm_request, bad attempt to connect
>         message refused from port 63310 addr 130.238.196.254
> 11/22/2001 18:58:12;0001;   pbs_mom;Svr;pbs_mom;Error 0 (0) in rm_request, bad attempt to connect
>         message refused from port 63310 addr 130.238.196.254
> 11/22/2001 18:58:14;0001;   pbs_mom;Svr;pbs_mom;Error 0 (0) in rm_request, bad attempt to connect
>         message refused from port 63310 addr 130.238.196.254
>
> The MAUI log file says:
>
> 11/22 19:02:53 PBSQueryMOM(sun1,0)
> 11/22 19:02:53 ALERT:    cannot get req from MOM on host 'sun1' (errno: 0:5)
>
> I'm using PBS 2.3.12 and maui-3.0.7-p10 (final), both compiled with gcc 2.95.2 on solaris8.
> Could be these versions of maui and PBS are not compatible?
>
> I'm ataching the config files for pbs and maui, if this hepls.
>
> Thank you.
> /Adrian.
>