[Mauiusers] MAUI not responding - "lost connection to server"
Adrian Sevcenco
Adrian.Sevcenco at cern.ch
Tue Dec 16 07:09:07 MST 2008
Gianfranco Sciacca wrote:
> Adrian Sevcenco wrote:
>> Greenseid, Joseph M. wrote:
>>
>>> it says ok for when it is starting up. does it not actually start? is
>>> there a maui process running after you do this?
>>>
>> yes, it has a process but when i try to do any command related to maui i
>> have :
>> [root at grid01 log]# checkjob 2
>> ERROR: lost connection to server
>> ERROR: cannot request service (status)
>> I attached the log(9) of starting maui.
>> Can somebody see the problem there?
>> Thank you,
>> Adrian
>>
> Adrian, are you running nscd per chance? We have noticed on many of our
> clients and servers that the nscd process tends to go haywire from time
> to time and cause all sort of problems, including the one you mention.
> The tell-tale would be nscd using 100% CPU on your grid01 machine.
> Perhaps not your case, but worth checking.
Hi and thanks for the tip but we don't have nscd on this machine.
Adrian
> cheers,
> Gianfranco
>>
>>>
>>> --Joe
>>>
>>> ------------------------------------------------------------------------
>>> *From:* mauiusers-bounces at supercluster.org on behalf of Adrian Sevcenco
>>> *Sent:* Mon 12/15/2008 12:56 PM
>>> *To:* mauiusers at supercluster.org
>>> *Subject:* [Mauiusers] MAUI not responding - "lost connection to server"
>>>
>>> Hi,
>>> I have a strange situation :
>>> when i try to restart the maui server i have :
>>> [root at grid01 /]# service maui restart
>>> Shutting down MAUI Scheduler: ERROR: lost connection to server
>>> ERROR: cannot request service (status)
>>> [FAILED]
>>> Starting MAUI Scheduler: [ OK ]
>>>
>>> The same with firewall down.
>>> as configuration i have this :
>>>
>>> [root at grid01 maui]# cat maui.cfg
>>> # MAUI configuration example
>>>
>>> SERVERHOST grid01.spacescience.ro
>>> ADMIN1 root
>>> ADMIN3 edginfo rgma edguser
>>> ADMINHOSTS grid01.spacescience.ro
>>> RMCFG[base] TYPE=PBS
>>> SERVERPORT 40559
>>> SERVERMODE NORMAL
>>>
>>> # Set PBS server polling interval. If you have short # queues or/and
>>> jobs it is worth to set a short interval. (10 seconds)
>>>
>>> RMPOLLINTERVAL 00:00:10
>>>
>>> # a max. 10 MByte log file in a logical location
>>>
>>> LOGFILE /var/log/maui.log
>>> LOGFILEMAXSIZE 10000000
>>> LOGLEVEL 1
>>>
>>> # Set the delay to 1 minute before Maui tries to run a job again, # in
>>> case it failed to run the first time.
>>> # The default value is 1 hour.
>>>
>>> DEFERTIME 00:01:00
>>>
>>> # Necessary for MPI grid jobs
>>> ENABLEMULTIREQJOBS TRUE
>>>
>>> Any ideas why it is not working? how can i debug this further?
>>> is there a requirement of something to be in /etc/hosts ?
>>> Thank you,
>>> Adrian
>>>
>>>
>>
>>
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> mauiusers mailing list
>> mauiusers at supercluster.org
>> http://www.supercluster.org/mailman/listinfo/mauiusers
>>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 3105 bytes
Desc: S/MIME Cryptographic Signature
Url : http://www.supercluster.org/pipermail/mauiusers/attachments/20081216/163bba0e/smime.bin
More information about the mauiusers
mailing list