[Moabusers] RE table is corrupt

Brock Palen brockp at umich.edu
Thu Jan 17 09:03:29 MST 2008


Dave,
Here is the output:

Moab Server 'Moab' running on nyx.engin.umich.edu:42559  (Mode: NORMAL)
   Build Info:  64,MCOMMTHREAD
   Process Info:  pid:14533 uid:0 euid:0 gid:0 egid:0
   RM MODULES: SSS,WIKI,NATIVE,PBS
   Load(5m)  Sched: 9.99%  RMAction: 8.39%  RMQuery: 25.17%  User:  
0.00%  Idle: 56.45%
   Load(24h) Sched: 10.44%  RMAction: 9.14%  RMQuery: 26.64%  User:  
0.00%  Idle: 53.78%
   Total Memory Size:  544 MB
WARNING:  excessive memory in use (544 MB) - restart Moab?
   PollInterval: 00:01:30  (Avg Sched Interval: 00:00:35  Iterations:  
1020)
   JobStarts: 1769  (Avg Starts/Iteration: 1.73  Last Iteration: 0)
   Object Specs:        Class=50  GRes=512/512  Job=20480/1024   
Node=5120  Par=31  Range=256  RM=16  Rsv=4096  UIBuffer=2MB  User=1792
   Message:  profiling enabled (22 of 50 samples/00:30:00 interval)

The problem has passed though, the only overlap was that a user was  
bulk submitting about 1800 jobs, and torque was running very slow.   
We had noticed this behavior with qstat, qsub and qmgr being slow to  
respond.  We are now testing some of the advice from:

http://www.clusterresources.com/wiki/doku.php? 
id=torque:appendix:f_large_cluster_considerations

We are at 608 nodes installed on that cluster.
Also 544MB isnt right, according to top Moab is using 715MB (1326MB  
allocated)

Thanks

Brock Palen
Center for Advanced Computing
brockp at umich.edu
(734)936-1985


On Jan 17, 2008, at 9:03 AM, Dave Jackson wrote:

> Brock,
>
>   Do you get any warnings of interest from 'mdiag -S -v'?
>
> Dave
>
> On Thu, 2008-01-17 at 01:02 -0500, Brock Palen wrote:
>> A bunch of our nodes just were marked down and not sure why, torque
>> (qmgr) still thinks they are up, the machines are up, moab thinks
>> they are down and i see lots of:
>>
>> 01/17 00:31:55 ALERT:    node nyx539 RE table is corrupt.  RE[6]
>> 'rmfailure.3586' at -00:00:57 is out of time order
>>
>> messages,  Is there a way to fix this?
>>
>>
>> Brock Palen
>> Center for Advanced Computing
>> brockp at umich.edu
>> (734)936-1985
>>
>>
>> _______________________________________________
>> moabusers mailing list
>> moabusers at supercluster.org
>> http://www.supercluster.org/mailman/listinfo/moabusers
>
>
>



More information about the moabusers mailing list