<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
<META NAME="Generator" CONTENT="MS Exchange Server version 6.0.6556.0">
<TITLE>RMFailure</TITLE>
</HEAD>
<BODY>
<!-- Converted from text/rtf format -->
<BR>
<P><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial">Hi,</FONT></SPAN>
</P>
<P><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial">I am running torque-1.0.1p6 and maui-3.2.6p6. When I stress maui by submitting 25-50 jobs at time, frequently, jobs get stuck in the queues. Maui sees that there are jobs in the queue as detected in the maui.log file, but it never seems to execute them. When performing a "checkjob -v job#" command, I get an RMFailure message as the reason the job cannot be executed. An example of the output of the CHECKJOB command follows. Has anyone seen this problem? Also, how can the job be requeued? The rerun command fails to rerun the job and the status remains the same.</FONT></SPAN></P>
<P><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial">-------------------------------------------------------------------------------------------</FONT></SPAN>
</P>
<P><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial"></FONT> <FONT SIZE=2 FACE="Arial Unicode MS">checking job 153 (RM job '153.resslnxc1-b.res.phar</FONT><FONT SIZE=2 FACE="Arial Unicode MS Western">Ճ')</FONT></SPAN>
</P>
<P><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial Unicode MS">State: Idle EState: Deferred</FONT></SPAN>
<BR><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial Unicode MS">Creds: user:ssamuels group:ssamuels class:test qos:DEFAULT</FONT></SPAN>
<BR><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial Unicode MS">WallTime: 00:00:00 of 99:23:59:59</FONT></SPAN>
<BR><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial Unicode MS">SubmitTime: Wed Sep 1 15:27:25</FONT></SPAN>
<BR><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial Unicode MS"> (Time Queued Total: 00:50:14 Eligible: 00:00:05)</FONT></SPAN>
</P>
<P><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial Unicode MS">StartDate: -00:50:08 Wed Sep 1 15:27:31</FONT></SPAN>
<BR><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial Unicode MS">Total Tasks: 1</FONT></SPAN>
</P>
<P><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial Unicode MS">Req[0] TaskCount: 1 Partition: ALL</FONT></SPAN>
<BR><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial Unicode MS">Network: [NONE] Memory >= 0 Disk >= 0 Swap >= 0</FONT></SPAN>
<BR><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial Unicode MS">Opsys: [NONE] Arch: [NONE] Features: [NONE]</FONT></SPAN>
<BR><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial Unicode MS">Exec: '' ExecSize: 0 ImageSize: 0</FONT></SPAN>
<BR><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial Unicode MS">Dedicated Resources Per Task: PROCS: 1</FONT></SPAN>
<BR><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial Unicode MS">NodeAccess: SHARED</FONT></SPAN>
<BR><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial Unicode MS">NodeCount: 0</FONT></SPAN>
</P>
<BR>
<P><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial Unicode MS">IWD: [NONE] Executable: [NONE]</FONT></SPAN>
<BR><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial Unicode MS">Bypass: 0 StartCount: 1</FONT></SPAN>
<BR><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial Unicode MS">PartitionMask: [ALL]</FONT></SPAN>
<BR><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial Unicode MS">Flags: RESTARTABLE</FONT></SPAN>
</P>
<P><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial Unicode MS">job is deferred. Reason: RMFailure (job cannot be started - cannot set hostlist)</FONT></SPAN>
<BR><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial Unicode MS">Holds: Defer (hold reason: RMFailure)</FONT></SPAN>
<BR><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial Unicode MS">PE: 1.00 StartPriority: 1</FONT></SPAN>
<BR><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial Unicode MS">cannot select job 153 for partition DEFAULT (job hold active)</FONT></SPAN>
</P>
<P><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial Unicode MS">============================================================================</FONT></SPAN>
</P>
<P><SPAN LANG="en-us"><B><I><FONT FACE="Script MT Bold"></FONT></I> <FONT FACE="Script MT Bold"> <U> </U></FONT><U><FONT COLOR="#0000FF" FACE="Script MT Bold">Stewart Samuels</FONT></U></B></SPAN>
<BR><SPAN LANG="en-us"><B><FONT FACE="Script MT Bold"> Technical Advisor</FONT></B></SPAN>
<BR><SPAN LANG="en-us"><B><FONT FACE="Script MT Bold"> Global Unix Engineering Services </FONT></B> <FONT SIZE=2 FACE="Arial"> </FONT><B></B><B></B><B> </B></SPAN>
<BR><SPAN LANG="en-us"><B> <FONT FACE="Script MT Bold"> 1041 Route 202-206 </FONT></B></SPAN>
<BR><SPAN LANG="en-us"><B><FONT FACE="Script MT Bold"> Bridgewater, NJ 08807</FONT></B></SPAN>
</P>
<P><SPAN LANG="en-us"><B><FONT FACE="Script MT Bold"> </FONT><FONT COLOR="#0000FF" FACE="Script MT Bold">(908) 231-4762</FONT></B></SPAN>
<BR><SPAN LANG="en-us"><B><FONT COLOR="#0000FF" FACE="Script MT Bold"> Stewart.Samuels@Aventis.com</FONT></B></SPAN>
</P>
</BODY>
</HTML>