<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
  <meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">
</head>
<body bgcolor="#ffffff" text="#000000">
I think I have a good overview over the problem now. The scheduler
seems to die when I add hundreds of jobs at once. I had to restart the
scheduler and then the server to get things running again. <br>
<br>
I guess torque isn't meant to receive hundreds of job requests at a
time. <br>
<br>
Motin skrev:
<blockquote cite="mid45E96B75.5040206@demomusic.nu" type="cite">
  <pre wrap="">My queue of around 1400 jobs is totally stalled. In the full job listing 
all jobs are listed with status "Q"...

I have tried qrun but I find it lacking the option "force run next one
up for running in queue", which would make it easier to use. 

Still, qrun is only a cure to the symptoms not the problems. The queue
runs great when there are appr maximum 300 queue items. After that, it
borks and refuses to run jobs most of the time. Sometimes however, it
sets of jobs - very seldomly though.

Here are my logs, but they are rather strange, can you make any sense
out of them?

I added items up to around 03/02/2007 09:24, then paused adding, then
added again at 03/02/2007 09:59

03/02/2007 09:23:37;0040; pbs_sched;Job;5431.tiger001;Not enough cpus
available
03/02/2007 09:23:37;0080; pbs_sched;Svr;main;brk point 167481344
03/02/2007 09:23:38;0040; pbs_sched;Job;5432.tiger001;Not enough cpus
available
03/02/2007 09:23:38;0040; pbs_sched;Job;5433.tiger001;Not enough cpus
available
03/02/2007 09:23:38;0040; pbs_sched;Job;5434.tiger001;Not enough cpus
available
03/02/2007 09:23:39;0040; pbs_sched;Job;5435.tiger001;Not enough cpus
available
03/02/2007 09:23:39;0040; pbs_sched;Job;5436.tiger001;Not enough cpus
available
03/02/2007 09:23:39;0040; pbs_sched;Job;5437.tiger001;Not enough cpus
available
03/02/2007 09:23:39;0080; pbs_sched;Svr;main;brk point 167485440
03/02/2007 09:23:41;0040; pbs_sched;Job;5438.tiger001;Not enough cpus
available
03/02/2007 09:23:41;0040; pbs_sched;Job;5439.tiger001;Not enough cpus
available
03/02/2007 09:23:41;0040; pbs_sched;Job;5440.tiger001;Not enough cpus
available
03/02/2007 09:23:41;0040; pbs_sched;Job;5441.tiger001;Not enough cpus
available
03/02/2007 09:23:41;0040; pbs_sched;Job;5442.tiger001;Not enough cpus
available
03/02/2007 09:23:41;0080; pbs_sched;Svr;main;brk point 167862272
03/02/2007 09:23:41;0040; pbs_sched;Job;5443.tiger001;Not enough cpus
available
03/02/2007 09:23:42;0040; pbs_sched;Job;5444.tiger001;Not enough cpus
available
03/02/2007 09:23:42;0040; pbs_sched;Job;5445.tiger001;Not enough cpus
available
03/02/2007 09:23:42;0080; pbs_sched;Svr;main;brk point 167923712
03/02/2007 09:23:43;0040; pbs_sched;Job;5446.tiger001;Not enough cpus
available
03/02/2007 09:23:43;0040; pbs_sched;Job;5447.tiger001;Not enough cpus
available
03/02/2007 09:23:43;0040; pbs_sched;Job;5448.tiger001;Not enough cpus
available
03/02/2007 09:23:46;0040; pbs_sched;Job;5449.tiger001;Not enough cpus
available
03/02/2007 09:23:46;0040; pbs_sched;Job;5450.tiger001;Not enough cpus
available
03/02/2007 09:23:46;0040; pbs_sched;Job;5451.tiger001;Not enough cpus
available
03/02/2007 09:23:46;0040; pbs_sched;Job;5452.tiger001;Not enough cpus
available
03/02/2007 09:23:46;0080; pbs_sched;Svr;main;brk point 167927808
03/02/2007 09:24:06;0040; pbs_sched;Job;3895.tiger001;Job Run
03/02/2007 09:24:06;0040; pbs_sched;Job;5453.tiger001;Not enough cpus
available
03/02/2007 09:34:06;0040; pbs_sched;Job;3896.tiger001;Job Run
03/02/2007 09:34:07;0040; pbs_sched;Job;3897.tiger001;Job Run
03/02/2007 09:34:07;0080; pbs_sched;Svr;main;brk point 167931904
03/02/2007 09:44:07;0040; pbs_sched;Job;3898.tiger001;Job Run
03/02/2007 09:54:07;0040; pbs_sched;Job;3899.tiger001;Job Run
03/02/2007 09:54:08;0040; pbs_sched;Job;3900.tiger001;Job Run
03/02/2007 09:54:08;0080; pbs_sched;Svr;main;brk point 167936000
03/02/2007 09:54:38;0040; pbs_sched;Job;3901.tiger001;Job Run
03/02/2007 09:59:46;0040; pbs_sched;Job;3902.tiger001;Job Run
03/02/2007 09:59:46;0040; pbs_sched;Job;5454.tiger001;Not enough cpus
available
03/02/2007 09:59:46;0040; pbs_sched;Job;3903.tiger001;Job Run
03/02/2007 09:59:46;0040; pbs_sched;Job;5455.tiger001;Not enough cpus
available
03/02/2007 09:59:46;0040; pbs_sched;Job;5456.tiger001;Not enough cpus
available
03/02/2007 09:59:46;0040; pbs_sched;Job;5457.tiger001;Not enough cpus
available
03/02/2007 09:59:46;0040; pbs_sched;Job;5458.tiger001;Not enough cpus
available
03/02/2007 09:59:46;0040; pbs_sched;Job;5459.tiger001;Not enough cpus
available
03/02/2007 09:59:46;0080; pbs_sched;Svr;main;brk point 168206336
03/02/2007 09:59:47;0040; pbs_sched;Job;3904.tiger001;Job Run
03/02/2007 09:59:47;0040; pbs_sched;Job;5460.tiger001;Not enough cpus
available
03/02/2007 09:59:47;0040; pbs_sched;Job;5461.tiger001;Not enough cpus
available
03/02/2007 09:59:47;0040; pbs_sched;Job;5462.tiger001;Not enough cpus
available
03/02/2007 09:59:47;0040; pbs_sched;Job;5463.tiger001;Not enough cpus
available
03/02/2007 09:59:47;0040; pbs_sched;Job;5464.tiger001;Not enough cpus
available
03/02/2007 09:59:47;0040; pbs_sched;Job;5465.tiger001;Not enough cpus
available
03/02/2007 09:59:47;0080; pbs_sched;Svr;main;brk point 168222720
03/02/2007 09:59:47;0040; pbs_sched;Job;3905.tiger001;Job Run
03/02/2007 09:59:47;0040; pbs_sched;Job;5466.tiger001;Not enough cpus
available
03/02/2007 09:59:47;0040; pbs_sched;Job;5467.tiger001;Not enough cpus
available
03/02/2007 09:59:47;0040; pbs_sched;Job;5468.tiger001;Not enough cpus
available
03/02/2007 09:59:47;0040; pbs_sched;Job;5469.tiger001;Not enough cpus
available
03/02/2007 09:59:47;0040; pbs_sched;Job;5470.tiger001;Not enough cpus
available
03/02/2007 09:59:47;0040; pbs_sched;Job;5471.tiger001;Not enough cpus
available
03/02/2007 09:59:47;0080; pbs_sched;Svr;main;brk point 168312832
03/02/2007 09:59:48;0040; pbs_sched;Job;3906.tiger001;Job Run

First queue items are:
3899.tiger001     flv_info6360     www-data               0 Q batch
3900.tiger001     flv_info6357     www-data               0 Q batch
3901.tiger001     flv_info6357     www-data               0 Q batch

Last items are:
5442.tiger001     flv_info41       www-data               0 Q batch
5443.tiger001     flv_info41       www-data               0 Q batch
5444.tiger001     flv_info10       www-data               0 Q batch
5445.tiger001     flv_info10       www-data               0 Q batch
5446.tiger001     flv_info10       www-data               0 Q batch
5447.tiger001     flv_info10       www-data               0 Q batch
5448.tiger001     flv_info10       www-data               0 Q batch
5449.tiger001     hiflv_in7        www-data               0 Q batch
5450.tiger001     hiflv_in7        www-data               0 Q batch
5451.tiger001     hiflv_in7        www-data               0 Q batch
5452.tiger001     hiflv_in7        www-data               0 Q batch
5453.tiger001     hiflv_in7        www-data               0 Q batch

Tim Miller skrev:

  </pre>
  <blockquote type="cite">
    <blockquote type="cite">
      <pre wrap="">One can always do qrun &lt;jobid&gt; (at least with pbs_sched -- I've never
used Maui). Have you looked at the full job listing and scheduler logs
to determine why the jobs aren't running?

Best,
Tim

Motin wrote:
      </pre>
    </blockquote>
    <pre wrap="">  
    </pre>
    <blockquote type="cite">
      <blockquote type="cite">
        <blockquote type="cite">
          <pre wrap="">Sometimes the queue just sits there, without running any jobs. The
machine is by no mean overloaded, only sparsely used. How can one force
the machine to run the available jobs in the queue?
          </pre>
        </blockquote>
      </blockquote>
      <pre wrap="">    


      </pre>
    </blockquote>
    <pre wrap="">  
    </pre>
  </blockquote>
  <pre wrap=""><!---->
_______________________________________________
torqueusers mailing list
<a class="moz-txt-link-abbreviated" href="mailto:torqueusers@supercluster.org">torqueusers@supercluster.org</a>
<a class="moz-txt-link-freetext" href="http://www.supercluster.org/mailman/listinfo/torqueusers">http://www.supercluster.org/mailman/listinfo/torqueusers</a>


  </pre>
  <pre wrap="">
<hr size="4" width="90%">
_______________________________________________
torqueusers mailing list
<a class="moz-txt-link-abbreviated" href="mailto:torqueusers@supercluster.org">torqueusers@supercluster.org</a>
<a class="moz-txt-link-freetext" href="http://www.supercluster.org/mailman/listinfo/torqueusers">http://www.supercluster.org/mailman/listinfo/torqueusers</a>
  </pre>
</blockquote>
<br>
</body>
</html>