[torqueusers] Torque on 1000 nodes ?
Garrick Staples
garrick at usc.edu
Thu Jun 30 14:20:52 MDT 2005
On Wed, Jun 29, 2005 at 12:33:34PM +0200, Ole Holm Nielsen alleged:
> We're considering whether to move our 900+ node Linux cluster to
> the Torque resource manager. However, we're unsure if Torque
> will work reliably on a cluster with this many nodes, since
> there may be all sorts of resource limits when the server
> has to communicate with ~1000 nodes. The Torque page says
> that it scales above 2500 nodes, but I'd be interested in
> real production experiences. My questions are:
>
> 1. Can anyone recommend for or against Torque on large clusters ?
These questions would have been a lot more interesting back in the OpenPBS
days :)
I can personally attest to Torque working just fine on 1700 nodes, whereas the
old OpenPBS code started having problems at 256 nodes.
Overall, it's lots of jobs that are a harder problem. Fortunately we've had
recent improvements in that area. I can now have 8 thousands queued jobs and a
few hundred running jobs without a problem.
> 2. What special tweaking must be done on large clusters ?
These aren't necessary, but keeps things running smoothly for me when thousands
of jobs are submitted.
These slow things down a wee bit.
set server node_ping_rate = 300
set server node_check_rate = 600
set server tcp_timeout = 6
These keep things responding well when thousands of jobs are submitted.
set server job_stat_rate = 45
set server poll_jobs = True
Both pbs_server and maui have the ability to trigger a scheduling iteration at
regular intervals. I think most people have maui "drive" the scheduling
iterations with an RMPOLLINTERNAL of 1 to 2 minutes. I find it better to have
pbs_server drive it because it's iteration timeout resets when a job is
submitted (which triggers an iteration); and it runs better when thousands of
jobs are submitted.
set server scheduler_iteration = 60 (1 minute)
RMPOLLINTERVAL 00:60:00 (in maui.cfg) (1 hour)
> 3. Does the Maui scheduler work reliably with Torque ?
Maui's limits are well understood and documented:
http://clusterresources.com/products/maui/docs/a.ddevelopment.shtml
I bump up these when building maui:
perl -pi -e 's/^#define MMAX_JOB .*/#define MMAX_JOB 8192/' include/msched.h
perl -pi -e 's/^#define MAX_MJOB .*/#define MAX_MJOB 8192/' include/msched.h
perl -pi -e 's/^#define MAX_MCLASS .*/#define MAX_MCLASS 32/' include/msched-common.h
(I think the docs are wrong regarding MMAX_JOB and MAX_MJOB)
> FYI, our cluster has fairly fast Pentium-4 nodes and
> Gigabit/100Mbit Ethernet (no Myrinet or other custom networks).
> The homepage is http://www.dcsc.dtu.dk/English/Niflheim.aspx
We have a mix of 32bit Xeons, 64bit Xeons, 32bit Opterons, 64bit Opterons, and
PIIIs, with and without Myrinet.
--
Garrick Staples, Linux/HPCC Administrator
University of Southern California
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20050630/d4c9813e/attachment.bin
More information about the torqueusers
mailing list