<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html;charset=Big5" http-equiv="Content-Type">
<title></title>
</head>
<body bgcolor="#ffffff" text="#000000">
<font face="斢X索X明體">Hi,<br>
<br>
I just benchmarked the performance of program VASP on our dual-core
dual-opteron <br>
cluster with gigabit ethernet switch, and I found something difficult
to explain:<br>
<br>
</font>
<pre wrap="">Using 1 node
Total CPU time/Elapsed time
1 core 2 cores 4 cores
interactive mode 5399s/5435s 3318s/3442s 1747s/2071s
batch modeh TORQUE 4903s/4905s 3727s/3922s 1698s/1899s
Using 2 nodes
Total CPU time/Elapsed time
2 cores 4 cores 8 cores
interactive mode 3169s/3608s 1718s/2019s 904s/1384s
batch modeh TORQUE 2704s/4248s 1617s/2271s 906s/1370s
Using 4 nodes
Total CPU time/Elapsed time
4 cores 8 cores 16 cores
interactive mode 1672s/1860s 791s/1096s 518s/1484s
batch modeh TORQUE 1366s/2482s 820s/2975s 548s/3178s
The total CPU time scales well, however, the elapsed times for test caeses
running in batch mode behaves in a strange way. It seems that the TORQUE
overhead increases dramatically when two or more than two nodes are used,
and the performance gain by using more cores was overtaken by the TORQUE
overhead. This does not make sense, and I have not found this behavior on
single-core dual-opteron cluster.
Anyone has any clue on why this behavior occurrs? On our cluster, each
node has 2 dual-core opteron 275 CPUs, TORQUE treats it as a 4-way SMP
node, each node has 4GB RAM.
Jyh-Shyong Ho, Ph.D.
Research Scientist
National Center for High Performance Computing
Hsinchu, Taiwan, ROC
</pre>
<font face="斢X索X明體"> </font>
</body>
</html>