[Moabusers] Hierarchical Fairshare/Share Trees documentation?
Ángel de Vicente
angelv at iac.es
Mon May 19 05:52:29 MDT 2008
Hi,
Angel de Vicente wrote:
> I'm trying to better understand how hierarchical fairshare/share trees
> work with Moab, but I find the documentation at
> http://www.clusterresources.com/products/mwm/docs/6.3fairshare.shtml not
> detailed enough.
>
> Is there any other source of information for this feature?
A few months ago I sent this question to the list, and last week we
spent some time at our institution figuring out how this really works,
so I thought I would share our findings... Let's start by saying what we
wanted to accomplish and what our current configuration looks like.
Simplifying a bit, basically we have two types of jobs: LOC and PRO, and
we want to make sure that (assuming heavy load of jobs of both types)
20% of the time is devoted to LOC jobs, while 80% of the time is devoted
to PRO jobs. These are further subdivided into other branches. For
example, the groups PRO_26 and PRO_81 share another branch PRO_a, where
25% of the time should go to PRO_26 and 75% of the time should go to PRO_81.
We assumed that when calculating the fairshare factors, the value of the
discrepancy between usage and target would always be considered as a
percentage and that the upper branches of the tree would always have a
bigger impact on the overall calculation of the FSFACTOR. Thus, for the
20-80% policy for LOC-PRO jobs, we set the TARGET for LOC to 20 and the
TARGET for PRO to 80. For the 25%-75% policy for PRO_26-PRO_81 we set
the TARGET respectively to 200-600, as can be seen in the information
given by the command mdiag -f below.
# /opt/moab/bin/mdiag -f
Share Tree Overview for partition 'ALL'
Name Usage Target (FSFACTOR)
----- ----- ------ ------------
root 100.00 100.00 of 100.00 (node: 2267590792.30)
(0.00)
- LOC 7.48 20.00 of 100.00 (node: 169597111.25)
(0.00)
- LOC_projects 100.00 100.00 of 100.00 (node: 169597111.25)
(0.00)
- LOC10 0.24 1.00 of 5.00 (group: 8051149.33)
(154.59)
- LOC11 0.18 1.00 of 5.00 (group: 6217766.80)
(155.22)
- LOC12 2.11 1.00 of 5.00 (group: 71686281.79)
(132.75)
- LOC13 0.51 1.00 of 5.00 (group: 17197930.64)
(151.45)
- LOC14 1.96 1.00 of 5.00 (group: 66443979.44)
(134.55)
- PRO 92.52 80.00 of 100.00 (node: 2097993677.90)
(0.00)
- PRO_a 951.27 1000.00 of 1102.00 (node: 1811038343.02)
(0.00)
- PRO_26 170.59 200.00 of 800.00 (group: 386181609.41)
(763.62)
- PRO_81 479.55 600.00 of 800.00 (group:
1085595601.27) (1823.15)
- PRO_b 150.73 100.00 of 1102.00 (node: 286955332.50)
(0.00)
- PRO_c 0.00 2.00 of 1102.00 (node: 0.00) (0.00)
But as it turns out, neither of our assumptions were correct, so the
fairshare factors don't reflect our intended policy. According to the
documentation at
http://www.clusterresources.com/products/mwm/docs/6.3fairshare.shtml#treeconfig
(section 6.3.4.2), we could accomplish more or less what we want with
those parameters, but in our version of Moab (5.1) those parameters did
not have any noticeable effect (actually FSTREECAP was returned as an
invalid parameter).
As can be seen from the output above, the FSFACTOR for PRO_81 is really
high in comparison with the LOC groups, despite that the PRO branch has
exceeded the 80 target and the LOC branch is well below their 20 target.
We found that this is so because the FSFACTOR is calculated as:
d1*k1 + d2*k2 + ... + dn*kn
where di is the discrepancy at the branch at level i, and ki is the
constant at that level. Perhaps with the parameters in section 6.3.4.2
this can be changed, but in our case all the constans at all levels are
the same, so the above formula is actually:
(d1 + d2 + ... + dn) * k , and we found that k~11.63
Thus, the FSFACTOR for PRO_81 is calculated as:
11.63 * ((80 - 92.52) + (1000 - 951.27) + (600 - 479.55)) ~ 1822
This is very unfortunate, because it means that the discrepancies at any
level have the same weight as at any other level, which is not at all
what we want.
Luckily this can be patched easily (at least until the parameters
mentioned above work, perhaps they do in version 5.2?), by modifying the
targets at each level accordingly. For example, in our case, we can
simply make the targets at the branches LOC-PRO maintain their ratio,
but increase their value by some orders of magnitude, say to 2e6 - 8e6.
This way, any discrepancy at the first level will have a much bigger
impact than anything below. This seems to work fine now for our purposes
and all jobs in LOC have actually much higer FSFACTOR than any job in PRO.
Share Tree Overview for partition 'ALL'
Name Usage Target (FSFACTOR)
----- ----- ------ ------------
root 100.00 100.00 of 100.00 (node: 2267771179.30)
(0.00)
- LOC 7478581.30 20000000.00 of 100000000.00 (node:
169597111.25) (0.00)
- LOC_projects 100.00 100.00 of 100.00 (node: 169597111.25)
(0.00)
- LOC10 0.24 1.00 of 5.00 (group: 8051149.33)
(145719187.24)
- LOC11 0.18 1.00 of 5.00 (group: 6217766.80)
(145719187.87)
- LOC12 2.11 1.00 of 5.00 (group: 71686281.79)
(145719165.41)
- LOC13 0.51 1.00 of 5.00 (group: 17197930.64)
(145719184.10)
- LOC14 1.96 1.00 of 5.00 (group: 66443979.44)
(145719167.20)
- PRO 92521418.57 80000000.00 of 100000000.00 (node:
2098174064.90) (0.00)
- PRO_a 951.29 1000.00 of 1102.00 (node: 1811218730.02)
(0.00)
- PRO_26 170.57 200.00 of 800.00 (group: 386181609.41)
(-145718267.37)
- PRO_81 479.58 600.00 of 800.00 (group:
1085775988.27) (-145717208.41)
- PRO_b 150.71 100.00 of 1102.00 (node: 286955332.50)
(0.00)
- PRO_c 0.00 2.00 of 1102.00 (node: 0.00) (0.00)
cabeza:/etc/perf/moab #
Does someone have any experience with this in version 5.2 or how to make
this a bit more elegant?
Cheers,
Ángel de Vicente
--
----------------------------------
http://www.iac.es/galeria/angelv/
PostDoc Software Support
Instituto de Astrofísica de Canarias
More information about the moabusers
mailing list