[Mauiusers] configuring disk space
Dave Jackson
jacksond at clusterresources.com
Mon Feb 5 15:33:04 MST 2007
Kevin,
I would guess you are requesting this correctly with qsub. I am not
certain why TORQUE is rejecting this but it looks like Maui is doing the
right thing. (Also, the mom 'size' option was correct, my fingers went
on auto-pilot in writing the previous email)
When I test this under TORQUE 2.2.0, my job fails with the following
in the job's stderr file
--------
getsize() failed for file in mom_set_limits
--------
It looks like we need to move this discussion over to torqueusers. Do
you want to re-post this? I will get started looking at what we can do
to make getsize working on larger values.
Dave
On Mon, 2007-02-05 at 17:17 -0500, Kevin Van Workum wrote:
> I tried using:
> NODECFG[amd24] CFGDISK=10000
>
> checknode amd24 returned:
> Configured Resources: PROCS: 1 MEM: 503M SWAP: 1484M DISK: 10000M
> Utilized Resources: DISK: 9999M
> Dedicated Resources: [NONE]
>
> But my job doesn't start if I request more than 1mb, e.g. 'qsub -l
> file=2mb'. So it looks like maui thinks there is only 1mb of disk
> available.
>
> Also, I couldn't find any 'file' option for mom's config, but I did
> find the 'size' option. So I tried 'size[fs=/tmp]'.
>
> In this case checknode amd24 returned:
> Configured Resources: PROCS: 1 MEM: 503M SWAP: 1483M DISK: 12G
> Utilized Resources: DISK: 1855M
> Dedicated Resources: [NONE]
>
> Which is close to the correct disk space for /tmp. 'df -h /tmp' gives:
> [root at amd24 ~]# df /tmp -h
> Filesystem Size Used Avail Use% Mounted on
> /dev/mapper/VolGroup00-LogVol00 13G 1.9G 10G 16% /
>
> But I found that if I request file=4096mb or more, the job fails.
> mom's log says:
> "pbs_mom;Job;TMomFinalizeJob3;job not started, Failure job exec
> failure, after files staged, no retry"
> and pbs_server says:
> "PBS_Server: stream_eof, connection to amd24 is bad, remote service
> may be down, message may be corrupt, or connection may have been
> dropped remotely (End of File). setting node state to down"
>
> Requesting less than file=4096mb works fine.
>
> Maybe I'm requesting the required disk space incorrectly on qsub's command line.
>
> Kevin
>
> On 2/5/07, Dave Jackson <jacksond at clusterresources.com> wrote:
> > Kevin,
> >
> > I think you can indicate the amount of diskspace available using the
> > TORQUE 'file' option in mom config. Otherwise, you should be able to
> > populate this info directly via Maui using 'NODECFG[X] CFGDISK=<VAL>'
> > where val is specified in MB. If this does not work, let me know and we
> > will get it fixed.
> >
> > Dave
> >
> > On Mon, 2007-02-05 at 16:16 -0500, Kevin Van Workum wrote:
> > > My nodes have various amounts of local scratch disk space. How do I
> > > tell maui how much local scratch disk space each node has, and how do
> > > I request a certain amount of disk space on each node for a particular
> > > job?
> > >
> > > I'm testing with maui-3.2.6p19 and torque-2.1.6.
> > >
> >
> >
>
>
More information about the mauiusers
mailing list