[torqueusers] requesting gpus
Gareth.Williams at csiro.au
Gareth.Williams at csiro.au
Thu Feb 2 20:17:36 MST 2012
Hi All,
I added a basic gpus count information to one of our compute nodes with:
qmgr -c 's n n121 gpus = 2'
and it seems fine:
> pbsnodes -a n121
n121
state = free
np = 12
ntype = cluster
status = rectime=1328238593,varattr=,jobs=,state=free,size=133709780kb:144492840kb,netload=156768229618,gres=,loadave=2.00,ncpus=24,physmem=99195396kb,availmem=95103784kb,totmem=101299868kb,idletime=173222,nusers=0,nsessions=0,uname=Linux n121 2.6.32.49-0.3-default #1 SMP 2011-12-02 11:28:04 +0100 x86_64,opsys=sles11,arch=x86_64
mom_service_port = 15002
mom_manager_port = 15003
gpus = 2
However when I run a job with the recommended syntax:
http://www.adaptivecomputing.com/resources/docs/torque/3-0-3/3.7schedulinggpus.php
I get:
> qsub -I -q viz -l nodes=1:ppn=1:gpus=1
qsub: Job exceeds queue resource limits MSG=cannot locate feasible nodes
The torque version is 3.0.3-snap.201108261653
Note that this is _not_ the --enable-nvidia-gpus functionality.
Also note that the server has not been restarted.
The scheduler is moab but I'm pretty sure the job gets rejected well before moab comes into the picture.
Does anyone have such a setup working or can anyone see what is wrong (or have an idea of where to look)?
Regards,
Gareth
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20120203/b5961f5e/attachment.html
More information about the torqueusers
mailing list