must remove nodes=1 - WAS: [Mauiusers] Node idle but load is HIGH
Toni L. Harbaugh-Blackford [Contr]
harbaugh at ncifcrf.gov
Fri Sep 28 08:28:16 MDT 2007
The problem is "nodes=1". With "nodes=1", all cpus from the "ncpus=100"
setting MUST be on the same node. Do you have nodes=1 in your qmgr setup?
On Fri, 28 Sep 2007, Jan Ploski wrote:
> mauiusers-bounces at supercluster.org schrieb am 09/28/2007 03:09:22 PM:
>
> > On Fri, 28 Sep 2007, Jan Ploski wrote:
> >
> > > ...and according to pstree these jobs are child processes of
> > > pbs_mom, so definitely not "runaway".
> >
> > What does qstat -f say about those jobs ?
>
> Here is an example. I see nothing strange in it:
>
> Job Id: 346597.srvgrid01.offis.uni-oldenburg.de
> Job_Name = STDIN
> Job_Owner = dgad0006 at srvgrid01.offis.uni-oldenburg.de
> resources_used.cput = 04:25:03
> resources_used.mem = 82576kb
> resources_used.vmem = 164320kb
> resources_used.walltime = 04:25:47
> job_state = R
> queue = dgiseq
> server = srvgrid01.offis.uni-oldenburg.de
> Checkpoint = u
> ctime = Fri Sep 28 11:00:53 2007
> Error_Path = srvgrid01:/home/d-grid-users/dgad0006/1705.err
> exec_host = node43/0
> Hold_Types = n
> Join_Path = n
> Keep_Files = n
> Mail_Points = n
> mtime = Fri Sep 28 11:00:54 2007
> Output_Path = srvgrid01:/home/d-grid-users/dgad0006/1705.out
> Priority = 0
> qtime = Fri Sep 28 11:00:53 2007
> Rerunable = True
> Resource_List.ncpus = 0
> Resource_List.neednodes = 1
> Resource_List.nodect = 1
> Resource_List.nodes = 1
> Resource_List.walltime = 12:00:00
> session_id = 8431
> Shell_Path_List = /bin/sh
> substate = 42
> Variable_List = PBS_O_HOME=/home/d-grid-users/dgad0006,
> PBS_O_LOGNAME=dgad0006,
> PBS_O_PATH=/usr/sbin:/bin:/usr/bin:/sbin:/usr/X11R6/bin,
> PBS_O_SHELL=/bin/bash,PBS_O_HOST=srvgrid01.offis.uni-oldenburg.de,
> PBS_O_WORKDIR=/home/d-grid-users/dgad0006,PBS_O_QUEUE=dgiseq
> euser = dgad0006
> egroup = ad
> hashname = 346597.srvg
> queue_rank = 103775
> queue_type = E
> etime = Fri Sep 28 11:00:53 2007
>
>
> Best regards,
> Jan Ploski
> _______________________________________________
> mauiusers mailing list
> mauiusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/mauiusers
>
-------------------------------------------------------------------
Toni Harbaugh-Blackford harbaugh at ncifcrf.gov
System Administrator
Advanced Biomedical Computing Center (ABCC)
National Cancer Institute
Contractor - SAIC/Frederick
More information about the mauiusers
mailing list