[Mauiusers] insufficient idle procs available ?
Itay M
itaym.tau at gmail.com
Tue Jan 29 14:00:31 MST 2008
Here is the diagnose -j on these two jobs that are running on node28:
/==============================/
diagnose -j 228620
Name State Par Proc QOS WCLimit R Min User
Group Account QueuedTime Network Opsys Arch Mem Disk Procs
Class Features
228620 Running DEF 1 low 10:00:00:00 1 1 ad_user
pu_group - 2:49:41 [NONE] [NONE] [NONE] >=0 >=0 NC0
[heavy:1] [NONE]
WARNING: job '228620' utilizes more memory than dedicated (3432 > 512)
diagnose -j 228621
Name State Par Proc QOS WCLimit R Min User
Group Account QueuedTime Network Opsys Arch Mem Disk Procs
Class Features
228621 Running DEF 1 low 10:00:00:00 1 1 ad_user
pu_group - 2:49:41 [NONE] [NONE] [NONE] >=0 >=0 NC0
[heavy:1] [NONE]
WARNING: job '228621' utilizes more memory than dedicated (3595 > 512)
/==============================/
And here is the checkjob -v on these two jobs:
/==============================/
checking job 228620 (RM job '228620.cluster')
State: Running
Creds: user:ad_user group:pu_group class:heavy qos:low
WallTime: 6:31:31 of 10:00:00:00
SubmitTime: Tue Jan 29 16:14:14
(Time Queued Total: 00:00:01 Eligible: 00:00:01)
StartTime: Tue Jan 29 16:14:15
Total Tasks: 1
Req[0] TaskCount: 1 Partition: DEFAULT
Network: [NONE] Memory >= 0 Disk >= 0 Swap >= 0
Opsys: [NONE] Arch: [NONE] Features: [NONE]
Exec: '' ExecSize: 0 ImageSize: 0
Dedicated Resources Per Task: PROCS: 1 MEM: 512M
Utilized Resources Per Task: PROCS: 0.13 MEM: 34.32 SWAP: 35.44
Avg Util Resources Per Task: PROCS: 0.10
Max Util Resources Per Task: PROCS: 0.13 MEM: 34.32 SWAP: 35.44
Average Utilized Memory: 3408.54 MB
Average Utilized Procs: 0.61
NodeAccess: SHARED
NodeCount: 1
Allocated Nodes:
[node28:1]
Task Distribution: node28
IWD: [NONE] Executable: [NONE]
Bypass: 0 StartCount: 1
PartitionMask: [ALL]
SystemQueueTime: Tue Jan 29 19:53:18
Flags: RESTARTABLE
Reservation '228620' (-6:31:19 -> 9:17:28:41 Duration: 10:00:00:00)
PE: 1.00 StartPriority: 200
checking job 228621 (RM job '228621.cluster')
State: Running
Creds: user:ad_user group:pu_group class:heavy qos:low
WallTime: 6:24:00 of 10:00:00:00
SubmitTime: Tue Jan 29 16:22:46
(Time Queued Total: 00:00:01 Eligible: 00:00:01)
StartTime: Tue Jan 29 16:22:47
Total Tasks: 1
Req[0] TaskCount: 1 Partition: DEFAULT
Network: [NONE] Memory >= 0 Disk >= 0 Swap >= 0
Opsys: [NONE] Arch: [NONE] Features: [NONE]
Exec: '' ExecSize: 0 ImageSize: 0
Dedicated Resources Per Task: PROCS: 1 MEM: 512M
Utilized Resources Per Task: PROCS: 0.10 MEM: 35.95 SWAP: 39.56
Avg Util Resources Per Task: PROCS: 0.08
Max Util Resources Per Task: PROCS: 0.10 MEM: 35.95 SWAP: 39.56
Average Utilized Memory: 3561.67 MB
Average Utilized Procs: 0.58
NodeAccess: SHARED
NodeCount: 1
Allocated Nodes:
[node28:1]
Task Distribution: node28
IWD: [NONE] Executable: [NONE]
Bypass: 0 StartCount: 1
PartitionMask: [ALL]
SystemQueueTime: Tue Jan 29 19:53:18
Flags: RESTARTABLE
Reservation '228621' (-6:23:49 -> 9:17:36:11 Duration: 10:00:00:00)
PE: 1.00 StartPriority: 200
/==============================/
what does the 0:4 means?
Could this be related to the way in which the user is running the job itself
(the one that qsub runs) ?
Or should I check something in the nodes? something related to load average?
else?
BTW, almost all of our jobs have the 'WARNING: job '{job_id}' utilizes more
memory than dedicated (xxxx > 512) . Should I change the default memory
assigned for the jobs? Currently the default is 512MB.
On Jan 29, 2008 10:36 PM, Jan Ploski <Jan.Ploski at offis.de> wrote:
>
>
>
> Can you also report the output of checkjob and diagnose -j on these 2
> jobs? Do they also have the MEM requirement?
>
> > About the MEM requirement: do you mean to unset it to? other than that
> > we don't use any MEM requierment in our qsub script.
>
> Well, it must be coming from somewhere, quite possibly from a default in
> the queue or server configuration. So I'd try unsetting it there.
> However, looking at the diagnose -n output above makes me think it is
> processor related - judging from the 0:4, for some unknown reason your
> jobs consume 2 processors each rather than 1.
>
> Regards,
> Jan Ploski
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/mauiusers/attachments/20080129/e714918b/attachment.html
More information about the mauiusers
mailing list