Jorg,<div><br></div><div>I was able to reproduce and fix the first deadlock with the following check-ins:</div><div><br></div><div><div>8d51275..ef452c8 4.1-dev -> 4.1-dev</div><div> 4b90e69..aa5f8d5 master -> master</div>
</div><div><br></div><div>Can you provide more details on how to reproduce the second one? </div><div><br></div><div>David<br><br><div class="gmail_quote">On Thu, Feb 7, 2013 at 2:40 PM, Joerg Blank <span dir="ltr"><<a href="mailto:j.blank@fz-juelich.de" target="_blank">j.blank@fz-juelich.de</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hello,<br>
<br>
I found another deadlock, this time when a job gets deleted. I was not<br>
able to pinpoint the offending lock.<br>
<div class="im"><br>
Regards,<br>
Jörg Blank<br>
<br>
<br>
(gdb) info threads<br>
</div> 19 Thread 13950 0x00007f3ef94cdc5d in nanosleep () at<br>
../sysdeps/unix/syscall-template.S:82<br>
18 Thread 14161 __lll_lock_wait () at<br>
../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136<br>
17 Thread 14160 __lll_lock_wait () at<br>
../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136<br>
16 Thread 14159 __lll_lock_wait () at<br>
../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136<br>
15 Thread 14158 __lll_lock_wait () at<br>
../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136<br>
14 Thread 14157 __lll_lock_wait () at<br>
../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136<br>
13 Thread 14156 __lll_lock_wait () at<br>
../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136<br>
12 Thread 14155 __lll_lock_wait () at<br>
../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136<br>
11 Thread 14154 __lll_lock_wait () at<br>
../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136<br>
10 Thread 14153 __lll_lock_wait () at<br>
../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136<br>
9 Thread 14152 __lll_lock_wait () at<br>
../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136<br>
8 Thread 14151 __lll_lock_wait () at<br>
../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136<br>
7 Thread 14150 __lll_lock_wait () at<br>
../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136<br>
6 Thread 14149 __lll_lock_wait () at<br>
../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136<br>
5 Thread 14148 __lll_lock_wait () at<br>
../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136<br>
4 Thread 14145 0x00007f3ef94cdc5d in nanosleep () at<br>
../sysdeps/unix/syscall-template.S:82<br>
3 Thread 14144 0x00007f3ef94cdc5d in nanosleep () at<br>
../sysdeps/unix/syscall-template.S:82<br>
2 Thread 14143 0x00007f3ef99a538d in accept () at<br>
../sysdeps/unix/syscall-template.S:82<br>
* 1 Thread 14142 __lll_lock_wait () at<br>
../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136<br>
<br>
(gdb) thread 17<br>
[Switching to thread 17 (Thread 14160)]#0 __lll_lock_wait () at<br>
<div class="im">../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136<br>
136 in ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S<br>
(gdb) bt<br>
#0 __lll_lock_wait () at<br>
../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136<br>
</div>#1 0x00007f3ef99a0179 in _L_lock_953 () from /lib/libpthread.so.0<br>
#2 0x00007f3ef999ff9b in __pthread_mutex_lock (mutex=0x5729cb0) at<br>
pthread_mutex_lock.c:61<br>
#3 0x0000000000448760 in lock_ji_mutex (pjob=0x5731100, id=Unhandled<br>
dwarf expression opcode 0xf3<br>
) at svr_jobfunc.c:2863<br>
#4 0x0000000000411370 in remove_job (aj=0xa972c0, pjob=0x5731100) at<br>
job_func.c:2562<br>
#5 0x0000000000449aac in svr_dequejob (pjob=0x5731100,<br>
parent_queue_mutex_held=0) at svr_jobfunc.c:758<br>
#6 0x0000000000411e17 in svr_job_purge (pjob=0x5731100) at job_func.c:1776<br>
#7 0x000000000042c5c8 in handle_complete_second_time (ptask=Unhandled<br>
dwarf expression opcode 0xf3<br>
) at req_jobobit.c:1800<br>
#8 0x000000000045acd2 in work_thread (a=0x7fff14ff7180) at<br>
u_threadpool.c:307<br>
#9 0x00007f3ef999d8ca in start_thread (arg=<value optimized out>) at<br>
pthread_create.c:300<br>
#10 0x00007f3ef94fcb6d in clone () at<br>
../sysdeps/unix/sysv/linux/x86_64/clone.S:112<br>
#11 0x0000000000000000 in ?? ()<br>
(gdb) print *(pthread_mutex_t*)0x5729cb0<br>
$4 = {__data = {__lock = 2, __count = 0, __owner = 14154, __nusers = 0,<br>
<div class="im">__kind = 0, __spins = 0, __list = {__prev = 0x0, __next = 0x0}},<br>
</div> __size = "\002\000\000\000\000\000\000\000J7", '\000' <repeats 29<br>
<div class="im">times>, __align = 2}<br>
<br>
</div>(gdb) thread 11<br>
[Switching to thread 11 (Thread 14154)]#0 __lll_lock_wait () at<br>
<div class="im">../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136<br>
136 in ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S<br>
(gdb) bt<br>
#0 __lll_lock_wait () at<br>
../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136<br>
</div>#1 0x00007f3ef99a0179 in _L_lock_953 () from /lib/libpthread.so.0<br>
#2 0x00007f3ef999ff9b in __pthread_mutex_lock (mutex=0x51cde90) at<br>
pthread_mutex_lock.c:61<br>
#3 0x000000000044a864 in lock_alljobs_mutex (aj=0xa972c0, id=Unhandled<br>
dwarf expression opcode 0xf3<br>
) at svr_jobfunc.c:3017<br>
#4 0x0000000000410aee in find_job_by_array (aj=0xa972c0,<br>
job_id=0x7f3ee41f99a0 "30979[34].glorim-1.cluster", get_subjob=1) at<br>
job_func.c:2140<br>
#5 0x0000000000410e16 in svr_find_job (jobid=0x7f3ee41f99a0<br>
"30979[34].glorim-1.cluster", get_subjob=1) at job_func.c:2245<br>
#6 0x000000000042c53a in handle_complete_second_time<br>
(ptask=0x7f3ee40361d0) at req_jobobit.c:1765<br>
#7 0x000000000045acd2 in work_thread (a=0x7fff14ff7180) at<br>
u_threadpool.c:307<br>
#8 0x00007f3ef999d8ca in start_thread (arg=<value optimized out>) at<br>
pthread_create.c:300<br>
#9 0x00007f3ef94fcb6d in clone () at<br>
../sysdeps/unix/sysv/linux/x86_64/clone.S:112<br>
#10 0x0000000000000000 in ?? ()<br>
(gdb) print *(pthread_mutex_t*)0x51cde90<br>
$5 = {__data = {__lock = 2, __count = 0, __owner = 14160, __nusers = 1,<br>
<div class="im">__kind = 0, __spins = 0, __list = {__prev = 0x0, __next = 0x0}},<br>
</div> __size = "\002\000\000\000\000\000\000\000P7\000\000\001", '\000'<br>
<div class="im HOEnZb"><repeats 26 times>, __align = 2}<br>
<br>
<br>
</div><div class="HOEnZb"><div class="h5">_______________________________________________<br>
torqueusers mailing list<br>
<a href="mailto:torqueusers@supercluster.org">torqueusers@supercluster.org</a><br>
<a href="http://www.supercluster.org/mailman/listinfo/torqueusers" target="_blank">http://www.supercluster.org/mailman/listinfo/torqueusers</a><br>
</div></div></blockquote></div><br><br clear="all"><div><br></div>-- <br><div>David Beer | Senior Software Engineer</div><div>Adaptive Computing</div>
</div>