[torquedev] [Bug 212] New: server spins on select() with expired sockets

bugzilla-daemon at supercluster.org bugzilla-daemon at supercluster.org
Mon Aug 6 04:55:37 MDT 2012


http://www.clusterresources.com/bugzilla/show_bug.cgi?id=212

           Summary: server spins on select() with expired sockets
           Product: TORQUE
           Version: 4.0.*
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: major
          Priority: P5
         Component: pbs_server
        AssignedTo: dbeer at adaptivecomputing.com
        ReportedBy: viktor.stujber at stuba.sk
                CC: torquedev at supercluster.org
   Estimated Hours: 0.0


Our torque 4.1.0 server often goes into a cpu-consuming loop. Here's
information I gathered so far.

> strace -p 28590
select(1024, [8 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
33 34 35 36 37 38 39 41 43 44], NULL, NULL, {5, 0}) = 31 (in [8 12 13 14 15 16
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 33 34 35 36 37 38 39 41 43 44],
left {4, 999992})
nanosleep({0, 100000}, NULL)            = 0
tgkill(28590, 28592, SIG_0)             = 0
tgkill(28590, 28593, SIG_0)             = 0
<repeat infinitely>

(gdb) p svr_conn[8]
$34 = {cn_addr = 2477722413, cn_handle = 0, cn_port = 15002, cn_authen = 1,
cn_socktype = 2, cn_active = ToServerDIS, cn_lasttime = 1344247871, cn_func =
0, cn_oncl = 0, cn_mutex = 0x2a12ce0, cn_stay_open = 0}

I shut down all the connected clients, and all of the abovementioned socket IDs
are in the CLOSE_WAIT state. The select() call signals activity on all 31
sockets and returns immediately, but all of them are of type 'ToServerDIS' (not
'Idle'), and none of them have a cn_func assigned, so the code in
lib/Libnet/net_server.c wait_request() just keeps spinning over all 10240 blank
connection slots with no sleep, causing significant cpu usage.

-- 
Configure bugmail: http://www.clusterresources.com/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.


More information about the torquedev mailing list