[torquedev] read_nonblocking_socket() wtf?
garrick at usc.edu
Fri Jul 16 14:06:38 MDT 2010
I've been looking into a problem regarding maui sometimes hanging in a read()
on its socket to pbs_server. The hangs happen in pbs_disconnect() after a
normal timeout. I thought this was weird because we define read() to be
read_nonblocking_socket() which a nice little 30-second loop around a
The define to read_nonblocking_socket() replaces a blocking read wrapped with
an ALRM of pbs_tcp_timeout seconds.
So why would maui hang on a non-blocking read()? Is there something broken in
my kernel? What a mystery!
It turns out that read_nonblocking_socket does the exact opposite of what it
says because the fcntl() call is commented out! WTF? A neat little ALRM-wrapped
read() call is replaced with a broken hard-wired implementation.
I'm on 2.1.x. Is this all fixed up in later branches?
Garrick Staples, GNU/Linux HPCC SysAdmin
University of Southern California
Life is Good!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torquedev/attachments/20100716/da9465f5/attachment.bin
More information about the torquedev