[Moabusers] Job error: interpretation needed

Caird, Andrew J acaird at umich.edu
Wed Jun 25 08:54:49 MDT 2008


It looks like the job was killed by a SIGTERM - did it run out of walltime or did Torque or whatever RM you're using kill it?

It also looks like one task (probably the rank 0 task) was killed and the other ranks died when they couldn't write back to rank 0.

> -----Original Message-----
> From: moabusers-bounces at supercluster.org [mailto:moabusers-
> bounces at supercluster.org] On Behalf Of Gelonia L Dent
> Sent: Tuesday, June 24, 2008 4:44 PM
> To: moabusers at supercluster.org
> Subject: [Moabusers] Job error: interpretation needed
>
> Can someone tell me what these error messages mean. I have a user who
> submitted a job and the
> following errors are produced with his output.
>
> p1_2832:  p4_error: interrupt SIGx: 15
> > p2_2833:  p4_error: interrupt SIGx: 15
> > rm_l_2_2836:  p4_error: interrupt SIGx: 15
> > rm_l_2_2836: (134.453125) net_send: could not write to fd=7, errno =
> 32
> > rm_l_3_2838:  p4_error: interrupt SIGx: 15
> > rm_l_3_2838: (134.453125) net_send: could not write to fd=8, errno =
> 32
> > bm_list_2831:  p4_error: interrupt SIGx: 15
> > p3_2835:  p4_error: interrupt SIGx: 15
> > p0_2803:  p4_error: interrupt SIGx: 15
> > p1_2832: (134.511719) net_send: could not write to fd=6, errno = 32
> > p2_2833: (134.511719) net_send: could not write to fd=7, errno = 32
> > p8_2846:  p4_error: interrupt SIGx: 13
> > p7_2844:  p4_error: interrupt SIGx: 13
>
> Thanks
> --
> Gelonia Dent, PhD
> Manager of Scientific Computing
> Invertebrate Zoology
> The American Museum of Natural History
> (212) 313-7911
>
>
>
> _______________________________________________
> moabusers mailing list
> moabusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/moabusers


More information about the moabusers mailing list