Hi <br>I thank you very much for reply and taking care for my problem. <br>I debug my code using idb. i got the following error. I dont know debugging command is correct or not.<br>First i compiled the code using <br><br>mpif90 -g test.f -o test.exe<br><br>And then i tried to debug it.<br><pre>/opt/mpich/intel/bin/mpirun -dbg=idb -np 6 test.exe</pre>I got the following error. Can you able to find what mistake i did.<br><pre>[velan@galaxy debug]$ /opt/mpich/intel/bin/mpirun -dbg=idb -np 6 test.exe<br>out.e<br>Intel(R) Debugger for IA-32 -based Applications, Version 9.0-16, Build<br>20051121<br>Reading symbolic information from /home/velan/debug/test.exe...done<br>Evaluating '::MPIR_proctable[0]' failed!<br>The value (166516824) is not an array or pointer!<br>Fatal error: Can't find process information.<br>p5_21896: p4_error: net_recv read: probable EOF on socket: 1<br>p4_17944: p4_error: interrupt SIGx: 13<br>[velan@galaxy debug]$ p2_21089: p4_error: interrupt SIGx:
13<br>rm_l_5_21913: (0.055103) net_send: could not write to fd=5, errno = 32<br>p5_21896: (0.058996) net_send: could not write to fd=5, errno = 32<br>p4_17944: (10.280392) net_send: could not write to fd=5, errno = 32<br><br>p2_21089: (11.816449) net_send: could not write to fd=5, errno = 32</pre>I try to run in parallel mode. i got the following error.<br><br>/opt/mpich/intel/bin/mpirun -parallel -dbg=idb -np 4 test.exe<br>Unrecognized argument -parallel ignored.<br>Intel(R) Debugger for IA-32 -based Applications, Version 9.0-16, Build 20051121<br>Reading symbolic information from /home/velan/debug/test.exe...done<br>Evaluating '::MPIR_proctable[0]' failed!<br>The value (166514304) is not an array or pointer!<br>Fatal error: Can't find process information.<br><br>Can you help me how to fix my problem<br>Thanks<br>Velan<br><br><b><i>Rajeev Thakur <thakur@mcs.anl.gov></i></b> wrote:<blockquote class="replbq" style="border-left: 2px solid rgb(16, 16, 255); margin-left:
5px; padding-left: 5px;"> <meta http-equiv="Content-Type" content="text/html; charset=us-ascii"> <meta content="MSHTML 6.00.2900.2963" name="GENERATOR"> <div dir="ltr" align="left"><span class="060354616-11102006"><font color="#0000ff" face="Arial" size="2">The error message says that one of the requests passed to MPI_Waitall is invalid. It is hard to tell further what may be the problem, but it's likely a bug in your code.</font></span></div> <div dir="ltr" align="left"><span class="060354616-11102006"><font color="#0000ff" face="Arial" size="2"></font></span> </div> <div dir="ltr" align="left"><span class="060354616-11102006"><font color="#0000ff" face="Arial" size="2">Rajeev</font></span></div><br> <blockquote style="border-left: 2px solid rgb(0, 0, 255); padding-left: 5px; margin-left: 5px; margin-right: 0px;"> <div class="OutlookMessageHeader" dir="ltr" align="left" lang="en-us"> <hr tabindex="-1"> <font face="Tahoma" size="2"><b>From:</b> Vadivelan
Ranjith [mailto:achillesvelan@yahoo.co.in] <br><b>Sent:</b> Wednesday, October 11, 2006 8:09 AM<br><b>To:</b> Rajeev Thakur<br><b>Cc:</b> mpi-maint@mcs.anl.gov<br><b>Subject:</b> RE: [MPI #11007] p4_error: latest msg from perror: Bad file descriptor<br></font><br></div> <div></div>Hi<br>Thanks for reply. i got the following error when i use mpich2--1.0.3.<br>This machine was installed manually(not rocks). Everytime the name of the compute node is changing in the job.o file.<br>For example:<br>..."rank 5 in job 1 node07.cluster2.iitb.ac.in_32814"<br>..."rank 5 in job 1 node08.cluster2.iitb.ac.in_32776"<br>..."rank 5 in job 1 node09.cluster2.iitb.ac.in_32817"<br>I dont know how to fix the problem. If anybody knows help
me<br><br>Velan<br>----------------------------------------------------------------------------------------------<br>job.e<br>----------------------------------------------------------------------------------------------<br>velan@galaxy:~/3DSIM$ cat job.e10340<br>[cli_5]: aborting job:<br>Fatal error in MPI_Waitall: Invalid MPI_Request, error stack:<br>MPI_Waitall(241): MPI_Waitall(count=250, req_array=0x9f7ede0, status_array=0x9f4e820) failed<br>MPI_Waitall(109): Invalid MPI_Request<br>----------------------------------------------------------------------------------------------<br>job.o<br>----------------------------------------------------------------------------------------------<br>velan@galaxy:~/3DSIM$ cat job.o10340<br># Allocating 5 nodes to block 1<br># Allocating 1 nodes to block 2<br># Require mxb >= 97<br># Require mxa >= 26 mya >= 97 and mza
>= 75<br># Maximum load imbalance = 71.69%<br>#<br># Navier-Stokes Simulation<br># Implicit Full Matrix DP-LUR<br><br>rank 5 in job 1 node09.cluster2.iitb.ac.in_32776 caused collective abort of all ranks exit status of rank 5: killed by signal 9<br>----------------------------------------------------------------------------------------------<br><br><br><b><i>Rajeev Thakur <thakur@mcs.anl.gov></i></b> wrote: <blockquote class="replbq" style="border-left: 2px solid rgb(16, 16, 255); padding-left: 5px; margin-left: 5px;"> <meta content="MSHTML 6.00.2900.2963" name="GENERATOR"> <div dir="ltr" align="left"><span class="983182416-10102006"><font color="#0000ff" face="Arial" size="2">It is hard to tell, but it is possible that there might be some bug in your code. You can try using MPICH2 and see if you get any better error message.</font></span></div> <div dir="ltr" align="left"><span
class="983182416-10102006"><font color="#0000ff" face="Arial" size="2"></font></span> </div> <div dir="ltr" align="left"><span class="983182416-10102006"><font color="#0000ff" face="Arial" size="2">Rajeev</font></span></div><br> <blockquote style="border-left: 2px solid rgb(0, 0, 255); padding-left: 5px; margin-left: 5px; margin-right: 0px;"> <div class="OutlookMessageHeader" dir="ltr" align="left" lang="en-us"> <hr tabindex="-1"> <font face="Tahoma" size="2"><b>From:</b> Vadivelan Ranjith [mailto:achillesvelan@yahoo.co.in] <br><b>Sent:</b> Tuesday, October 10, 2006 7:56 AM<br><b>To:</b> mpi-maint@mcs.anl.gov<br><b>Cc:</b> mpi-maint@mcs.anl.gov<br><b>Subject:</b> [MPI #11007] p4_error: latest msg from perror: Bad file descriptor<br></font><br></div> <div></div><pre><tt><tt><tt><tt>Hi<br>I thank you for helping to all.<br>Today i got a error message by sumbitting job. First i<br>ran the code using explict
method. I got result accurately, and no<br>problem occured when i sumbit job. Now i changed my code to implict method.<br>I got error when i sumbit job.<br>I checked correctly, it reading all files and<br>iteration starts. after one iteration it gives the following error. The same<br>code is running on other machine, giving result correctly. So please help<br>me how to fix it.<br><br>Advance thanks<br>Velan<br><br>----------------------------------------------------------------<br>job.e file:<br> <br> p4_error: latest msg from perror: Bad file descriptor<br> p4_error: latest msg from perror: Bad file descriptor<br> p4_error: latest msg from perror: Bad file descriptor<br> p4_error: latest msg from perror: Bad file descriptor<br>-----------------------------------------------------------------<br>job.o<br> file:<br>3<br>node18.local<br>node19.local<br>node17.local<br># Allocating 5 nodes to block 1<br># Allocating 1 nodes to block 2<br># Require mxb >=
97<br># Require mxa >= 26 mya >= 97 and mza >= 75<br># Maximum load imbalance = 71.69%<br># Navier-Stokes Simulation<br># Implicit Full Matrix DP-LUR<br># Reading restart files...( 0.34 seconds)<br># Freestream Mach Number = 6.50<br><br> 1 0.3670E+01 0.7803E+05 16 15 7 2 0.1222E-08<br>p5_2609: p4_error: interrupt SIGx: 13 <br>bm_list_17559: (3.666982) wakeup_slave: unable to interrupt slave 0 pid 17542<br>rm_l_1_18696: (2.738297) net_send: could not write to fd=6, errno<br> = 9<br>rm_l_1_18696: p4_error: net_send write: -1<br>rm_l_2_2605: (2.614927) net_send: could not write to fd=6, errno = 9<br>rm_l_4_18718: (2.373120) net_send: could not write to fd=6, errno = 9<br>rm_l_4_18718: p4_error: net_send write: -1<br>rm_l_2_2605: p4_error: net_send write: -1<br>rm_l_3_17584:<br> (2.496277)<br> net_send: could not write to fd=6, errno = 9<br>rm_l_3_17584: p4_error: net_send write: -1<br>rm_l_5_2626: (2.249144) net_send: could not write to
fd=5, errno = 32<br>p5_2609: (2.251356) net_send: could not write to fd=5,errno = 32<br>-------------------------------------------------------------------<br>job file:<br>#!/bin/bash<br>#PBS -l nodes=3:ppn=1<br><br>cd $PBS_O_WORKDIR<br>n=`/usr/local/bin/pbs.py $PBS_NODEFILE hosts`<br>echo $n<br>cat hosts<br>/opt/mpich/intel/bin/mpirun -nolocal -machinefile<br>hosts -np 6 pg3d.exe<br>-------------------------------------------------------------------<br>Machine configuration:<br> CPU: Intel(R) Dual Processor<br> Xeon(R) CPU 3.2GHz<br>Installation using rocks4.1</tt></tt></tt></tt></pre> <div></div> <hr size="1"> Find out what India is talking about on - <a href="http://us.rd.yahoo.com/mail/in/yanswers/*http://in.answers.yahoo.com/">Yahoo! Answers India</a> <br>Send FREE SMS to your friend's mobile from Yahoo! Messenger Version 8. <a href="http://us.rd.yahoo.com/mail/in/messengertagline/*http://in.messenger.yahoo.com">Get it
NOW</a></blockquote></blockquote><br> <div> </div><hr size="1"> Find out what India is talking about on - <a href="http://us.rd.yahoo.com/mail/in/yanswers/*http://in.answers.yahoo.com/">Yahoo! Answers India</a> <br>Send FREE SMS to your friend's mobile from Yahoo! Messenger Version 8. <a href="http://us.rd.yahoo.com/mail/in/messengertagline/*http://in.messenger.yahoo.com">Get it NOW</a></blockquote></blockquote><br><p> 
        
        
                <hr size=1></hr>
Find out what India is talking about on - <a href="http://us.rd.yahoo.com/mail/in/yanswers/*http://in.answers.yahoo.com/">Yahoo! Answers India</a> <BR>
Send FREE SMS to your friend's mobile from Yahoo! Messenger Version 8. <a href="http://us.rd.yahoo.com/mail/in/messengertagline/*http://in.messenger.yahoo.com">Get it NOW</a>