<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD><TITLE>Fluent Infiniband jobs fail, only in PBS</TITLE>
<META http-equiv=Content-Type content="text/html; charset=us-ascii">
<META content="MSHTML 6.00.2900.3243" name=GENERATOR></HEAD>
<BODY>
<DIV dir=ltr align=left><SPAN class=711154615-29022008><FONT face=Verdana
size=2>Hello again,</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=711154615-29022008><FONT face=Verdana
size=2> I've answered my own question but wanted to follow up with everyone
and share the result.</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=711154615-29022008><FONT face=Verdana
size=2></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=711154615-29022008><FONT face=Verdana
size=2>Basically pbs_mom was being memorylocked to 32k, so this explained the
strange behavior of jobs run through PBS.</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=711154615-29022008><FONT face=Verdana
size=2></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=711154615-29022008><FONT face=Verdana
size=2>The resolution was adding the following line to
pbs_mom:</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=711154615-29022008><FONT face=Verdana
size=2>ulimit -l unlimited</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=711154615-29022008><FONT face=Verdana
size=2></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=711154615-29022008><FONT face=Verdana
size=2>And ofcourse, restarting pbs_mom.</FONT></SPAN></DIV><BR>
<BLOCKQUOTE dir=ltr style="MARGIN-RIGHT: 0px">
<DIV class=OutlookMessageHeader lang=en-us dir=ltr align=left>
<HR tabIndex=-1>
<FONT face=Tahoma size=2><B>From:</B> torqueusers-bounces@supercluster.org
[mailto:torqueusers-bounces@supercluster.org] <B>On Behalf Of </B>Edsall,
William (WJ)<BR><B>Sent:</B> Thursday, February 28, 2008 3:01 PM<BR><B>To:</B>
torqueusers@supercluster.org<BR><B>Subject:</B> [torqueusers] Fluent
Infiniband jobs fail, only in PBS<BR></FONT><BR></DIV>
<DIV></DIV><!-- Converted from text/rtf format -->
<P><FONT face=Verdana size=2>Hello!</FONT> <BR><FONT face=Verdana
size=2> I'm experiencing a strange issue with PBS and the application
Fluent. We experienced a power outage, and have since had trouble running
Fluent through PBS over Infiniband.</FONT></P>
<P><FONT face=Verdana size=2>- Fluent runs fine through PBS on Ethernet</FONT>
<BR><FONT face=Verdana size=2>- Fluent runs fine outside of PBS on
Infiniband</FONT> <BR><FONT face=Verdana size=2>- Fluent only fails when run
through PBS, over Infiniband</FONT> </P>
<P><FONT face=Verdana size=2>Any suggestions? Even if I run PBS with the -I
switch, I can't run fluent successfully over infiniband. Something
environmentally changed by PBS is causing MPI to fail. My PBS version is
2.1.7.</FONT></P>
<P><FONT face=Verdana size=2>Here is the failure result of a job on 2 nodes
running through PBS:</FONT> <BR><FONT face=Verdana size=2>Host spawning Node 0
on machine "node30" (unix).</FONT> <BR><FONT face=Verdana
size=2>/apps/fluent/Fluent.Inc/fluent6.3.35/bin/fluent -r6.3.35 3ddp -node
-t16 -pib -mpi=hp -cnf=/home/u396929/fluent_test/nodes2 -mport
192.168.0.30:192.168.0.30:46683:0</FONT></P>
<P><FONT face=Verdana size=2>Starting
/apps/fluent/Fluent.Inc/fluent6.3.35/multiport/mpi/lnamd64/hp/bin/mpirun -prot
-IBV -e MPI_HASIC_IBV=1 -f /tmp/fluent-appfile.8049</FONT></P>
<P><FONT face=Verdana size=2>fluent_mpi.6.3.35: Rank 0:4: MPI_Init:
ibv_create_qp() failed</FONT> <BR><FONT face=Verdana size=2>fluent_mpi.6.3.35:
Rank 0:4: MPI_Init: probably you need to increase pinnable memory in
/etc/security/limits.conf</FONT> <BR><FONT face=Verdana
size=2>fluent_mpi.6.3.35: Rank 0:4: MPI_Init: Can't initialize RDMA
device</FONT> <BR><FONT face=Verdana size=2>fluent_mpi.6.3.35: Rank 0:4:
MPI_Init: MPI BUG: Cannot initialize RDMA protocol</FONT> <BR><FONT
face=Verdana size=2>MPI Application rank 4 exited before MPI_Init() with
status 1</FONT> </P><BR>
<P><FONT face=Verdana size=2>Here is the success result from the command line
(same nodes):</FONT> <BR><FONT face=Verdana size=2>Host spawning Node 0 on
machine "node30" (unix).</FONT> <BR><FONT face=Verdana
size=2>/apps/fluent/Fluent.Inc/fluent6.3.35/bin/fluent -r6.3.35 3ddp -node
-t16 -pib -mpi=hp -cnf=/home/u396929/fluent_test/nodes2 -mport
192.168.0.30:192.168.0.30:43334:0</FONT></P>
<P><FONT face=Verdana size=2>Starting
/apps/fluent/Fluent.Inc/fluent6.3.35/multiport/mpi/lnamd64/hp/bin/mpirun -prot
-IBV -e MPI_HASIC_IBV=1 -f /tmp/fluent-appfile.8871</FONT></P>
<P><FONT face=Verdana size=2>HP-MPI licensed for Fluent.</FONT> <BR><FONT
face=Verdana size=2>Host 0 -- ip 192.168.0.30 -- ranks 0 - 7</FONT> <BR><FONT
face=Verdana size=2>Host 1 -- ip 192.168.0.31 -- ranks 8 - 15</FONT> </P>
<P><FONT face=Verdana size=2>Please help! Thanks much!</FONT> </P>
<P><FONT face=Verdana size=2>William</FONT> </P></BLOCKQUOTE></BODY></HTML>