Hi Jason,<div><br></div><div>Thank you very much! It works!</div><div><br></div><div>Best,</div><div><br></div><div>Junjun<br><br><div class="gmail_quote">On Mon, Nov 14, 2011 at 10:26 PM, Jason Bacon <span dir="ltr"><<a href="mailto:jwbacon@tds.net">jwbacon@tds.net</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;"><br>
I had a similar issue and got around it by simply setting up /etc/hosts<br>
on each node properly.<br>
<br>
On the multihomed head node, the hostname is bound to the external IP in<br>
/etc/hosts. On the compute nodes, the hostname of the head node is<br>
bound to it's internal address. Also be sure that name resolution on<br>
the compute nodes is configured to check files before DNS.<br>
<br>
No special configuration was required within torque.<br>
<br>
Regards,<br>
<br>
-J<br>
<div><div class="h5"><br>
On 11/13/11 09:48, liu junjun wrote:<br>
> Hi everyone,<br>
><br>
> I am trying to install torque-3.0.2 on a multi-homed system (two NIC<br>
> networks) but having an authority problem. Please read my description<br>
> on the problem below. Any helps are highly appreciated!<br>
><br>
> ---- System information ----<br>
> OS: Ubuntu 10.10<br>
> eth0: external_host_name<br>
> eth1: internal_host_name<br>
> hostname: internal_hostname<br>
> --------------------------------------------<br>
><br>
> ---- Basic Torque information ----<br>
> Torque version: 3.0.2<br>
> content of /var/spool/torque/server_name: internal_host_name<br>
> content of /var/spool/torque/torque.cfg: SERVERHOST internal_host_name<br>
><br>
> server and nodes can ping each other with internal_host_name<br>
> ----------------------------------------<br>
><br>
><br>
> ---- the problem -------------<br>
> 1. My first try on the installation:<br>
> By following the installation document at<br>
> <a href="http://www.adaptivecomputing.com/resources/docs/torque/1.1installation.php" target="_blank">http://www.adaptivecomputing.com/resources/docs/torque/1.1installation.php</a>,<br>
> I have problem with "torque.setup" script. It gave me "unauthorized<br>
> request". I noticed that the problem may related to my two NIC cards.<br>
> Then I double checked the server_name file and also added "SERVERHOST<br>
> interal_host_name" to torque.cfg. Unfortunately, problem sitll remains.<br>
><br>
> 2. My 2nd try on the installation:<br>
> I removed the first installation, and disabled eth0 which is<br>
> associated with external_host_name, and recompiled torque again with<br>
> the exactly same steps as that in my first try on the installation.<br>
> Everything seems fine. I can create a batch queue and can submit jobs<br>
> which run and terminate normally. However, once I enable eth0<br>
> (external_host_name), every qmgr command returns "unauthorized<br>
> request". I noticed that the server recognizes me as<br>
> user@external_host_name, whereas the pbs server is set as<br>
> internal_host_name which is also the hostname. I guess this causes the<br>
> "unauthorized" issue, so I made the following settings, by disabling<br>
> eth0 to get the authority on the operation:<br>
> ====<br>
> qmgr -c 's s acl_hosts += external_host_name'<br>
> qmgr -c 's s managers += root@external_host_name'<br>
> qmgr -c 's s operators += root@external_host_name'<br>
> qmgr -c 's s submit_hosts += external_host_name'<br>
> ====<br>
><br>
> After the above commands, I gain the operational access to the<br>
> pbs_server even when eth0 is enabled. However, all the submitted jobs<br>
> are still remain in the Q state. The followings are part of the 'qstat<br>
> -f' command and log files on the server:<br>
> ==== part of 'qstat -f' command =====<br>
> Job Id: 51.internal_host_name<br>
> Job_Name = STDIN<br>
> Job_Owner = user@exteral_host_name<br>
> job_state = Q<br>
> queue = batch<br>
> server = internal_host_name<br>
> Checkpoint = u<br>
> ctime = Sun Nov 13 19:25:12 2011<br>
> Error_Path = internal_host_name:/home/liu/STDIN.e51<br>
> Hold_Types = n<br>
> Join_Path = n<br>
> Keep_Files = n<br>
> Mail_Points = a<br>
> mtime = Sun Nov 13 19:25:12 2011<br>
> Output_Path = internal_host_name:/home/liu/STDIN.o51<br>
> ===============================<br>
><br>
> ==== part of pbs_server log ======<br>
> 11/13/2011 19:25:05;0002;PBS_Server;Svr;PBS_Server;Torque Server<br>
> Version = 3.0.2, loglevel = 0<br>
> 11/13/2011 19:25:12;0100;PBS_Server;Job;51.interal_host_name;enqueuing<br>
> into batch, state 1 hop 1<br>
> 11/13/2011 19:25:12;0008;PBS_Server;Job;51.interal_host_name;Job<br>
> Queued at request of user@external_host_name, owner =<br>
> user@external_host_name, job name = STDIN, queue = batch<br>
> 11/13/2011 19:25:12;0040;PBS_Server;Svr;cddlogin;Scheduler was sent<br>
> the command new<br>
> 11/13/2011 19:25:12;0080;PBS_Server;Req;dis_request_read;req header<br>
> bad, dis error 7 (Premature end of message), type=Connect<br>
> 11/13/2011 19:25:12;0080;PBS_Server;Req;req_reject;Reject reply<br>
> code=15058(Bad DIS based Request Protocol MSG=cannot decode message),<br>
> aux=0, type=Connect, from @<br>
> 11/13/2011 19:25:12;0002;PBS_Server;Req;dis_reply_write;DIS reply<br>
> failure, -1<br>
> =========================<br>
><br>
> ==== part of pbs_sche log ======<br>
> 11/13/2011 19:25:12;0001; pbs_sched;Svr;pbs_sched;LOG_ERROR::badconn,<br>
> external_host_name on port 762 unauthorized host<br>
> ==========================<br>
><br>
> As you can see from the above information, although exteral_host_name<br>
> is set as a submit_host, all jobs are still remain in 'Q' state<br>
> because the job owner is user@external_host_name! My question is :<br>
> either 1. how to make the server to accept jobs from<br>
> users@external_host_name?<br>
> or 2. how to make the server to recognize every submitted jobs as<br>
> belonging to user@internal_host_name?<br>
><br>
> Thanks in advance!<br>
><br>
> Junjun<br>
><br>
><br>
</div></div>> _______________________________________________<br>
> torqueusers mailing list<br>
> <a href="mailto:torqueusers@supercluster.org">torqueusers@supercluster.org</a><br>
> <a href="http://www.supercluster.org/mailman/listinfo/torqueusers" target="_blank">http://www.supercluster.org/mailman/listinfo/torqueusers</a><br>
<span class="HOEnZb"><font color="#888888"><br>
<br>
--<br>
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~<br>
Jason W. Bacon<br>
<a href="mailto:jwbacon@tds.net">jwbacon@tds.net</a><br>
<a href="http://personalpages.tds.net/~jwbacon" target="_blank">http://personalpages.tds.net/~jwbacon</a><br>
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~<br>
<br>
<br>
_______________________________________________<br>
torqueusers mailing list<br>
<a href="mailto:torqueusers@supercluster.org">torqueusers@supercluster.org</a><br>
<a href="http://www.supercluster.org/mailman/listinfo/torqueusers" target="_blank">http://www.supercluster.org/mailman/listinfo/torqueusers</a><br>
</font></span></blockquote></div><br></div>