<html><head></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><div>Hi all, </div><div><br></div><div><div>We are seeing a lot of "pbs_mom: scp" transfer errors in our /var/log/messages, but the files mentioned in these errors are there and are accessible. </div><div><div><br></div><div>This is an example of error:</div><div><br></div><div><font class="Apple-style-span" face="'Courier New'"><span class="Apple-style-span" style="font-size: 12px; ">wn113: Nov 9 14:54:57 wn113 pbs_mom: LOG_ERROR::sys_copy, command '/usr/bin/scp -rpB out_cre02_208919067_StandardOutput cms079@cream02.lcg.cscs.ch:/cream_localsandbox/data/cms/_DC_ch_DC_cern_OU_Organic_Units_OU_Users_CN_eaguiloc_CN_555092_CN_Ernest_Aguilo_Chivite_cms_Role_NULL_Capability_NULL_cms079/20/CREAM208919067/StandardOutput' failed with status=1, giving up after 4 attempts</span></font></div><div><br></div><div>These kind of errors happen everyday without any specific correlation to cron jobs or any other cfengine tasks done on a regular scheduled base. Here is a summary them in all the WNs in our cluster.</div><div><br></div><div><font class="Apple-style-span" face="'Courier New'"><span class="Apple-style-span" style="font-size: 12px; ">Total: 12 in cream02 on Nov 10</span></font></div><div><font class="Apple-style-span" face="'Courier New'"><span class="Apple-style-span" style="font-size: 12px; ">Total: 2 in cream01 on Nov 10</span></font></div><div><font class="Apple-style-span" face="'Courier New'"><span class="Apple-style-span" style="font-size: 12px; "><br></span></font></div><div><font class="Apple-style-span" face="'Courier New'"><span class="Apple-style-span" style="font-size: 12px; ">Total: 72 in cream02 on Nov 09 </span></font></div><div><font class="Apple-style-span" face="'Courier New'"><span class="Apple-style-span" style="font-size: 12px; ">Total: 74 in cream01 on Nov 09 </span></font></div><div><font class="Apple-style-span" face="'Courier New'"><span class="Apple-style-span" style="font-size: 12px; "><br></span></font></div><div><font class="Apple-style-span" face="'Courier New'"><span class="Apple-style-span" style="font-size: 12px; ">Total: 52 in cream02 on Nov 08</span></font></div><div><font class="Apple-style-span" face="'Courier New'"><span class="Apple-style-span" style="font-size: 12px; ">Total: 2 in cream01 on Nov 08</span></font></div><div><font class="Apple-style-span" face="'Courier New'"><span class="Apple-style-span" style="font-size: 12px; "><br></span></font></div><div><font class="Apple-style-span" face="'Courier New'"><span class="Apple-style-span" style="font-size: 12px; ">Total: 212 in cream02 on Nov 07</span></font></div><div><font class="Apple-style-span" face="'Courier New'"><span class="Apple-style-span" style="font-size: 12px; ">Total: 36 in cream01 on Nov 07</span></font></div><div><font class="Apple-style-span" face="'Courier New'"><span class="Apple-style-span" style="font-size: 12px; "><br></span></font></div><div><font class="Apple-style-span" face="'Courier New'"><span class="Apple-style-span" style="font-size: 12px; ">Total: 1240 in cream02 on Nov 06</span></font></div><div><font class="Apple-style-span" face="'Courier New'"><span class="Apple-style-span" style="font-size: 12px; ">Total: 465 in cream01 on Nov 06</span></font></div><div><br></div><div>At the moment there are two CREAM-CEs (the endpoint host of these scp transfers), one is a VM (cream01) and the other is a physical machine (cream02), each with its own local cream_sandbox directory (endpoint location of the scp transfers) and enough computing power to attend ssh connections and all the rest of the CREAM services. Initially we had the cream_sandbox shared through a Lustre filesystem, but since it was unreliable and became very slow at times (jobs ran there), we decided to move it to the local disk. These issues did not happen before: since the sandbox was shared, we used regular $usecp</div><div><br></div><div>We are aware that you can tune this with the directive $rcpcmd in the config file of pbs_mom, but since we are not sure what the error may be, we don't know what to change in the settings. The value of MaxStartups in /etc/ssh/sshd_config is set to 20000</div><div><br></div><div><blockquote type="cite">MaxStartups 20000</blockquote></div><div><br></div><div>We've checked the /var/log/secure for scp errors, but everything seems to be ok there.</div><div><br></div><div>Any idea on what could be wrong?? </div><div><br></div><div>Thanks in advance,</div><div>Miguel</div><div><br></div><div><br></div></div></div><br><div>
<span class="Apple-style-span" style="border-collapse: separate; color: rgb(0, 0, 0); font-family: Helvetica; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px; -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; font-size: medium; "><span class="Apple-style-span" style="border-collapse: separate; color: rgb(0, 0, 0); font-family: Helvetica; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px; -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; font-size: medium; "><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><span class="Apple-style-span" style="border-collapse: separate; color: rgb(0, 0, 0); font-family: Helvetica; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px; -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; font-size: medium; "><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><span class="Apple-style-span" style="border-collapse: separate; color: rgb(0, 0, 0); font-family: Helvetica; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px; -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; font-size: medium; "><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">--</div><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">Miguel Gila<br>CSCS Swiss National Supercomputing Centre <br>HPC Solutions<br>Via Cantonale, Galleria 2 | CH-6928 Manno | Switzerland<br><a href="mailto:miguel.gila@cscs.ch">miguel.gila@cscs.ch</a> | <a href="http://www.cscs.ch">www.cscs.ch</a> | Phone +41 91 610 82 22</div></span></div></span></div></span></span>
</div>
<br></body></html>