[From nobody Tue Jun 3 13:55:43 2008 Message-ID: <4845A16D.2080407@penguincomputing.com> Date: Tue, 03 Jun 2008 12:54:21 -0700 From: Joshua Bernstein <jbernstein@penguincomputing.com> User-Agent: Thunderbird 2.0.0.6 (X11/20071022) MIME-Version: 1.0 To: Chris Samuel <csamuel@vpac.org> Subject: Re: [torqueusers] Unknown Job Id Behavior References: <1385182107.8691211802165659.JavaMail.root@zimbra.vpac.org> In-Reply-To: <1385182107.8691211802165659.JavaMail.root@zimbra.vpac.org> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Chris Samuel wrote: > ----- "Joshua Bernstein" <jbernstein@penguincomputing.com> wrote: > >>> But I've noticed in 2.3 that we seem to be hitting the >>> same problem described by the OP. :-( >> Interesting. Are you running TORQUE in a diskless configuration like >> I'm doing? > > Nope, ours have 4 x 300GB drives and keep state. > > Does that help or hinder ? Doesn't help. I still think there is a problem with some area of the communication between pbs_mom and pbs_server. If pbs_mom responds to pbs_server with a message saying that it doesn't know anything about the job, shouldn't pbs_server just consider the job dead, and either re-queue it or just notify the user? -Josh ]