<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body text="#000000" bgcolor="#ffffff">
Estimado Hector, gracias por tu pronta respuesta.<br>
El problema es que cuando en el cluster hubo actividad de varios
jobs, un usuario largó primero un cálculo en un job y luego de unas
5 horas largó otro. El problema fue que ambos jobs fueron a parar a
el mismo nodo y los mismos cores.<br>
<br>
Parte del comando qstat:<br>
<br>
[root@fe ~]# qstat -f 477<br>
Job Id: 477.fe<br>
Job_Name = job_gr_PBE<br>
Job_Owner = matias@fe<br>
job_state = Q<br>
queue = batch<br>
server = fe<br>
Checkpoint = u<br>
ctime = Tue Nov 8 11:58:16 2011<br>
Error_Path =
fe:/usr/home/matias/graf/graf-graf-PBE-VdW/job_gr_PBE.e477<br>
exec_host = n10/3+n10/2+n10/1+n10/0<br>
exec_port = 15003+15003+15003+15003<br>
<br>
[root@fe ~]# qstat -f 480<br>
Job Id: 480.fe<br>
Job_Name = job_gr_PBE<br>
Job_Owner = matias@fe<br>
job_state = Q<br>
queue = batch<br>
server = fe<br>
Checkpoint = u<br>
ctime = Tue Nov 8 17:26:09 2011<br>
Error_Path =
fe:/usr/home/matias/graf/graf-graf-PBE-VdW/job_gr_PBE.e480<br>
exec_host = n10/3+n10/2+n10/1+n10/0<br>
exec_port = 15003+15003+15003+15003<br>
<br>
esto me da el comando tracejob para ambos job:<br>
[root@fe ~]# tracejob 480<br>
/var/spool/torque/mom_logs/20111108: No such file or directory<br>
/var/spool/torque/sched_logs/20111108: No such file or directory<br>
<br>
Job: 480.fe<br>
<br>
11/08/2011 17:26:09 S enqueuing into batch, state 1 hop 1<br>
11/08/2011 17:26:09 S Job Queued at request of matias@fe, owner
= matias@fe,<br>
job name = job_gr_PBE, queue = batch<br>
11/08/2011 17:26:09 A queue=batch<br>
11/08/2011 17:26:10 S Job Run at request of root@fe<br>
11/08/2011 17:26:12 S unable to run job, MOM rejected/rc=2<br>
11/08/2011 18:26:36 S Job Run at request of root@fe<br>
11/08/2011 18:26:38 S unable to run job, MOM rejected/rc=2<br>
11/08/2011 19:26:45 S Job Run at request of root@fe<br>
11/08/2011 19:26:45 S Not sending email: User does not want mail
of this<br>
type.<br>
11/08/2011 19:26:45 A user=matias group=matias
jobname=job_gr_PBE<br>
queue=batch ctime=1320783969
qtime=1320783969<br>
etime=1320783969 start=1320791205
owner=matias@fe<br>
exec_host=n11/7+n11/6+n11/5+n11/4<br>
Resource_List.neednodes=1:ppn=4
Resource_List.nodect=1<br>
Resource_List.nodes=1:ppn=4<br>
Resource_List.walltime=2400:00:00 <br>
11/08/2011 19:26:53 S Not sending email: User does not want mail
of this<br>
type.<br>
11/08/2011 19:26:53 S Exit_status=0 resources_used.cput=00:00:27<br>
resources_used.mem=0kb
resources_used.vmem=0kb<br>
resources_used.walltime=00:00:09<br>
11/08/2011 19:26:53 A user=matias group=matias
jobname=job_gr_PBE<br>
queue=batch ctime=1320783969
qtime=1320783969<br>
etime=1320783969 start=1320791205
owner=matias@fe<br>
exec_host=n11/7+n11/6+n11/5+n11/4<br>
Resource_List.neednodes=1:ppn=4
Resource_List.nodect=1<br>
Resource_List.nodes=1:ppn=4<br>
Resource_List.walltime=2400:00:00
session=8035<br>
end=1320791213 Exit_status=0<br>
resources_used.cput=00:00:27
resources_used.mem=0kb<br>
resources_used.vmem=0kb<br>
resources_used.walltime=00:00:09<br>
11/08/2011 19:31:53 S dequeuing from batch, state COMPLETE<br>
[root@fe ~]# <br>
[root@fe ~]# tracejob 477<br>
/var/spool/torque/mom_logs/20111108: No such file or directory<br>
/var/spool/torque/sched_logs/20111108: No such file or directory<br>
<br>
Job: 477.fe<br>
<br>
11/08/2011 11:58:16 S enqueuing into batch, state 1 hop 1<br>
11/08/2011 11:58:16 S Job Queued at request of matias@fe, owner
= matias@fe,<br>
job name = job_gr_PBE, queue = batch<br>
11/08/2011 11:58:16 A queue=batch<br>
11/08/2011 11:58:17 S Job Run at request of root@fe<br>
11/08/2011 11:58:19 S unable to run job, MOM rejected/rc=2<br>
11/08/2011 12:58:34 S Job Run at request of root@fe<br>
11/08/2011 12:58:36 S unable to run job, MOM rejected/rc=2<br>
11/08/2011 13:58:37 S Job Run at request of root@fe<br>
11/08/2011 13:58:39 S unable to run job, MOM rejected/rc=2<br>
11/08/2011 14:58:43 S Job Run at request of root@fe<br>
11/08/2011 14:58:45 S unable to run job, MOM rejected/rc=2<br>
11/08/2011 15:59:09 S Job Run at request of root@fe<br>
11/08/2011 15:59:11 S unable to run job, MOM rejected/rc=2<br>
11/08/2011 16:59:30 S Job Run at request of root@fe<br>
11/08/2011 16:59:32 S unable to run job, MOM rejected/rc=2<br>
11/08/2011 17:59:50 S Job Run at request of root@fe<br>
11/08/2011 17:59:52 S unable to run job, MOM rejected/rc=2<br>
11/08/2011 19:00:02 S Job Run at request of root@fe<br>
11/08/2011 19:00:02 S Not sending email: User does not want mail
of this<br>
type.<br>
11/08/2011 19:00:02 A user=matias group=matias
jobname=job_gr_PBE<br>
queue=batch ctime=1320764296
qtime=1320764296<br>
etime=1320764296 start=1320789602
owner=matias@fe<br>
exec_host=n11/7+n11/6+n11/5+n11/4<br>
Resource_List.neednodes=1:ppn=4
Resource_List.nodect=1<br>
Resource_List.nodes=1:ppn=4<br>
Resource_List.walltime=2400:00:00 <br>
11/08/2011 19:00:10 S Not sending email: User does not want mail
of this<br>
type.<br>
11/08/2011 19:00:10 S Exit_status=0 resources_used.cput=00:00:27<br>
resources_used.mem=0kb
resources_used.vmem=0kb<br>
resources_used.walltime=00:00:09<br>
11/08/2011 19:00:10 A user=matias group=matias
jobname=job_gr_PBE<br>
queue=batch ctime=1320764296
qtime=1320764296<br>
etime=1320764296 start=1320789602
owner=matias@fe<br>
exec_host=n11/7+n11/6+n11/5+n11/4<br>
Resource_List.neednodes=1:ppn=4
Resource_List.nodect=1<br>
Resource_List.nodes=1:ppn=4<br>
Resource_List.walltime=2400:00:00
session=7936<br>
end=1320789610 Exit_status=0<br>
resources_used.cput=00:00:27
resources_used.mem=0kb<br>
resources_used.vmem=0kb<br>
resources_used.walltime=00:00:09<br>
11/08/2011 19:05:11 S dequeuing from batch, state COMPLETE<br>
[root@fe ~]# <br>
<br>
no entiendo porque no están los logs en los directorios
/var/spool/torque/mom_logs ni /var/spool/torque/sched_logs<br>
<br>
Saludos<br>
<br>
Fernando<br>
<br>
----------------------------------------------------<br>
<pre class="moz-signature" cols="72">Ing. Fernando Caba
Director General de Telecomunicaciones
Universidad Nacional del Sur
<a class="moz-txt-link-freetext" href="http://www.dgt.uns.edu.ar">http://www.dgt.uns.edu.ar</a>
Tel/Fax: (54)-291-4595166
Tel: (54)-291-4595101 int. 2050
Avda. Alem 1253, (B8000CPB) Bahía Blanca - Argentina
----------------------------------------------------
</pre>
<br>
El 08/11/2011 07:17 PM, Hector Oliver escribió:
<blockquote
cite="mid:CA+oaJgSUK8FKSW7vYBi9FkriBtBaeNQDLX_UAGCiGdb=dn2+xw@mail.gmail.com"
type="cite">Cual es el estado de los jobs (tracejob #job)??
<div>los dos te aparecen en el qstat?</div>
<div>se permite en tu configuración varios jobs a la ves?<br>
<br>
<div class="gmail_quote">On Tue, Nov 8, 2011 at 3:58 PM,
Fernando Caba <span dir="ltr"><<a moz-do-not-send="true"
href="mailto:fcaba@uns.edu.ar">fcaba@uns.edu.ar</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt
0.8ex; border-left: 1px solid rgb(204, 204, 204);
padding-left: 1ex;">Hi mauiusers, i have a job that it is
assigned to node10, from cores 0<br>
to 3 and another job assigned to the same node and to the
same identical<br>
cores (o to 3)<br>
Somebody have any idea what is happening? I have
torque-3.0.1 and<br>
maui-3.3.1.<br>
Thanks<br>
<br>
--<br>
----------------------------------------------------<br>
Ing. Fernando Caba<br>
Director General de Telecomunicaciones<br>
Universidad Nacional del Sur<br>
<a moz-do-not-send="true" href="http://www.dgt.uns.edu.ar"
target="_blank">http://www.dgt.uns.edu.ar</a><br>
Tel/Fax: (54)-291-4595166<br>
Tel: (54)-291-4595101 int. 2050<br>
Avda. Alem 1253, (B8000CPB) Bahía Blanca - Argentina<br>
----------------------------------------------------<br>
<br>
_______________________________________________<br>
mauiusers mailing list<br>
<a moz-do-not-send="true"
href="mailto:mauiusers@supercluster.org">mauiusers@supercluster.org</a><br>
<a moz-do-not-send="true"
href="http://www.supercluster.org/mailman/listinfo/mauiusers"
target="_blank">http://www.supercluster.org/mailman/listinfo/mauiusers</a><br>
</blockquote>
</div>
<br>
</div>
</blockquote>
</body>
</html>