<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1">
<title></title>
</head>
<body>
Yes should be usefull to point out problematic users <br>
<br>
In the accounting record : Exit_status if differrent of 0 must indicate something
wrong<br>
Exit status consideration is not in pbsjobs and pbsacct but should be added.<br>
with 2 columns : number of jobs and number of jobs with non 0 exit_status<br>
<br>
Etienne Gondet<br>
<br>
PS : I will try to add that. <br>
<br>
<br>
etienne gondet a écrit:<br>
<blockquote type="cite" cite="mid43BE8373.5050805@mercator-ocean.fr"> <br>
hello, <br>
<br>
I just had a try to pbsacct. It's just the easy tools I was looking for. <br>
<br>
I tried to add total cumulated cpu and I believe there is a mistake in the
cpu computation. <br>
<br>
In pbsjobs : cput is computed according to the value of resources_used.cput
<br>
which is the total cpu of cput over all nodes and ppn ? Anybody can confirm
this point. <br>
<br>
Wallclock Average Average
CPU <br>
Username Group #jobs hours Percent #nodes q-days hours <br>
-------- ----- ----- --------- ------- ------- ------- ----- <br>
TOTAL - 1876 8248.34 100.00 4.41 0.00 3017.38 <br>
user1 red 745 3538.88 42.90 6.00 0.00 1229.49
<br>
user2 red 285 2382.64 28.89 2.99 0.00 1103.90
<br>
<br>
But in pbsacct you remultiply by the number of nodes <br>
line 108 cpunodes[user] += nodect*cput <br>
line 116 cpunodesecs += nodect*cput <br>
<br>
So I guess the following should have been more accurate. <br>
line 108 cpunodes[user] += nodect*cput <br>
line 116 cpunodesecs += nodect*cput <br>
<br>
<br>
If I look an accounting record resources_used.cput=01:41:36 is > to resources_used.walltime=00:51:22
<br>
That's why i thik it's already the cmulated VCPU over all the processors
nodes*ppn. <br>
<br>
01/05/2006 02:18:34;E;20020.baltic;user=mbenkiran group=mercator jobname=SAM1V2_UV
queue=long ctime=1136424430 qtime=1136424431 etime=1136424431 start=1136424432
exec_host=baltic-05/1+baltic-05/0+baltic-04/1+baltic-04/0+baltic-03/1+baltic-03/0
Resource_List.cput=12:30:00 Resource_List.neednodes=3:ppn=2 Resource_List.nodect=3
Resource_List.nodes=3:ppn=2 Resource_List.pcput=03:00:00 Resource_List.pmem=5888mb
Resource_List.pvmem=5888mb Resource_List.walltime=03:00:00 session=0 end=1136427514
Exit_status=0 resources_used.cput=01:41:36 resources_used.mem=4095000kb
resources_used.vmem=3003072kb resources_used.walltime=00:51:22 <br>
<br>
Happy new years to all torque users. <br>
<br>
Ole Holm Nielsen a écrit: <br>
<br>
<blockquote type="cite">hpc.group at gmail.com wrote: <br>
<br>
<blockquote type="cite">Does anyone know how to generate an accurate
torque monthly usage report <br>
based on cpu number, not number of nodes for cluster and SMP machine? The
<br>
report will include userid, group, wall-clock (hours), cpu time (hours) <br>
and cpu number. Pls let me know, thanks. <br>
</blockquote>
<br>
<br>
I wrote some really simple PBS accounting scripts for PBS (Torque and PBSPro)
<br>
some years ago, and this is what we still use. You may download the pbsacct
<br>
package from <a class="moz-txt-link-freetext" href="ftp://ftp.fysik.dtu.dk/pub/PBS/">ftp://ftp.fysik.dtu.dk/pub/PBS/</a> <br>
<br>
Regards, <br>
Ole <br>
<br>
</blockquote>
<br>
<pre wrap="">
<hr width="90%" size="4">
#!/bin/sh
# Summarize USER accounting information from PBS accounting files
# located in $PBSHOME/server_priv/accounting/
# The accompanying script "pbsjobs" extracts simplified records
# of completed jobs.
# Usage: pbsacct <accounting-files>
# where <accounting-files> are daily PBS records (such as 20000705)
# Author:        <a class="moz-txt-link-abbreviated" href="mailto:Ole.H.Nielsen@fysik.dtu.dk">Ole.H.Nielsen@fysik.dtu.dk</a>
# Thanks to:        <a class="moz-txt-link-abbreviated" href="mailto:Miroslaw.Prywata@fuw.edu.pl">Miroslaw.Prywata@fuw.edu.pl</a>
#---------------------------------------------------------------
#BINDIR=/usr/local/bin
BINDIR=/home/mercator/64/bin
GROUPID=""
if [ -z "$1" ] ; then
        echo "Usage: $0 [-g groupid] accounting-files";
        exit 1
fi
#
case $1 in
        -g) GROUPID=$2
         shift; shift;
esac
# Accounting-files:
ACCT_FILES=$*
NUM_FILES=$#
# Sanity check
for f in ${ACCT_FILES}
do
        if [ ! -r $f ]
        then
                echo ERROR: File $f is unreadable:
                ls -la $f
                exit 1
        fi
done
# The pbsjobs accounting-information extractor script:
# May be set by an environment variable.
if [ -z "${PBSJOBS}" ] ; then
        PBSJOBS="${BINDIR}/pbsjobs";
fi
if [ ! -x "${PBSJOBS}" ] ; then
        echo No ${PBSJOBS} executable found
        exit 1
fi
# A working file
JOBTEMP=/tmp/pbsjobs.$$
# Trap error signals:
trap "rm -f ${JOBTEMP}; exit 2" 1 2 3 14 15 19
#---------------------------------------------------------------
# List the input files
echo
echo "Portable Batch System USER accounting statistics"
echo "------------------------------------------------"
echo
echo A total of $NUM_FILES accounting files will be processed.
rm -f ${JOBTEMP}
cat ${ACCT_FILES} | ${PBSJOBS} > ${JOBTEMP}
cat ${JOBTEMP} | awk '
{
        if (NR == 1) firstdate=$7
        lastdate=$7
} END {
        printf("The first record is dated %s, last record is dated %s.\n",
                firstdate, lastdate)
}'
#---------------------------------------------------------------
echo
echo " Wallclock Average Average CPU"
echo "Username Group #jobs hours Percent #nodes q-days hours"
echo "-------- ----- ----- --------- ------- ------- ------- -----"
cat ${JOBTEMP} | awk -vGROUPID=$GROUPID '
{
        # Parse input data
        user        = $2                # User name
        group        = $3                # Group name
        queue        = $4                # Queue name
        nodect        = $5                # Number of nodes used
        cput        = $6                # CPU time in seconds
        wall        = $9                # Wallclock time in seconds
        wait        = $11                # Waiting time in seconds
        total_ncpus = $12        # Total number of CPUs used (>=nodect)
        #
        # For accounting by number of CPUs in stead of number of nodes:
        # Uncomment the following line:
#ETG modif for SBU = walltime*NCPUS
        # nodect = total_ncpus
        nodect = total_ncpus
        username[user] = user
        groupname[user] = group
        jobs[user]++
#ETG        cpunodes[user] += nodect*cput
        cpunodes[user] += cput
        wallnodes[user] += nodect*wall
        wallcpu[user] += wall
        if (nodect < minnodes[user]) minnodes[user] = nodect
        if (nodect > maxnodes[user]) maxnodes[user] = nodect
        waittime[user] += wait
        totaljobs++
        totalwait += wait
#ETG        cpunodesecs += nodect*cput
        cpunodesecs += cput
        wallnodesecs += nodect*wall
        wallsecs += wall
} END {
        cpunodedays = cpunodesecs / 86400
        wallnodedays = wallnodesecs / 86400
        walldays = wallsecs / 86400
        groupjobs = 0
        groupdays = 0
        for (user in username) {
                if (length(GROUPID) > 0 && groupname[user] != GROUPID) continue
                if (wallcpu[user] > 0)
                        printf("%10s %8s %7d %8.2f %6.2f %7.2f %7.2f %8.2f\n",
                        username[user], groupname[user], jobs[user],
                        wallnodes[user]/3600, wallnodes[user]/(864*wallnodedays),
                        wallnodes[user]/wallcpu[user], waittime[user]/jobs[user]/36400,
cpunodes[user]/3600)
                groupjobs += jobs[user]
                groupnodedays += wallnodes[user]/86400
                groupdays += wallcpu[user]/86400
                groupwait += waittime[user]
        }
        printf("%10s %8s %7d %8.2f %6.2f %7.2f %7.2f %8.2f\n",
                "TOTAL", "-", totaljobs, wallnodesecs/3600, 100,
                wallnodedays/walldays, totalwait/totaljobs/86400, cpunodesecs/3600)
        if (length(GROUPID) > 0 && groupjobs > 0)
                printf("%10s %8s %7d %8.2f %7.2f %7.2f %7.2f \n",
                        "GROUP", GROUPID, groupjobs, groupnodedays,
                        100*groupnodedays/wallnodedays,
                        groupnodedays/groupdays, groupwait/groupjobs/86400)
                
} ' | sort -r -n +3
rm -f ${JOBTEMP}
exit 0
</pre>
</blockquote>
<br>
</body>
</html>