Data Challenge 2 Output Size

Just did the calculation: our second data challenge wrote 19.7 TB of data from 609,000 jobs, which gives an average file size of 32.4 MB. The jobs that ran here produced files of 100 MB or so; at JLab we were not subject to preemption and so we could afford to run longer. But the file count was dominated by the OSG jobs. I believe each job ran for 24 hours here (single-threaded), rather than the 8 hours Richard mentioned on the OSG. So within factors of your Simple calculation #1.

A stub file (rest/dana_rest_09001_2000065.hddm)
looks like:

bitfileIndex=44323532
sourcePath=/v/volatile/halld/home/gluex/proj/dc_02/rest/dana_rest_09001_2000065.hddm
size=82962154
crc32=ff7f0b76
md5=e4b03d67226d1db60a69d00b0746d34a
owner=gluex
creationTime=2014-04-01 13:46:25
bfid=scdm14,job=78950966
volser=502756
filePosition=1262414
volumeSet=d-data-challenge
stubPath=/mss/halld/data_challenge/02/rest/dana_rest_09001_2000065.hddm

and used this command:

find rest -name \*.hddm -exec grep size= {} \; > dc_02_rest_size.txt

and this script to do the count:

#!/usr/bin/env perl
$total_size = 0;
$count = 0;
while ($line = <STDIN>) {
    print $line;
    chomp $line;
    @t = split(/size=/, $line);
    print "$t[1]\n";
    $count++;
    $size = $t[1];
    $total_size += $size;
    print "count = $count size=$size total_size=$total_size\n";
}
print "$total_size $count\n";
exit 0;

which ended like this:

size=25341077
25341077
count = 608758 size=25341077 total_size=19741252163453
size=17263891
17263891
count = 608759 size=17263891 total_size=19741269427344
19741269427344 608759

GlueX Meeting Report, August 19, 2015

Offline

  • Notification from GitHub only on pull requests.
  • Work on Geant4 continues
    • particle gun, EM background, genr8, bggen implemented
    • work on hits next
  • build_scipts moved to GitHub

DC3

  • jobs
    • 10 GB files
    • 430 k events (limited by hdgeant filesize limit)
  • ┬ásimulation
    • 5000 jobs total
    • 2732 jobs done
  • reconstruction
    • 2691 jobs done

Spring Commissioning Simulations:

  • all 30600 jobs run and hddm put on the silo and available on cache
  • 150300 rest files created and available on cache (expect 3).

Notes on DC3 simulation

Some notes

  • 1000 events, CentOS 6.2
    • 13:18 cpu time
    • output file: 26.4 MB
    • => 26.4 kB/ev
    • => 0.798 s/ev including reconstruction
    • 20 GB => 8.77 days
    • 5 GB => 2.19 days = 3150 minutes and 189,394 ev
  • CentOS 6.5
    • cput=00:07:47,mem=652424kb,vmem=1719588kb,walltime=00:07:47
    • cpu = 7.78 minutes = 467 s
    • events = 1000
    • output file: 26,370,224 bytes
    • extrapolate to 48 hour job = 2880 minutes
      • 370,000 events
      • output file = 9.77 GB

running another 10 jobs of 1000 events each