notes on sim 1.2.1

cput = 35:56
number of events = 5000
=> 2.3 Hz

cput = 7:10
number of events = 1000
=> 2.3 Hz

for 24 hours => 200k events, 1600 minutes (10% margin)

populated at 3% in populate_3percent.sh
start with first 10 lines only as a test

 

Advertisements

Replacing sim1_1 file in tape library

Tried to put the new file:

ifarm1102:gxproj4:sim1_1> jproj.pl sim1_1 jput
jproj.pl jput: command = cd /volatile/halld/gluex_simulations/sim1_1/rest ; jput *011454_0026* /mss/halld/gluex_simulations/sim1_1/rest/
FATAL Bad Request - A stub file already exists at /mss/halld/gluex_simulations/sim1_1/rest/dana_rest_011454_0026.hddm

jput 1 files

ls on the old file:

> ls -l /cache/halld/gluex_simulations/sim1_1/rest/dana_rest_011454_0026.hddm
-r--r--r-- 1 gxproj4 halld-1 7340032 Nov 19 18:15 /cache/halld/gluex_simulations/sim1_1/rest/dana_rest_011454_0026.hddm

did a:

jcache tapeRemove /cache/halld/gluex_simulations/sim1_1/rest/dana_rest_011454_0026.hddm

deleted /cache/halld/gluex_simulations/sim1_1/rest/dana_rest_011454_0026.hddm

redid the jput on the new file in volatile:

jproj.pl sim1_1 jput
jproj.pl jput: command = cd /volatile/halld/gluex_simulations/sim1_1/rest ; jput *011454_0026* /mss/halld/gluex_simulations/sim1_1/rest/
66987735
154867173 Staging        /lustre/expphy/volatile/halld/gluex_simulations/sim1_1/rest/dana_rest_011454_0026.hddm   /mss/halld/gluex_simulations/sim1_1/rest/dana_rest_011454_0026.hddm
154867173 Staged         /lustre/expphy/volatile/halld/gluex_simulations/sim1_1/rest/dana_rest_011454_0026.hddm   /mss/halld/gluex_simulations/sim1_1/rest/dana_rest_011454_0026.hddm
154867173 Running        /lustre/expphy/volatile/halld/gluex_simulations/sim1_1/rest/dana_rest_011454_0026.hddm   /mss/halld/gluex_simulations/sim1_1/rest/dana_rest_011454_0026.hddm
154867173 Done           /lustre/expphy/volatile/halld/gluex_simulations/sim1_1/rest/dana_rest_011454_0026.hddm   /mss/halld/gluex_simulations/sim1_1/rest/dana_rest_011454_0026.hddm
jput 1 files

new stub file is:

> cat /mss/halld/gluex_simulations/sim1_1/rest/dana_rest_011454_0026.hddm
bitfileIndex=73060284
sourcePath=/lustre/expphy/volatile/halld/gluex_simulations/sim1_1/rest/dana_rest_011454_0026.hddm
size=106079719
crc32=5f1f3d98
md5=48c3b26a3f8e121e33354c9fb947adb5
owner=gxproj4
creationTime=2016-11-19 18:41:14
bfid=57543676
volser=601330
filePosition=577781
volumeSet=d-data-challenge
stubPath=/mss/halld/gluex_simulations/sim1_1/rest/dana_rest_011454_0026.hddm

jcache status gives:

Sun Nov 20 20:56:05 EST 2016
iteration = 89
get request: 4765609
user: gxproj4
status: pending
/cache/halld/gluex_simulations/sim1_1/rest/dana_rest_011454_0026.hddm -> failed

sim1.1 problem jobs

Questions

  • why no printout of fort.15?
  • why no printout of control.in?
  • why does random number seed not make a difference in the jobs?

Answers

  • all three jobs for run 10871, file 8 bomb with seg fault from bggen
  • same for all three for same run, file 7
  • for file 7 and 8, hdgeant reports “tracking abandoned” first on the same event, 2034
  • look at a successful run: 10391, files 1-34
    • file 1 tracking abondoned on even 11366
    • file 2: 4114
    • file 3: 20588
    • fort.15 gets typed out
    • control.in gets typed out
  • scan .out files for event where tracking abandoned
  • The 167 failed jobs are as follows:
    • run 10727, files 1-3
    • run 10777, files 1-57
    • run 10871, files 1-49
    • run 11140, files 1-58
    • these are all of the files in each run
  • Failure mode: hdgeant runs forever when finding seg faulted output from bggen. Evidence: extremely large event numbers in output.
  • Running test on run 10727, collimator comes back as “Unknown”
  • Collimator is Unknown in each of these runs.

From standard error for one failed job:

cp: cannot stat `run.ffr.Unknown_coll.template': No such file or directory
cp: cannot stat `run.ffr': No such file or directory
cp: cannot stat `control.in_Unknown_coll': No such file or directory
control.in: No such file or directory.

Sim1.1 Notes 2

  • hd_root was bombing at the beginning. Output files were stubs.
  • Sean pointed this out.
  • Error: calling terminate after throwing an exception of type long int (or something like that)
    • Cured by removing the TS_scaler plugin
  • Next error: seg fault
    • Cured by removing the L1_online plugin
  • Paul recommends removal of the occupancy_online and EPICS_dump plugins, the latter since there are no EPICS events in MC.
  • many jobs crashing with seg faults, most of them
  • 010438_0001 crashed quickly, rest file has 355 events
    • gets past that part when run interactively
  • standard err mentions PSPair_online a lot. Drop that plugin.
  • drop the timeout option
  • drop the skim plugin

Sim1.1 Notes

  • there is a sim1.1 branch in the repository, Sean says to ignore it
  • example of awk to get collimator
    > rcnd 11555 collimator_diameter | awk '{print $1}'
    3.4mm
    > rcnd 1 collimator_diameter | awk '{print $1}'
    >
    
  • On the value of the random number seed for bggen, from http://goo.gl/YWLymG:

    No initialization is necessary if the user wants default values. Otherwise the following are available:CALL RLUXGO(LUX,INT,K1,K2)When K1=K2=0 , this call initializes the RANLUX generator from one 32-bit integer INT and sets the Luxury Level.

    Assuming signed integer, max pos. value is 2147483648 so we need to keep the seed below this. 5 digit run concatenated with a 4 digit file does this.

  • from Sean, 6/29/15: By the way, here is a list of the “golden runs” and the number of triggers in each of them: /work/halld/home/gxproj3/rp2016-02-runs_events
  • from Sean list of plugins, email from 6/14:
    danarest,monitoring_hists,track_skimmer,occupancy_online,EPICS_dump,TS_scaler,TRIG_online,L1_online,PSPair_online,BCAL_inv_mass,FCAL_invmass,BCAL_Hadronic_Eff,CDC_Efficiency,FCAL_Hadronic_Eff,FDC_Efficiency,SC_Eff,TOF_Eff
    
  • from Sean, email of 6/15, additional option to hd_root:
    -PTRKFIT:HYPOTHESES=2,3,8,9,11,12,14
    

GlueX Meeting Report, August 19, 2015

Offline

  • Notification from GitHub only on pull requests.
  • Work on Geant4 continues
    • particle gun, EM background, genr8, bggen implemented
    • work on hits next
  • build_scipts moved to GitHub

DC3

  • jobs
    • 10 GB files
    • 430 k events (limited by hdgeant filesize limit)
  • ┬ásimulation
    • 5000 jobs total
    • 2732 jobs done
  • reconstruction
    • 2691 jobs done

Spring Commissioning Simulations:

  • all 30600 jobs run and hddm put on the silo and available on cache
  • 150300 rest files created and available on cache (expect 3).