Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

run-pandora Install and Run Script Extensions for NDLAr Production #69

Merged
merged 6 commits into from
Jan 25, 2025

Conversation

alexbooth92
Copy link
Member

@alexbooth92 alexbooth92 commented Jan 22, 2025

This PR extends install_pandora.sh and setup_pandora.sh so that run-pandora can handle both 2x2 and NDLAr jobs in a more automated way that's consistent with other steps in the production chain. It also addresses a couple of suggestions from @jback08. These changes have been tested with small scale interactive tests.

  • install_pandora.sh calls setup_pandora.sh which is detector dependent, the install is therefore detector dependent. Since setup_pandora.sh is also used in the run_* scripts, the flexibility has been added to pull geometry information from environment via, for example, ARCUBE_GEOM in a way that is consistent with how that variable is used in other stages of production. If you are installing and running 2x2, all changes are completely transparent except having to pass 2x2 to the install script. For NDLAr need to pass ndlar to the install script and specify ARCUBE_GEOM and ARCUBE_PANDORA_GEOM as explained in the install script.
  • Change what's passed to -r when running pandora from AllHitsNu to AllHitsSliceNu.
  • Bump LArRecoND to v01-01-03, already been used by 2x2 production but not changed here.

…, install is therefore detector dependent. setup_pandora.sh is also used int the run_* scripts so added flexibility to pull geometry from environment via, for example ARCUBE_GEOM, which is consistent with how that variable is used in other stages of production. If you are installing and running 2x2, all changes are completely transparent except having to pass '2x2' to the install script. For NDLAr need to pass ndlar to the install script and specify ARCUBE_GEOM and ARCUBE_PANDORA_GEOM as explained in the install script.
…ssue we had with a seg fault that could occur when the CaloHit IDs were being set).
@alexbooth92 alexbooth92 marked this pull request as ready for review January 22, 2025 16:12
@alexbooth92 alexbooth92 requested review from jback08 and noeroy January 22, 2025 16:13
Copy link
Contributor

@noeroy noeroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for putting that together, and sorry I haven't been more thorough to push my modifications as soon as I had to make them.

While we're looking at additions to run-pandora, you probably want to add export OMP_NUM_THREADS=1 in the setup_pandora.sh script, otherwise flow2root eats too much memory.

run-pandora/setup_pandora.sh Outdated Show resolved Hide resolved
@alexbooth92
Copy link
Member Author

Thanks for putting that together, and sorry I haven't been more thorough to push my modifications as soon as I had to make them.

While we're looking at additions to run-pandora, you probably want to add export OMP_NUM_THREADS=1 in the setup_pandora.sh script, otherwise flow2root eats too much memory.

No worries - things move fast! On the export OMP_NUM_THREADS=1 do we want this for the "actual" pandora too or just the file conversion step?

@alexbooth92
Copy link
Member Author

Thanks for putting that together, and sorry I haven't been more thorough to push my modifications as soon as I had to make them.
While we're looking at additions to run-pandora, you probably want to add export OMP_NUM_THREADS=1 in the setup_pandora.sh script, otherwise flow2root eats too much memory.

No worries - things move fast! On the export OMP_NUM_THREADS=1 do we want this for the "actual" pandora too or just the file conversion step?

I added it as a configurable export in 8093e67

@jback08
Copy link
Member

jback08 commented Jan 24, 2025

These changes look OK. I have a few other suggestions that we could try to add to this PR or leave for another future PR:

  1. In run_flow2root.sh, the 2nd from last integer parameter 0 for the command

python3 $ARCUBE_PANDORA_INSTALL/LArRecoND/ndlarflow/h5_to_root_ndlarflow.py $inFile 0 $tmpOutDir

needs to be changed to 1 whenever we have data and not MC. So I would like to propose that we have an isData boolean here equal to 0 or 1 depending if the ARCUBE_PANDORA_INPUT_FORMAT environment variable in setup_pandora.sh is set to either SPMC or SP (data), respectively.

  1. I understand from Bruce Howard that the h5_to_root_ndlarflow.py script we use in run_flow2root.sh needs to use different neutrino vertex variable names depending on the detector setup:

MiniRun6 2x2: nuVertexX.append(vtx["vertex"][0]
NDLAr version: nuVertexX.append(vtx["x_vert"])

and similarly for y and z. Should we handle this by using a 2x2 or ndlar parameter in the h5_to_root_ndlarflow.py script which then decides what these vertex variable names should be? If so, we would also need to pass this extra parameter when this script is called in run_flow2root.sh after the LArRecoND script has been updated. Will these vertex variables be unified to be the same for 2x2 and ND-LAr at some point in the future?

  1. In addition to the output file tmpAnaOut=${tmpRunDir}/LArRecoND.root that is moved at the end of the run_pandora.sh script, which is used to create the CAFs, it will also be useful to move the MCHierarchy.root and EventHierarchy.root output files that are made with LArRecoND version v01-01-04, which will allow us to find MC reconstruction efficiencies and other performance plots independent of the CAFs. But this will almost double the output file directory size. So this is a "nice to have" request that is not necessary for the CAF maker. The LArRecoND.root files store the reco PFOs along with any best matched MC (if any), while the other files store the MC particles along with any matched reco PFOs (best to worse), so we know the efficiency denominators.

@alexbooth92
Copy link
Member Author

These changes look OK. I have a few other suggestions that we could try to add to this PR or leave for another future PR:

  1. In run_flow2root.sh, the 2nd from last integer parameter 0 for the command

python3 $ARCUBE_PANDORA_INSTALL/LArRecoND/ndlarflow/h5_to_root_ndlarflow.py $inFile 0 $tmpOutDir

needs to be changed to 1 whenever we have data and not MC. So I would like to propose that we have an isData boolean here equal to 0 or 1 depending if the ARCUBE_PANDORA_INPUT_FORMAT environment variable in setup_pandora.sh is set to either SPMC or SP (data), respectively.

  1. I understand from Bruce Howard that the h5_to_root_ndlarflow.py script we use in run_flow2root.sh needs to use different neutrino vertex variable names depending on the detector setup:

MiniRun6 2x2: nuVertexX.append(vtx["vertex"][0] NDLAr version: nuVertexX.append(vtx["x_vert"])

and similarly for y and z. Should we handle this by using a 2x2 or ndlar parameter in the h5_to_root_ndlarflow.py script which then decides what these vertex variable names should be? If so, we would also need to pass this extra parameter when this script is called in run_flow2root.sh after the LArRecoND script has been updated. Will these vertex variables be unified to be the same for 2x2 and ND-LAr at some point in the future?

  1. In addition to the output file tmpAnaOut=${tmpRunDir}/LArRecoND.root that is moved at the end of the run_pandora.sh script, which is used to create the CAFs, it will also be useful to move the MCHierarchy.root and EventHierarchy.root output files that are made with LArRecoND version v01-01-04, which will allow us to find MC reconstruction efficiencies and other performance plots independent of the CAFs. But this will almost double the output file directory size. So this is a "nice to have" request that is not necessary for the CAF maker. The LArRecoND.root files store the reco PFOs along with any best matched MC (if any), while the other files store the MC particles along with any matched reco PFOs (best to worse), so we know the efficiency denominators.

Thank you the detailed look @jback08! In the interest of getting this particular PR merged, I suggest that we make an issue for your suggestions 1 and 3 - when I have 30 minutes next week, I'd be happy to tackle them. With regards to number 2 - I am ashamed to say that I haven't found out where this difference in naming for the vertex variables is coming from, however in my on-going work to roll forward the NDLAr version of the software stack from "Run 6" like to present day latest-and-greatest, the NDLAr naming is now consistent with 2x2. I strongly suspect that it's just a single commit somewhere that I missed during cherry-picking back in September. My vote would be to live with it for now since it will soon become a non-issue.

@alexbooth92 alexbooth92 merged commit a59d136 into develop Jan 25, 2025
@alexbooth92 alexbooth92 deleted the feature/abooth-pandora_for_nd_prod branch January 25, 2025 10:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants