Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Two step gap clustering #596

Merged
merged 15 commits into from
Aug 3, 2017
Merged

Two step gap clustering #596

merged 15 commits into from
Aug 3, 2017

Conversation

feigaodm
Copy link
Member

In order not to split discrete small S2s from deep TPC signal, we designed a clustering algrithm to merge any nearby hits. When S1s are nearby some S2 signals, like events on the top TPC, or S1s are acompanied by PMT after pulses and/or gate photo-ionisation s2s, there is high chance that pax will merge those signals into a larger peak and miss the S1s we are interested.

This PR aims to solve this problem by applying the gap size clustering twice:
1, We first apply a tight gapsize clustering and only merge hits with gapsize smaller than 100ns (cut value tuned by seperation of S1s and gate photoionisation while keep single electron S2s not splited). We try to identify the S1 like signals and mark them differently. In the same sense, we isolate lone hit and small coincidence signals. All other signals are considered as S2 candidates.
2, For the remaining S2 candidate peaks, we apply a large gap size clustering and merge all nearby S2s. So the treatment of deep S2 signal remains un-changed.

Here is one example WF to validate this change.

1, S1 and photo-ionisation signals can be seperated clearly.
s1_photo-ionisation
2, Deep S2s unaffected.
screen shot 2017-07-14 at 11 48 18 am

We can probably tune the parameter to improve the performance further, so we'd like to merge it to pax_head for more checks.

Credit to (Joey, Jelle and Fei)

@feigaodm feigaodm requested a review from JelleAalbers July 14, 2017 15:49
s1_mask[l_i:r_i] = True
event.peaks.append(self.build_peak(hits=h, detector=detector))

elif len(h) <= 2:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea to add this, you can save a bit of space by just changing the previous if (i.e. seeing lone hits as "s1 candidates"), then you don't need the lone_mask either

@JelleAalbers
Copy link
Contributor

JelleAalbers commented Jul 14, 2017

Nice, thanks for taking this up. Interesting that it finds an 'unknown' in the middle of an S2... I guess this looked sufficiently like an S1 to be split off. I suppose you'd want to run a few tests with peakfindertest before merging to quantify the performance (especially to show quantitatively that deep S2s are still ok). Maybe @jhowl01?

Also it might be worth checking it doesn't do weird things at very high energy, where S2s have very long 'single hits' in each channel.

@feigaodm
Copy link
Member Author

Good point about the unknown peak in the middle of the S2s. I checked it in detail and it's because the first clustering's gaps size is a little small and then two hits from single electron S2s get splited from it. I guess I have to tune this parameter further .

screen shot 2017-07-14 at 12 16 01 pm

Not sure how to check high energy S2s yet, will simulate such WFs and see what happens:)

@JosephJHowlett
Copy link
Contributor

@JelleAalbers @feigaodm I'm away until Tuesday and can run the pftest then, unless this is needed sooner

@feigaodm was this unknown called an "s2 candidate" at the classification stage? If so, shouldn't it already be merged on the second iteration? It doesn't look like it has a fast rise time to me, but maybe I'm not understanding.

@feigaodm
Copy link
Member Author

In the update I also rejected lone hits or peak with 2 fold coincidence in the S2 clustering. So the split of single electron s2 make part of the signal unknown. I fixed this by putting a s1_gap_size=150ns, so single electrons are not splited in that WF.

@feigaodm feigaodm added this to the v6.7.0 milestone Jul 17, 2017
@feigaodm
Copy link
Member Author

feigaodm commented Jul 17, 2017

This should fix Issue (#594, #540, #542, #543, #544, #545)

@feigaodm
Copy link
Member Author

feigaodm commented Jul 17, 2017

One of CU summer student (Malcolm Wells) here checked the effect on S1 peak finding efficiency using @JelleAalbers 's framework. The new twostepgap + classification/clustering has no observable impact on S1 efficiencies. The efficiency gets improved a little bit because the probability to be mis-identified as single electron S2s is smaller as the clustering is tighter. Here are two plots

current_pax_results
2step_pax_results
current_pax_results-3

Also, in the update I only isolated small peaks after the first clustering, so it shouldn't affect high energy S2 signals at all. Any suggestions to test it before merging it?

@JelleAalbers
Copy link
Contributor

Nice! But the main test is to quantify what happens to low-energy deep S2s, things like SmallS2s, SmallDeepS2s or TenElectronS2s (or a similar test) in PeakfinderTest.

@JosephJHowlett
Copy link
Contributor

I finally ran a couple of the tests @JelleAalbers suggested with PeakFinderTest. Here I'll report the numerical outcomes where the efficiency distributions were flat.

SmallS2s

I limited this test to simulating single electrons. The master branch had two categories - those single electron S2s found as a single peak and those split to multiple S2s. Twostepgap did better in this regard, but many cases occurred where one or two hits were stripped off the end of the S2, and called a lone_hit or unknown. This is occurring because the first GapSizeClustering iteration is chopping up the S2, and only S2-like segments are considered for the second iteration. Here I show this effect as a function of the first gap size parameter.

fraction_merged_as_par_varied

SmallDeepS2s

Here I used the instructions as Jelle left them, simulating 1-100 electrons at a time, 95cm below the gate. Again, a lot of chopping unknowns and lone_hits off the end of signals occurred, but in terms of splitting the single S2 into multiple S2s, twostepgap did a bit better than pax. The question is whether we're OK with splitting off these 1-2 hit peaks.

I also found one event out of 2500 in which an S1 was split off from the signal. This was fixed by increasing the first max gap size to 250 ns from 200 ns.

Outcome Master Branch twostepgap twostepgap (250ns)
found 98.64 82.68 89.56
split to s2s 1.36 0.76 0.80
split and misid as s1 0 0.04 0
chopped to unknown/lone_hit 0 16.52 9.64

S1S2Close

I also checked on merging of S1s and S2s at low drift times using PeakFinderTest. As expected, twostepgap improves on the current pax clustering at resolving these close-together peaks, although when the simulation is run with afterpulses on this improvement is less than expected. I'm looking into this - plots forthcoming.

@sanderbreur
Copy link
Contributor

sanderbreur commented Jul 22, 2017 via email

@JelleAalbers
Copy link
Contributor

Thanks for the tests Joey, looking forward to the plots. Is 'twostepgap' here without running the natural breaks & local minimum algorithms? I think we want them all on in the end, they have different specializations. For example local minimum is best for splitting high-energy S2s, and naturalbreaks should be good at close S1s.

@sanderbreur Perhaps it's a bit confusing, but 'chopped' means that the peak was found and correctly classified, but some part of it got chopped off and labeled unknown or lone hit. So it would mean introducing some more S2 nonlinearity. How serious that is depends on how large a part is split off.

@JosephJHowlett
Copy link
Contributor

Hey guys,

@JelleAalbers yes, all plugins are run in the same way. "twostepgap" here just means switching between the master branch and the branch in this pull request, which only modifies the GapSizeClustering plugin and keeps all others.

@sanderbreur Thanks Jelle for clarifying, and I just want to add that in all cases Fei and I looked at, the "chopped" part was just 1-2 hits at the end of the peak. Still 10% is a big number (I think we're leaning toward 250ns over 200ns).

@feigaodm
Copy link
Member Author

Hi @JelleAalbers and @sanderbreur ,

Thanks for the feedback! As @jhowl01 explained, the two step clustering is not as bad as the number shows. In the worse senerio, we will have some more S1s identified from single electron S2. But please not that this is at the order of ~1 out of 2000, probably smaller than the mis-identification of single electron s2s, so I don't think it's a big problem at all.

The reason why new cluster performs better in merging deep S2s is because we can not make the gap size for S2 like signals larger (2.5 us instead of 2 us). All other procedures like other clusters are not changed, so I don't expect change in high energy calibrations at all.

If we can seperate PMT after pulse from the S1s using this method, I suspect that the energy resolution will get improved a little bit (maybe ER band width as well). But we have to check the effect on calibration data.

Once this PR is merged, we can process some calibration (Rn220) and background data to see whether it helps. And I hope we can see some more top populations in the det.

@JosephJHowlett
Copy link
Contributor

Hi all,

Basically I re-ran the simulation of 4000 S1s and S2s down to 1cm below the gate, with areas according to the ER band mean, below 2000pe in cS1. I did this with and without PMT afterpulses and photo-ionization, for both the master and the twostepgap branches. Shown are the fractions of S1s having each outcome after processing. The relevant quantity to merging of S1s and S2s is the percentage of S1s merged with another peak and classified as S2, which is shown in the figures.

merging_fraction_ap_off

merging_fraction_ap_on

Outcome Master (off) twostepgap (off) Master (on) twostepgap (on)
Found 98.80 98.58 71.25 77.05
Merged to S2 0.70 0.93 18.05 12.80
Mis-Ided as S2 0.50 0.42 9.53 9.03
Un-Classified 0.0 0.0 1.15 1.07
  • There is definitely some improvement in merging, although maybe less than we would expect. An example waveform where twostepgap still fails to resolve the peaks is shown below.

  • It also seems clear that this merging (as well as mid-identification) is dependent on the inclusion of afterpulses. The waveform below suggest this may be due to gate photo-ionization.

screen shot 2017-07-24 at 10 02 31 am

@sanderbreur
Copy link
Contributor

sanderbreur commented Jul 25, 2017 via email

@JosephJHowlett
Copy link
Contributor

Hi all,

Fei and I were looking deeper into why twostepgap wasn't vastly improving the S1-S2 resolution, and found that the initial clustering was resolving these S1s, but they were often mis-classified as S2 candidates and subsequently merged with the main S2. Fei fixed this by updating the classification step to be identical to pax's usual classification (see most recent commit). See results below:

Updated S1S2Close Comparison

corrected_branch_comparison

  • The S1s and S2s are now almost always resolved. This should give us some new top events, or at the least kill our best guess at where they went.
Outcome Master (APs on) twostepgap (APs on)
Found 71.25 88.85
Merged to S2 18.05 0.15
Mis-Ided as S2 9.53 9.93
Un-Classified 1.15 1.05

@sanderbreur
Copy link
Contributor

sanderbreur commented Jul 25, 2017 via email

@JosephJHowlett
Copy link
Contributor

@sanderbreur you asked the right question, I just reprocessed the SmallDeepS2s data and it looks good:

Outcome Master Branch twostepgap
found 98.64 82.24
split to s2s 1.36 1.04
split and misid as s1 0 0.04
chopped to unknown/lone_hit 0 16.68

I think this is expected, since most of the deep-s2 fragments after the tight clustering have a small rise time and are merged by the second, loose clustering. Those with a short rise are usually lone hits (and 1-2% are two-fold coincidences).

@JelleAalbers
Copy link
Contributor

JelleAalbers commented Jul 25, 2017

That looks like a nice improvement in the merging, but you might have opened a can of worms here...

  • If BuildPeaks is after RejectNoiseHits, the latter doesn't do anything anymore (as it relies on a rough clustering being already applied), so you will have lost noisy channel mitigation. I think you also get into trouble if you would use the order [SumWaveform, BuildPeaks, RejectNoiseHits], I don't remember exactly why (maybe a rare crash case, or some properties end up inconsistent, but there's a comment in XENON1T.ini saying you can't do it :-), maybe there is a more detailed comment somewhere else).
  • There is now quite a bit of duplication of code between the classification / peak properties computation plugins and the clustering plugin -- though I see you've tried to import where you could -- which would make it harder to maintain and optimize. Also keep in mind pax isn't used only by XENON1T.

Are you sure you want this extra complication? As Sander mentioned if you just want to see the top population once we can process with different settings (eg. the old natural breaks settings) first. Or perhaps there is a smaller modification of the mini-classification that would get you most of the way there.

@feigaodm
Copy link
Member Author

@JelleAalbers The problem is that the raw classification doesn't work nicely as shown in Joey's plot. This is because the rise time based on max_index and left (or left_central) has too big fluctuations.That's why we spent the time and effort to tune this. From the results, I think it's worth the effort and should be implemented in Xe1T. As I expected, I think it can not only solve the missing top population issue, but can also help to increase the S1 resolution (because we can get rid of PMT after pulses and gate PI).

I understand there could be some potential issues, but I think we should proceed with it and solve the coming issues. About the order of [SumWaveform, BuildPeaks, RejectNoiseHits], I did think about it and come with the solution as is. I changed the order because I met some problem when calculating peak.rise time using the peak. Maybe we can modify that part so that we don't need to call sumWF to calculate rise time, do you have any suggestions to do this? Or it would be good if you can modify it directly.

The simple approach @sanderbreur mentioned won't work so nicely without lots of effort. At the top part of the TPC, S1s won't be too different than S2. We tried to disable some other clustering algrithm to test the effect, it looks like effects of the other two are limited.

@JosephJHowlett
Copy link
Contributor

JosephJHowlett commented Jul 26, 2017

@JelleAalbers maybe this is silly, but is it possible to split this new algorithm down the middle, do the initial GapSizeClustering with our small (200ns) threshold, then continue down the line as usual, but after SumWaveform add a classify step, then replace the later classification with an S2 merging within our larger threshold? I say this because we know the second GapSizeClustering will only merge S2s into larger S2s, so in principle the second classification step adds no new information. Or is there at least some other way to incorporate one or more plugins that retain the essence of this algorithm without hurting the flow of pax?

For sure Fei's and my immediate goal was to replace the current GapSizeClustering with a two-step algorithm with minimal changes that could resolve these kinds of events. We're testing some processing on background data now to see if new events appear, maybe this result can add to our discussion.

@feigaodm
Copy link
Member Author

feigaodm commented Jul 28, 2017

@JelleAalbers I modified the plugins and use the order [SumWaveform, BuildPeaks, RejectNoiseHits] in configuration files. The branch has been used to process ~17 hour of background data without bugs as far as I can tell, the results of the processed data looks promising in solving the missing top populations. Please let us know if you think we can test other topics. Thanks.
btw, this update doesn't cost more time in processing data.

Copy link
Contributor

@JelleAalbers JelleAalbers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the extra tests, sorry it took a few days to get back to you. This clearly improves the processing outcome. I'm happy the noisy hit rejection has been reactivated now.

As for my previous comments on code duplication / complications: I guess this is some local optimum along the (effective vs. simple) curve. There may be solutions that work better but they would be more complicated and require bigger changes -- e.g. splitting up the algorithm like Joey suggests, so the hits-only sum waveform still excludes the noisy hits, or even changing pax more thoroughly to have a clustering/property/classification inner loop over peaks spanning several plugins. There may also be simpler solutions (e.g. the earlier ideas), but you found they didn't work so well.

So I agree it's time to merge this and see what happens; we can always fine-tune things. Nice work guys!

@feigaodm feigaodm merged commit 7b8d0ef into master Aug 3, 2017
@feigaodm feigaodm deleted the twostepgap branch August 3, 2017 02:20
@feigaodm
Copy link
Member Author

This fixed Issues (#594, #540, #542, #543, #544, #545)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants