-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance degradation reading files via root:// #1157
Comments
From a bit more debugging, it looks like it's reading way more data than it needs to, see the $ XRD_LOGLEVEL=Debug python /tmp/uproot-perf.py $pfn default 2>&1 | rg '(read command|Took )' -A 1
[2024-03-03 14:56:23.136216 +0100][Debug ][File ] [0x25f98828@root://ccsrm.ihep.ac.cn:1094//dpm/ihep.ac.cn/home/lhcb/LHCb/Collision18/LEPTONIC.MDST/00092248/0000/00092248_00002347_1.leptonic.mdst?xrdcl.requuid=59BD3460-5F19-483C-ADC9-0CC973D9CB0C] Sending a read command for handle 0x0 to dpmlhcb01.ihep.ac.cn:1095
[2024-03-03 14:56:23.136351 +0100][Debug ][ExDbgMsg ] [dpmlhcb01.ihep.ac.cn:1095] MsgHandler created: 0x6e06a20 (message: kXR_read (handle: 0x00000000, offset: 0, size: 5243283) ).
--
[2024-03-03 14:56:29.381762 +0100][Debug ][File ] [0x25f98828@root://ccsrm.ihep.ac.cn:1094//dpm/ihep.ac.cn/home/lhcb/LHCb/Collision18/LEPTONIC.MDST/00092248/0000/00092248_00002347_1.leptonic.mdst?xrdcl.requuid=59BD3460-5F19-483C-ADC9-0CC973D9CB0C] Sending a read command for handle 0x0 to dpmlhcb01.ihep.ac.cn:1095
[2024-03-03 14:56:29.381919 +0100][Debug ][ExDbgMsg ] [dpmlhcb01.ihep.ac.cn:1095] MsgHandler created: 0x4e12b50 (message: kXR_read (handle: 0x00000000, offset: 3872048242, size: 413) ).
--
[2024-03-03 14:56:29.673917 +0100][Debug ][File ] [0x25f98828@root://ccsrm.ihep.ac.cn:1094//dpm/ihep.ac.cn/home/lhcb/LHCb/Collision18/LEPTONIC.MDST/00092248/0000/00092248_00002347_1.leptonic.mdst?xrdcl.requuid=59BD3460-5F19-483C-ADC9-0CC973D9CB0C] Sending a read command for handle 0x0 to dpmlhcb01.ihep.ac.cn:1095
[2024-03-03 14:56:29.674115 +0100][Debug ][ExDbgMsg ] [dpmlhcb01.ihep.ac.cn:1095] MsgHandler created: 0x6d04340 (message: kXR_read (handle: 0x00000000, offset: 3867679172, size: 4369483) ).
--
[2024-03-03 14:56:43.445820 +0100][Debug ][File ] [0x14d5d98@root://ccsrm.ihep.ac.cn:1094//dpm/ihep.ac.cn/home/lhcb/LHCb/Collision18/LEPTONIC.MDST/00092248/0000/00092248_00002347_1.leptonic.mdst?xrdcl.requuid=8DF1FEF1-9E06-419B-BE92-F5EF459EDA32] Sending a read command for handle 0x1 to dpmlhcb01.ihep.ac.cn:1095
[2024-03-03 14:56:43.445890 +0100][Debug ][ExDbgMsg ] [dpmlhcb01.ihep.ac.cn:1095] MsgHandler created: 0x6e06810 (message: kXR_read (handle: 0x01000000, offset: 3867678939, size: 168) ).
--
Took 14.426180082999053 seconds
[2024-03-03 14:56:44.102932 +0100][Debug ][File ] [0x25f98828@root://ccsrm.ihep.ac.cn:1094//dpm/ihep.ac.cn/home/lhcb/LHCb/Collision18/LEPTONIC.MDST/00092248/0000/00092248_00002347_1.leptonic.mdst?xrdcl.requuid=59BD3460-5F19-483C-ADC9-0CC973D9CB0C] Sending a close command for handle 0x0 to dpmlhcb01.ihep.ac.cn:1095
$ XRD_LOGLEVEL=Debug python /tmp/uproot-perf.py $pfn xrootd 2>&1 | rg '(read command|Took )' -A 1
[2024-03-03 14:56:59.208793 +0100][Debug ][File ] [0x6a2cb98@root://ccsrm.ihep.ac.cn:1094//dpm/ihep.ac.cn/home/lhcb/LHCb/Collision18/LEPTONIC.MDST/00092248/0000/00092248_00002347_1.leptonic.mdst?xrdcl.requuid=4BD91220-0CC2-473F-A6C9-838F342FC691] Sending a read command for handle 0x0 to dpmlhcb01.ihep.ac.cn:1095
[2024-03-03 14:56:59.208916 +0100][Debug ][ExDbgMsg ] [dpmlhcb01.ihep.ac.cn:1095] MsgHandler created: 0x75054d0 (message: kXR_read (handle: 0x00000000, offset: 0, size: 403) ).
--
[2024-03-03 14:56:59.499091 +0100][Debug ][File ] [0x6a2cb98@root://ccsrm.ihep.ac.cn:1094//dpm/ihep.ac.cn/home/lhcb/LHCb/Collision18/LEPTONIC.MDST/00092248/0000/00092248_00002347_1.leptonic.mdst?xrdcl.requuid=4BD91220-0CC2-473F-A6C9-838F342FC691] Sending a read command for handle 0x0 to dpmlhcb01.ihep.ac.cn:1095
[2024-03-03 14:56:59.499166 +0100][Debug ][ExDbgMsg ] [dpmlhcb01.ihep.ac.cn:1095] MsgHandler created: 0x75054d0 (message: kXR_read (handle: 0x00000000, offset: 3872048242, size: 313) ).
--
[2024-03-03 14:56:59.789551 +0100][Debug ][File ] [0x6a2cb98@root://ccsrm.ihep.ac.cn:1094//dpm/ihep.ac.cn/home/lhcb/LHCb/Collision18/LEPTONIC.MDST/00092248/0000/00092248_00002347_1.leptonic.mdst?xrdcl.requuid=4BD91220-0CC2-473F-A6C9-838F342FC691] Sending a read command for handle 0x0 to dpmlhcb01.ihep.ac.cn:1095
[2024-03-03 14:56:59.789725 +0100][Debug ][ExDbgMsg ] [dpmlhcb01.ihep.ac.cn:1095] MsgHandler created: 0x14806d00 (message: kXR_read (handle: 0x00000000, offset: 3867679172, size: 681) ).
--
[2024-03-03 14:57:00.085071 +0100][Debug ][File ] [0x6a2cb98@root://ccsrm.ihep.ac.cn:1094//dpm/ihep.ac.cn/home/lhcb/LHCb/Collision18/LEPTONIC.MDST/00092248/0000/00092248_00002347_1.leptonic.mdst?xrdcl.requuid=4BD91220-0CC2-473F-A6C9-838F342FC691] Sending a vector read command for handle 0x0 to dpmlhcb01.ihep.ac.cn:1095
[2024-03-03 14:57:00.085096 +0100][Debug ][ExDbgMsg ] [dpmlhcb01.ihep.ac.cn:1095] MsgHandler created: 0x4f066c0 (message: kXR_readv (handle: 0x00000000, chunks: [(offset: 3867678939, size: 168); ], total size: 168) ).
--
Took 0.6477262080006767 seconds
[2024-03-03 14:57:00.440007 +0100][Debug ][File ] [0x6a2cb98@root://ccsrm.ihep.ac.cn:1094//dpm/ihep.ac.cn/home/lhcb/LHCb/Collision18/LEPTONIC.MDST/00092248/0000/00092248_00002347_1.leptonic.mdst?xrdcl.requuid=4BD91220-0CC2-473F-A6C9-838F342FC691] Sending a close command for handle 0x0 to dpmlhcb01.ihep.ac.cn:1095 |
That's right, vector reads are no longer the default as a result of a study that @lobis did in his fellowship last fall. The change in behavior went in at the boundary between 5.2.x and 5.3.0, when FSSpecSource became the new default backend. You can still get the other backends by explicitly setting the In @lobis's tests, sending and receiving independent requests and responses for each TBasket and handling them with Python 0.69s and 2.7s are both rather short times and likely don't reveal the asymptotic behavior... Hey, wait a minute! I remember our conversation back in Australia: you were motivated to introduce vector reads because you were working interactively with students who had small files, for which latency was the important metric. I can see how a 2.7s wait is considerably more noticeable than a 0.7s wait. How would it be to pass uproot.reading.open.defaults["handler"] = uproot.HTTPSource to set this globally, once at the top of a script. Would that work? |
I've done some very crude testing with a high latency connection (Australia <-> CERN) and it seems like the new fsspec implementation does well in comparision to the old XRootD source for lager amounts of data. It might even be faster if not for my next question. Is it expected that the new fsspec based source reads more data than the old XRootD source in some cases? In #1157 (comment) it's reading over 6000x more data (9.16 MiB vs 1.53 KiB). |
No, I expect it to read exactly the same amount of data. 6000× sounds like reading all TBranches when only a few are desired. Does Uproot agree with uproot5/src/uproot/source/fsspec.py Lines 137 to 139 in accc1ca
They can be accessed through uproot.Source.num_requested_chunks and uproot.Source.num_requested_bytes. |
If I force
|
The rest of the difference is coming from the
The second one is caused by uproot5/src/uproot/source/fsspec.py Line 164 in e47934f
which opens a new file behind the scenes for every call: |
When we were evaluating single-request, streamed versus many requests in flight in parallel, it was mostly with HTTP as a protocol (easier to set up instances for testing). The issues we were concerned about, overloading the server with too many requests versus funneling all of a multi-node server's output through a single serializer, are unrelated to this additional cost of the XRootD That's connected to the reason the file is opened twice, right? Does a |
No, it looks like this is just a limiation of the implementation in Unlike POSIX-style file reading there isn't a concept of the current position within a file with XRootD. Instead every I think the overhead of opening the file comes from the combination of:
|
What a coincidence, I've also been looking at this in the last few days. uproot5/src/uproot/source/fsspec.py Line 55 in e47934f
because in fsspec the default is 5MB: https://github.com/fsspec/filesystem_spec/blob/8a542c8c555614ab91fe07ce27a2d3dd3fd6700f/fsspec/spec.py#L1577 We usually read through a File handle a few times (file header, then the TTree header, usually at the end) and the bulk of the reads (TBranch get all baskets) go through the FileSystem stateless interface (I'll say more on this in a bit), so that handle might deserve a smaller size. |
So I've been running a skim benchmark where I access the first 50k events out of a few hundred files hosted throughout the US, comparing the new and old xrootd handlers, and write out the data for 96 columns from the input into local parquet files. I find a pretty sizable degradation, here in the
So we can eventually recover the old performance with known solutions. scikit-hep/fsspec-xrootd#54 will take some more thought on how to best cache file handles in a way that doesn't cause trouble. That said, all this is rather terrible performance. So I dumped the ranges passed to each ordered = sorted((start, stop) for ranges in all_ranges for start, stop in ranges)
better = ordered[:1]
for start, stop in ordered[1:]:
if start - better[-1][1] <= max((stop - start) // 50, 100):
better[-1] = better[-1][0], stop
else:
better.append((start, stop)) I get a list of ranges The test code, omitting the cluster creation using lpcjobqueue, is as follows: import dask
import awkward as ak
import dask_awkward as dak
from coffea import dataset_tools
def analysis(events):
dataset = events.metadata["dataset"]
photonSelect = (
(events.Photon.pt > 18)
& (abs(events.Photon.eta) < 1.5)
& (events.Photon.isScEtaEE | events.Photon.isScEtaEB)
& (events.Photon.cutBased >= 1)
)
events = events[
ak.any(photonSelect, axis=1)
]
skim = ak.zip(
{
"Jets": events.Jet,
"MET": events.MET,
"Photon": events.Photon,
},
depth_limit=1,
)
skim_task = dak.to_parquet(
skim,
f"root://cmseos.fnal.gov//store/user/ncsmith/skimtest/{dataset}",
compute=False,
)
return skim_task
out, report = dataset_tools.apply_to_fileset(
analysis,
dataset_tools.max_files(
dataset_tools.max_chunks(samples_ready, 1),
200,
),
uproot_options={
"allow_read_errors_with_report": True,
# "handler": uproot.source.xrootd.XRootDSource,
},
) where |
Thanks for the analysis @nsmith-! 🍰 All of your conclusions match with what I found, with the bonus of numbers to back them up. For the caching open files, I realised it's analogous to the connection pool used by each aiohttp session so I think it's definitely the right way to go.
Looking at the XRootD logs (#1157 (comment)) it seems like |
It was at first my understanding that Line 165 in e47934f
(or better yet round to 1024.. this is a small number these days) |
Ok I implemented a file handle cache in scikit-hep/fsspec-xrootd#54 and now the sites that previously had a lot of trouble with opening the file often are doing better. On the uproot side, we can save one additional open if we default to the stateless form uproot5/src/uproot/source/fsspec.py Lines 99 to 103 in e47934f
@lobis is there is a reason you prefer a file handle here, maybe for reading local files? |
Here's a summary of what I've attempted: The legend entries correspond to progressively more patches to improve this situation:
Note that for all sites the performance steadily improves, but it may in some cases have been improved by just one of the patches. I don't think any of these patches are harmful in alternative cases. |
All the updates are merged. I'll close this for now. |
Since the move to using fsspec-xrootd the performance of reading files via XRootD has dropped. For example, here is a script which pulls a few bytes of metadata from an LHCb file:
uproot-perf.py
The effect is less obvious for me with eospublic as I'm only a few milliseconds away from it. If I use data stored further away from me, I get 0.69s vs 2.7s.
I can also see the overhead when pulling larger amounts of data, though it's a smaller effect.
I suspect this is because "vector reads" are no longer being used so you have the roundtrip latency and overhead.
The text was updated successfully, but these errors were encountered: