Skip to content

Latest commit

 

History

History
67 lines (49 loc) · 7.03 KB

2022-09-07-#25957.md

File metadata and controls

67 lines (49 loc) · 7.03 KB

A PR by the theStack, as a revival of PR#15845, to use BIP157 block filters for faster wallet rescans, but only for descriptor wallets.

Notes

Questions

Why would a node operator enable BIP158 filters (-blockfilterindex=1)? Does the motivation make sense? 🔗

  • Before this PR, it's mostly altruism, providing a community service. You offer better privacy to light clients connected to you, lower resource usage for yourself (as the server) and no ability for clients to DoS the server by requesting you to monitor many unique filters (like BIP37 can do).
    • The BIP37 Bloom filter had the light client provide the bloom filter to its server (the full node), and that was different for each light client (so the server had to remember a bunch of them), whereas with BIP 157/158, the server generates just one for each block, and can send it (the same filter) to ALL of its light clients.
  • This PR may lead to more nodes providing this service, since the incremental cost is smaller to do and you now get the benefit of faster rescans for yourself.
  • Note: You enable the building and maintaining of this index with -blockfilterindex=1 but to provide the BIP 157 peer-to-peer service and actually serve the filters you have to enable peerblockfilters.

What downsides, if any, are there to enabling BIP158 filters? 🔗

  • They require more disk space because of the overhead that comes with the new index.
  • They require more (client) bandwidth than BIP37 filters because clients request entire blocks instead merely upload the Bloom filter and receive the txns that matches it directly.
    • BIP37 offered a way to just downloading matcing transactions in blocks. BIP157 does not, as the server just doesn't know what it'd need to give. This is an advantage on its own, as it avoids gratuitously revealing which transactions are interesting to the client (BIP37 has terrible privacy for this reason)
  • For a node operator with adequate CPU, RAM and disk space overhead, there are not many downsides.
  • Conceptually BIP158's GCS filter is similar to a Bloom filter (no false negatives, a controllable rate of false positives), but more compact (iirc around 1.3x-1.4x). (src sipa)
  • The downsides are the GCSs are write-once (you can't update them once created), and querying is much slower.
    • Bloom filters are effectively O(n) for finding n elements in them.
    • GCS are O(m+n) for finding n elements in a filter of size m.
    • So Bloom filters are way faster if you're only going to do one or a few queries. But as you're querying for larger and larger number of elements, the relative downside of a GCS's performance goes down.
  • Sipa has a writeup on the analysis for the size of GCS filters (which was used to set the BIP158 parameters).

Were you able to set up and run the PR on signet as described in the notes? Did you see a difference in performance with and without -blockfilterindex? 🔗

  • It's easier than enabling blockfilterindex on mainnet, you can build the blockfilter index on signet in a few minutes. But LarryRuane observed that signet was slower with the PR than without the PR.
  • This is probably because signet has a lot of empty (or near-empty) blocks. With this PR it ends up using the block filter to check each block (rather than checking each block directly), which takes longer than directly checking an empty (or near-empty) block.
  • It seems that GCSFilter::MatchInternal() is just always going to be slower than reading (nearly) empty blocks.
bonus: It seems like there could be a threshold of how many transactions are in a block to gain the performance boost, therefore suggesting an optimization. Why this is not true? 🔗
  • Even if we know the transaction count, it's a bad metric to determine how long a block takes to rescan, because it also depends on the number of inputs and outputs in the transaction. You'd need to know how many inputs and outputs there are to examine in a block, which you don't really have easy access to.
  • The sentiment is that it's not worth optimizing. This inverted performance behavior wouldn't occur on mainnet, which is all we really care about.

What is the advantage of descriptor wallets compared to legacy wallets, especially in the creation of the filter set? (Hint: what exact type of data do we need to put into the filter set?) 🔗

  • The descriptor-wallet gives us already all the scriptPubKeys that we have to look for in blocks. For legacy wallets there was more manual work involved, which was error-prone ("did i really construct all possible scriptPubKeys from the pubkeys?").
  • That makes this PR implementation simpler, as filter creation is as simple as putting each DescriptScriptPubKeyMans m_map_script_pub_key key into the filter.