Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does bloom filter in SQL even work? #17602

Open
saulius opened this issue Jan 3, 2025 · 2 comments
Open

Does bloom filter in SQL even work? #17602

saulius opened this issue Jan 3, 2025 · 2 comments

Comments

@saulius
Copy link

saulius commented Jan 3, 2025

The docs at https://druid.apache.org/docs/latest/development/extensions-core/bloom-filter/ say the following:

Bloom filters can be computed in SQL expressions with the bloom_filter aggregator:
SELECT BLOOM_FILTER(, ) FROM druid.foo WHERE dim2 = 'abc'
but requires the setting druid.sql.planner.serializeComplexValues to be set to true. Bloom filter results in a SQL response are serialized into a base64 string, which can then be used in subsequent queries as a filter.

I'm trying to do exactly that using the default kttm_rollup dataset, e.g.:

WITH yunowork AS (
 SELECT 
    BLOOM_FILTER("ip_address", 10000) AS bf
  FROM druid.kttm_rollup
)
SELECT
  ip_address
FROM druid.kttm_rollup
JOIN yunowork Y ON 1 = 1
WHERE BLOOM_FILTER_TEST(ip_address, Y.bf)

I get the following error:

Error: INVALID_INPUT

Cannot apply 'BLOOM_FILTER_TEST' to arguments of type 'BLOOM_FILTER_TEST(<VARCHAR>, <COMPLEX<BLOOM>>)'. Supported form(s): 'BLOOM_FILTER_TEST(<ANY>, <CHARACTER>)' (line [10], column [7])

druid.sql.planner.serializeComplexValues is set to true, but as far as I understand it's irrelevant.

Am I doing something wrong?

Affected Version

31.0.1 (locally on osx) and 26.0.0 (cluster on linux), I assume all versions in between too.

Description

Mostly described above.

@saulius
Copy link
Author

saulius commented Jan 3, 2025

BLOOM_FILTER_TEST works if I provide bloom filter encoded as a base64 string, so the issue seems to related to SQL function interop.

@clintropolis
Copy link
Member

yea, this looks like a bug with subquery results not being correctly handled for the bloom filter type, i'll try to have a look into fixing this since it does seem like it would be useful to support queries like this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants