Skip to content

Latest commit

 

History

History
128 lines (104 loc) · 15.3 KB

query-context.md

File metadata and controls

128 lines (104 loc) · 15.3 KB
id title sidebar_label
query-context
Query context
Query context

The query context is used for various query configuration parameters. Query context parameters can be specified in the following ways:

  • For Druid SQL, context parameters are provided either as a JSON object named context to the HTTP POST API, or as properties to the JDBC connection.
  • For native queries, context parameters are provided as a JSON object named context.

Note that setting query context will override both the default value and the runtime properties value in the format of druid.query.default.context.{property_key} (if set).

General parameters

Unless otherwise noted, the following parameters apply to all query types.

property default description
timeout druid.server.http.defaultQueryTimeout Query timeout in millis, beyond which unfinished queries will be cancelled. 0 timeout means no timeout (up to the server-side maximum query timeout, druid.server.http.maxQueryTimeout). To set the default timeout and maximum timeout, see Broker configuration
priority 0 Query Priority. Queries with higher priority get precedence for computational resources.
lane null Query lane, used to control usage limits on classes of queries. See Broker configuration for more details.
queryId auto-generated Unique identifier given to this query. If a query ID is set or known, this can be used to cancel the query
brokerService null Broker service to which this query should be routed. This parameter is honored only by a broker selector strategy of type manual. See Router strategies for more details.
useCache true Flag indicating whether to leverage the query cache for this query. When set to false, it disables reading from the query cache for this query. When set to true, Apache Druid uses druid.broker.cache.useCache or druid.historical.cache.useCache to determine whether or not to read from the query cache
populateCache true Flag indicating whether to save the results of the query to the query cache. Primarily used for debugging. When set to false, it disables saving the results of this query to the query cache. When set to true, Druid uses druid.broker.cache.populateCache or druid.historical.cache.populateCache to determine whether or not to save the results of this query to the query cache
useResultLevelCache true Flag indicating whether to leverage the result level cache for this query. When set to false, it disables reading from the query cache for this query. When set to true, Druid uses druid.broker.cache.useResultLevelCache to determine whether or not to read from the result-level query cache
populateResultLevelCache true Flag indicating whether to save the results of the query to the result level cache. Primarily used for debugging. When set to false, it disables saving the results of this query to the query cache. When set to true, Druid uses druid.broker.cache.populateResultLevelCache to determine whether or not to save the results of this query to the result-level query cache
bySegment false Native queries only. Return "by segment" results. Primarily used for debugging, setting it to true returns results associated with the data segment they came from
finalize true Flag indicating whether to "finalize" aggregation results. Primarily used for debugging. For instance, the hyperUnique aggregator will return the full HyperLogLog sketch instead of the estimated cardinality when this flag is set to false
maxScatterGatherBytes druid.server.http.maxScatterGatherBytes Maximum number of bytes gathered from data processes such as Historicals and realtime processes to execute a query. This parameter can be used to further reduce maxScatterGatherBytes limit at query time. See Broker configuration for more details.
maxQueuedBytes druid.broker.http.maxQueuedBytes Maximum number of bytes queued per query before exerting backpressure on the channel to the data server. Similar to maxScatterGatherBytes, except unlike that configuration, this one will trigger backpressure rather than query failure. Zero means disabled.
serializeDateTimeAsLong false If true, DateTime is serialized as long in the result returned by Broker and the data transportation between Broker and compute process
serializeDateTimeAsLongInner false If true, DateTime is serialized as long in the data transportation between Broker and compute process
enableParallelMerge true Enable parallel result merging on the Broker. Note that druid.processing.merge.useParallelMergePool must be enabled for this setting to be set to true. See Broker configuration for more details.
parallelMergeParallelism druid.processing.merge.pool.parallelism Maximum number of parallel threads to use for parallel result merging on the Broker. See Broker configuration for more details.
parallelMergeInitialYieldRows druid.processing.merge.task.initialYieldNumRows Number of rows to yield per ForkJoinPool merge task for parallel result merging on the Broker, before forking off a new task to continue merging sequences. See Broker configuration for more details.
parallelMergeSmallBatchRows druid.processing.merge.task.smallBatchNumRows Size of result batches to operate on in ForkJoinPool merge tasks for parallel result merging on the Broker. See Broker configuration for more details.
useFilterCNF false If true, Druid will attempt to convert the query filter to Conjunctive Normal Form (CNF). During query processing, columns can be pre-filtered by intersecting the bitmap indexes of all values that match the eligible filters, often greatly reducing the raw number of rows which need to be scanned. But this effect only happens for the top level filter, or individual clauses of a top level 'and' filter. As such, filters in CNF potentially have a higher chance to utilize a large amount of bitmap indexes on string columns during pre-filtering. However, this setting should be used with great caution, as it can sometimes have a negative effect on performance, and in some cases, the act of computing CNF of a filter can be expensive. We recommend hand tuning your filters to produce an optimal form if possible, or at least verifying through experimentation that using this parameter actually improves your query performance with no ill-effects.
secondaryPartitionPruning true Enable secondary partition pruning on the Broker. The Broker will always prune unnecessary segments from the input scan based on a filter on time intervals, but if the data is further partitioned with hash or range partitioning, this option will enable additional pruning based on a filter on secondary partition dimensions.
enableJoinLeftTableScanDirect false This flag applies to queries which have joins. For joins, where left child is a simple scan with a filter, by default, druid will run the scan as a query and the join the results to the right child on broker. Setting this flag to true overrides that behavior and druid will attempt to push the join to data servers instead. Please note that the flag could be applicable to queries even if there is no explicit join. since queries can internally translated into a join by the SQL planner.
debug false Flag indicating whether to enable debugging outputs for the query. When set to false, no additional logs will be produced (logs produced will be entirely dependent on your logging level). When set to true, the following addition logs will be produced:
- Log the stack trace of the exception (if any) produced by the query
maxNumericInFilters -1 Max limit for the amount of numeric values that can be compared for a string type dimension when the entire SQL WHERE clause of a query translates only to an OR of Bound filter. By default, Druid does not restrict the amount of of numeric Bound Filters on String columns, although this situation may block other queries from running. Set this property to a smaller value to prevent Druid from running queries that have prohibitively long segment processing times. The optimal limit requires some trial and error; we recommend starting with 100. Users who submit a query that exceeds the limit of maxNumericInFilters should instead rewrite their queries to use strings in the WHERE clause instead of numbers. For example, WHERE someString IN (‘123’, ‘456’). This value cannot exceed the set system configuration druid.sql.planner.maxNumericInFilters. This value is ignored if druid.sql.planner.maxNumericInFilters is not set explicitly.
inSubQueryThreshold 2147483647 Threshold for minimum number of values in an IN clause to convert the query to a JOIN operation on an inlined table rather than a predicate. A threshold of 0 forces usage of an inline table in all cases; a threshold of [Integer.MAX_VALUE] forces usage of OR in all cases.

Druid SQL parameters

See SQL query context for query context parameters specific to Druid SQL queries.

Parameters by query type

Some query types offer context parameters specific to that query type.

TopN

property default description
minTopNThreshold 1000 The top minTopNThreshold local results from each segment are returned for merging to determine the global topN.

Timeseries

property default description
skipEmptyBuckets false Disable timeseries zero-filling behavior, so only buckets with results will be returned.

Join filter

property default description
enableJoinFilterPushDown true Controls whether a join query will attempt filter push down, which reduces the number of rows that have to be compared in a join operation.
enableJoinFilterRewrite true Controls whether filter clauses that reference non-base table columns will be rewritten into filters on base table columns.
enableJoinFilterRewriteValueColumnFilters false Controls whether Druid rewrites non-base table filters on non-key columns in the non-base table. Requires a scan of the non-base table.
joinFilterRewriteMaxSize 10000 The maximum size of the correlated value set used for filter rewrites. Set this limit to prevent excessive memory use.

GroupBy

See the list of GroupBy query context parameters available on the groupBy query page.

Vectorization parameters

The GroupBy and Timeseries query types can run in vectorized mode, which speeds up query execution by processing batches of rows at a time. Not all queries can be vectorized. In particular, vectorization currently has the following requirements:

  • All query-level filters must either be able to run on bitmap indexes or must offer vectorized row-matchers. These include "selector", "bound", "in", "like", "regex", "search", "and", "or", and "not".
  • All filters in filtered aggregators must offer vectorized row-matchers.
  • All aggregators must offer vectorized implementations. These include "count", "doubleSum", "floatSum", "longSum", "longMin", "longMax", "doubleMin", "doubleMax", "floatMin", "floatMax", "longAny", "doubleAny", "floatAny", "stringAny", "hyperUnique", "filtered", "approxHistogram", "approxHistogramFold", and "fixedBucketsHistogram" (with numerical input).
  • All virtual columns must offer vectorized implementations. Currently for expression virtual columns, support for vectorization is decided on a per expression basis, depending on the type of input and the functions used by the expression. See the currently supported list in the expression documentation.
  • For GroupBy: All dimension specs must be "default" (no extraction functions or filtered dimension specs).
  • For GroupBy: No multi-value dimensions.
  • For Timeseries: No "descending" order.
  • Only immutable segments (not real-time).
  • Only table datasources (not joins, subqueries, lookups, or inline datasources).

Other query types (like TopN, Scan, Select, and Search) ignore the "vectorize" parameter, and will execute without vectorization. These query types will ignore the "vectorize" parameter even if it is set to "force".

property default description
vectorize true Enables or disables vectorized query execution. Possible values are false (disabled), true (enabled if possible, disabled otherwise, on a per-segment basis), and force (enabled, and groupBy or timeseries queries that cannot be vectorized will fail). The "force" setting is meant to aid in testing, and is not generally useful in production (since real-time segments can never be processed with vectorized execution, any queries on real-time data will fail). This will override druid.query.default.context.vectorize if it's set.
vectorSize 512 Sets the row batching size for a particular query. This will override druid.query.default.context.vectorSize if it's set.
vectorizeVirtualColumns false Enables or disables vectorized query processing of queries with virtual columns, layered on top of vectorize (vectorize must also be set to true for a query to utilize vectorization). Possible values are false (disabled), true (enabled if possible, disabled otherwise, on a per-segment basis), and force (enabled, and groupBy or timeseries queries with virtual columns that cannot be vectorized will fail). The "force" setting is meant to aid in testing, and is not generally useful in production. This will override druid.query.default.context.vectorizeVirtualColumns if it's set.