core/translate: refactor query planner again to be simpler #853

jussisaurio · 2025-02-01T17:41:29Z

Simplify bookkeeping of referenced tables in the query planner

This PR refactors the way we track referenced tables and associated planner operations related to them (scans, index searches, etc).

The problem with what we currently have:

We have a tree data structure called SourceOperator which is either a Join, Scan, Search, Subquery or Nothing.

/**
  A SourceOperator is a Node in the query plan that reads data from a table.
*/
#[derive(Clone, Debug)]
pub enum SourceOperator {
    // Join operator
    // This operator is used to join two source operators.
    // It takes a left and right source operator, a list of predicates to evaluate,
    // and a boolean indicating whether it is an outer join.
    Join {
        id: usize,
        left: Box<SourceOperator>,
        right: Box<SourceOperator>,
        predicates: Option<Vec<ast::Expr>>,
        outer: bool,
        using: Option<ast::DistinctNames>,
    },
    // Scan operator
    // This operator is used to scan a table.
    // It takes a table to scan and an optional list of predicates to evaluate.
    // The predicates are used to filter rows from the table.
    // e.g. SELECT * FROM t1 WHERE t1.foo = 5
    // The iter_dir are uset to indicate the direction of the iterator.
    // The use of Option for iter_dir is aimed at implementing a conservative optimization strategy: it only pushes
    // iter_dir down to Scan when iter_dir is None, to prevent potential result set errors caused by multiple
    // assignments. for more detailed discussions, please refer to https://github.com/penberg/limbo/pull/376
    Scan {
        id: usize,
        table_reference: TableReference,
        predicates: Option<Vec<ast::Expr>>,
        iter_dir: Option<IterationDirection>,
    },
    // Search operator
    // This operator is used to search for a row in a table using an index
    // (i.e. a primary key or a secondary index)
    Search {
        id: usize,
        table_reference: TableReference,
        search: Search,
        predicates: Option<Vec<ast::Expr>>,
    },
    Subquery {
        id: usize,
        table_reference: TableReference,
        plan: Box<SelectPlan>,
        predicates: Option<Vec<ast::Expr>>,
    },
    // Nothing operator
    // This operator is used to represent an empty query.
    // e.g. SELECT * from foo WHERE 0 will eventually be optimized to Nothing.
    Nothing {
        id: usize,
    },
}

Logically joins are a tree, but this is at least marginally bad for performance because each Join has two boxed child operators, and so e.g. for a 3-table query you have, for example, 3 Scan nodes and then 2 Join nodes.
There are other bigger problems too, though, related to code structure. We have been carrying around a separate vector of referenced_tables that columns can refer to by index:

/// A query plan has a list of TableReference objects, each of which represents a table or subquery.
#[derive(Clone, Debug)]
pub struct TableReference {
    /// Table object, which contains metadata about the table, e.g. columns.
    pub table: Table,
    /// The name of the table as referred to in the query, either the literal name or an alias e.g. "users" or "u"
    pub table_identifier: String,
    /// The index of this reference in the list of TableReference objects in the query plan
    /// The reference at index 0 is the first table in the FROM clause, the reference at index 1 is the second table in the FROM clause, etc.
    /// So, the index is relevant for determining when predicates (WHERE, ON filters etc.) should be evaluated.
    pub table_index: usize,
    /// The type of the table reference, either BTreeTable or Subquery
    pub reference_type: TableReferenceType,
}

referenced_tables is used because SQLite joins are an n^tables_len nested loop, and we need to figure out during which loop to evaluate a condition expression. A lot of plumbing in the current code exists for this, e.g. "pushing predicates" in optimizer even though "predicate pushdown" as a query planner concept is an optimization, but in our current system the "pushdown" is really a necessity to move the condition expressions to the correct SourceOperator::predicates vector so that they are evaluated at the right point.
referenced_tables is also used to map identifiers in the query to the correct table, e.g. 'foo' SELECT foo FROM users becomes an ast::Expr::Column { table: 0, .. } if users is the first table in referenced_tables.

In addition to this, we ALSO had a TableReferenceType separately for checking whether the upper-level query is reading from a BTree table or a Subquery.

/// The type of the table reference, either BTreeTable or Subquery
#[derive(Clone, Debug, PartialEq, Eq)]
pub enum TableReferenceType {
    /// A BTreeTable is a table that is stored on disk in a B-tree index.
    BTreeTable,
    /// A subquery.
    Subquery {
        /// The index of the first register in the query plan that contains the result columns of the subquery.
        result_columns_start_reg: usize,
    },
}

...even though we already have an Operator::Subquery that should be able to encode this information, but doesn't, because it's a tree and we refer to things by index in referenced_tables.

Why this is especially stupid

SourceOperator and TableReference are basically just two representations of the same thing, one in tree format and another in vector format. SourceOperator even carries around its own copy of TableReference, even though we ALSO have referenced_tables: Vec<TableReference> 🤡

Note that I'm allowed to call the existing code stupid because I wrote it.

What we can do instead

Basically, we can just fold the concerns from SourceOperator into TableReference and have a list of those in the query plan, one per table, in loop order (outermost loop is 0, and so on).

Funnily enough, when Limbo had very very few features we used to have a Vec of LoopInfos similarly, obviously with a lot less information than now, but for SQLite it's probably the right abstraction. :)

pub struct SelectPlan {
    /// List of table references in loop order, outermost first.
    pub table_references: Vec<TableReference>,
    ...etc...
}

/// A table reference in the query plan.
/// For example, SELECT * FROM users u JOIN products p JOIN (SELECT * FROM users) sub
/// has three table references:
/// 1. operation=Scan, table=users, table_identifier=u, reference_type=BTreeTable, join_info=None
/// 2. operation=Scan, table=products, table_identifier=p, reference_type=BTreeTable, join_info=Some(JoinInfo { outer: false, using: None }),
/// 3. operation=Subquery, table=users, table_identifier=sub, reference_type=Subquery, join_info=None
#[derive(Debug, Clone)]
pub struct TableReference {
    /// The operation that this table reference performs.
    pub op: Operation,
    /// Table object, which contains metadata about the table, e.g. columns.
    pub table: Table,
    /// The name of the table as referred to in the query, either the literal name or an alias e.g. "users" or "u"
    pub identifier: String,
    /// The join info for this table reference if it is the right side of a join (which all except the first table reference have)
    pub join_info: Option<JoinInfo>,
}

And we keep the "operation" part from the "operator", but in a simple form:

pub enum Operation {
    Scan {
        iter_dir: Option<IterationDirection>,
    },
    Search(Search),
    Subquery {
        plan: Box<SelectPlan>,
        result_columns_start_reg: usize
    },
}

Now we don't need to carry around both the operator tree and Vec<TableReference>, because they are the same thing. If something refers to the n'th table, it is just plan.table_references[n].

We also don't need to recurse through the operator tree and usually we can just loop from outermost table to innermost table.

Handling the "where to evaluate a condition expression" problem

You can see I've also removed the predicates vector from Scan and friends. Previously each SourceOperator had a vector of predicates so that we knew at which loop depth to evaluate a condition. Now we align more with what SQLite does -- it puts all the conditions, even the join conditions, in the WHERE clause and adds extra metadata to them:

/// In a query plan, WHERE clause conditions and JOIN conditions are all folded into a vector of JoinAwareConditionExpr.
/// This is done so that we can evaluate the conditions at the correct loop depth.
/// We also need to keep track of whether the condition came from an OUTER JOIN. Take this example:
/// SELECT * FROM users u LEFT JOIN products p ON u.id = 5.
/// Even though the condition only refers to 'u', we CANNOT evaluate it at the users loop, because we need to emit NULL
/// values for the columns of 'p', for EVERY row in 'u', instead of completely skipping any rows in 'u' where the condition is false.
#[derive(Debug, Clone)]
pub struct JoinAwareConditionExpr {
    /// The original condition expression.
    pub expr: ast::Expr,
    /// Is this condition originally from an OUTER JOIN?
    /// If so, we need to evaluate it at the loop of the right table in that JOIN,
    /// regardless of which tables it references.
    /// We also cannot e.g. short circuit the entire query in the optimizer if the condition is statically false.
    pub from_outer_join: bool,
    /// The loop index where to evaluate the condition.
    /// For example, in `SELECT * FROM u JOIN p WHERE u.id = 5`, the condition can already be evaluated at the first loop (idx 0),
    /// because that is the rightmost table that it references.
    pub eval_at_loop: usize,
}

Final notes

I've been wanting to make this refactor for a long time now, but the last straw was when I was making a PR trying to reduce some of the massive amount of allocations happening in the read path currently, and I got stuck because of this Operator + referenced_tables shit getting in the way constantly. So I decided I wanted to get it over with now.

The PR is again very big, but I've artificially split it up into commits that don't individually compile but at least separate the changes a bit for you, dear reader.

- Get rid of SourceOperator tree - Make plan have a Vec of TableReference, and TableReference now contains the information from the old SourceOperator. - Remove `predicates` (conditions) from Table References -- put everything in the WHERE clause like SQLite, and attach metadata to the where clause expressions with JoinAwareConditionExpr struct. - Refactor select_star() to be simpler now that we use a vec, not a tree

- use new TableReference and JoinAwareConditionExpr - add utilities for determining at which loop depth a WHERE condition should be evaluated, now that "operators" do not carry condition expressions inside them anymore.

Now that we do not have a tree of SourceOperators but rather a Vec of TableReferences, we can just use loops instead of recursion for handling the main query loop.

…ary stuff We don't need `push_predicates()` because that never REALLY was a predicate pushdown optimization -- it just pushed WHERE clause condition expressions into the correct SourceOperator nodes in the tree. Now that we don't have a SourceOperator tree anymore and we keep the conditions in the WHERE clause instead, we don't need to "push" anything anymore. Leaves room for ACTUAL predicate pushdown optimizations later :) We also don't need any weird bitmask stuff anymore, and perhaps we never did, to determine where conditions should be evaluated.

jussisaurio force-pushed the operator-vec branch 5 times, most recently from 8edc5c8 to 26eefbd Compare February 1, 2025 21:02

jussisaurio marked this pull request as ready for review February 1, 2025 21:11

jussisaurio added 5 commits February 1, 2025 23:22

Change Display implementation of Plan to work with new data structures

deef42b

planner.rs: refactor from/join + where parsing logic

a3dd5ab

- use new TableReference and JoinAwareConditionExpr - add utilities for determining at which loop depth a WHERE condition should be evaluated, now that "operators" do not carry condition expressions inside them anymore.

select.rs: use new data structures when parsing select

6bc3579

delete.rs: use new data structures when parsing delete

a8b32f4

jussisaurio force-pushed the operator-vec branch from 26eefbd to e0a6527 Compare February 1, 2025 21:23

jussisaurio added 4 commits February 1, 2025 23:37

main_loop.rs: use iteration instead of recursion

b43c6f4

Now that we do not have a tree of SourceOperators but rather a Vec of TableReferences, we can just use loops instead of recursion for handling the main query loop.

subquery.rs: use iteration instead of recursion and simplify

4553a10

Marginal changes to use new data structures and field names

bb17c79

jussisaurio force-pushed the operator-vec branch from e0a6527 to bb17c79 Compare February 1, 2025 21:38

jussisaurio changed the title ~~core/translate: simplify bookkeeping of referenced tables in planner~~ core/translate: refactor query planner again to be simpler Feb 1, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

core/translate: refactor query planner again to be simpler #853

core/translate: refactor query planner again to be simpler #853

jussisaurio commented Feb 1, 2025 •

edited

Loading

core/translate: refactor query planner again to be simpler #853

Are you sure you want to change the base?

core/translate: refactor query planner again to be simpler #853

Conversation

jussisaurio commented Feb 1, 2025 • edited Loading

Simplify bookkeeping of referenced tables in the query planner

The problem with what we currently have:

Why this is especially stupid

What we can do instead

Handling the "where to evaluate a condition expression" problem

Final notes

jussisaurio commented Feb 1, 2025 •

edited

Loading