-
Notifications
You must be signed in to change notification settings - Fork 919
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[FEA] AST filtering in parquet reader (#13348)
The plan to support AST based filter predicate pushdown in parquet. This PR adds predicate pushdown on row group filtering. The statistics of columns of each row group are loaded to a device column, and AST filter is applied on min, max of each column to select the row groups to read. The user given AST needs to be converted to another AST to be applied on min, max values of each column ('Statistics AST'). After the row groups are parsed, the user given AST is applied on the output columns to filter any remaining rows in the row groups. New `column_name_reference` is introduced to help the users create AST's that reference columns by name, as the user may or may not have the column indices information before reading. Since AST engine takes only column index reference, a transformation is applied to the user given AST. So, 2 new AST transformation classes are introduced: 1. `named_to_reference_converter` - Converts column name references to column index references 2. `stats_expression_converter` - Converts the above output table filtering AST to 'Statistics AST'. Note: This column_name_reference only supported for predicate pushdown filtering, but not supported for other AST operations such as transform, joins etc. - [x] #13472 - [x] Convert column chunk min, max to cudf type column. - [x] Add AST filter interface to parquet reader options - [x] Convert AST to Statistics AST - [x] Apply statistics AST on Stats values to get row_groups - [x] Apply AST as filter on output columns. Depends on #13472 Authors: - Karthikeyan (https://github.com/karthikeyann) Approvers: - Mike Wilson (https://github.com/hyperbolic2346) - Bradley Dice (https://github.com/bdice) - Ray Douglass (https://github.com/raydouglass) URL: #13348
- Loading branch information
1 parent
2231b15
commit fa09cca
Showing
21 changed files
with
1,635 additions
and
69 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,64 @@ | ||
|
||
/* | ||
* Copyright (c) 2023, NVIDIA CORPORATION. | ||
* | ||
* Licensed under the Apache License, Version 2.0 (the "License"); | ||
* you may not use this file except in compliance with the License. | ||
* You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, software | ||
* distributed under the License is distributed on an "AS IS" BASIS, | ||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
#pragma once | ||
|
||
#include <cudf/ast/expressions.hpp> | ||
|
||
namespace cudf::ast::detail { | ||
/** | ||
* @brief Base "visitor" pattern class with the `expression` class for expression transformer. | ||
* | ||
* This class can be used to implement recursive traversal of AST tree, and used to validate or | ||
* translate an AST expression. | ||
*/ | ||
class expression_transformer { | ||
public: | ||
/** | ||
* @brief Visit a literal expression. | ||
* | ||
* @param expr Literal expression | ||
* @return Reference wrapper of transformed expression | ||
*/ | ||
virtual std::reference_wrapper<expression const> visit(literal const& expr) = 0; | ||
|
||
/** | ||
* @brief Visit a column reference expression. | ||
* | ||
* @param expr Column reference expression | ||
* @return Reference wrapper of transformed expression | ||
*/ | ||
virtual std::reference_wrapper<expression const> visit(column_reference const& expr) = 0; | ||
|
||
/** | ||
* @brief Visit an expression expression | ||
* | ||
* @param expr Expression expression | ||
* @return Reference wrapper of transformed expression | ||
*/ | ||
virtual std::reference_wrapper<expression const> visit(operation const& expr) = 0; | ||
|
||
/** | ||
* @brief Visit a column name reference expression. | ||
* | ||
* @param expr Column name reference expression | ||
* @return Reference wrapper of transformed expression | ||
*/ | ||
virtual std::reference_wrapper<expression const> visit(column_name_reference const& expr) = 0; | ||
|
||
virtual ~expression_transformer() {} | ||
}; | ||
} // namespace cudf::ast::detail |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.