Skip to content

Commit

Permalink
Don't use the exact size from the cache for query planning (#1736)
Browse files Browse the repository at this point in the history
So far, when a part of a query was in the cache, the query planner used the exact size as cost estimate. This sounds natural because the exact size is by definition the best estimate. However, under special circumstances, this might lead to poor query plans when the other estimates are off (it's not important that the individual estimates are good, it's important that together they lead to a reasonable query plan).

NOTE: This is a trivial change. A bigger issue that we might want to address in the future is that it might not be optimal to compute the average multiplicity of a column as the average multiplicity of it's distinct elements. Instead the average of the squares might be more appropriate (note that the sum of the squares of the individual multiplicities is exactly the size of the result one would get when joining that column with itself).
  • Loading branch information
joka921 authored Jan 31, 2025
1 parent 1f38ba3 commit f2562fe
Show file tree
Hide file tree
Showing 2 changed files with 8 additions and 12 deletions.
14 changes: 5 additions & 9 deletions src/engine/QueryExecutionTree.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -91,15 +91,11 @@ size_t QueryExecutionTree::getCostEstimate() {
// _____________________________________________________________________________
size_t QueryExecutionTree::getSizeEstimate() {
if (!sizeEstimate_.has_value()) {
if (cachedResult_) {
AD_CORRECTNESS_CHECK(cachedResult_->isFullyMaterialized());
sizeEstimate_ = cachedResult_->idTable().size();
} else {
// if we are in a unit test setting and there is no QueryExecutionContest
// specified it is the rootOperation_'s obligation to handle this case
// correctly
sizeEstimate_ = rootOperation_->getSizeEstimate();
}
// Note: Previously we used the exact size instead of the estimate for
// results that were already in the cache. This however often lead to poor
// planning, because the query planner compared exact sizes with estimates,
// which lead to worse plans than just conistently choosing the estimate.
sizeEstimate_ = rootOperation_->getSizeEstimate();
}
return sizeEstimate_.value();
}
Expand Down
6 changes: 3 additions & 3 deletions test/OperationTest.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -242,12 +242,12 @@ TEST(OperationTest, estimatesForCachedResults) {
[[maybe_unused]] auto res = qet->getResult();
}
// The result is now cached inside the static execution context, if we create
// the same operation again, the cost estimate is 0 and the size estimate is
// exact (3 rows).
// the same operation again, the cost estimate is 0. The size estimate doesn't
// change (see the `getCostEstimate` function for details on why).
{
auto qet = makeQet();
EXPECT_EQ(qet->getCacheKey(), qet->getRootOperation()->getCacheKey());
EXPECT_EQ(qet->getSizeEstimate(), 3u);
EXPECT_EQ(qet->getSizeEstimate(), 24u);
EXPECT_EQ(qet->getCostEstimate(), 0u);
}
}
Expand Down

0 comments on commit f2562fe

Please sign in to comment.