Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Don't use the exact size from the cache for query planning (#1736)
So far, when a part of a query was in the cache, the query planner used the exact size as cost estimate. This sounds natural because the exact size is by definition the best estimate. However, under special circumstances, this might lead to poor query plans when the other estimates are off (it's not important that the individual estimates are good, it's important that together they lead to a reasonable query plan). NOTE: This is a trivial change. A bigger issue that we might want to address in the future is that it might not be optimal to compute the average multiplicity of a column as the average multiplicity of it's distinct elements. Instead the average of the squares might be more appropriate (note that the sum of the squares of the individual multiplicities is exactly the size of the result one would get when joining that column with itself).
- Loading branch information