Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add PostgresViewEntry #183

Draft
wants to merge 15 commits into
base: main
Choose a base branch
from
Draft

Add PostgresViewEntry #183

wants to merge 15 commits into from

Conversation

Tishj
Copy link
Contributor

@Tishj Tishj commented Feb 20, 2024

In the table query we get both tables, views, materialized views, partitioned tables and foreign tables.
The views are then stored in the catalog as a TableCatalogEntry, which confuses duckdb_tables() and duckdb_views() which is used inside information_schema.tables

So the views showed up as tables, this PR fixes that and adds a test for it

Notable change:

In postgres, when views are created on a dummy binding, with no aliases, their resulting alias becomes "?column?":

(lldb) p view_catalog_entry.names
(duckdb::vector<std::string>) {
  std::__1::vector<std::__1::string, std::__1::allocator<std::__1::string> > = size=1 {
    [0] = "?column?"
  }
}

As a result, our binding will fail because the names don't match.
To get around this, we don't save the entry as "?column?" but instead use the name that DuckDB would attribute to the expression

public:
PostgresViewInfo() {
create_info = make_uniq<CreateViewInfo>();
// create_info->columns.SetAllowDuplicates(true);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about this, the Table has a ColumnsList which provides this option, but no such option exists for views because we don't have a ColumnsList, we instead just store the types+names separately

@Tishj
Copy link
Contributor Author

Tishj commented Feb 20, 2024

This is lacking testing still, and there are probably virtual methods that need to be overridden

@Tishj Tishj marked this pull request as draft February 20, 2024 10:25
@Tishj
Copy link
Contributor Author

Tishj commented Mar 4, 2024

The failure is probably related to:

statement ok
CREATE VIEW s1.v1 AS SELECT 42 as j

Our error check is probably not a fan of this, because there is no existing table to reference

Or actually..
Both test_view.test and attach_views.test create or replace v1.
Maybe these are somehow happening concurrently, messing with the persistent postgres database file?

Copy link
Contributor

@Mytherin Mytherin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you merge with main? Also one minor comment:

: ViewCatalogEntry(catalog, schema, *info.create_info), postgres_types(std::move(info.postgres_types)),
postgres_names(std::move(info.postgres_names)) {
D_ASSERT(postgres_types.size() == types.size());
approx_num_pages = info.approx_num_pages;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approx_num_pages should be 0 for a view always (or rather it probably shouldn't be a property at all?) - we use this to partition base tables but that is not possible for views

@Tishj
Copy link
Contributor Author

Tishj commented Mar 23, 2024

@Mytherin This seems like a problem, I can't really find the reason for this behavior

# Not aliased (no expression alias)
statement ok
CREATE VIEW s1.view_aliases.x AS SELECT 'x'

query I
select x from s1.view_aliases.x
----
{'CAST('x' AS VARCHAR)': x}

I've had to replace the CreateViewInfo names with the expression.ToString() if the name was "?column?", this seems to have this effect

@Mytherin
Copy link
Contributor

The fact that we are getting two different names (one from Postgres', one from DuckDB') indicates to me we are binding the same query both in Postgres and in DuckDB. That seems like a more fundamental problem to me. After all - if a view contains anything that DuckDB does not support, the binding in DuckDB will fail, whereas we should still be able to query the view.

Maybe we could try e.g. not pulling the SQL definition of the view into DuckDB so the binding is avoided?

@Mytherin
Copy link
Contributor

An example of a view that works in Postgres but wouldn't work in DuckDB might be:

create table my_table(i int);
create view my_view as select ctid from my_table;

@Tishj
Copy link
Contributor Author

Tishj commented Mar 27, 2024

An example of a view that works in Postgres but wouldn't work in DuckDB might be:

create table my_table(i int);
create view my_view as select ctid from my_table;

You're right, that doesn't work
But that also doesn't work for select ctid from s1.my_table, because we've never registered ctid as being part of my_table in our custom postgres catalog

That's another issue we can solve by turning the columns into separate catalog entries, we could add default entries for things like ctid.
And come to think of it, rowid as well for our internal use

@Tishj
Copy link
Contributor Author

Tishj commented Mar 27, 2024

The problem with this lies in Binder::Bind(BaseTableRef &ref)
Even if I remove the query from the created ViewCatalogEntry, this code looks up the catalog entry, binds it and compares the new results versus the old results, to throw the "Contents of view were altered: types don't match!" error.

(also setting the query to nullptr causes an InternalException here currently because this path assumes the query is always present)
Perhaps we want to just not do this verification step if the saved query is nullptr?

@Tishj
Copy link
Contributor Author

Tishj commented Mar 27, 2024

The problem here is that we first bind the view's query before we hand it off to postgres, so we have to be able to bind it ourselves (possibly delegating to the postgres catalog extension)

Then we create the entry in the postgres catalog, running the same query, then reloading to get the entry as it was created in postgres.
This saves it as "?column?" because that's what postgres assigned to it.

Then in our verification step inside Bind(BaseTableRef &ref) we bind the query again on duckdb's side, which produces a different name for the column.


The "skip view content verification if query is nullptr" solution could work in that case, because we won't bind the query on our side again.
That still leaves the problem of the first bind though, if system columns are referenced that we haven't explicitly made a part of the TableCatalogEntry the bind will fail (as evidenced by the create view my_view as select ctid from my_table example)

One solution there might be decompose the TableCatalogEntry into a catalog set containing columns, and using a DefaultGenerator to create the system columns
Another could be to always add the system columns manually in our (postgres)TableCatalogEntry creation code.

A more generic approach that will likely take the least effort is to add a virtual method on the table catalog entry for column retrieval, so we can hook in and return a column definition for "ctid" and friends, basically a poor mans "DefaultColumnGenerator"


The last idea to make this work is take a similar approach as BindCreateIndex does and let BindCreateView be a virtual method on Catalog so we can override the behavior in the PostgresCatalog
Skipping the initial bind, and also the rebind that is currently done to verify that "View contents were altered" did not occur

@Tishj Tishj self-assigned this Apr 18, 2024
@Tishj
Copy link
Contributor Author

Tishj commented Apr 18, 2024

We're likely going to have to separate out the SelectStatement inside the CreateViewInfo, then construct a postgres query from that, get the names + types from the PGresult object, and convert the type into a LogicalType

Oid PQftype(const PGresult *res,
            int column_number);

That gets us the Oid, then we can run another query to retrieve the information we need to transform PG -> LogicalType

One downside of this is that we have to actually execute the query to get the names+types
I have seen other solutions which involve creating a prepared statement and querying metadata from it, that might be interesting?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants