Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enum type support #193

Open
wants to merge 18 commits into
base: main
Choose a base branch
from
Open

Enum type support #193

wants to merge 18 commits into from

Conversation

Tishj
Copy link
Collaborator

@Tishj Tishj commented Sep 19, 2024

This PR adds support for Postgres's ENUM types.

In Postgres ENUM types can not be ephemeral (exist only for the duration of a query), they have to be registered, and thus we can't just recreate the ENUM type when converting back from DuckDB -> Postgres at the end of execution.

This detail caused the biggest development overhead, we need to vendor and inherit from a templated derived class that contains the guts of the DuckDB ENUM type so we can add Postgres's ENUM member oids to it.

template <class T>
class PGDuckDBEnumTypeInfo : public EnumTypeInfoTemplated<T> {
public:
	PGDuckDBEnumTypeInfo(Vector &values_insert_order_p, idx_t dict_size_p, Vector &oid_vec)
	    : EnumTypeInfoTemplated<T>(values_insert_order_p, dict_size_p), oid_vec(oid_vec) {
	}

public:
	const Vector &
	GetMemberOids() const {
		return oid_vec;
	}

private:
	Vector oid_vec;
};

We save the ENUM member oids into this, as this is what has to be returned in the query result.

test/regression/sql/enum.sql Outdated Show resolved Hide resolved
test/regression/sql/enum.sql Show resolved Hide resolved
include/pgduckdb/types/pgduckdb_enum.hpp Show resolved Hide resolved
include/pgduckdb/types/pgduckdb_enum.hpp Show resolved Hide resolved
src/pgduckdb_types.cpp Outdated Show resolved Hide resolved
test/regression/sql/enum.sql Outdated Show resolved Hide resolved
src/pgduckdb_types.cpp Outdated Show resolved Hide resolved
src/pgduckdb_types.cpp Outdated Show resolved Hide resolved
src/pgduckdb_types.cpp Outdated Show resolved Hide resolved
src/pgduckdb_types.cpp Outdated Show resolved Hide resolved
using duckdb::LogicalType;
using duckdb::Vector;

struct PGDuckDBEnum {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be changed to a namespace? Now that this struct doesn't have any fields anymore?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea possibly, in our codebase we're not too strict on static-membered-structs vs namespaces, I think when we also need a constexpr value here we can't use a namespace but can use a struct, so I almost always go for using a struct

@Tishj
Copy link
Collaborator Author

Tishj commented Sep 24, 2024

I'm waiting on #97 so I can add the tests with a > 'sad' etc, that will involve adding support for looking up type catalog entries, which is probably a nice bonus (but might be better as a separate PR)

Mytherin added a commit to duckdb/duckdb that referenced this pull request Sep 24, 2024
To further override `EnumTypeInfoTemplated` it needs to be either
vendored (bad) or we move it to a header.
Relevant `pg_duckdb` PR: <duckdb/pg_duckdb#193>
@JelteF JelteF added this to the 0.2.0 milestone Sep 30, 2024
@JelteF JelteF added the enhancement New feature or request label Sep 30, 2024
@wuputah
Copy link
Collaborator

wuputah commented Oct 1, 2024

Regarding the 0.2.0 milestone: while it's not a blocker for 0.1.0 but it would be great for this to land, especially considering it's basically done. Use of ENUMs is certainly not uncommon in Postgres.

Comment on lines +21 to +24
// To store additional metadata for a type, DuckDB's LogicalType class contains a 'type_info_' field.
// The type of this is ExtraTypeInfo, ENUMs make use of this in the form of EnumTypeInfo (see
// duckdb/include/common/extra_type_info.hpp). We derive from this further to store the ENUMs connection with the oids
// of Postgres' enum members.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a super helpful comment, thanks a lot.

@@ -68,6 +73,56 @@ IsIndexScan(const Path *nodePath) {
return false;
}

optional_ptr<CatalogEntry>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a little comment about the assumptions of this function (feel free to correct this comment if my assumptions are incorrect)

Suggested change
optional_ptr<CatalogEntry>
/* Looks up the relevant postgres type and caches it in the SchemaItems.types for the next lookup. This is currently only used to lookup enum types. */
optional_ptr<CatalogEntry>

Comment on lines +113 to +114
dummy_attr.atttypmod = 0;
dummy_attr.attndims = 0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems incorrect if e.g. it's an array an of enum. I think it would be good to add a test for that, it doesn't have to actually work but at least it should throw a somewhat clear error and not crash.

Also I think it would be good to check here for now that the type is an enum, because that's really the only kind of type that we're comfortable with handling here.

Comment on lines +543 to +549
if (en1->enumsortorder < en2->enumsortorder) {
return true; // v1 < v2
} else if (en1->enumsortorder > en2->enumsortorder) {
return false; // v1 > v2
} else {
return false; // v1 == v2
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be simplified like this:

Suggested change
if (en1->enumsortorder < en2->enumsortorder) {
return true; // v1 < v2
} else if (en1->enumsortorder > en2->enumsortorder) {
return false; // v1 > v2
} else {
return false; // v1 == v2
}
return en1->enumsortorder < en2->enumsortorder;

Comment on lines +11 to +15
// This is medium hacky, we create extra space in the Vector that would normally only hold the strings corresponding to
// the enum's members. In that extra space we store the enum member oids.

// We do this so we can find the OID from the offset (which is what DuckDB stores in a Vector of type ENUM) and return
// that when converting the result from DuckDB -> Postgres
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not true anymore right? i.e. you switched back to the first with the vendored in template (until the next duckdb release)

#include "duckdb/common/serializer/deserializer.hpp"

// This is copied directly without changes from 'duckdb/src/common/extra_type_info.cpp'
// The reason this is copied is to be able to inherit from it.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// The reason this is copied is to be able to inherit from it.
// The reason this is copied is to be able to inherit from it.
// This can be removed once DuckDB 1.2.0 is released, which contains this PR:
// https://github.com/duckdb/duckdb/pull/14038

// that when converting the result from DuckDB -> Postgres

LogicalType
PGDuckDBEnum::CreateEnumType(std::vector<HeapTuple> &enum_members) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to merge this function ConvertPostgresEnumToDuckEnum. They are so strongly coupled that I don't think it has any benefit to separate them like this. i.e. ConvertPostgresEnumToDuckEnum is the only caller of this function and that this takes a vector of HeapTuples, which should already be sorted. If we'd keep them separate that requirement on already sorted should be documented in this function its comment, but if we merge the functions such a comment is not necessary.

@JelteF JelteF removed this from the 0.2.0 milestone Dec 10, 2024
@JelteF JelteF added this to the 0.3.0 milestone Dec 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants