Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

idxtype: add idxtype package #139786

Merged
merged 4 commits into from
Jan 28, 2025
Merged

idxtype: add idxtype package #139786

merged 4 commits into from
Jan 28, 2025

Conversation

andy-kimball
Copy link
Contributor

@andy-kimball andy-kimball commented Jan 24, 2025

Add new sql/sem/idxtype package that defines an index type enum with values
like forward, inverted, and vector, as well as "capability" functions describing
what the index can or cannot do. Code across SQL needs to check the index
type and previously it used multiple different enums and bools. This commit
continues the process of unifying these mechanisms (future commits will do
even more).

Epic: CRDB-42943

Release note: None

@andy-kimball andy-kimball requested review from a team as code owners January 24, 2025 22:10
@andy-kimball andy-kimball requested review from jeffswenson and removed request for a team January 24, 2025 22:10
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@andy-kimball
Copy link
Contributor Author

andy-kimball commented Jan 24, 2025

Note that there are 3 commits in this PR: the 1st commit contains mostly Foundations changes and the 2nd commit mostly Queries changes. The 3rd commit adds Supports and Allows methods in the idxtype package, so that we consolidate the code that checks index capabilities.

@andy-kimball andy-kimball removed the request for review from jeffswenson January 25, 2025 00:35
Copy link
Collaborator

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Thanks for cleaning things up!

Reviewed 101 of 101 files at r1, 40 of 40 files at r2, 13 of 13 files at r3, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @andy-kimball and @rafiss)


pkg/sql/sem/idxtype/idxtype.proto line 21 at r1 (raw file):

  FORWARD = 0;
  // INVERTED can contain multiple entries for each row in the table, in order
  // to index collection or composite data types like JSONB, ARRAY, and

nit: missing a word in "... in order to index collection..."?


pkg/sql/sem/idxtype/idxtype.go line 11 at r3 (raw file):

// contains unique keys sorted according to the primary ordering of the table.
// Secondary indexes refer to rows in the primary index by unique key value.
func CanBePrimary(t T) bool {

I'm curious if you thought about making these methods of T instead of functions. It'd make calling them slightly more succinct. Maybe there is an benefit to functions I'm not seeing. Feel free to leave as-is—I don't feel strongly.


pkg/sql/opt/optbuilder/groupby.go line 993 at r3 (raw file):

	for i := 1; i < tab.IndexCount(); i++ {
		index := tab.Index(i)
		if !index.IsUnique() || idxtype.CanBeUnique(index.Type()) {

I think this should be !idxtype.CanBeUnique(index.Type()).


pkg/sql/schemachanger/scbuild/internal/scbuildstmt/create_index.go line 311 at r3 (raw file):

	if columnNode.OpClass != "" && (!lastColIdx || !idxtype.SupportsOpClass(n.Type)) {
		panic(pgerror.New(pgcode.DatatypeMismatch,
			"operator classes are only allowed for the last column of an inverted index"))

nit: "inverted" should be the index type name like above


pkg/sql/schemachanger/scbuild/internal/scbuildstmt/create_index.go line 316 at r3 (raw file):

	if idxtype.AllowExplicitDirection(n.Type) && columnNode.Direction == tree.Descending && lastColIdx {
		panic(pgerror.New(pgcode.FeatureNotSupported,
			"the last column in an inverted index cannot have the DESC option"))

nit: "inverted" should be the index type name like above


pkg/sql/randgen/schema.go line 264 at r3 (raw file):

			continue
		}
		if idxtype.CanBePrimary(indexDef.Type) && pk != nil {

nit: !idxtype.CanBePrimary(indexDef.Type)

@andy-kimball andy-kimball force-pushed the idxtype branch 2 times, most recently from 5730419 to 982504d Compare January 27, 2025 16:29
Copy link
Contributor Author

@andy-kimball andy-kimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner and @rafiss)


pkg/sql/opt/optbuilder/groupby.go line 993 at r3 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

I think this should be !idxtype.CanBeUnique(index.Type()).

You're right. This caused a test failure as well, which shows we have good test coverage here.


pkg/sql/randgen/schema.go line 264 at r3 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

nit: !idxtype.CanBePrimary(indexDef.Type)

Done.


pkg/sql/schemachanger/scbuild/internal/scbuildstmt/create_index.go line 311 at r3 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

nit: "inverted" should be the index type name like above

This one is trickier, b/c it uses "an", where vector needs to use "a". In a future PR where I enable vector indexes, I'll look more closely at this and other cases. For now, I changed the error messages to use "an inverted or vector index".


pkg/sql/schemachanger/scbuild/internal/scbuildstmt/create_index.go line 316 at r3 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

nit: "inverted" should be the index type name like above

Same here.


pkg/sql/sem/idxtype/idxtype.go line 11 at r3 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

I'm curious if you thought about making these methods of T instead of functions. It'd make calling them slightly more succinct. Maybe there is an benefit to functions I'm not seeing. Feel free to leave as-is—I don't feel strongly.

Yeah, I went back and forth. I worry a bit about setting the precedent of having complex methods on a simple idxtype enum. While these methods are simple right now, I worry that people will add more and more complexity over time, e.g. by adding methods that involve coltypes. I was also thinking about a future where we have a much more elaborate index type abstraction, similar to what PG has. But maybe we never get there, and we should just do what is most ergonomic right now - I went ahead and switched it.


pkg/sql/sem/idxtype/idxtype.proto line 21 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

nit: missing a word in "... in order to index collection..."?

I reworded this.

@andy-kimball
Copy link
Contributor Author

What do people think about defining the alias type IndexType = idxtype.T? While on the one hand it seems to follow the precedent of tree defining its own types, it also means that devs will have the cognitive burden of deciding whether to use, e.g., idxtype.INVERTED vs. tree.IndexTypeInverted. This is not a hard decision in the tree package itself, but it's more subtle in the schema change packages, where both are used. I end up using tree.IndexType where I'm interacting with the AST and idxtype.T where I'm not, but it's really easy to mix them up. Should we just use idxtype directly in the tree package?

Copy link
Collaborator

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should get rid of the alias.

Reviewed 51 of 51 files at r4, 38 of 38 files at r5, 14 of 16 files at r6, 3 of 3 files at r7, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @rafiss)


pkg/sql/schemachanger/scbuild/internal/scbuildstmt/create_index.go line 311 at r3 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

This one is trickier, b/c it uses "an", where vector needs to use "a". In a future PR where I enable vector indexes, I'll look more closely at this and other cases. For now, I changed the error messages to use "an inverted or vector index".

SGTM.


pkg/sql/sem/idxtype/idxtype.go line 11 at r3 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

Yeah, I went back and forth. I worry a bit about setting the precedent of having complex methods on a simple idxtype enum. While these methods are simple right now, I worry that people will add more and more complexity over time, e.g. by adding methods that involve coltypes. I was also thinking about a future where we have a much more elaborate index type abstraction, similar to what PG has. But maybe we never get there, and we should just do what is most ergonomic right now - I went ahead and switched it.

👍


pkg/sql/sem/idxtype/idxtype.proto line 21 at r1 (raw file):

Previously, andy-kimball (Andy Kimball) wrote…

I reworded this.

Ahh I see now that it was grammatically correct the first time, I was confusing "index" as a noun instead of a verb. It's easier to read now though.

Copy link
Collaborator

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with not having the alias. We definitely should be pretty aggressive with the disallowed_imports in the idxtype/BUILD.bazel file to ensure that it remains a lightweight leaf package.

tree already has its own disallowed list. the new package should at least match that, but perhaps be even more aggressive if it can be.

disallowed_imports_test(
"tree",
disallowed_list = [
"//pkg/kv",
"//pkg/roachpb",
"//pkg/security",
"//pkg/server",
"//pkg/sql/sessiondata",
"//pkg/storage",
"//pkg/util/log",
],
disallowed_prefixes = [
"pkg/sql/catalog",
],

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @andy-kimball)

@andy-kimball andy-kimball requested review from DarrylWong, angles-n-daemons, arjunmahishi, aa-joshi, jbowens and wenyihu6 and removed request for a team January 27, 2025 20:08
Copy link

blathers-crl bot commented Jan 27, 2025

Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks.

It looks like your PR touches SQL parser code but doesn't add or edit parser tests. Please make sure you add or edit parser tests if you edit the parser.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

Copy link

blathers-crl bot commented Jan 27, 2025

It looks like your PR touches SQL parser code but doesn't add or edit parser tests. Please make sure you add or edit parser tests if you edit the parser.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@andy-kimball
Copy link
Contributor Author

I added a commit that uses idxtype directly in tree. I also added a BUILD.bazel rule to the idxtype package that ensures it stays a leaf package.

@andy-kimball andy-kimball force-pushed the idxtype branch 2 times, most recently from a1711fc to 86d982a Compare January 27, 2025 21:24
Copy link
Collaborator

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for this organizational improvement!

@andy-kimball andy-kimball force-pushed the idxtype branch 2 times, most recently from 774b98f to 523ee6c Compare January 27, 2025 23:25
Copy link
Collaborator

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 301 files at r14, 3 of 56 files at r15.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @aa-joshi, @andy-kimball, @angles-n-daemons, @arjunmahishi, @DarrylWong, @herkolategan, @jbowens, and @wenyihu6)


-- commits line 44 at r18:
nit: Looks like the leaf package check was included in the previous commit, not this one.

@andy-kimball andy-kimball force-pushed the idxtype branch 2 times, most recently from 5993bd5 to 0ffa011 Compare January 28, 2025 00:34
Add new sql/sem/idxtype package that defines an index type enum with values
like forward, inverted, and vector. Code across SQL needs to check the index
type and previously it used multiple different enums and bools. This commit
continues the process of unifying these mechanisms (future commits will do
even more).

Epic: CRDB-42943

Release note: None
Now that we have the new idxtype.T enumeration that specifies the type
of index (e.g. forward, inverted, vector), use that in the opt packages
rather than IsInverted and IsVector boolean functions. This will make
our code more extensible, such that adding new index types is easier,
starting with vector indexes.

Epic: CRDB-42943

Release note: None
Add helper functions that specify the capabilities of various index
types. For example, only FORWARD indexes can be primary or unique,
and only INVERTED and VECTOR indexes support prefix columns.

Use these new helper functions across the SQL packages.

Epic: CRDB-42943

Release note: None
Update the tree package to use idxtype.T directly rather than using an
alias for it. Add a check to the idxtype BUILD.bazel package that ensures
it stays a leaf package, so that the tree package doesn't take on
additional dependencies beyond what idxtype needs.

Epic: CRDB-42943

Release note: None
@andy-kimball
Copy link
Contributor Author

bors=rafiss,mgartner

@mgartner
Copy link
Collaborator

bors r=rafiss,mgartner

@craig craig bot merged commit 5043527 into cockroachdb:master Jan 28, 2025
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants