-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql/rowenc: reduce index key prefix calls #139256
base: master
Are you sure you want to change the base?
Conversation
152b12d
to
45ba888
Compare
using benchdiff:
we seem to consistently get an improvement for allocs/op for our |
rebased with the concurrency updates on master:
we now consistently see both 10k and 1mil improvements in allocs/op |
45ba888
to
5570dcb
Compare
5570dcb
to
0dee148
Compare
i'm still not 100% sure why yet, but the inverted index tests were failing due to the shared key prefixes change; specifically, we need to look at a case where the inverted column (?term) has dupes:
when we are creating our batch of index entries, an encoding for (1, [], ...) gets created, making our entries look something like i tried to see if this was an instance of me assuming something is passed/call by value instead of by ref, but no luck so far |
0dee148
to
9f62d46
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @annrpom)
pkg/sql/rowenc/index_encoding.go
line 1651 at r3 (raw file):
tableDesc catalog.TableDescriptor, indexes []catalog.Index, keyPrefixMap map[descpb.IndexID][]byte,
instead of making this a map, let's make this a slice. the function comment should be updated to say that the indexes
and the keyPrefixes
slice should both have the same ordering. i expect that to result in less overhead, since looking up in a slice is cheaper than looking up in a map.
we can create this keyPrefixes
slice in the same place where we create the indexesToEncode
slice:
cockroach/pkg/sql/backfill/backfill.go
Lines 775 to 778 in cca9535
ib.indexesToEncode = ib.added | |
if len(ib.predicates) > 0 { | |
ib.indexesToEncode = make([]catalog.Index, 0, len(ib.added)) | |
} |
cockroach/pkg/sql/backfill/backfill.go
Lines 944 to 965 in cca9535
ib.indexesToEncode = ib.indexesToEncode[:0] | |
for _, idx := range ib.added { | |
if !idx.IsPartial() { | |
// If the index is not a partial index, all rows should have | |
// an entry. | |
ib.indexesToEncode = append(ib.indexesToEncode, idx) | |
continue | |
} | |
// If the index is a partial index, only include it if the | |
// predicate expression evaluates to true. | |
texpr := ib.predicates[idx.GetID()] | |
val, err := eval.Expr(ctx, ib.evalCtx, texpr) | |
if err != nil { | |
return nil, nil, memUsedPerChunk, err | |
} | |
if val == tree.DBoolTrue { | |
ib.indexesToEncode = append(ib.indexesToEncode, idx) | |
} | |
} |
unrelated to this PR, that second snippet makes me realize how much more expensive this is if any partial indexes are being backfilled. we have to reallocate re-populate ib.indexesToEncode
for each row in that case.
This patch removes redundant calls to `MakeIndexKeyPrefix` during the construction of `IndexEntry`s by saving each first-time call in a map that we can later lookup. Previously, we would make this call for each row; however, as the prefix (table id + index id) for a particular index remains the same, we do not need to do any recomputation. Epic: CRDB-42901 Fixes: cockroachdb#137798 Release note: None
Epic: none Release note: None
9f62d46
to
9017591
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @rafiss and @yuzefovich)
pkg/sql/rowenc/index_encoding.go
line 1651 at r3 (raw file):
Previously, rafiss (Rafi Shamim) wrote…
instead of making this a map, let's make this a slice. the function comment should be updated to say that the
indexes
and thekeyPrefixes
slice should both have the same ordering. i expect that to result in less overhead, since looking up in a slice is cheaper than looking up in a map.we can create this
keyPrefixes
slice in the same place where we create theindexesToEncode
slice:cockroach/pkg/sql/backfill/backfill.go
Lines 775 to 778 in cca9535
ib.indexesToEncode = ib.added if len(ib.predicates) > 0 { ib.indexesToEncode = make([]catalog.Index, 0, len(ib.added)) } cockroach/pkg/sql/backfill/backfill.go
Lines 944 to 965 in cca9535
ib.indexesToEncode = ib.indexesToEncode[:0] for _, idx := range ib.added { if !idx.IsPartial() { // If the index is not a partial index, all rows should have // an entry. ib.indexesToEncode = append(ib.indexesToEncode, idx) continue } // If the index is a partial index, only include it if the // predicate expression evaluates to true. texpr := ib.predicates[idx.GetID()] val, err := eval.Expr(ctx, ib.evalCtx, texpr) if err != nil { return nil, nil, memUsedPerChunk, err } if val == tree.DBoolTrue { ib.indexesToEncode = append(ib.indexesToEncode, idx) } } unrelated to this PR, that second snippet makes me realize how much more expensive this is if any partial indexes are being backfilled. we have to reallocate
ib.indexesToEncode
for each row in that case.
ok done; can you check me on this change i made:
cockroach/pkg/sql/backfill/backfill.go
Lines 970 to 972 in 774bc22
} else { | |
ib.keyPrefixes[i] = nil | |
} |
my thought was that unless we clear out these key prefixes, our indexes
and keyPrefixes
slice are not guaranteed to maintain the same ordering as eachother (since in the case of partial indexes, we only include the index if the pred evals to true for that row). though this would mean no improvement in the case of any partial indexes being backfilled. maybe there is another way to handle this case with slices, wdyt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @annrpom and @yuzefovich)
pkg/sql/rowenc/index_encoding.go
line 1651 at r3 (raw file):
Previously, annrpom (annie pompa) wrote…
ok done; can you check me on this change i made:
cockroach/pkg/sql/backfill/backfill.go
Lines 970 to 972 in 774bc22
} else { ib.keyPrefixes[i] = nil } my thought was that unless we clear out these key prefixes, our
indexes
andkeyPrefixes
slice are not guaranteed to maintain the same ordering as eachother (since in the case of partial indexes, we only include the index if the pred evals to true for that row). though this would mean no improvement in the case of any partial indexes being backfilled. maybe there is another way to handle this case with slices, wdyt
i believe the logic works, but IMO it's hard to understand the code for two reasons:
- making the computation lazy makes it harder to understand which parts of the code will actually compute the value.
- modifying function parameters makes it harder to reason about what the function's inputs and outputs are.
i left a suggestion above
pkg/sql/backfill/backfill.go
line 780 at r5 (raw file):
// reset in BuildIndexEntriesChunk for every row added. ib.indexesToEncode = ib.added ib.keyPrefixes = make([][]byte, len(ib.added))
instead of computing the prefixes lazily inside of EncodeSecondaryIndexes
, they can all just be computed here with an additional loop.
pkg/sql/backfill/backfill.go
line 970 at r5 (raw file):
if val == tree.DBoolTrue { ib.indexesToEncode = append(ib.indexesToEncode, idx) } else {
IMO this would be easier to understand if the keyPrefixes
logic mirrored the indexesToEncode
logic:
if len(ib.predicates) > 0 {
ib.indexesToEncode = ib.indexesToEncode[:0]
ib.keyPrefixes = ib.keyPrefixes[:0]
for i, idx := range ib.added {
if !idx.IsPartial() {
// If the index is not a partial index, all rows should have
// an entry.
ib.indexesToEncode = append(ib.indexesToEncode, idx)
ib.keyPrefixes = append(ib.keyPrefixes, MakeIndexKeyPrefix(...))
continue
}
// If the index is a partial index, only include it if the
// predicate expression evaluates to true.
texpr := ib.predicates[idx.GetID()]
val, err := eval.Expr(ctx, ib.evalCtx, texpr)
if err != nil {
return nil, nil, memUsedPerChunk, err
}
if val == tree.DBoolTrue {
ib.indexesToEncode = append(ib.indexesToEncode, idx)
ib.keyPrefixes = append(ib.indexesToEncode, MakeIndexKeyPrefix(...))
}
}
}
we can worry about optimizing this for partial indexes later (for example, by saving all of the prefixes somewhere so we can avoid calling MakeIndexKeyPrefix
for each row).
sql/rowenc: reduce index key prefix calls
This patch removes redundant calls to
MakeIndexKeyPrefix
duringthe construction of
IndexEntry
s by saving each first-time call in amap that we can later lookup. Previously, we would make this call
for each row; however, as the prefix (table id + index id) for a
particular index remains the same, we do not need to do any
recomputation.
Epic: CRDB-42901
Fixes: #137798
Release note: None
sql/rowexec: run BenchmarkIndexBackfill on-disk
Epic: none
Release note: None