Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

br/storage: enable async prefetch data #48587

Merged
merged 21 commits into from
Nov 24, 2023
Merged

Conversation

lance6716
Copy link
Contributor

@lance6716 lance6716 commented Nov 14, 2023

What problem does this PR solve?

Issue Number: close #48781

Problem Summary:

What is changed and how it works?

offload the network reading from the main goroutine of merge iterator

image

Still need to improve the performance in future.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No need to test
    • I checked and no code files have been changed.

use this command to run the specific "benchmark" unit test

go test ./br/pkg/lightning/backend/external -v --tags=intest -test.run TestCompareReader --testing-storage-uri "xxx"

On master with ks3

    bench_test.go:414: merge iter read speed for 1258291200 bytes: 109.49 MB/s

This PR with ks3

    bench_test.go:412: merge iter read speed for 1258291200 bytes: 121.49 MB/s

This PR with memstore

    bench_test.go:412: merge iter read speed for 1258291200 bytes: 163.92 MB/s

And I checked the heap usage, I have set 64MB in the unit test, the heap does not exceed it.

image

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

Signed-off-by: lance6716 <[email protected]>
Signed-off-by: lance6716 <[email protected]>
Signed-off-by: lance6716 <[email protected]>
Signed-off-by: lance6716 <[email protected]>
@ti-chi-bot ti-chi-bot bot added do-not-merge/invalid-title do-not-merge/needs-linked-issue do-not-merge/needs-tests-checked do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Nov 14, 2023
Copy link

tiprow bot commented Nov 14, 2023

Hi @lance6716. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link

codecov bot commented Nov 14, 2023

Codecov Report

Merging #48587 (d731f28) into master (69028f1) will increase coverage by 1.7708%.
Report is 92 commits behind head on master.
The diff coverage is 83.5294%.

Additional details and impacted files
@@               Coverage Diff                @@
##             master     #48587        +/-   ##
================================================
+ Coverage   71.4085%   73.1794%   +1.7708%     
================================================
  Files          1404       1438        +34     
  Lines        407209     418451     +11242     
================================================
+ Hits         290782     306220     +15438     
+ Misses        96471      93351      -3120     
+ Partials      19956      18880      -1076     
Flag Coverage Δ
integration 44.3869% <0.0000%> (?)
unit 71.5862% <83.5294%> (+0.1777%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 53.9663% <ø> (-0.0212%) ⬇️
parser ∅ <ø> (∅)
br 48.8598% <59.2592%> (-4.2218%) ⬇️

Signed-off-by: lance6716 <[email protected]>
Signed-off-by: lance6716 <[email protected]>
Signed-off-by: lance6716 <[email protected]>
Signed-off-by: lance6716 <[email protected]>
Signed-off-by: lance6716 <[email protected]>
Signed-off-by: lance6716 <[email protected]>
Signed-off-by: lance6716 <[email protected]>
Signed-off-by: lance6716 <[email protected]>
Signed-off-by: lance6716 <[email protected]>
Signed-off-by: lance6716 <[email protected]>
@ti-chi-bot ti-chi-bot bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Nov 14, 2023
@lance6716 lance6716 changed the title [WIP] br/storage: enable async prefetch data Nov 14, 2023
@ti-chi-bot ti-chi-bot bot removed do-not-merge/invalid-title do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. do-not-merge/needs-tests-checked labels Nov 14, 2023
@lance6716 lance6716 added the skip-issue-check Indicates that a PR no need to check linked issue. label Nov 14, 2023
Signed-off-by: lance6716 <[email protected]>
Signed-off-by: lance6716 <[email protected]>
Signed-off-by: lance6716 <[email protected]>
@ywqzzy
Copy link
Contributor

ywqzzy commented Nov 23, 2023

/cc @ywqzzy

@ti-chi-bot ti-chi-bot bot requested a review from ywqzzy November 23, 2023 06:21
buf := r.buf[r.bufIdx]
n, err := r.r.Read(buf)
buf = buf[:n]
r.bufCh <- buf
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might block forever if we call close before read

and we should wait this routine exist on close

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@D3Hunter D3Hunter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rest lgtm

Signed-off-by: lance6716 <[email protected]>
return r.r.Close()
ret := r.r.Close()
r.closeOnce.Do(func() {
close(r.closed)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't close reader twice, seems no need this closeOnce

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I simply checked s3.go. In some cases s3ObjectReader will be called by line 893 and line 911

tidb/br/pkg/storage/s3.go

Lines 872 to 905 in 3543275

func (r *s3ObjectReader) Read(p []byte) (n int, err error) {
maxCnt := r.rangeInfo.End + 1 - r.pos
if maxCnt > int64(len(p)) {
maxCnt = int64(len(p))
}
n, err = r.reader.Read(p[:maxCnt])
// TODO: maybe we should use !errors.Is(err, io.EOF) here to avoid error lint, but currently, pingcap/errors
// doesn't implement this method yet.
if err != nil && errors.Cause(err) != io.EOF && r.retryCnt < maxErrorRetries { //nolint:errorlint
// if can retry, reopen a new reader and try read again
end := r.rangeInfo.End + 1
if end == r.rangeInfo.Size {
end = 0
}
_ = r.reader.Close()
newReader, _, err1 := r.storage.open(r.ctx, r.name, r.pos, end)
if err1 != nil {
log.Warn("open new s3 reader failed", zap.String("file", r.name), zap.Error(err1))
return
}
r.reader = newReader
r.retryCnt++
n, err = r.reader.Read(p[:maxCnt])
}
r.pos += int64(n)
return
}
// Close implement the io.Closer interface.
func (r *s3ObjectReader) Close() error {
return r.reader.Close()
}

So I think we should tolerate the caller call Close() for multiple times. But all reader methods are not thread safe, I will change it to a bool rather than sync.Once

@@ -88,7 +100,12 @@ func (r *Reader) Read(data []byte) (int, error) {
}
}

// Close implements io.Closer.
// Close implements io.Closer. Close should not be called concurrently with Read.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems most io.Reader is thread unsafe

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

io.Reader interface does not limit the invocation, for example, the reader of io.Pipe is thread safe. So I want to notice the caller about safety.

func (r *Reader) Close() error {
return r.r.Close()
ret := r.r.Close()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if the undering reader is safe to call close/read concurrently

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can't assume it so I added the comment

require.EqualValues(t, 11, n)
_, err = r.Read(buf)
require.ErrorIs(t, err, io.EOF)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a case that len(read buf) > len(source []byte)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Signed-off-by: lance6716 <[email protected]>
@ti-chi-bot ti-chi-bot bot added the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Nov 24, 2023
Copy link

ti-chi-bot bot commented Nov 24, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: 3pointer, D3Hunter

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added approved lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Nov 24, 2023
Copy link

ti-chi-bot bot commented Nov 24, 2023

[LGTM Timeline notifier]

Timeline:

  • 2023-11-24 02:28:30.212715299 +0000 UTC m=+544138.877941487: ☑️ agreed by D3Hunter.
  • 2023-11-24 02:34:39.402813724 +0000 UTC m=+544508.068039919: ☑️ agreed by 3pointer.

@lance6716
Copy link
Contributor Author

/retest

Copy link

tiprow bot commented Nov 24, 2023

@lance6716: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@lance6716
Copy link
Contributor Author

/retest

Copy link

tiprow bot commented Nov 24, 2023

@lance6716: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@ti-chi-bot ti-chi-bot bot merged commit c771e8b into pingcap:master Nov 24, 2023
10 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved lgtm release-note-none Denotes a PR that doesn't merit a release note. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. skip-issue-check Indicates that a PR no need to check linked issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

prefetch data from network at background
4 participants