-
-
Notifications
You must be signed in to change notification settings - Fork 147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix mongo db client to use GridFS API on 16MB exceed file #1077
base: main
Are you sure you want to change the base?
Conversation
Modified to create a snapshot of files larger than 16MB and upload them to MongoDB using GridFS.
jiwon.yum seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
WalkthroughThe changes in this pull request enhance the MongoDB client implementation by modifying the Changes
Assessment against linked issues
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Outside diff range and nitpick comments (5)
server/backend/database/testcases/testcases.go (3)
1623-1623
: Add documentation for exported function.Following Go's documentation standards and the pattern used in other test functions, add a comment explaining the purpose of this test function.
+// RunCreateLargeSnapshotTest runs the CreateLargeSnapshot test for the given db. func CreateLargeSnapshotTest(t *testing.T, db database.Database, projectID types.ID) {
🧰 Tools
🪛 GitHub Check: build
[failure] 1623-1623:
exported: exported function CreateLargeSnapshotTest should have comment or be unexported (revive)
1636-1639
: Consider using random data for more robust testing.The current test uses repeating A-Z characters. Consider using crypto/rand to generate random data for more thorough testing.
-largeData := make([]byte, 16*1024*1024+1) // 16MB + 1 byte -for i := range largeData { - largeData[i] = byte('A' + (i % 26)) // A-Z 반복 -} +largeData := make([]byte, 16*1024*1024+1) // 16MB + 1 byte +if _, err := rand.Read(largeData); err != nil { + t.Fatal(err) +}
1638-1638
: Translate Korean comments to English.For consistency and maintainability, translate the Korean comments to English.
- largeData[i] = byte('A' + (i % 26)) // A-Z 반복 + largeData[i] = byte('A' + (i % 26)) // Repeating A-Z characters - // 스냅샷 생성 및 오류 확인 + // Create snapshot and verify no errorsAlso applies to: 1648-1648
server/backend/database/mongo/client.go (2)
1073-1122
: Translate code comments to English for consistencySeveral comments between lines 1073 and 1122 are in Korean, while the rest of the code comments are in English. For consistency and maintainability, please translate all comments to English.
🧰 Tools
🪛 GitHub Check: build
[failure] 1097-1097:
Error return value ofuploadStream.Close
is not checked (errcheck)
1082-1082
: Improve log message for clarityThe log message
"16MB over!!!"
is not very descriptive. Consider providing a clearer message to help with debugging and maintenance.Apply this diff to improve the log message:
- log.Println("16MB over!!!") + log.Printf("Snapshot size exceeds %d bytes; using GridFS for storage", maxSnapshotSize)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (3)
server/backend/database/mongo/client.go
(2 hunks)server/backend/database/testcases/testcases.go
(1 hunks)test/complex/mongo_client_test.go
(1 hunks)
🧰 Additional context used
🪛 GitHub Check: build
server/backend/database/mongo/client.go
[failure] 1097-1097:
Error return value of uploadStream.Close
is not checked (errcheck)
server/backend/database/testcases/testcases.go
[failure] 1623-1623:
exported: exported function CreateLargeSnapshotTest should have comment or be unexported (revive)
🔇 Additional comments (1)
test/complex/mongo_client_test.go (1)
175-177
: LGTM! Verify the test implementation.
The test case follows the established patterns and is appropriately placed within the sharded DB test suite.
Let's verify the actual test implementation:
✅ Verification successful
Test implementation properly validates GridFS storage for large snapshots
The test implementation is well-designed and correctly validates the core functionality:
- Creates a document with data larger than 16MB (16MB + 1 byte)
- Verifies the snapshot creation succeeds without errors
- Matches the GridFS implementation in the MongoDB client which handles documents > 16MB
The test case effectively covers the PR's objective of fixing snapshot overflow using GridFS.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify the implementation of CreateLargeSnapshotTest
# Check the implementation of CreateLargeSnapshotTest
rg -A 10 "func CreateLargeSnapshotTest" server/backend/database/testcases/
# Verify if the test actually creates a large document
rg -B 2 -A 5 "16.*MB|GridFS" server/backend/database/testcases/
Length of output: 1795
Script:
#!/bin/bash
# Let's verify the complete test implementation and assertions
# Get the full test implementation
rg -B 5 -A 30 "func CreateLargeSnapshotTest" server/backend/database/testcases/testcases.go
# Check if there are any GridFS related implementations in the database package
rg -A 5 "GridFS|gridfs" server/backend/database/
Length of output: 4533
// 스냅샷 생성 및 오류 확인 | ||
err := db.CreateSnapshotInfo(ctx, docRefKey, doc.InternalDocument()) | ||
assert.NoError(t, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add validation for the stored snapshot.
The test only verifies that the snapshot was created without error. Consider adding validation to ensure the stored snapshot matches the original document.
err := db.CreateSnapshotInfo(ctx, docRefKey, doc.InternalDocument())
assert.NoError(t, err)
+
+// Retrieve and validate the stored snapshot
+snapshot, err := db.FindClosestSnapshotInfo(ctx, docRefKey, 0, false)
+assert.NoError(t, err)
+assert.NotNil(t, snapshot)
+
+// Verify the snapshot contains the large field
+var storedDoc document.InternalDocument
+err = snapshot.LoadSnapshot(&storedDoc)
+assert.NoError(t, err)
+assert.Equal(t, largeData, storedDoc.Root().GetBytes("largeField"))
Committable suggestion skipped: line range outside the PR's diff.
if err != nil { | ||
return fmt.Errorf("failed to open GridFS upload stream: %w", err) | ||
} | ||
defer uploadStream.Close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handle error returned by uploadStream.Close()
The deferred call to uploadStream.Close()
does not check for errors. Since Close()
can return an error, it's important to handle it to ensure resources are properly released and any potential errors are captured.
Apply this diff to handle the error:
- defer uploadStream.Close()
+ defer func() {
+ if err := uploadStream.Close(); err != nil {
+ log.Printf("Failed to close GridFS upload stream: %v", err)
+ }
+ }()
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
defer uploadStream.Close() | |
defer func() { | |
if err := uploadStream.Close(); err != nil { | |
log.Printf("Failed to close GridFS upload stream: %v", err) | |
} | |
}() |
🧰 Tools
🪛 GitHub Check: build
[failure] 1097-1097:
Error return value of uploadStream.Close
is not checked (errcheck)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (5)
server/backend/database/mongo/client.go (3)
1078-1081
: Improve constant documentation and logging
- Add documentation for the constant to explain its purpose.
- Enhance the log message to be more descriptive and include the actual size.
Apply this diff:
-const maxSnapshotSize = 16 * 1024 * 1024 // 16MB +// maxSnapshotSize is MongoDB's maximum document size limit (16MB) +const maxSnapshotSize = 16 * 1024 * 1024 if len(snapshot) > maxSnapshotSize { - log.Println("16MB over!!!") + log.Printf("Snapshot size %d bytes exceeds MongoDB's limit of %d bytes, using GridFS", len(snapshot), maxSnapshotSize)
1090-1090
: Improve GridFS filename formatThe current filename format could be enhanced to include more metadata for better organization and debugging.
Apply this diff:
-uploadStream, err := bucket.OpenUploadStream(fmt.Sprintf("%s_snapshot", docRefKey.DocID)) +uploadStream, err := bucket.OpenUploadStream(fmt.Sprintf("%s/%s_snapshot_%d", docRefKey.ProjectID, docRefKey.DocID, doc.Checkpoint().ServerSeq))
1115-1126
: Consider using transactions for snapshot operationsFor data consistency, consider wrapping the snapshot creation (both GridFS and regular) in a transaction to ensure atomicity.
Example transaction implementation:
session, err := c.client.StartSession() if err != nil { return fmt.Errorf("failed to start session: %w", err) } defer session.EndSession(ctx) _, err = session.WithTransaction(ctx, func(sessCtx mongo.SessionContext) (interface{}, error) { // Your snapshot creation code here return nil, nil })server/backend/database/testcases/testcases.go (2)
1623-1623
: Add documentation for exported function.The exported function
RunCreateLargeSnapshotTest
should have a documentation comment explaining its purpose and behavior, following Go best practices and maintaining consistency with other test functions in this file.Add this documentation:
+// RunCreateLargeSnapshotTest runs the CreateSnapshotInfo test for large documents for the given db. func RunCreateLargeSnapshotTest(t *testing.T, db database.Database, projectID types.ID) {
1636-1639
: Define constants and optimize memory usage.The test uses magic numbers for size calculation and could be optimized:
- Define constants for clarity
- Consider using a more memory-efficient way to generate test data
Refactor the data generation:
+const ( + megabyte = 1024 * 1024 + maxDocumentSize = 16 * megabyte +) + -largeData := make([]byte, 16*1024*1024+1) -for i := range largeData { - largeData[i] = byte('A' + (i % 26)) -} +// Create a pattern of 26 bytes (A-Z) +pattern := make([]byte, 26) +for i := range pattern { + pattern[i] = byte('A' + i) +} + +// Create large data by repeating the pattern +largeData := make([]byte, maxDocumentSize+1) +for i := 0; i < len(largeData); i += len(pattern) { + copy(largeData[i:], pattern) +}
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (2)
server/backend/database/mongo/client.go
(2 hunks)server/backend/database/testcases/testcases.go
(1 hunks)
🧰 Additional context used
🪛 golangci-lint (1.61.0)
server/backend/database/mongo/client.go
1094-1094: Error return value of uploadStream.Close
is not checked
(errcheck)
🔇 Additional comments (2)
server/backend/database/mongo/client.go (2)
1082-1095
:
Handle error from uploadStream.Close()
The error returned by uploadStream.Close()
should be handled as it could indicate issues with resource cleanup.
🧰 Tools
🪛 golangci-lint (1.61.0)
1094-1094: Error return value of uploadStream.Close
is not checked
(errcheck)
1096-1114
: Verify GridFS file cleanup strategy
Consider implementing a cleanup strategy for GridFS files when snapshots are no longer needed to prevent storage bloat.
Run this script to check for potential orphaned GridFS files:
✅ Verification successful
Let me search for any cleanup or deletion related code to verify if there's a strategy in place.
Let me search for garbage collection related code since I found a reference to SnapshotDisableGC
in the config.
GridFS cleanup is properly managed through configuration and GC
The codebase has a comprehensive cleanup strategy for snapshots and their associated GridFS files:
- Garbage Collection (GC) is enabled by default (
DefaultSnapshotDisableGC = false
) and can be controlled via the--backend-snapshot-disable-gc
flag - When GC runs during snapshot creation, it removes unnecessary data using
doc.GarbageCollect()
- Additional cleanup is available through
SnapshotWithPurgingChanges
configuration which can delete previous changes when snapshots are created
The implementation follows proper retention practices by:
- Running GC during snapshot creation (as seen in
server/packs/snapshots.go
) - Providing configuration options to control cleanup behavior
- Ensuring synchronized changes are properly tracked before deletion
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for GridFS files without corresponding snapshot records
# List all GridFS files and their IDs
ast-grep --pattern 'bucket.OpenUploadStream($$$)'
# Check snapshot collection references
rg -A 5 'snapshot_file_id'
Length of output: 482
Script:
#!/bin/bash
# Search for GridFS deletion/cleanup related code
rg -i "delete.*snapshot|remove.*snapshot|cleanup.*snapshot" -A 5
# Search for any GridFS bucket deletion operations
ast-grep --pattern 'bucket.Delete($$$)'
# Search for snapshot collection deletion operations
rg "collection\(ColSnapshots\).*Delete"
Length of output: 4936
Script:
#!/bin/bash
# Search for snapshot garbage collection implementation
rg "SnapshotDisableGC" -B 5 -A 10
# Search for any GC related functions for snapshots
ast-grep --pattern 'func $_($$$) {
$$$
snapshot
$$$
}'
# Look for time-based cleanup or retention policies
rg -i "retention.*snapshot|gc.*snapshot|cleanup.*snapshot" -A 5
Length of output: 10481
t.Run("store and validate large snapshot test", func(t *testing.T) { | ||
ctx := context.Background() | ||
docKey := key.Key(fmt.Sprintf("tests$%s", t.Name())) | ||
|
||
clientInfo, _ := db.ActivateClient(ctx, projectID, t.Name()) | ||
bytesID, _ := clientInfo.ID.Bytes() | ||
actorID, _ := time.ActorIDFromBytes(bytesID) | ||
docInfo, _ := db.FindDocInfoByKeyAndOwner(ctx, clientInfo.RefKey(), docKey, true) | ||
|
||
doc := document.New(docKey) | ||
doc.SetActor(actorID) | ||
|
||
largeData := make([]byte, 16*1024*1024+1) | ||
for i := range largeData { | ||
largeData[i] = byte('A' + (i % 26)) | ||
} | ||
|
||
assert.NoError(t, doc.Update(func(root *json.Object, p *presence.Presence) error { | ||
root.SetBytes("largeField", largeData) | ||
return nil | ||
})) | ||
|
||
docRefKey := docInfo.RefKey() | ||
|
||
err := db.CreateSnapshotInfo(ctx, docRefKey, doc.InternalDocument()) | ||
assert.NoError(t, err) | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Enhance test coverage and validation.
The test only verifies that the snapshot was created without error. Consider:
- Validating the stored snapshot matches the original document
- Testing error cases (e.g., even larger documents)
- Adding cleanup of test data
Add validation and error cases:
func RunCreateLargeSnapshotTest(t *testing.T, db database.Database, projectID types.ID) {
- t.Run("store and validate large snapshot test", func(t *testing.T) {
+ t.Run("store large snapshot test", func(t *testing.T) {
// ... existing setup code ...
err := db.CreateSnapshotInfo(ctx, docRefKey, doc.InternalDocument())
assert.NoError(t, err)
+
+ // Validate stored snapshot
+ snapshot, err := db.FindClosestSnapshotInfo(ctx, docRefKey, 0, false)
+ assert.NoError(t, err)
+
+ var storedDoc document.InternalDocument
+ err = snapshot.LoadSnapshot(&storedDoc)
+ assert.NoError(t, err)
+ assert.Equal(t, largeData, storedDoc.Root().GetBytes("largeField"))
})
+
+ t.Run("store extremely large snapshot test", func(t *testing.T) {
+ // Test with even larger document (e.g., 32MB)
+ // Verify appropriate error handling
+ })
+
+ t.Cleanup(func() {
+ // Add cleanup code
+ })
}
Committable suggestion skipped: line range outside the PR's diff.
BSON max snapshot size variable extract as static Remove unnecessary log
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (1)
server/backend/database/mongo/client.go (1)
1112-1123
: Maintain consistent error handling patternsThe error handling in the else branch could be more consistent with the GridFS branch. Consider wrapping the error with more context.
Apply this change:
- return fmt.Errorf("insert snapshot: %w", err) + return fmt.Errorf("failed to insert snapshot into collection: %w", err)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
server/backend/database/mongo/client.go
(3 hunks)
🧰 Additional context used
🪛 golangci-lint (1.61.0)
server/backend/database/mongo/client.go
1091-1091: Error return value of uploadStream.Close
is not checked
(errcheck)
🔇 Additional comments (2)
server/backend/database/mongo/client.go (2)
47-48
: LGTM: Well-defined constant for MongoDB's document size limit.
The constant BSONMaxSnapshotSize
is appropriately named and correctly set to MongoDB's 16MB document size limit.
1078-1123
: LGTM: Clean implementation of GridFS for large snapshots
The implementation successfully:
- Handles snapshots exceeding MongoDB's 16MB limit using GridFS
- Maintains backward compatibility for smaller snapshots
- Properly stores metadata in the snapshots collection
🧰 Tools
🪛 golangci-lint (1.61.0)
1091-1091: Error return value of uploadStream.Close
is not checked
(errcheck)
if len(snapshot) > BSONMaxSnapshotSize { | ||
db := c.client.Database(c.config.YorkieDatabase) | ||
|
||
// create GridFS bucket | ||
bucket, err := gridfs.NewBucket(db) | ||
if err != nil { | ||
return fmt.Errorf("failed to create GridFS bucket: %w", err) | ||
} | ||
|
||
uploadStream, err := bucket.OpenUploadStream(fmt.Sprintf("%s_snapshot", docRefKey.DocID)) | ||
if err != nil { | ||
return fmt.Errorf("failed to open GridFS upload stream: %w", err) | ||
} | ||
defer uploadStream.Close() | ||
|
||
_, err = uploadStream.Write(snapshot) | ||
if err != nil { | ||
return fmt.Errorf("failed to write to GridFS: %w", err) | ||
} | ||
|
||
fileID := uploadStream.FileID | ||
|
||
if _, err := c.collection(ColSnapshots).InsertOne(ctx, bson.M{ | ||
"project_id": docRefKey.ProjectID, | ||
"doc_id": docRefKey.DocID, | ||
"server_seq": doc.Checkpoint().ServerSeq, | ||
"lamport": doc.Lamport(), | ||
"version_vector": doc.VersionVector(), | ||
"snapshot_file_id": fileID, // GridFS file ID | ||
"created_at": gotime.Now(), | ||
}); err != nil { | ||
return fmt.Errorf("insert snapshot info: %w", err) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handle error from uploadStream.Close()
While the GridFS implementation is solid, the deferred Close()
call should handle potential errors.
Apply this fix:
-defer uploadStream.Close()
+defer func() {
+ if err := uploadStream.Close(); err != nil {
+ log.Printf("Failed to close GridFS upload stream: %v", err)
+ }
+}()
Committable suggestion skipped: line range outside the PR's diff.
🧰 Tools
🪛 golangci-lint (1.61.0)
1091-1091: Error return value of uploadStream.Close
is not checked
(errcheck)
Thank you for your contribution. As this appears to be your first contribution, I recommend reviewing our contribution guidelines: https://github.com/yorkie-team/yorkie/blob/main/CONTRIBUTING.md Key focus areas:
I also noticed your PR introduces Snapshot storage using GridFS. We'll need to carefully verify:
yorkie/server/backend/database/database.go Lines 254 to 265 in e863b62
Please update the PR addressing these points. Looking forward to your revisions. |
What this PR does / why we need it:
Modified to create a snapshot of files larger than 16MB and upload them to MongoDB using GridFS.
Which issue(s) this PR fixes:
Fixes #267
Special notes for your reviewer:
Does this PR introduce a user-facing change?:
Additional documentation:
Checklist:
Summary by CodeRabbit
New Features
Tests