Add unsafe little endian loaders #1036

klauspost · 2025-01-05T19:49:56Z

Benchmarks without assembly (may be a bit noisy)

deflate:

BEFORE:
github-june-2days-2019.json     gzkp    1       6273951764      1073607045      17441   343.04
github-june-2days-2019.json     gzkp    2       6273951764      1045461954      24258   246.65
github-june-2days-2019.json     gzkp    3       6273951764      1030139729      21752   275.06
github-june-2days-2019.json     gzkp    4       6273951764      992526317       25868   231.29
github-june-2days-2019.json     gzkp    5       6273951764      938015731       28992   206.38
github-june-2days-2019.json     gzkp    6       6273951764      918717756       32863   182.07
github-june-2days-2019.json     gzkp    7       6273951764      924473679       42332   141.34
github-june-2days-2019.json     gzkp    8       6273951764      905294390       53014   112.86
github-june-2days-2019.json     gzkp    9       6273951764      895561157       100686  59.43
github-june-2days-2019.json     gzkp    -2      6273951764      4097019597      12499   478.70
github-june-2days-2019.json     gzkp    -3      6273951764      1175153215      24140   247.85

AFTER:
github-june-2days-2019.json     gzkp    1       6273951764      1073607045      16584   360.79
github-june-2days-2019.json     gzkp    2       6273951764      1045461954      19113   313.04
github-june-2days-2019.json     gzkp    3       6273951764      1030139729      20420   293.00
github-june-2days-2019.json     gzkp    4       6273951764      992526317       23619   253.32
github-june-2days-2019.json     gzkp    5       6273951764      938015731       26842   222.90
github-june-2days-2019.json     gzkp    6       6273951764      918717756       30541   195.90
github-june-2days-2019.json     gzkp    7       6273951764      924473679       43810   136.57
github-june-2days-2019.json     gzkp    8       6273951764      905294390       73933   80.93
github-june-2days-2019.json     gzkp    9       6273951764      895561157       98379   60.82
github-june-2days-2019.json     gzkp    -2      6273951764      4097019597      13439   445.20
github-june-2days-2019.json     gzkp    -3      6273951764      1175153215      22819   262.20

zstd:

github-june-2days-2019.json     zskp    1       6273951764      697439481       9378    637.96
github-june-2days-2019.json     zskp    2       6273951764      610876538       12416   481.87
github-june-2days-2019.json     zskp    3       6273951764      545382443       40775   146.74
github-june-2days-2019.json     zskp    4       6273951764      522934301       114291  52.35

github-june-2days-2019.json     zskp    1       6273951764      697439481       8325    718.69
github-june-2days-2019.json     zskp    2       6273951764      610876538       9905    604.04
github-june-2days-2019.json     zskp    3       6273951764      545382443       29954   199.74
github-june-2days-2019.json     zskp    4       6273951764      522934301       111174  53.82

s2:

github-june-2days-2019.json     s2      1       6273951764      1041705230      522     11443.55
github-june-2days-2019.json     s2      2       6273951764      944873043       1248    4793.24
github-june-2days-2019.json     s2      3       6273951764      826384742       9999    598.37

github-june-2days-2019.json     s2      1       6273951764      1041705230      464     12868.90
github-june-2days-2019.json     s2      2       6273951764      944873043       861     6947.69
github-june-2days-2019.json     s2      3       6273951764      826384742       9335    640.94

Summary by CodeRabbit

New Features
- Introduced a new le package for flexible integer type handling.
- Added new functions for loading and storing binary data in little-endian format.
- Enhanced test coverage with nounsafe build tag in GitHub Actions workflow.
Refactor
- Updated byte loading mechanisms across multiple packages.
- Replaced encoding/binary imports with custom internal/le package.
- Modified bit reader and decoder offset handling.
- Adjusted decoding logic to utilize cursor for state management.
- Removed outdated comments regarding bounds checks in code.
Chores
- Updated build constraints and import statements.
- Refined error handling in decoding processes.
- Adjusted assembly code offsets for improved performance.
- Updated Go version from 1.19 to 1.21 in module file.
Tests
- Simplified error reporting in decompression tests.

Benchmarks pending

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (5)

internal/le/le.go (1)

3-5: Flexible Indexer interface.

This union type is a neat approach for generics. Just confirm you need all these integer variants, since broad coverage might allow usage in unintended ways.

internal/le/unsafe_disabled.go (1)

17-19: Load64 fallback.

Same pattern: length checks avoid potential slicing mistakes.
zstd/bitreader.go (2)
22-22: New off field introduced.

Renaming off to byteOffset or cursor might improve clarity.

Here’s a potential diff:
 type bitReader struct {
 	in       []byte
 	value    uint64
-	off      int
+	byteOffset int
 	bitsRead uint8
 }
106-106: finished() check uses offset and bitsRead.

Now that you rely on off == 0, confirm that the read buffer has truly been consumed. If bitsRead < 64 but we haven't set off to zero, partial data might remain.
.github/workflows/go.yml (1)
131-146: Consider parameterizing common test configurations.

The S2 fuzzing tests have repeated patterns with similar configurations. Consider using GitHub Actions matrix strategy to reduce duplication.

Example refactor:
strategy:
  matrix:
    fuzz_test: [FuzzDictBlocks, FuzzEncodingBlocks, FuzzLZ4Block]
    tags: [nounsafe, 'noasm,nounsafe']
steps:
  - name: S2/${{ matrix.fuzz_test }}${{ matrix.tags == 'noasm,nounsafe' && '/noasm' || '' }}
    run: go test -tags=${{ matrix.tags }} -run=none -fuzz=${{ matrix.fuzz_test }} -fuzztime=${{ matrix.fuzz_test == 'FuzzDictBlocks' && '100000x' || '500000x' }} -test.fuzzminimizetime=10ms ./s2/.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6ad807b and 179af51.

📒 Files selected for processing (15)

.github/workflows/go.yml (4 hunks)
flate/fast_encoder.go (2 hunks)
flate/fuzz_test.go (0 hunks)
internal/le/le.go (1 hunks)
internal/le/unsafe_disabled.go (1 hunks)
internal/le/unsafe_enabled.go (1 hunks)
s2/encode_all.go (1 hunks)
zstd/_generate/gen.go (2 hunks)
zstd/bitreader.go (6 hunks)
zstd/decoder.go (1 hunks)
zstd/matchlen_generic.go (1 hunks)
zstd/seqdec.go (1 hunks)
zstd/seqdec_amd64.s (16 hunks)
zstd/seqdec_generic.go (1 hunks)
zstd/zstd.go (2 hunks)

💤 Files with no reviewable changes (1)

flate/fuzz_test.go

🔇 Additional comments (60)

internal/le/unsafe_enabled.go (3)

1-4: Validate build constraints and selective unsafe usage.

You're targeting 64-bit LE architectures while explicitly excluding nounsafe, purego, and appengine. This approach is valid for performance-critical scenarios but ensure these constraints match your broader build strategy (e.g., are you expecting code coverage or usage in non-64-bit or safe builds?).

Would you like a script to find all references to these build tags across your repository to confirm consistency?

5-9: Import statement is minimal and clear.

Keeping imports confined to just unsafe here is appropriate; ensures you don't accidentally rely on binary.LittleEndian when using the unsafe approach.

17-21: Replicate the same safety check in Load32.

Similar to Load16, consider a safeguard if int(i) + 4 > len(b).

Would you like me to generate a script to scan for all calls to Load32 across the codebase to confirm they provide correct indexing?

internal/le/le.go (1)

1-2: Package definition is concise.

A straightforward approach for an internal package that provides LE loading/storing functionality.

internal/le/unsafe_disabled.go (7)

1-2: Build constraints ensure fallback on non-64-bit or no-unsafe builds.

This is essential to maintain portability across architectures. Good usage of negative build constraints.

3-4: Package declaration.

Good to keep everything under le for consistent naming.

5-7: Fallback to encoding/binary is valid and stable.

Relying on standard library ensures correctness for unsafe_disabled scenarios.

9-11: Load16 fallback.

Similar to unsafe_enabled.go, consider verifying i < len(b) with enough space for 2 bytes. Possibly out-of-scope if all callers guarantee length.

13-15: Load32 fallback.

Check array length if you want robust out-of-bounds handling.

21-23: Store16 fallback.

Ensure len(b) >= 2 in the event of potential misuse.

25-27: Store32 fallback.

Likewise, ensure len(b) >= 4.

zstd/matchlen_generic.go (2)

11-12: Switched from binary.LittleEndian to le package.

Great for consistency with the new le methods.

19-19: Use of Load64 for bitwise comparison.

This is valid for matching prefix lengths. Just ensure a and b each have at least 8 bytes in these loops.

zstd/bitreader.go (5)

12-13: Transition to internal/le for reading integers.

This aligns with the new approach and unifies the code.

37-37: Initialize offset with len(in).

This ensures reading proceeds from the end of the slice. Good usage to align with the bitstream logic.

73-74: Offset-based read logic in fill methods.

You’ve updated each fill function to decrement off by 4 or 8 before loading. Ensure no negative offsets are possible. If off < 4 or off < 8 unexpectedly, we risk an out-of-range read.

If you wish, I can provide a script to search the codebase for any direct calls that might lead to insufficient data prior to calling these methods.

Also applies to: 80-81, 90-100

116-116: remain() reflects offset-based bits.

Implementation is correct given the new off usage, but if partial bytes are left unfilled or if reads were misaligned, we might want a sturdier approach.

123-123: Reset off upon close.

This is helpful to avoid accidental reuse.

zstd/zstd.go (3)

11-12: Import usage looks good.
Switching to the github.com/klauspost/compress/internal/le package is consistent with the broader refactoring effort and aligns with the goal of using specialized utilities for little-endian operations.

114-114: Validate performance gains and potential endianness assumptions.
While le.Load32 may offer faster reads, confirm consistent usage across big-endian environments or confirm that the library explicitly requires little-endian compatibility.

118-118: Similar verification for Load64 usage.
Likewise, ensure that all usage of le.Load64 is tested on platforms or configurations that may not be purely little-endian in hardware.

flate/fast_encoder.go (3)

10-11: Refactor to remove redundant imports.
Removing encoding/binary in favor of the le package helps maintain code consistency and possibly improves performance, but confirm that all references to binary.LittleEndian have been updated correctly elsewhere.

62-62: Check for potential pointer alignment issues.
Using le.Load32 is likely safe, but if there's any pointer arithmetic or alignment concerns, ensure they are properly handled in all build configurations (including with nounsafe).

66-66: Perform thorough testing of 64-bit reads.
Ensure automated tests cover boundary conditions (e.g., reading near slice ends).

zstd/seqdec_generic.go (1)

32-32: Confirm logic correctness of if br.off > 4+((maxOffsetBits+16+16)>>3) condition.
Changing from a length-based check to an offset-based check may affect certain edge cases in decoding, so validate via robust tests, including boundary conditions.

zstd/seqdec.go (1)

248-248: Ensure consistency in offset-based fast path usage.
Similar to seqdec_generic.go, confirm that comparing br.off instead of buffer length is correct for all decoding scenarios.

zstd/decoder.go (1)

326-326: Ensure concurrency safety when resetting offset
This assignment resets the bit reader's offset to zero. Confirm that no other goroutines rely on the previous offset.

s2/encode_all.go (3)

13-14: Import statement is consistent with project structure
The move to the le package aligns with the overall refactoring strategy.

18-18: Validate that le.Load32 handles boundary conditions
Ensure that le.Load32 is safe for edge cases (e.g., near end-of-slice).

22-22: Validate that le.Load64 handles boundary conditions
Ensure that le.Load64 is safe for edge cases (e.g., near end-of-slice).

zstd/_generate/gen.go (2)

160-160: Capture newly introduced offset field
Loading br.Field("off") into brOffset is an important change. Confirm that all references to the old pointer usage are removed.

441-441: Properly storing the offset
Storing brOffset into br.Field("off") ensures the bit reader’s state is consistent upon return.

zstd/seqdec_amd64.s (24)

10-10: Offset-loading ensures updated struct layout
“MOVBQZX 40(CX), BX” suggests a shift in field alignment. Double-check that structures match offsets.

12-12: Initialize SI from new field
Line 12 loads from offset 32, confirming the new layout. Validate the old references’ removal.

302-303: Persist updated bit reader fields
“MOVB BL, 40(AX)” and “MOVQ SI, 32(AX)” store the new bit count and pointer offset back.

338-338: Load bit count from 40(CX)
Reinforces the updated field usage for the bit count.

340-340: Set offset from 32(CX)
Check that the pointer arithmetic remains correct after the struct changes.

601-602: Store updated bit count & offset
Lines 601-602 finalize the bit reader’s state. Verify no concurrency hazards exist.

637-637: Adopt new offset reading
“MOVBQZX 40(BX), DX” reads the offset or bits count from the new field offset. Confirm it matches the structure.

639-639: Initialize CX from 32(BX)
Similar check for the pointer offset field.

887-888: Write back updated fields
Storing AX to 24(CX), DL to 40(CX), and BX to 32(CX) shows consistent usage of the changed offsets.

923-923: Load offset from 40(BX)
Double-check that the bit count references are correct for 64-bit registers.

925-925: Extract pointer from 32(BX)
Ensure alignment constraints are still valid.

1144-1145: Amend registers with new offsets
“Saves AX to 24(CX), DL to 40(CX), BX to 32(CX).” This continues the pattern of storing updated offsets.

1790-1790: Load bit count from 40(CX)
Ensure a correct mask or shift is applied elsewhere to interpret it properly.

1792-1792: Load offset from 32(CX)
Validate concurrency usage if multiple decoders share the same structure.

2284-2285: Write updated offset fields
Check that the values put into 24(AX), 40(AX), and 32(AX) do not conflict with any leftover references to old structure member indices.

2352-2352: Switch to 40(BX) offset
Bit count read from a new place.

2354-2354: Load pointer from 32(BX)
Double-check pointer addition logic downstream.

2804-2805: Use updated fields
Storing AX→24(CX), DL→40(CX), BX→32(CX) ensures consistent state for the bit reader fields with the new layout.

2872-2872: Adopt new offset at 40(CX)
Keep an eye on potential side effects if external code expects the old offset.

2874-2874: Update pointer from 32(CX)
Confirm that all call sites have been updated to match these new offsets.

3468-3469: Apply final offset writes
“MOVB BL, 40(AX)” and “MOVQ SI, 32(AX)” are storing the new bit count and the pointer offset.

3536-3536: Read bits from 40(BX)
A continuing pattern for offset bit count retrieval. No immediate issues.

3538-3538: Pointer offset from 32(BX)
Double-check that pointer manipulation is correct given the new offset scheme.

4090-4091: Complete final offset store
Writes final offset and bit count. Confirm the state is consistent throughout the entire decode.
.github/workflows/go.yml (4)

38-43: LGTM! Good test coverage for nounsafe builds.

The addition of dedicated test steps for nounsafe and nounsafe,noasm combinations ensures comprehensive testing of the codebase without unsafe operations. This is particularly important given the PR's focus on unsafe little endian loaders.

47-47: LGTM! Race condition testing with nounsafe tag.

The race condition tests now include the nounsafe tag, which is essential for detecting potential race conditions in the safe implementation path.

Also applies to: 52-52

202-214: LGTM! Comprehensive fuzzing coverage across packages.

The addition of nounsafe tag to flate, zip, and fse package fuzzing tests ensures consistent testing strategy across all packages. This is particularly important as the unsafe little endian loaders might be used across these packages.

162-183: Verify fuzz test coverage for unsafe operations.

The ZSTD fuzzing tests now include the nounsafe tag, which is good. However, since this PR introduces unsafe little endian loaders, we should ensure that the fuzzing tests adequately cover the new unsafe operations.
✅ Verification successful

Let me gather more information about the unsafe operations and their test coverage.

Let me try a different approach to find the unsafe operations and little endian loaders.

Let me check for the specific little endian loader implementations.

Based on the gathered information, I can now provide a final response:

Fuzz tests adequately cover unsafe operations and little endian loaders

The codebase has comprehensive fuzz testing coverage that specifically addresses the concerns:

The fuzz tests run with the nounsafe tag, ensuring the code is tested without unsafe operations:
go test -tags=nounsafe -run=none -fuzz=FuzzDecodeAll
The tests include multiple variants to cover different scenarios:

FuzzDecodeAll: Tests both low and high memory decoders

FuzzDecoder: Tests concurrent and non-concurrent decoding

FuzzEncoding: Tests encoding with various compression levels

Additional variants with BMI2 disabled (FuzzDecAllNoBMI2, FuzzNoBMI2Dec)

The little endian operations are primarily handled through the standard binary.LittleEndian package, which is well-tested and safe.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Check if the fuzz tests cover the new unsafe operations
rg -A 5 'FuzzEncoding|FuzzDecodeAll|FuzzDecoder' ./zstd/
Length of output: 1572

Script:
#!/bin/bash
# Check for unsafe operations in the codebase
rg -A 5 'unsafe\.' ./zstd/

# Look for little endian loader implementations
ast-grep --pattern 'func $_(b []byte) uint$_ {
  $$$
}'
Length of output: 86

Script:
#!/bin/bash
# Check for unsafe operations in the codebase with a broader pattern
rg -A 5 'unsafe' ./zstd/

# Look for little endian related code
rg -A 5 'littleEndian|LittleEndian' ./zstd/

# Check the test file content for better context
cat ./zstd/fuzz_test.go
Length of output: 14894

Script:
#!/bin/bash
# Look for little endian related code in decoder files
rg -A 10 'func.*\[\]byte.*uint' ./zstd/

# Check for unsafe operations in decoder files
rg -A 10 'unsafe' ./zstd/decoder*.go
Length of output: 5348

internal/le/unsafe_enabled.go

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

flate/stateless.go (1)

156-162: Consider adding documentation and tests for the le package integration.

Since this is a performance-critical change in compression logic:

Add tests specifically for the integration with the le package

Document the performance expectations and safety guarantees

Consider adding benchmarks comparing the old and new implementations

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b959ae3 and 209574e.

📒 Files selected for processing (3)

README.md (1 hunks)
flate/stateless.go (2 hunks)
zstd/README.md (1 hunks)

🧰 Additional context used

🪛 LanguageTool

README.md

[style] ~25-~25: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...isable all assembly across packages.

Use the links above for more information on...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)

⏰ Context from checks skipped due to timeout of 90000ms (17)

GitHub Check: build (1.23.x, windows-latest)
GitHub Check: build (1.23.x, macos-latest)
GitHub Check: build (1.23.x, ubuntu-latest)
GitHub Check: build (1.22.x, windows-latest)
GitHub Check: build (1.22.x, macos-latest)
GitHub Check: fuzz-zstd ("noasm,nounsafe")
GitHub Check: build (1.22.x, ubuntu-latest)
GitHub Check: fuzz-zstd (nounsafe)
GitHub Check: build (1.21.x, windows-latest)
GitHub Check: build (1.21.x, macos-latest)
GitHub Check: fuzz-other ("noasm,nounsafe")
GitHub Check: fuzz-s2 ("noasm,nounsafe")
GitHub Check: build (1.21.x, ubuntu-latest)
GitHub Check: fuzz-other (nounsafe)
GitHub Check: Analyze (go)
GitHub Check: fuzz-s2 (nounsafe)
GitHub Check: build-special

🔇 Additional comments (4)

zstd/README.md (1)

9-9: LGTM! Documentation update accurately reflects package features.

The updated description provides clearer guidance on how to disable assembly and unsafe features using build tags.

README.md (1)

17-26: LGTM! Clear and helpful package usage instructions.

The new section provides valuable information about:

How to add the package to a project

Go version compatibility

Build tag options for disabling unsafe and assembly features

🧰 Tools

🪛 LanguageTool

[style] ~25-~25: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...isable all assembly across packages.

Use the links above for more information on...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
flate/stateless.go (2)

7-8: LGTM! Good choice using an internal package for byte operations.

The addition of the internal le package for little-endian operations is well-structured and follows Go conventions.

156-162: Verify safety bounds and performance impact of le package.

While the change simplifies the code, we should verify:

That the le package properly handles bounds checking

The performance impact, as mentioned in the PR description ("Benchmarks pending")

Let's analyze the safety of these changes:

✅ Verification successful

Implementation is safe and follows standard library patterns

The le package maintains safety through:

Standard library encoding/binary implementation on most platforms

Modern unsafe patterns (unsafe.SliceData + unsafe.Add) on 64-bit platforms

Automatic fallback to safe implementation when needed

The change aligns with standard library practices for performance-critical code.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Check the le package implementation for bounds checking
ast-grep --pattern 'func Load$_(b []byte, $x) $_' | grep -A 10 'func Load'

# Look for any existing tests that verify bounds checking
rg -g '**/*_test.go' 'TestLoad(32|64).*bounds|TestLoad(32|64).*panic'
Length of output: 148

Script:
#!/bin/bash
# Find the le package import
rg -g '*.go' '^import.*".*le"'

# Find files containing le package references
rg -g '*.go' 'le\.(Load32|Load64)'

# Look for test files with broader patterns
rg -g '**/*_test.go' 'Load(32|64)'
Length of output: 1344

Script:
#!/bin/bash
# Find the le package implementation
fd -e go -E '*_test.go' . | xargs grep -l "package le"

# Look for the actual Load32/Load64 implementations
ast-grep --pattern 'func Load$_(b []byte, $x) $_'
Length of output: 215

Script:
#!/bin/bash
# Check both implementations
cat internal/le/unsafe_disabled.go
cat internal/le/unsafe_enabled.go
cat internal/le/le.go
Length of output: 2224

Add unsafe little endian loaders

179af51

Benchmarks pending

coderabbitai bot reviewed Jan 5, 2025

View reviewed changes

internal/le/unsafe_enabled.go Outdated Show resolved Hide resolved

internal/le/unsafe_enabled.go Outdated Show resolved Hide resolved

internal/le/unsafe_enabled.go Outdated Show resolved Hide resolved

internal/le/unsafe_enabled.go Outdated Show resolved Hide resolved

klauspost added 3 commits January 5, 2025 21:08

Try new test matrix

fef4be6

Fix field name.

1471edf

Use in buff0, s2 go decoder.

bd64698