Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhanced Directory Pattern Matching Test Coverage #123

Merged
merged 6 commits into from
Jan 13, 2025

Conversation

RyanL2004
Copy link
Contributor

Overview

This PR adds comprehensive test coverage for directory pattern matching in GitIngest, focusing on how different patterns interact with directory structures and file inclusion. Through the development process, discovered several important behaviors of the pattern matching system.

Test Cases

1. Pattern Matching Behavior Discovery

Initially, started with previous assumption that patterns would work similar to common glob patterns:

  • src/* would only match direct children
  • src/ would match recursively
  • Leading slashes would be normalized

However, through testing, discovered the actual behavior:

  • src/* matches all files recursively under src/
  • Leading slash patterns (/src/*) don't match any files
  • The proper pattern for explicit recursive matching is src/**

2. Path Separator Issues

Encountered cross-platform compatibility issues with path separators:

# Initial comparison failed because:
assert 'src\\subdir\\file.txt' == 'src/subdir/file.txt'  # Windows vs Unix paths

Fixed by normalizing paths using Path:

file_paths = {str(Path(f["path"])) for f in files}
expected_paths = {str(Path("src/subfile1.txt")), ...}

3. Test Evolution

Original Tests (Had Issues):

  • Assumed src/* would only match direct children
  • Tried to use leading slash patterns
  • Had path separator inconsistencies
  • Mixed test assertions that made debugging difficult

Improved Tests:

  1. test_include_src_star_pattern:

    • Now correctly verifies src/* matches all files
    • Uses normalized path comparisons
    • Has clearer assertions and error messages
  2. test_include_src_recursive:

    • Replaced src/ with src/** for recursive matching
    • Better documents the intended behavior
    • Uses set comparison for clearer file matching verification
  3. Directory Structure Being Tested:

test_repo/
├── file1.txt
├── file2.py
└── src/
    ├── subfile1.txt
    ├── subfile2.py
    └── subdir/
        └── file_subdir.txt
        └── file_subdir.py

Technical Improvements

1. Path Normalization

  • Added platform-independent path handling
  • Improved test reliability across operating systems
  • Makes tests more maintainable
# Before
file_paths = [f["path"] for f in files]

# After
file_paths = {str(Path(f["path"])) for f in files}

2. Better Test Structure

  • Added descriptive docstrings
  • Improved error messages
  • Used set comparisons for file path verification
  • Added comments explaining test expectations

3. Pattern Matching Documentation

Discovered and documented the actual pattern matching rules:

  • src/* → Matches all files under src/ recursively
  • src/** → Explicit recursive matching
  • Leading slash patterns are not supported

Testing Strategy

Each test verifies:

  1. Pattern matching behavior works as expected
  2. All expected files are included/excluded correctly
  3. Directory structure is properly traversed
  4. Path separators are handled correctly

Impact

These tests:

  • Improve understanding of GitIngest's pattern matching
  • Ensures consistent behavior across platforms
  • Document if it is an actual vs expected behavior
  • Make future pattern matching changes safer

Future Considerations

  1. Consider adding support for:

    • Leading slash patterns
    • Non-recursive src/* behavior
    • More granular pattern matching options
  2. Potential documentation updates:

    • Clarify pattern matching behavior
    • Add examples of supported patterns
    • Document platform-specific considerations

Learning Points

  1. Always verify assumptions about pattern matching behavior
  2. Consider cross-platform compatibility early
  3. Use clear, specific test assertions
  4. Document unexpected behavior
  5. Make tests maintainable and clear

Copy link
Owner

@cyclotruc cyclotruc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RyanL2004 Thank you for taking the time to do this

You are right, the behaviour of the exclude/include patterns isn't ideal at the moment, I could use a little rework

As for the windows compatibility, I believe we sould not be adapting the tests to pass on windows: they are meant to not pass on windows because gitingest doesn't fully work on windows, so i think we should remove those changes

Maybe I did not understood fully the fix if so please tell me

@cyclotruc cyclotruc added the changes requested A review requested changes before merging label Jan 12, 2025
@RyanL2004
Copy link
Contributor Author

RyanL2004 commented Jan 12, 2025

@cyclotruc Thank you for the feedback about Windows compatibility !

You're absolutely right, I've removed all the Windows path normalization changes and reverted back to using Unix-style paths in the tests, it signals to users that Windows support isn't complete and by making tests pass on Windows, we might be masking this limitation.

This better reflects the current state of GitIngest where Windows support isn't fully implemented.

Changes I made:

  • Removed Path-based normalization
  • Kept Unix-style path separators (/) in all test assertions
  • Added comments noting that tests expect Unix-style paths
  • Retained the discovered pattern matching behavior tests

The tests now:

  • Convert paths to Unix-style only during test validation (not affecting actual GitIngest functionality)
  • Maintain the original purpose of testing different pattern matching behaviors (src/*, src/**, etc.)
  • Explicitly document that Windows is not supported

Let me know if you'd like me to make any other changes!

Copy link
Owner

@cyclotruc cyclotruc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for the quick changes, merging

@cyclotruc cyclotruc merged commit 1fd741a into cyclotruc:main Jan 13, 2025
8 checks passed
@filipchristiansen filipchristiansen removed the changes requested A review requested changes before merging label Jan 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants