Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

filter: add API for registering Python filters #1237

Merged
merged 9 commits into from
Nov 1, 2023

Conversation

pmrowla
Copy link
Contributor

@pmrowla pmrowla commented Oct 2, 2023

  • Adds pygit2.filter API for registering pure-python git_filter_* filters.
  • Adds pygit2.BlobIO() wrapper for reading raw and filtered blob contents.
    • BlobIO will stream buffered content (instead of loading the entire blob into memory) if the underlying libgit2 filter(s) support the streaming filter API.
  • Adds Repository.get_attr(..., commit=...) to support attr lookup by commit ID
    • The commit specific attrs are already provided to filters in Filter.check() but it is useful for anyone implementing filters to lookup commit specific attributes before actually running the filter (i.e. to pre-fetch LFS objects in a batch rather than streaming them one by one upon checkout)

This allows the user to subclass pygit2.Filter and then register their own Python smudge/clean filters.

In libgit2, you normally need to define both a git_filter and a git_writestream implementation for your filter. This PR abstracts this out into a single pygit2.Filter class so the user does not need to work directly with the git_writestream linked list chain that is invoked when libgit2 does filtering.

def __init__(self):
pass

def check(self, src: FilterSource, attr_values: List[str]):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filter.check() is equivalent to git_filter->check()

data: bytes,
src: FilterSource,
write_next: Callable[[bytes], None]
):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filter.write() is equivalent to git_writestream->write() (i.e git_filter->stream->write).

write_next writes to the next stream in the linked-list of git_writestream's (where there is one stream for each filter in the chain)

def close(
self,
write_next: Callable[[bytes], None]
):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filter.close() is equivalent to git_writestream->close() (i.e. git_filter->stream->close())

pygit2/callbacks.py Outdated Show resolved Hide resolved
pygit2/filter.py Outdated Show resolved Hide resolved
pygit2/filter.py Outdated Show resolved Hide resolved
pygit2/filter.py Outdated Show resolved Hide resolved
pygit2/filter.py Outdated Show resolved Hide resolved
@jdavid
Copy link
Member

jdavid commented Oct 3, 2023

Hi @pmrowla , thanks for the contribution.
Most important is that tests don't pass, apparently it gets stuck in test_blob_filter wit pypy

@pmrowla
Copy link
Contributor Author

pmrowla commented Oct 3, 2023

Thanks for the review, I'll make the suggested changes and take a look into the pypy issue.

I'll likely keep the PR marked as draft for now, I'm still testing to make sure this works with real-world LFS filtering so it's still possible this PR may change a bit.

@pmrowla pmrowla force-pushed the git-filter branch 2 times, most recently from 4b5a6c6 to d9c72e0 Compare October 3, 2023 15:56
src/blob.c Outdated Show resolved Hide resolved
@pmrowla
Copy link
Contributor Author

pmrowla commented Oct 24, 2023

After some investigation, it looks like pypy and cffi don't always mix well if an ffi callback is run outside of the main thread, which is what was causing the pypy tests to hang (since the libgit2 filtering can be done in a separate thread).

I've rewritten the underlying filter/writestream code to done entirely in the _pygit2 C-extension (so this PR no longer uses cffi at all)

@pmrowla pmrowla force-pushed the git-filter branch 3 times, most recently from ebbc5d4 to 522a716 Compare October 24, 2023 10:03
@pmrowla
Copy link
Contributor Author

pmrowla commented Oct 26, 2023

The flaky pypy tests should now be taken care of. The issue was a difference thread.is_alive() behavior in CPython vs pypy. The PR now blocks on explicit threading.Event() to signal that the writer thread is done rather than relying on thread.is_alive() which should resolve the pypy race condition.

@pmrowla pmrowla force-pushed the git-filter branch 2 times, most recently from 4bacc1e to ac8f335 Compare October 31, 2023 05:42
@pmrowla pmrowla marked this pull request as ready for review October 31, 2023 05:43
@pmrowla pmrowla requested a review from jdavid November 1, 2023 06:37
@jdavid jdavid merged commit f0bfd89 into libgit2:master Nov 1, 2023
6 checks passed
@pmrowla pmrowla deleted the git-filter branch November 1, 2023 10:18
@pmrowla
Copy link
Contributor Author

pmrowla commented Nov 20, 2023

@jdavid we'd appreciate it if we could get a release for this pushed whenever you can, thanks!

@jdavid
Copy link
Member

jdavid commented Nov 20, 2023

yes, this week, just have to finish the ongoing PRs

@jdavid
Copy link
Member

jdavid commented Nov 21, 2023

The release is done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants