-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to compute idfs for a custom set of packages #86
Comments
Yeah, that's a good point. If you actually want or need to do it quickly, the process is in https://github.com/ropensci-review-tools/pkgmatch/blob/main/data-raw/release-data-script.R But it's definitely important for further pkg dev to properly expose this kind of functionality. It shall be done... 🚀 |
@Bisaloo Those commits add a new function, But prior to that, you could just try running it on a local corpus to see what you get, and then passing the results as explicit |
This is what I have tried here: https://github.com/epiverse-connect/epiverse-pkgmatch, with the corpus from https://epiverse-connect.r-universe.dev/. It currently only use Overall, the results are reasonably good. There is nothing outrageous in the results and we often find some expected results but we also miss some expected results. We will meet with my colleagues soon to determine if we go with this approach or something more custom. |
I think asking the users to generate their own The I think it's fair to expect users who want to use their own corpus to be able to jump through a couple of well-documented hoops. |
It will actually lower the maintenance burden, because it can simply be called to do complete local updates, and replace current script. This new function is I think a sensible generalization of that. I ran it locally on < 10 repos, and the whole thing only took a minute or two, so it seems to work pretty well. I'll nevertheless more explicitly address what you're asking for as well. Thanks! |
Would you like to open a PR from |
I am trying to compute idfs for a different corpus but I cannot figure out how.
The docs state that it is the output of
pkgmatch_bm25()
pkgmatch/R/similar-pkgs.R
Lines 50 to 52 in caa1dad
But the inputs of
pkgmatch_bm25()
don't match what I would expect here (I would expect the same inputs aspkgmatch_embeddings_from_pkgs()
) and the output doesn't seem to match whatpkgmatch_similar_pkgs()
is expecting anyways.In other words, if such as function doesn't exist yet, I would like a function
pkgmatch_idfs_from_pkgs()
which would be the equivalent ofpkgmatch_embeddings_from_pkgs()
for theidfs
argument inpkgmatch_similar_pkgs()
.The text was updated successfully, but these errors were encountered: