-
Notifications
You must be signed in to change notification settings - Fork 345
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add LogisticalRegressionPredictor - rendering type predictor for adaptive crawling #930
Conversation
Add more tests.
Isn't that a bit... weird name? :] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems sound to me, I have just some minor comments. Also please fix typos in the PR title 🙂
# Increased detection_probability_recommendation | ||
prediction = predictor.predict(Request.from_url(url='http://www.aaa.com/some/stuffa', label=label)) | ||
assert prediction.rendering_type == 'static' | ||
assert prediction.detection_probability_recommendation == detection_ratio * 4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These exact checks are probably too implementation-dependent. Could we just check that the recommendation is, like, higher than detection_ratio * 2
for the first couple of results so that we don't have to update the tests each time we fine-tune some constant by a tiny bit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is just describing how it actually behaves. Is it requirement or implementation detail? Impossible to say without actually having the requirements. I think it acts nicely as documentation of current behavior and is very easy to change if needed.
If we fine-tune some constant and this test fails, it will be very nice feedback in seeing what those constants actually influence and we can then do informed decision about whether we want to change the test or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a hill I want to die on, but this particular example seems like an overspecified test. Almost to the point where if it breaks, you just adjust it without a second thought, which undermines the value of such test.
Co-authored-by: Jan Buchar <[email protected]>
I thought about naming my future kid like this. But maybe it is not so great name after all. It is some implementation of abstract |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! And thanks for taking this out from AdaptiveCrawler
PR.
src/crawlee/crawlers/_adaptive_playwright/_rendering_type_predictor.py
Outdated
Show resolved
Hide resolved
src/crawlee/crawlers/_adaptive_playwright/_rendering_type_predictor.py
Outdated
Show resolved
Hide resolved
…ictor.py Co-authored-by: Vlada Dusek <[email protected]>
from jaro import jaro_winkler_metric | ||
from sklearn.linear_model import LogisticRegression |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should wrap this in a way that the exception tells the user to install the adaptive-playwright
extra. But since this is a private subpackage, maybe it can wait until the adaptive playwirght functionality is made public.
Co-authored-by: Jan Buchar <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few nits, LGTM otherwise.
src/crawlee/crawlers/_adaptive_playwright/_rendering_type_predictor.py
Outdated
Show resolved
Hide resolved
Co-authored-by: Vlada Dusek <[email protected]>
Description
Add
LogisticalRegressionPredictor
which is going to be defaultRenderingTypePredictor
forAdaptivePlaywrightCrawler
.Issues
Split from: #249