-
-
Notifications
You must be signed in to change notification settings - Fork 290
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(api): make rottentomatoes matching more robust #1265
Conversation
Changed it to feat as I would classify this as an enhancement that makes an existing feature better and more robust |
I've done some more testing and I'm pretty happy with the results. I've diff'd the behavior of the new and old logic against the first 998 movies and 327 tv shows on jellyseerr's respective discover pages, and hand-checked the differences. The new logic found and corrected 6 movies out of my sample set that either failed to match or were matched to the wrong movie by the old logic, for an absolute improvement of about 0.6% - small, but the old logic was already pretty close to 100%, and I think this represents the majority of the room there was to improve. For TV shows on the other hand, the new logic found and corrected 15 previously incorrect results in my samples, including high profile shows like House (2004), for an improvement of nearly 5% - much higher than I originally expected, especially since the logic was meant primarily for movies with tv shows as an afterthought. I've not yet found a single regression where the new logic fails to match a movie or tv show that the old logic correctly matched, so I'm comfortable marking it as ready for review. |
In case anyone cares, here's a list of media from my test set that's fixed by this PR, broken down by media type and then failure type. MoviesNo valid RT scores exist, but matched to something else's score
Valid RT scores exist, but failed to match to anything
TV ShowsNo valid RT scores exist, but matched to something else's score
Valid RT scores exist, but failed to match to anything
Valid RT scores exist, but matched to something else's score
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Thanks for the extensive tests
Description
Replaces the RT search result matching logic with a ranking system that should be more robust. Also changes RT search queries to only search for the desired result type (
movie
ortv
) and strips "The" from movie search queries.From my testing so far, this fixes a handful of mismatched ratings, but there are some matches that this fails on in the same way as the old logic. Usually the examples that still fail are just absurd data, like results with release years that disagree by more than 1 year (eg, Terrifier which finished production in 2016 but wasn't widely released until 2018) or just have bad RT entries (eg, Nightmare Before Christmas which has two RT entries: one with all the ratings called Tim Burton's Nightmare Before Christmas, and another dummy entry with no ratings called just Nightmare Before Christmas which always matches better).
I want to make sure this doesn't make any matches worse, so I'm leaving it as a draft until I have time to do some more comprehensive testing. I've scraped the RT search results for the first ~1000 movies in Jellyseerr's Movies tab and fed them through the old and new logic, but I still need to test TV show matching and changes to the search query.
Gotta name all the magic numbers too.
To-Dos
pnpm build
pnpm i18n:extract
Issues Fixed or Closed