Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(api): make rottentomatoes matching more robust #1265

Merged
merged 1 commit into from
Jan 31, 2025

Conversation

benhaney
Copy link
Contributor

@benhaney benhaney commented Jan 15, 2025

Description

Replaces the RT search result matching logic with a ranking system that should be more robust. Also changes RT search queries to only search for the desired result type (movie or tv) and strips "The" from movie search queries.

From my testing so far, this fixes a handful of mismatched ratings, but there are some matches that this fails on in the same way as the old logic. Usually the examples that still fail are just absurd data, like results with release years that disagree by more than 1 year (eg, Terrifier which finished production in 2016 but wasn't widely released until 2018) or just have bad RT entries (eg, Nightmare Before Christmas which has two RT entries: one with all the ratings called Tim Burton's Nightmare Before Christmas, and another dummy entry with no ratings called just Nightmare Before Christmas which always matches better).

I want to make sure this doesn't make any matches worse, so I'm leaving it as a draft until I have time to do some more comprehensive testing. I've scraped the RT search results for the first ~1000 movies in Jellyseerr's Movies tab and fed them through the old and new logic, but I still need to test TV show matching and changes to the search query.

Gotta name all the magic numbers too.

To-Dos

  • Successful build pnpm build
  • Translation keys pnpm i18n:extract
  • Database migration (if required)

Issues Fixed or Closed

@fallenbagel fallenbagel changed the title fix(api): Make rottentomatoes matching more robust feat(api): make rottentomatoes matching more robust Jan 15, 2025
@fallenbagel
Copy link
Owner

fallenbagel commented Jan 15, 2025

Changed it to feat as I would classify this as an enhancement that makes an existing feature better and more robust

@benhaney
Copy link
Contributor Author

I've done some more testing and I'm pretty happy with the results. I've diff'd the behavior of the new and old logic against the first 998 movies and 327 tv shows on jellyseerr's respective discover pages, and hand-checked the differences. The new logic found and corrected 6 movies out of my sample set that either failed to match or were matched to the wrong movie by the old logic, for an absolute improvement of about 0.6% - small, but the old logic was already pretty close to 100%, and I think this represents the majority of the room there was to improve. For TV shows on the other hand, the new logic found and corrected 15 previously incorrect results in my samples, including high profile shows like House (2004), for an improvement of nearly 5% - much higher than I originally expected, especially since the logic was meant primarily for movies with tv shows as an afterthought.

I've not yet found a single regression where the new logic fails to match a movie or tv show that the old logic correctly matched, so I'm comfortable marking it as ready for review.

@benhaney benhaney marked this pull request as ready for review January 17, 2025 17:13
@benhaney
Copy link
Contributor Author

benhaney commented Jan 17, 2025

In case anyone cares, here's a list of media from my test set that's fixed by this PR, broken down by media type and then failure type.

Movies

No valid RT scores exist, but matched to something else's score
  • The OctoGames (2022)
  • Making Squid Game: The Challenge (2023)
Valid RT scores exist, but failed to match to anything
  • AfrAId (2024)
  • Double Blind (2024)
  • Lee (2024)
  • The Convert (2024)

TV Shows

No valid RT scores exist, but matched to something else's score
  • Alice (1976)
  • The City (1995)
  • Smile (2002)
  • A Girl's Guide to 21st Century Sex (2006)
  • Masterpiece (2017)
  • Kin (2018)
  • Hudson & Rex (2019)
  • Tamron Hall (2019)
  • Legacy (2020)
Valid RT scores exist, but failed to match to anything
  • Angel (1999)
  • House (2004)
  • Shameless (2011)
  • Girls (2012)
Valid RT scores exist, but matched to something else's score
  • The Hills (2006)
  • Dune: Prophecy (2024)

Copy link
Collaborator

@gauthier-th gauthier-th left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
Thanks for the extensive tests

@fallenbagel fallenbagel merged commit 907ba6f into fallenbagel:develop Jan 31, 2025
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RottenTomatoes title match failure
3 participants