feat(api): make rottentomatoes matching more robust #1265

benhaney · 2025-01-15T18:12:42Z

Description

Replaces the RT search result matching logic with a ranking system that should be more robust. Also changes RT search queries to only search for the desired result type (movie or tv) and strips "The" from movie search queries.

From my testing so far, this fixes a handful of mismatched ratings, but there are some matches that this fails on in the same way as the old logic. Usually the examples that still fail are just absurd data, like results with release years that disagree by more than 1 year (eg, Terrifier which finished production in 2016 but wasn't widely released until 2018) or just have bad RT entries (eg, Nightmare Before Christmas which has two RT entries: one with all the ratings called Tim Burton's Nightmare Before Christmas, and another dummy entry with no ratings called just Nightmare Before Christmas which always matches better).

I want to make sure this doesn't make any matches worse, so I'm leaving it as a draft until I have time to do some more comprehensive testing. I've scraped the RT search results for the first ~1000 movies in Jellyseerr's Movies tab and fed them through the old and new logic, but I still need to test TV show matching and changes to the search query.

Gotta name all the magic numbers too.

To-Dos

Successful build pnpm build
Translation keys pnpm i18n:extract
Database migration (if required)

Issues Fixed or Closed

Fixes RottenTomatoes title match failure #1249

server/api/rating/rottentomatoes.ts

fallenbagel · 2025-01-15T18:56:12Z

Changed it to feat as I would classify this as an enhancement that makes an existing feature better and more robust

benhaney · 2025-01-17T17:13:25Z

I've done some more testing and I'm pretty happy with the results. I've diff'd the behavior of the new and old logic against the first 998 movies and 327 tv shows on jellyseerr's respective discover pages, and hand-checked the differences. The new logic found and corrected 6 movies out of my sample set that either failed to match or were matched to the wrong movie by the old logic, for an absolute improvement of about 0.6% - small, but the old logic was already pretty close to 100%, and I think this represents the majority of the room there was to improve. For TV shows on the other hand, the new logic found and corrected 15 previously incorrect results in my samples, including high profile shows like House (2004), for an improvement of nearly 5% - much higher than I originally expected, especially since the logic was meant primarily for movies with tv shows as an afterthought.

I've not yet found a single regression where the new logic fails to match a movie or tv show that the old logic correctly matched, so I'm comfortable marking it as ready for review.

benhaney · 2025-01-17T17:31:52Z

In case anyone cares, here's a list of media from my test set that's fixed by this PR, broken down by media type and then failure type.

Movies

No valid RT scores exist, but matched to something else's score

The OctoGames (2022)
Making Squid Game: The Challenge (2023)

Valid RT scores exist, but failed to match to anything

AfrAId (2024)
Double Blind (2024)
Lee (2024)
The Convert (2024)

TV Shows

No valid RT scores exist, but matched to something else's score

Alice (1976)
The City (1995)
Smile (2002)
A Girl's Guide to 21st Century Sex (2006)
Masterpiece (2017)
Kin (2018)
Hudson & Rex (2019)
Tamron Hall (2019)
Legacy (2020)

Valid RT scores exist, but failed to match to anything

Angel (1999)
House (2004)
Shameless (2011)
Girls (2012)

Valid RT scores exist, but matched to something else's score

The Hills (2006)
Dune: Prophecy (2024)

gauthier-th

LGTM.
Thanks for the extensive tests

github-advanced-security bot found potential problems Jan 15, 2025

View reviewed changes

server/api/rating/rottentomatoes.ts Fixed Show fixed Hide fixed

benhaney force-pushed the rt-fixes branch from 4c27b10 to 552dc86 Compare January 15, 2025 18:21

fallenbagel changed the title ~~fix(api): Make rottentomatoes matching more robust~~ feat(api): make rottentomatoes matching more robust Jan 15, 2025

feat(api): make rottentomatoes matching more robust

49ebc9b

benhaney force-pushed the rt-fixes branch from 552dc86 to 49ebc9b Compare January 17, 2025 16:54

benhaney marked this pull request as ready for review January 17, 2025 17:13

benhaney requested review from fallenbagel and gauthier-th as code owners January 17, 2025 17:13

gauthier-th approved these changes Jan 30, 2025

View reviewed changes

fallenbagel approved these changes Jan 31, 2025

View reviewed changes

fallenbagel merged commit 907ba6f into fallenbagel:develop Jan 31, 2025
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(api): make rottentomatoes matching more robust #1265

feat(api): make rottentomatoes matching more robust #1265

benhaney commented Jan 15, 2025 •

edited

Loading

fallenbagel commented Jan 15, 2025 •

edited

Loading

benhaney commented Jan 17, 2025

benhaney commented Jan 17, 2025 •

edited

Loading

gauthier-th left a comment

feat(api): make rottentomatoes matching more robust #1265

feat(api): make rottentomatoes matching more robust #1265

Conversation

benhaney commented Jan 15, 2025 • edited Loading

Description

To-Dos

Issues Fixed or Closed

fallenbagel commented Jan 15, 2025 • edited Loading

benhaney commented Jan 17, 2025

benhaney commented Jan 17, 2025 • edited Loading

Movies

TV Shows

gauthier-th left a comment

Choose a reason for hiding this comment

benhaney commented Jan 15, 2025 •

edited

Loading

fallenbagel commented Jan 15, 2025 •

edited

Loading

benhaney commented Jan 17, 2025 •

edited

Loading