Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migration indexes #3536

Merged
merged 5 commits into from
Oct 19, 2024
Merged

Migration indexes #3536

merged 5 commits into from
Oct 19, 2024

Conversation

nichwall
Copy link
Contributor

This PR fixes #3259, #3525, and #3237.

This PR adds migrations for the following indices:

The author and series indexes reduce query time from multiple seconds/minutes for large databases (more than 20k items) to less than a second. I have not done much testing with large podcast databases yet. I am still investigating why some of the select book queries did not improve too much and whether this can be solved by another index.

To test the difference in query time, I did the following on a moderately sized database so the loop ran in a reasonable amount of time.
Database stats:

  • Authors: 3217
  • Series: 947
  • Books: 5892
  1. Enable benchmark logging for each SQL query
  2. Generated a HAR file of navigating around through the web client to get a variety of SQL requests
  3. Used a combination of Python/bash scripts to:
    • Delete and copy database from a backup to start at the same point for all tests
    • Start the server
    • Run all requests from HAR file
    • Stop the server
    • Repeat above steps 10 times
    • Copy the log file and rename according to index so we can keep all queries for this specific test separate
    • Parse the log files to build a table comparing worst time of each query for each data set

I sorted the times by the runtime without indexes, and created the following table (did not include all sets of indexes being added to show best/worst case):
Book_Index_Comparison

@nichwall
Copy link
Contributor Author

After some more attempts, I have been unable to find additional indices to speed up the selects on each book. I think that is just related to how many columns are being loaded. I looked at adding indices for feeds and adding the titleIgnorePrefix back in, but both of those made all queries slower. I'm not sure what other queries to try adding for these long queries. The longest query of around 600-700 ms is below:

SELECT `book`.`id`, `book`.`title`, `book`.`titleIgnorePrefix`, `book`.`subtitle`, `book`.`publishedYear`, `book`.`publishedDate`, `book`.`publisher`, `book`.`description`, `book`.`isbn`, `book`.`asin`, `book`.`language`, `book`.`explicit`, `book`.`abridged`, `book`.`coverPath`, `book`.`duration`, `book`.`narrators`, `book`.`audioFiles`, `book`.`ebookFile`, `book`.`chapters`, `book`.`tags`, `book`.`genres`, `book`.`createdAt`, `book`.`updatedAt`, `libraryItem`.`id` AS `libraryItem.id`, `libraryItem`.`ino` AS `libraryItem.ino`, `libraryItem`.`path` AS `libraryItem.path`, `libraryItem`.`relPath` AS `libraryItem.relPath`, `libraryItem`.`mediaId` AS `libraryItem.mediaId`, `libraryItem`.`mediaType` AS `libraryItem.mediaType`, `libraryItem`.`isFile` AS `libraryItem.isFile`, `libraryItem`.`isMissing` AS `libraryItem.isMissing`, `libraryItem`.`isInvalid` AS `libraryItem.isInvalid`, `libraryItem`.`mtime` AS `libraryItem.mtime`, `libraryItem`.`ctime` AS `libraryItem.ctime`, `libraryItem`.`birthtime` AS `libraryItem.birthtime`, `libraryItem`.`size` AS `libraryItem.size`, `libraryItem`.`lastScan` AS `libraryItem.lastScan`, `libraryItem`.`lastScanVersion` AS `libraryItem.lastScanVersion`, `libraryItem`.`libraryFiles` AS `libraryItem.libraryFiles`, `libraryItem`.`extraData` AS `libraryItem.extraData`, `libraryItem`.`createdAt` AS `libraryItem.createdAt`, `libraryItem`.`updatedAt` AS `libraryItem.updatedAt`, `libraryItem`.`libraryId` AS `libraryItem.libraryId`, `libraryItem`.`libraryFolderId` AS `libraryItem.libraryFolderId`, `libraryItem->feeds`.`id` AS `libraryItem.feeds.id`, `libraryItem->feeds`.`slug` AS `libraryItem.feeds.slug`, `libraryItem->feeds`.`entityType` AS `libraryItem.feeds.entityType`, `libraryItem->feeds`.`entityId` AS `libraryItem.feeds.entityId`, `libraryItem->feeds`.`entityUpdatedAt` AS `libraryItem.feeds.entityUpdatedAt`, `libraryItem->feeds`.`serverAddress` AS `libraryItem.feeds.serverAddress`, `libraryItem->feeds`.`feedURL` AS `libraryItem.feeds.feedURL`, `libraryItem->feeds`.`imageURL` AS `libraryItem.feeds.imageURL`, `libraryItem->feeds`.`siteURL` AS `libraryItem.feeds.siteURL`, `libraryItem->feeds`.`title` AS `libraryItem.feeds.title`, `libraryItem->feeds`.`description` AS `libraryItem.feeds.description`, `libraryItem->feeds`.`author` AS `libraryItem.feeds.author`, `libraryItem->feeds`.`podcastType` AS `libraryItem.feeds.podcastType`, `libraryItem->feeds`.`language` AS `libraryItem.feeds.language`, `libraryItem->feeds`.`ownerName` AS `libraryItem.feeds.ownerName`, `libraryItem->feeds`.`ownerEmail` AS `libraryItem.feeds.ownerEmail`, `libraryItem->feeds`.`explicit` AS `libraryItem.feeds.explicit`, `libraryItem->feeds`.`preventIndexing` AS `libraryItem.feeds.preventIndexing`, `libraryItem->feeds`.`coverPath` AS `libraryItem.feeds.coverPath`, `libraryItem->feeds`.`createdAt` AS `libraryItem.feeds.createdAt`, `libraryItem->feeds`.`updatedAt` AS `libraryItem.feeds.updatedAt`, `libraryItem->feeds`.`userId` AS `libraryItem.feeds.userId` FROM `books` AS `book` INNER JOIN `libraryItems` AS `libraryItem` ON `book`.`id` = `libraryItem`.`mediaId` AND (`libraryItem`.`libraryId` = 'a210cdb5-cb8d-4ff1-bd87-34eaefffd218' AND `libraryItem`.`mediaType` = 'book') LEFT OUTER JOIN `feeds` AS `libraryItem->feeds` ON `libraryItem`.`id` = `libraryItem->feeds`.`entityId` AND `libraryItem->feeds`.`entityType` = 'libraryItem' ORDER BY titleIgnorePrefix COLLATE NOCASE ASC LIMIT 630, 35;

@nichwall nichwall marked this pull request as ready for review October 19, 2024 19:36
@advplyr
Copy link
Owner

advplyr commented Oct 19, 2024

I think that when we can start improving the API the queries will be simpler and it will be easier to write indexes for them. That data is really helpful, thanks for pulling that.
The only update I had to make here was since I already had the indexes created manually when testing it was crashing so I added a check for them first. This is working well for me.

Thanks!

@advplyr advplyr merged commit 72e59e7 into advplyr:master Oct 19, 2024
5 checks passed
@nichwall nichwall deleted the migration_indexes branch October 19, 2024 21:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: ABS crashes with giant ebook libraries
2 participants