book-recommendations

Using web-scraped Goodreads dataset to find similar books. The notebook in this repository creates a book recommendation function when a title present within the data is entered.

THE DATA : The dataset, 'Complete Book Graph' was sourced from Menting Wan and the UCSD Book Graph site at https://sites.google.com/eng.ucsd.edu/ucsdbookgraph/home Information about the data and how it was collected can be found at https://github.com/MengtingWan/goodreads and from the following papers: Mengting Wan, Julian McAuley, "Item Recommendation on Monotonic Behavior Chains", in RecSys'18. [bibtex] Mengting Wan, Rishabh Misra, Ndapa Nakashole, Julian McAuley, "Fine-Grained Spoiler Detection from Large-Scale Review Corpora", in ACL'19. [bibtex]

IMPORTING THE DATASET: The file (goodreads_books.json.gz) was downloaded and saved in a collection on MongoDB on a local server. Steps to set this up on your own machine are linked here: Get MongoDB installed: https://www.mongodb.com/ Import the dataset to a database/collection: https://www.mongodb.com/docs/guides/server/import/

A NOTE ON MEMORY: The actual query to extract the books from the database requests all books in English and pulls ~700k titles. However, due to memory constaints, the notebook only uses 20k rows of the dataset. It is recommended that any wish to increase the data size would be to batch the data and then run through the code.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitattributes		.gitattributes
Book Recommendations.ipynb		Book Recommendations.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

book-recommendations

About

Releases

Packages

Languages

brittmorin/book-recommendations

Folders and files

Latest commit

History

Repository files navigation

book-recommendations

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages