Skip to content

Latest commit

 

History

History
92 lines (71 loc) · 2.68 KB

readme.md

File metadata and controls

92 lines (71 loc) · 2.68 KB

Meme Tracker

🐸 🐸 🐸 🐸 🐸 🐸 🐸 🐸 🐸 🐸 🐸 🐸 🐸 🐸 🐸 🐸 🐸 🐸 🐸 🐸 🐸 🐸 🐸 🐸 🐸 🐸 🐸 🐸

Description

Meme tracker is a web scraper and image grouper for internet memes.

It scrapes a given URL for images and downloads them. Then, it groups similar images together and displays those image groups as clusters of images in a browser.

Installation

  1. Install Pipenv for your user.

    user@pc:~$ pip install --user pipenv
  2. Create a virtual environment for this project and activate it.

    user@pc:~$ pip shell
  3. Clone this repo.

    user@pc:~/projects$ git clone [email protected]:Obleskar/meme_tracker.git
  4. Install dependencies.

    user@pc:~/projects/meme_tracker$ pipenv install

Usage

  1. Create a YAML file called spider_config.yaml, place it in the project's root directory, and add a list of URLs to scrape.

    The scraper's currently limited to 4chan boards and 4chan threads.

    user@pc:~/projects/meme_tracker$ vim spider_config.yaml
    urls: [http://boards.4channel.org/v/] 
  2. Run the image scraper.

    If you provided a board, then every image from every thread on the board's first page will be downloaded.

    If you provided a thread, then every image in that thread will be downloaded.

    Press Ctrl+c once to top scraping once the current downloads have finished and again to stop scraping immediately.

    user@pc:~/projects/meme_tracker$ scrapy crawl 4chan_images
  3. Launch a local webserver to host the downloaded images.

    user@pc:~/projects/meme_tracker$ python3 show_images.py
  4. Navigate to http://localhost:5000 in a web browser to view the images in a grid.

Motivation

To provide an easy way for researchers to view daily summaries of images on the internet.

To Do

  • Feat: Scrape image URLs from /v
  • Feat: Download images
  • Feat: Show images in browser
  • Feat: Generate thumbnails
  • Feat: Justify image grid
  • Internal: Change yaml config from dict to list
  • Internal: Write names and locations to database
  • Feat: Add JupyterNotebooks
  • Feat: Add a dhashing notebook
  • Feat: Run dhashing notebook with papermill
  • Feat: Cluster images
  • Feat: Display image cluster "compass" in web browser
  • Feat: Write origin post URLs to the database
  • Feat: Click image to open origin post in new tab