Skip to content

A Python package for semantically searching and querying scientific PDFs using vector databases and AI-powered natural language interactions.

License

Notifications You must be signed in to change notification settings

p-enel/giantsmind

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GiantsMind

GiantsMind is a Python package that provides tools for interacting with scientific article PDFs. It allows you to parse, extract metadata, search, and query scientific papers using natural language. In its current form, it uses LlamaIndex to parse PDFs and Claude Sonnet 3.5 for various agents and natural language interaction. You will need an API key for these services to run the two commands (see Installation)

This is an early version with limited functionalities in active development.

Features

  • PDF document parsing using Llamaparse
  • Metadata extraction from PDF papers (DOI, arXiv ID)
  • Automatic metadata fetching from CrossRef and arXiv APIs
  • Vector database storage for semantic search
  • SQLite database for metadata management
  • Natural language querying using Claude AI
  • Interactive CLI interface

Installation

  1. Clone the repository
  2. Install dependencies:
pip install -e ".[dev]"
  1. Set up environment variables in .env:
LLAMA_API_KEY=<your-llama-api-key>
ANTHROPIC_API_KEY=<your-anthropic-api-key>
DEFAULT_PDF_PATH=<path-to-pdf-folder>

Usage

Parse PDF Papers

giantsmind --parse /path/to/papers

This will:

  • Parse PDFs using Llamaparse
  • Extract and fetch metadata
  • Store content in vector database
  • Save metadata in SQLite database

Interactive Query Mode

giantsmind

This starts an interactive session where you can:

  • Ask questions about papers in natural language
  • Search by metadata (authors, dates, journals)
  • Search paper content semantically
  • Get AI-generated answers with citations

Development

Requirements:

  • Python 3.12+
  • Development dependencies (pip install -e ".[dev]")

Run tests:

pytest

License

BSD 3-Clause License. See LICENSE.txt for details.

Author

Pierre Enel ([[email protected]])

About

A Python package for semantically searching and querying scientific PDFs using vector databases and AI-powered natural language interactions.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages