GiantsMind is a Python package that provides tools for interacting with scientific article PDFs. It allows you to parse, extract metadata, search, and query scientific papers using natural language. In its current form, it uses LlamaIndex to parse PDFs and Claude Sonnet 3.5 for various agents and natural language interaction. You will need an API key for these services to run the two commands (see Installation)
This is an early version with limited functionalities in active development.
- PDF document parsing using Llamaparse
- Metadata extraction from PDF papers (DOI, arXiv ID)
- Automatic metadata fetching from CrossRef and arXiv APIs
- Vector database storage for semantic search
- SQLite database for metadata management
- Natural language querying using Claude AI
- Interactive CLI interface
- Clone the repository
- Install dependencies:
pip install -e ".[dev]"
- Set up environment variables in
.env
:
LLAMA_API_KEY=<your-llama-api-key>
ANTHROPIC_API_KEY=<your-anthropic-api-key>
DEFAULT_PDF_PATH=<path-to-pdf-folder>
giantsmind --parse /path/to/papers
This will:
- Parse PDFs using Llamaparse
- Extract and fetch metadata
- Store content in vector database
- Save metadata in SQLite database
giantsmind
This starts an interactive session where you can:
- Ask questions about papers in natural language
- Search by metadata (authors, dates, journals)
- Search paper content semantically
- Get AI-generated answers with citations
Requirements:
- Python 3.12+
- Development dependencies (
pip install -e ".[dev]"
)
Run tests:
pytest
BSD 3-Clause License. See LICENSE.txt for details.
Pierre Enel ([[email protected]])