GiantsMind

GiantsMind is a Python package that provides tools for interacting with scientific article PDFs. It allows you to parse, extract metadata, search, and query scientific papers using natural language. In its current form, it uses LlamaIndex to parse PDFs and Claude Sonnet 3.5 for various agents and natural language interaction. You will need an API key for these services to run the two commands (see Installation)

This is an early version with limited functionalities in active development.

Features

PDF document parsing using Llamaparse
Metadata extraction from PDF papers (DOI, arXiv ID)
Automatic metadata fetching from CrossRef and arXiv APIs
Vector database storage for semantic search
SQLite database for metadata management
Natural language querying using Claude AI
Interactive CLI interface

Installation

Clone the repository
Install dependencies:

pip install -e ".[dev]"

Set up environment variables in .env:

LLAMA_API_KEY=<your-llama-api-key>
ANTHROPIC_API_KEY=<your-anthropic-api-key>
DEFAULT_PDF_PATH=<path-to-pdf-folder>

Usage

Parse PDF Papers

giantsmind --parse /path/to/papers

This will:

Parse PDFs using Llamaparse
Extract and fetch metadata
Store content in vector database
Save metadata in SQLite database

Interactive Query Mode

giantsmind

This starts an interactive session where you can:

Ask questions about papers in natural language
Search by metadata (authors, dates, journals)
Search paper content semantically
Get AI-generated answers with citations

Development

Requirements:

Python 3.12+
Development dependencies (pip install -e ".[dev]")

Run tests:

pytest

License

BSD 3-Clause License. See LICENSE.txt for details.

Author

Pierre Enel ([pierre.enel@gmail.com])

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

GiantsMind

Features

Installation

Usage

Parse PDF Papers

Interactive Query Mode

Development

License

Author

Files

README.md

Latest commit

History

README.md

File metadata and controls

GiantsMind

Features

Installation

Usage

Parse PDF Papers

Interactive Query Mode

Development

License

Author