Skip to content

Commit

Permalink
feat: enhance parser domain-agnostic support (#117)
Browse files Browse the repository at this point in the history
* feat: make parser domain-agnostic to support multiple Git hosts

- added list of known domains/Git hosts in `query_parser.py`
- fixed bug from [#115](#115): corrected case handling for URL components—scheme, domain, username, and repository are case-insensitive, but paths beyond (e.g., file names, branches) are case-sensitive
- implemented `try_domains_for_user_and_repo` in `query_parser.py` to iteratively guess the correct domain until success or supported hosts are exhausted
- added helper functions `_get_user_and_repo_from_path`, `_validate_host`, and `_validate_scheme` in `query_parser.py`
- extended `_parse_repo_source` in `query_parser.py` to be Git host agnostic by using `try_domains_for_user_and_repo`
- added tests `test_parse_url_unsupported_host` and `test_parse_query_with_branch` in `test_query_parser.py`
- created new file `test_git_host_agnostic.py` to verify domain/Git host agnostic behavior
  • Loading branch information
filipchristiansen authored Jan 13, 2025
1 parent 0fd16ba commit dd8f1e0
Show file tree
Hide file tree
Showing 22 changed files with 429 additions and 167 deletions.
2 changes: 1 addition & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ FROM python:3.12-slim
ENV PYTHONUNBUFFERED=1
ENV PYTHONDONTWRITEBYTECODE=1

# Install git
# Install Git
RUN apt-get update \
&& apt-get install -y --no-install-recommends git curl\
&& rm -rf /var/lib/apt/lists/*
Expand Down
30 changes: 15 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,13 +11,13 @@

Turn any Git repository into a prompt-friendly text ingest for LLMs.

You can also replace `hub` with `ingest` in any GitHub URL to access the coresponding digest
You can also replace `hub` with `ingest` in any GitHub URL to access the coresponding digest.

[gitingest.com](https://gitingest.com/) · [Chrome Extension](https://chromewebstore.google.com/detail/adfjahbijlkjfoicpjkhjicpjpjfaood) · [Firefox Add-on](https://addons.mozilla.org/firefox/addon/gitingest/)
[gitingest.com](https://gitingest.com) · [Chrome Extension](https://chromewebstore.google.com/detail/adfjahbijlkjfoicpjkhjicpjpjfaood) · [Firefox Add-on](https://addons.mozilla.org/firefox/addon/gitingest)

## 🚀 Features

- **Easy code context**: Get a text digest from a git repository URL or a directory
- **Easy code context**: Get a text digest from a Git repository URL or a directory
- **Smart Formatting**: Optimized output format for LLM prompts
- **Statistics about**:
- File and directory structure
Expand All @@ -36,11 +36,12 @@ pip install gitingest

<!-- markdownlint-disable MD033 -->
<a href="https://chromewebstore.google.com/detail/adfjahbijlkjfoicpjkhjicpjpjfaood" target="_blank" title="Get Gitingest Extension from Chrome Web Store"><img height="48" src="https://github.com/user-attachments/assets/20a6e44b-fd46-4e6c-8ea6-aad436035753" alt="Available in the Chrome Web Store" /></a>
<a href="https://addons.mozilla.org/firefox/addon/gitingest/" target="_blank" title="Get Gitingest Extension from Firefox Add-ons"><img height="48" src="https://github.com/user-attachments/assets/c0e99e6b-97cf-4af2-9737-099db7d3538b" alt="Get The Add-on for Firefox" /></a>
<a href="https://addons.mozilla.org/firefox/addon/gitingest" target="_blank" title="Get Gitingest Extension from Firefox Add-ons"><img height="48" src="https://github.com/user-attachments/assets/c0e99e6b-97cf-4af2-9737-099db7d3538b" alt="Get The Add-on for Firefox" /></a>
<a href="https://microsoftedge.microsoft.com/addons/detail/nfobhllgcekbmpifkjlopfdfdmljmipf" target="_blank" title="Get Gitingest Extension from Firefox Add-ons"><img height="48" src="https://github.com/user-attachments/assets/204157eb-4cae-4c0e-b2cb-db514419fd9e" alt="Get from the Edge Add-ons" /></a>
<!-- markdownlint-enable MD033 -->

The extension is open source at [lcandy2/gitingest-extension](https://github.com/lcandy2/gitingest-extension).

Issues and feature requests are welcome to the repo.

## 💡 Command line usage
Expand Down Expand Up @@ -71,7 +72,7 @@ summary, tree, content = ingest("path/to/directory")
summary, tree, content = ingest("https://github.com/cyclotruc/gitingest")
```

By default, this won't write a file but can be enabled with the `output` argument
By default, this won't write a file but can be enabled with the `output` argument.

## 🌐 Self-host

Expand All @@ -87,31 +88,30 @@ By default, this won't write a file but can be enabled with the `output` argumen
docker run -d --name gitingest -p 8000:8000 gitingest
```

The application will be available at `http://localhost:8000`
The application will be available at `http://localhost:8000`.

If you are hosting it on a domain, you can specify the allowed hostnames via env variable `ALLOWED_HOSTS`.

```bash
#Default: "gitingest.com,*.gitingest.com,localhost, 127.0.0.1".
# Default: "gitingest.com, *.gitingest.com, localhost, 127.0.0.1".
ALLOWED_HOSTS="example.com, localhost, 127.0.0.1"
```

## 🛠️ Stack

- [Tailwind CSS](https://tailwindcss.com/) - Frontend
- [Tailwind CSS](https://tailwindcss.com) - Frontend
- [FastAPI](https://github.com/fastapi/fastapi) - Backend framework
- [Jinja2](https://jinja.palletsprojects.com/) - HTML templating
- [Jinja2](https://jinja.palletsprojects.com) - HTML templating
- [tiktoken](https://github.com/openai/tiktoken) - Token estimation
- [apianalytics.dev](https://www.apianalytics.dev/) - Simple Analytics
- [apianalytics.dev](https://www.apianalytics.dev) - Simple Analytics

### Looking for a javascript/node package?
### Looking for a JavaScript/Node package?

Check out the NPM alternative 📦 Repomix: <https://github.com/yamadashy/repomix>

## ✔️ Contributing to Gitingest

Gitingest aims to be friendly for first time contributors, with a simple python and html codebase.
If you need any help while working with the code, reach out to us on [discord](https://discord.com/invite/zerRaGK9EC)
Gitingest aims to be friendly for first time contributors, with a simple python and html codebase. If you need any help while working with the code, reach out to us on [Discord](https://discord.com/invite/zerRaGK9EC).

### Ways to help (non-technical)

Expand All @@ -125,7 +125,7 @@ Gitingest aims to be friendly for first time contributors, with a simple python
2. Setup the dev environment (see Development section bellow)
3. Run unit tests with `pytest`
4. Commit your changes and run `pre-commit`
5. Open a pull request on Github for review and feedback
5. Open a pull request on GitHub for review and feedback
6. (Optionnal) Invite project maintainer to your branch for easier collaboration

## 🔧 Development
Expand Down Expand Up @@ -161,7 +161,7 @@ Gitingest aims to be friendly for first time contributors, with a simple python
pytest
```

The application should be available at `http://localhost:8000`
The application should be available at `http://localhost:8000`.

### Working on the CLI

Expand Down
2 changes: 1 addition & 1 deletion src/gitingest/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
""" Gitingest: A package for ingesting data from git repositories. """
""" Gitingest: A package for ingesting data from Git repositories. """

from gitingest.query_ingestion import run_ingest_query
from gitingest.query_parser import parse_query
Expand Down
4 changes: 2 additions & 2 deletions src/gitingest/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
@click.option("--max-size", "-s", default=MAX_FILE_SIZE, help="Maximum file size to process in bytes")
@click.option("--exclude-pattern", "-e", multiple=True, help="Patterns to exclude")
@click.option("--include-pattern", "-i", multiple=True, help="Patterns to include")
def main(
async def main(
source: str,
output: str | None,
max_size: int,
Expand Down Expand Up @@ -54,7 +54,7 @@ def main(

if not output:
output = "digest.txt"
summary, _, _ = ingest(source, max_size, include_patterns, exclude_patterns, output=output)
summary, _, _ = await ingest(source, max_size, include_patterns, exclude_patterns, output=output)

click.echo(f"Analysis complete! Output written to: {output}")
click.echo("\nSummary:")
Expand Down
4 changes: 2 additions & 2 deletions src/gitingest/exceptions.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ def __init__(self, pattern: str) -> None:

class AsyncTimeoutError(Exception):
"""
Raised when an async operation exceeds its timeout limit.
Exception raised when an async operation exceeds its timeout limit.
This exception is used by the `async_timeout` decorator to signal that the wrapped
asynchronous function has exceeded the specified time limit for execution.
Expand All @@ -38,7 +38,7 @@ def __init__(self, max_files: int) -> None:


class MaxFileSizeReachedError(Exception):
"""Raised when the maximum file size is reached."""
"""Exception raised when the maximum file size is reached."""

def __init__(self, max_size: int):
super().__init__(f"Maximum file size limit ({max_size/1024/1024:.1f}MB) reached.")
Expand Down
4 changes: 3 additions & 1 deletion src/gitingest/query_ingestion.py
Original file line number Diff line number Diff line change
Expand Up @@ -170,7 +170,9 @@ def _read_file_content(file_path: Path) -> str:

def _sort_children(children: list[dict[str, Any]]) -> list[dict[str, Any]]:
"""
Sort children nodes with:
Sort the children nodes of a directory according to a specific order.
Order of sorting:
1. README.md first
2. Regular files (not starting with dot)
3. Hidden files (starting with dot)
Expand Down
Loading

0 comments on commit dd8f1e0

Please sign in to comment.