Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Minify Option to Reduce Output Size #26

Open
escorciav opened this issue Dec 13, 2024 · 6 comments
Open

Feature Request: Minify Option to Reduce Output Size #26

escorciav opened this issue Dec 13, 2024 · 6 comments
Labels
suggestion New feature or request

Comments

@escorciav
Copy link

Thank you for the great tool! While using Gitingest, I encountered a challenge with large repositories containing irrelevant or extraneous information in the .txt output. This includes files such as:

  • Jupyter Notebooks with embedded images.
  • CSV files with results or large datasets.
  • (possibly) Binaries or other non-textual data.

These often inflate the file size unnecessarily, making it difficult for downstream tools (e.g., ChatGPT) to process them efficiently.

Feature Request:

  • Add a minify option to exclude or summarize such files in the output. This option could work by:
    • Stripping or summarizing Jupyter Notebooks (e.g., excluding image data or non-code cells).
    • Skipping large files like CSVs or binaries entirely.
    • Allowing configurable file-type or size exclusions (e.g., via an argument like --exclude ".csv,.ipynb,*.bin" or --max-file-size 5MB).

Use Case:
This would be particularly useful for repositories where only code and textual content are relevant, improving the usability and performance of gitingest outputs.

Proposed Approach:
A potential implementation could involve:

  • Scanning file extensions or MIME types to exclude or process specific formats.
  • Adding a size threshold to skip overly large files.
  • Leveraging libraries like nbconvert for Jupyter Notebook minification.

Thank you for considering this enhancement!

@cyclotruc
Copy link
Owner

Thank you for your feedback

At the moment the non text files should be ignored by default, could you provide an example link where you this issue?

@escorciav
Copy link
Author

escorciav commented Dec 13, 2024

Try out with this repo. I got a file around 8MB. I manually minify it to 568Kb.

mlfoundations-open_clip.txt

@cyclotruc cyclotruc added the suggestion New feature or request label Dec 13, 2024
@cyclotruc
Copy link
Owner

Ok this makes sense, i'm going to look into that

@escorciav
Copy link
Author

I guess someone is hammering at this ⚒️ 🚀

Screenshot 2024-12-17 at 5 19 24 AM

@cyclotruc
Copy link
Owner

Don't hesitate to tell me if there's anything you whish you could do that those two settings don't permit.

I plan to add a "minify" as part of an advanced settings revamp but some issues are higher priority right now

@cyclotruc
Copy link
Owner

@filipchristiansen made this #105

And 3 weeks after my last answer I realise that the priorities are now shifting towards improving the points you mentioned

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
suggestion New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants