Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataDreamer - v0.2.0 #68

Merged
merged 60 commits into from
Nov 12, 2024
Merged

DataDreamer - v0.2.0 #68

merged 60 commits into from
Nov 12, 2024

Conversation

sokovninn
Copy link
Member

This PR includes:

  • Improvements to LuxonisDataset
  • Added logging, tests, and refactoring
  • Reworked GHCR publish actions
  • Added a profanity filter for input class names
  • Added Qwen 2.5 LM as a prompt generator
  • Added SlimSAM for instance segmentation
  • Bug fixes

sokovninn and others added 30 commits April 29, 2024 17:00
* fix: replace typing with typing_extensions

* fix: remove deprecated truncation argument

* feature: add luxonis loader with plugin

* feature: modify luxonis dataset with plugin

* style: formatting

* [Automated] Updated coverage badge

* style: remove commented lines

* fix: remove dataset_id from luxonis dataset args

* fix: remove redundant env var check

* [Automated] Updated coverage badge

* fix: empty image_paths with luxonis dataset plugin

* style: formatting

* [Automated] Updated coverage badge

* test: simple luxonis dataset test

* fix: luxonis dataset converter missing attr

* style: formatting

* fix: luxonis dataset converter parent

* [Automated] Updated coverage badge

---------

Co-authored-by: GitHub Actions <[email protected]>
* chore: add gcsfs to requirements.txt

* chore: fix dev image tag
* fix: bbox labels in LuxonisDatasetConverter

* fix: convert split_ratios to tuple for luxonis-ml

* [Automated] Updated coverage badge

---------

Co-authored-by: GitHub Actions <[email protected]>
* fix: use plugin arg to load image paths

* fix: pre-commit formatting

* fix: pass sync_target_directory to loader
* fix: unet from config warning

* feat: add logger

* test: add utils tests

* style: utils tests formatting

* fix: args extenstion in merge dataset function

* docs: docstrings and return types

* [Automated] Updated coverage badge

* fix: remove axes in bbox visualization

* tests: improve image generation tests

* [Automated] Updated coverage badge

* docs: fix docstrings formatting

* fix: type hints

* tests: replace default ubuntu runner with buildjet runner

* fix: type hint

* test: modify memory computation

* test: round up ram computation

* test: disable output capturing

* test: decrease required ram for demanding tests

* test: 8vpcu buildjet runner

* test: fix buildjet 8cpu runner

* test: fix 8vcpu buildjet

* test: divide tests into core and heavy

* style: tests formatting

* test: rename core tests

* test: run core tests on pull to dev

* test: fix config paths

* [Automated] Updated coverage badge

* test: update tests

* [Automated] Updated coverage badge

* test: run core tests on pr to main

* test: rename heavy test scripts

---------

Co-authored-by: GitHub Actions <[email protected]>
* chore: add manual GHCR publish trigger from any branch

* chore: remove GAR publish on release

* chore: rename GHCR on release publish action

* chore: modify commit hash extraction
* Add safety features

* Fix lm prompt testing

* Correct lm prompt testing

* [Automated] Updated coverage badge

* Rework the safety features

* Add truncation

* Remove truncation

* Update Profanity Filter

* Profanity Filter refactor

---------

Co-authored-by: GitHub Actions <[email protected]>
* Add Qwen2.5 LM as prompt generator

* Format code

* fix: prompt text visualization

* fix: padding_side="left" in Qwen2.5 to remove warning

* docs: update args description

---------

Co-authored-by: Nikita Sokovnin <[email protected]>
Copy link

github-actions bot commented Oct 29, 2024

☂️ Python Coverage

current status: ✅

Overall Coverage

Lines Covered Coverage Threshold Status
1693 1271 75% 0% 🟢

New Files

File Coverage Status
datadreamer/dataset_annotation/slimsam_annotator.py 82% 🟢
datadreamer/prompt_generation/profanity_filter.py 85% 🟢
datadreamer/prompt_generation/qwen2_lm_prompt_generator.py 73% 🟢
datadreamer/utils/bad_words.py 100% 🟢
TOTAL 85% 🟢

Modified Files

File Coverage Status
datadreamer/dataset_annotation/init.py 100% 🟢
datadreamer/dataset_annotation/clip_annotator.py 63% 🟢
datadreamer/dataset_annotation/image_annotator.py 88% 🟢
datadreamer/dataset_annotation/owlv2_annotator.py 81% 🟢
datadreamer/dataset_annotation/utils.py 75% 🟢
datadreamer/image_generation/clip_image_tester.py 96% 🟢
datadreamer/image_generation/image_generator.py 86% 🟢
datadreamer/image_generation/sdxl_image_generator.py 25% 🟢
datadreamer/image_generation/sdxl_lightning_image_generator.py 28% 🟢
datadreamer/image_generation/sdxl_turbo_image_generator.py 73% 🟢
datadreamer/pipelines/generate_dataset_from_scratch.py 84% 🟢
datadreamer/prompt_generation/init.py 100% 🟢
datadreamer/prompt_generation/lm_prompt_generator.py 60% 🟢
datadreamer/prompt_generation/lm_synonym_generator.py 36% 🟢
datadreamer/prompt_generation/prompt_generator.py 94% 🟢
datadreamer/prompt_generation/synonym_generator.py 93% 🟢
datadreamer/prompt_generation/tinyllama_lm_prompt_generator.py 85% 🟢
datadreamer/utils/base_converter.py 96% 🟢
datadreamer/utils/coco_converter.py 90% 🟢
datadreamer/utils/config.py 100% 🟢
datadreamer/utils/convert_dataset.py 23% 🟢
datadreamer/utils/dataset_utils.py 95% 🟢
datadreamer/utils/luxonis_dataset_converter.py 73% 🟢
datadreamer/utils/merge_raw_datasets.py 75% 🟢
datadreamer/utils/nms.py 75% 🟢
datadreamer/utils/single_label_cls_converter.py 97% 🟢
datadreamer/utils/yolo_converter.py 79% 🟢
TOTAL 77% 🟢

updated for commit: c89ca34 by action🐍

Copy link

Test Results

  5 files    5 suites   1h 19m 25s ⏱️
 77 tests  60 ✅  17 💤 0 ❌
385 runs  276 ✅ 109 💤 0 ❌

Results for commit 1914f7d.

Copy link

@klemen1999 klemen1999 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@HonzaCuhel HonzaCuhel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but can we please first merge this PR as it contains a fix for when using DataDreamer to just annotate images which aren't necessarily all RGB.

HonzaCuhel and others added 8 commits October 30, 2024 13:01
* Fix: convert images to RGB

* Change the source branch to install

* [Automated] Updated coverage badge

---------

Co-authored-by: GitHub Actions <[email protected]>
* fix: add images with no annotations to LuxonisDataset

* fix(workflow): use branch we run workflow from

* test: confirm current branch in workflow

* fix(workflows): correctly pass branch arg

* fix(workflows): tags

* [Automated] Updated coverage badge

* chore: change min required luxonis-ml version to 0.5.0

---------

Co-authored-by: conorsim <[email protected]>
Co-authored-by: GitHub Actions <[email protected]>
* Add option to keep images with no annotation, by default removing & refactor

* [Automated] Updated coverage badge

* Rename 'keep_empty_images' to 'keep_unlabeled_images'

---------

Co-authored-by: GitHub Actions <[email protected]>
@sokovninn sokovninn merged commit 23e18d4 into main Nov 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants