Why do we have to manually split training and validation images? #45

aferust · 2021-01-25T10:25:02Z

Yes, why? It is a huge hassle. Can you easily implement an auto split procedure using a scikit-learn function (train_test_split)?

maikherbig · 2021-01-26T07:30:46Z

Hi,
thanks for the new issue! Having such an option would indeed be cool for quick model performance tests.

Although, I must admit that I deliberately omitted this feature so far. The reason is that I wanted users to wisely choose their validation data. Lots of things can go wrong when capturing a dataset (wrong labeling, badly focused image, lots of blur, dirt on cam, oversampling one class, …) and the validation set should be chosen and checked with extra care. Furthermore, users almost always want to train models for predicting events in the future. Therefore, it makes sense that the validation set is captured after the training set. Similarly, for biomedical applications, trained models still need to work for data from a new experiment or of another patient.

If the existing dataset is very large, a random allocation of validation data might be an option. Therefore, I’m now thinking how to implement the train_test_split, you suggested, as an option.

aferust · 2021-01-26T07:46:35Z

Maybe I write a helper tool for automating that split. It does not have to be embedded in the main project. It can just create the required folder structure based on one folder input. The project seems very promising, and thank you for the great contribution. I used NVIDIA-digits once in this study.: https://link.springer.com/article/10.1007/s11694-020-00707-7

However, installing digits is a hassle, especially for my students who have no programming background. I am looking for an alternative to digits that can be easily installed, then I saw your project. I could not found legacy GoogleNet in the predefined list. It is nice to know the opportunity for getting some support for the program.

maikherbig · 2021-01-26T08:12:27Z

Sounds like AIDeveloper could be a helpful tool for you. The students just need to download and unzip. AIDeveloper even works with GPU support.
Within the unzipped AIDeveloper folder, you can find following scripts:
model_zoo.py, aid_backbone.py, aid_bin.py, aid_start.py , aid_dl.py , aid_frontend.py ,aid_img.py,
which you can modify to your desire. After the next start of AIDevloper.exe the changes will take effect.
In particular, model_zoo.py contains the definitions of the neural nets. There is explanation how to add a model within the script. Furthermore, I uploaded a tutorial video showing how to add models to the model zoo.

Maybe you already discovered the "Python" tab within AIDeveloper. There, you can execute any code you want in the same Python environment that is used by AIDeveloper. Hence, packages like tensorflow, scikit-learn, opencv and so on are available without having to install anything.
You could for example use it to execute code for automating the train_test split :)

aferust · 2021-01-26T08:44:46Z

Thank you for the information. It looks like the user has much control over it.

aferust · 2021-01-26T14:46:27Z

Dear Maik,

Here is my tt_split implementation. Probably, it needs more error handling. I had near-zero experience with QT, and this one is my first Qt program, and looks like it does the job :)

https://gist.github.com/aferust/55bb70359fdd3148c7e920b02907084a

maikherbig · 2021-01-29T04:51:14Z

Thanks for sharing your code!
I used QtDesigner for AIDeveloper. The resulting .ui files can be transformed into .py scripts.
It sounds like you just need some kind of easy to install Python environment for your course. I made an the project PyBox which provides Python in a .zip (basically like AIDeveloper): https://github.com/maikherbig/PyBox
PyBox is easier to customize (compared to AIDeveloper)

DankMemeGuy · 2023-01-04T04:54:32Z

While I really thank the GUI and the splitting code, I really think this should be a feature of the software. Having a simple option where you load a class, then set the percentage for training, validation, and testing would be very very useful.

i understand the concern that 'garbage in, garbage out' where you would want people to check their images before using it, but I think a tool like this is more about developing skills, a reasonable model, and fast. there is also many free and easy ways to collect pretty good data, and sure is there going to be some garbage in a dataset? yeah probably, but if you have a class with 10,000 images, and you have 90% very high quality images, then that's good enough accuracy for a model made from a GUI. No one will be making a model in this tool and using it at Google or Facebook or something right? this is just for developing understanding, hobby models, etc. Having a model that is 70% good isn't bad at all!

plus the problem with using external script to do this is it just makes life harder than it has to be. the model should be continually trained and it will be better and better, and if its bad then you retrain it or you start going through the dataset, etc. i think having the option to split the dataset in the software would help people in that journey.

maikherbig · 2023-01-04T10:48:48Z

@DankMemeGuy thanks for your suggestions. I have implemented a (kind of) quick solution. You can now find a new checkbox 'Validation split(%)'. You can change that fraction during the training process on the fly.
Link to the new update https://github.com/maikherbig/AIDeveloper/releases/tag/0.4.7-update

DankMemeGuy · 2023-01-04T13:07:41Z

thank you very much!!

maikherbig added the enhancement New feature or request label Jan 26, 2021

maikherbig added a commit that referenced this issue Jan 4, 2023

Add option for validation split #45

e13f34f

maikherbig added a commit that referenced this issue Jan 4, 2023

#45

6748fe4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why do we have to manually split training and validation images? #45

Why do we have to manually split training and validation images? #45

aferust commented Jan 25, 2021

maikherbig commented Jan 26, 2021

aferust commented Jan 26, 2021 •

edited

Loading

maikherbig commented Jan 26, 2021

aferust commented Jan 26, 2021

aferust commented Jan 26, 2021 •

edited

Loading

maikherbig commented Jan 29, 2021

DankMemeGuy commented Jan 4, 2023

maikherbig commented Jan 4, 2023 •

edited

Loading

DankMemeGuy commented Jan 4, 2023

Why do we have to manually split training and validation images? #45

Why do we have to manually split training and validation images? #45

Comments

aferust commented Jan 25, 2021

maikherbig commented Jan 26, 2021

aferust commented Jan 26, 2021 • edited Loading

maikherbig commented Jan 26, 2021

aferust commented Jan 26, 2021

aferust commented Jan 26, 2021 • edited Loading

maikherbig commented Jan 29, 2021

DankMemeGuy commented Jan 4, 2023

maikherbig commented Jan 4, 2023 • edited Loading

DankMemeGuy commented Jan 4, 2023

aferust commented Jan 26, 2021 •

edited

Loading

aferust commented Jan 26, 2021 •

edited

Loading

maikherbig commented Jan 4, 2023 •

edited

Loading