Skip to content

Latest commit

 

History

History
155 lines (116 loc) · 6.67 KB

Autoencoder.md

File metadata and controls

155 lines (116 loc) · 6.67 KB

Auto-Encoder trained on ImageNet

Train VGG-like and ResNet-like auto-encoder on image dataset like ImageNet

imagenet-autoencoder/reconstruction.jpg at main · Horizon2333/imagenet-autoencoder (github.com)

  1. Project Structure
  2. Install
  3. Data preparing
  4. Train and Evaluate
  5. Tools
  6. Model zoo

Project Structure

$imagenet-autoencoder
    |──figs                 # result images
        |── *.jpg
    |──models
        |──builder.py       # build autoencoder models
        |──resnet.py        # resnet-like autoencoder
        |──vgg.py           # vgg-like autoencoder
    |──run
        |──eval.sh          # command to evaluate single checkpoint
        |──evalall.sh       # command to evaluate all checkpoints in specific folder
        |──train.sh         # command to train auto-encoder
    |──tools                
        |──decode.py        # decode random latent code to images
        |──encode.py        # encode single image to latent code
        |──generate_list.py # generate image list for training
        |──reconstrust.py   # reconstruct the images to see difference
    |──dataloader.py        # dataset and dataloader
    |──eval.py              # evaluate checkpoints
    |──train.py             # train models
    |──utils.py             # other utility function
    |──requirements.txt
    |──Autoencoder.md

Install

  1. Clone the project
git clone https://github.com/Horizon2333/imagenet-autoencoder
cd imagenet-autoencoder
  1. Install dependencies
pip install -r requirements.txt

Data Preparing

Your dataset should looks like:

$your_dataset_path
    |──class1
        |──xxxx.jpg
        |──...
    |──class2
        |──xxxx.jpg
        |──...
    |──...
    |──classN
        |──xxxx.jpg
        |──...

The you can use tools/generate_list.py to generate list of training samples. Here we do not use torchvision.datasets.ImageFolder because it is very slow when dataset is pretty large. You can run

python tools/generate_list.py --name {name your dataset such as caltech256} --path {path to your dataset}

Then two files will be generated under list folder, one *_list.txt save every image path and its class(here no use); one *_name.txt save index of every class and its class name.

Train and Evaluate

For training

bash run/train.sh {model architecture such as vgg16} {you dataset name}
# For example
bash run/train.sh vgg16 caltech256

For evaluating single checkpoint:

bash run/eval.sh {model architecture} {checkpoint path} {dataset name}
# For example
bash run/eval.sh vgg16 results/caltech256-vgg16/099.pth caltech101

For evaluating all checkpoints under specific folder:

bash run/evalall.sh {model architecture} {checkpoints path} {dataset name}
# For example
bash run/evalall.sh vgg16 results/caltech256-vgg16/ caltech101

When all checkpoints are evaluated, a scatter diagram figs/evalall.jpg will be generated to show the evaluate loss trend.

For model architecture, now we support vgg11,vgg13,vgg16,vgg19 and resnet18, resnet34, resnet50, resnet101, resnet152.

Tools

We provide several tools to better visualize the auto-encoder results.

reconstruct.py

Reconstruct images from original one. This code will sample 64 of them and save the comparison results to figs/reconstruction.jpg.

python tools/reconstruct.py --arch {model architecture} --resume {checkpoint path} --val_list {*_list.txt of your dataset}
# For example
python tools/reconstruct.py --arch vgg16 --resume results/caltech256-vgg16/099.pth --val_list caltech101_list.txt

encode.py and decode.py

Encode image to latent code or decode latent code to images.

encode.py can transfer single image to latent code.

python tools/encode.py --arch {model architecture} --resume {checkpoint path} --img_path {image path}
# For example
python tools/encode.py --arch vgg16 --resume results/caltech256-vgg16/099.pth --img_path figs/reconstruction.jpg

decode.py can transform 128 random latent code to images.

python tools/decode.py --arch {model architecture} --resume {checkpoint path} 
# For example
python tools/decode.py --arch vgg16 --resume results/caltech256-vgg16/099.pth

The decoded results will be save as figs/generation.jpg

Model zoo

Dataset VGG11 VGG13 VGG16 VGG19 ResNet18 ResNet34 ResNet50 ResNet101 ResNet152
Caltech256 link link link link link link link link link
Objects365 link link link link
ImageNet link

Note that the size of Objects365 dataset is about half of ImageNet dataset(128 million images, much larger than Caltech256), so the performance may be comparable.