GitHub - paumarquez/mono2binaural-conv-tasnet

End-to-End Mono to Binaural Conversion with Conv-TasNet

This repository is the result of a research project in the Bachelor's Degree in Data Science and Engineering in Universitat Politècnica de Catalunya (UPC).

It is an end-to-end approach to mono to binaural conversion, having 2.5D Visual Sound as the baseline and focused on Conv-TasNet's architecture.

More information can be found in paper_mono2binaural_tasnet.pdf.

Training and Testing

(The code has beed tested under the following system environment: Ubuntu 18.04.5 LTS, CUDA 11.1, Python 3.6.9, PyTorch 1.6.0)

Download the FAIR-Play dataset.
Generate the frames from the mp4 videos with the script generate_frames.py.
Set relative path to the splits with the script generate_splits.py.
[OPTIONAL] Preprocess the audio files using reEncodeAudio.py to accelerate the training process.
Use the following command to train a model:

python3 train.py --hdf5FolderPath /YOUR_CODE_PATH/2.5d_visual_sound/hdf5/ --name mono2binaural --model MODEL_NAME --checkpoints_dir /YOUR_CHECKPOINT_PATH/ --save_epoch_freq 50 --display_freq 10 --save_latest_freq 100 --batchSize 32 --learning_rate_decrease_itr 10 --niter 1000 --lr_visual 0.0001 --lr_audio 0.001 --nThreads 32 --gpu_ids 0,1,2,3,4,5,6,7 --validation_on --validation_freq 100 --validation_batches 50 --tensorboard True --use_visual_info |& tee -a training.log

The model parameter refers to either tasnet or audioVisual.

If it does not fit into the gpu, use the stepBatchSize parameter.

Use the following command to test your trained mono2binaural model:

python3 demo.py --input_audio_path /BINAURAL_AUDIO_PATH --video_frame_path /VIDEO_FRAME_PATH --weights_visual /VISUAL_MODEL_PATH --weights_audio /AUDIO_MODEL_PATH --output_dir_root /YOUT_OUTPUT_DIR/ --input_audio_length 10 --hop_size 0.05 --model MODEL_NAME --use_visual_info

Use the following command for evaluation:

python evaluate.py --results_root /YOUR_RESULTS --normalization True

Acknowlegements

This code is manly based on 2.5 Visual Sound.

The Conv-TasNet implementation is based on Demucs.

Licence

The code is CC BY 4.0 licensed, as found in the LICENSE file.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
data		data
models		models
options		options
preprocess		preprocess
util		util
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
demo.py		demo.py
evaluate.py		evaluate.py
paper_mono2binaural_tasnet.pdf		paper_mono2binaural_tasnet.pdf
reEncodeAudio.py		reEncodeAudio.py
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

End-to-End Mono to Binaural Conversion with Conv-TasNet

Training and Testing

Acknowlegements

Licence

About

Releases

Packages

Languages

License

paumarquez/mono2binaural-conv-tasnet

Folders and files

Latest commit

History

Repository files navigation

End-to-End Mono to Binaural Conversion with Conv-TasNet

Training and Testing

Acknowlegements

Licence

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages