VocSynth: Neural Singing Synthesizer

Introduction

This is a neural network based singing synthesizer, heavily inspired by the "A Neural Parametric Singing Synthesizer Modeling Timbre and Expression from Natural Songs" paper (https://arxiv.org/abs/1704.03809).

This project was implemented using TensorFlow, and unlike the original paper, it is implemented such that it can be trained and used with normal isolated vocal tracks that are readily available instead of manually recorded and annotated tracks. The project also utilizes the Penn Phonetics Lab Forced Aligner (https://github.com/jaekookang/p2fa_py3) as well to enable this.

The training phase requires isolated vocal tracks along with the lyrics of each track as inputs. During inference, an isolated vocal track of a different singer and the corresponding lyrics must be given as inputs, resulting in replacing the vocals with the trained singer's voice and singing style. As an additional input, the alternative musical notes can be given to alter the pitch of the vocals as desired.

There are three models trained and used here.

Harmonic/Spectral Model: This is used to generate the Harmonic Spectral Envelope of the output.
Aperiodic Model: This is used to generate the Aperiodic Spectral Envelope of the output.
Frequency Model: This is used to generate the Frequency of the output.

The above three outputs are used by the Vocoder to generate the final audio output.

Requirements and installation

The project uses Python 3.6.8. The libraries required can be installed by the following command in the project directory.

$ pip install -r requirements.txt

There are some additional requirements as well to run this program.

Installing HTK
Installing Sox
Installing mysys (For Windows only)

The guide for installing the above requirements on Linux/MacOS was directly extracted from here. The Windows installation is more complicated and the guide is included below.

Installing requirements on Linux/MacOS

First, you need to download HTK source code (http://htk.eng.cam.ac.uk/). This HTK installation guide is retrieved from Link. Installation is based on macOS Sierra.

Unzip HTK-3.4.1.tar.gz file

$ tar -xvf HTK-3.4.1.tar.gz

After extracting the tar file, switch to htk directory.

$ cd htk

Compile HTK in the htk directory.

$ export CPPFLAGS=-UPHNALG
$ ./configure --disable-hlmtools --disable-hslab
$ make clean    # necessary if you're not starting from scratch
$ make -j4 all
$ sudo make -j4 install

Note: For macOS, you may need to follow these steps before compiling HTK:

# Add CPPFLAGS
$ export CPPFLAGS=-I/opt/X11/include

# If the above doesn't work, do 
$ ln -s /opt/X11/include/X11 /usr/local/include/X11

# Replace line 21 (#include <malloc.h>) of HTKLib/strarr.c as below
#   include <malloc/malloc.h> 

# Replace line 1650 (labid != splabid) of HTKLib/HRec.c as below
#   labpr != splabid
# This step will prevent "ERROR [+8522] LatFromPaths: Align have dur<=0"
# See: https://speechtechie.wordpress.com/2009/06/12/using-htk-3-4-1-on-mac-os-10-5/

# Compile with options if necessary
$ ./configure
$ make all
$ maek install

The following can be used to install Sox.

$ sudo apt-get install sox

# or in Arch

$ sudo pacman -S sox

# or using brew

$ brew install sox

Windows installation

HTK installation

As a prerequisite, Microsoft Visual Studio is required to be installed in order to build the HTK source. This guide was written based on Visual Studio 2019 Community Edition.
Register for a free account to obtain a license using this link: (http://htk.eng.cam.ac.uk/register.shtml)
Download the Windows source code using here: (http://htk.eng.cam.ac.uk/download.shtml) and extract it.
Open Visual studio, and go to Tools-> Command Line -> Developer Command prompt.
Using the above terminal carry out the steps 4, 5, 7 and 8 (skip 6) in the following link: (http://htk.eng.cam.ac.uk/docs/inst-win.shtml). The steps are included below as well.

# cd into the HTK directory

cd htk

mkdir bin.win32

cd HTKLib
nmake /f htk_htklib_nt.mkf all
cd ..

cd HTKTools
nmake /f htk_htktools_nt.mkf all
cd ..
cd HLMLib
nmake /f htk_hlmlib_nt.mkf all
cd ..
cd HLMTools
nmake /f htk_hlmtools_nt.mkf all
cd ..

Add the above bin.win32 folder to the PATH.

Sox installation

Download and install using the executable installer available here: (https://sourceforge.net/projects/sox/files/latest/download)
Add the installed directory to the PATH

MSYS installation

Download and install MSYS2 using the steps here: (https://www.msys2.org/)
Add the MSYS2-installation-directory/usr/bin to PATH.

How to use

This program can be run using a command line or the GUI.

The input for training the model must be put in the Dataset folder and must follow the guidelines included in the Dataset/README.md

The args.py file contains the setting and preferences for the program and the model.

Using the command line

The program can be simply run by running the main.py file.

$ python main.py

There are multiple options and settings that are included in the args.py file. These options can be set by either including them in the command line when executing main.py or by modifying the default values for each setting in the args.py file. Some of basic settings are included below.

--model_name and --output_name for the names to be used when saving the models and the final output.
--load_data: Once the data is initially read and pre-processed, it is saved in the ProcessedData folder. Enabling this option makes sure that the training data is directly loaded from here instead of reading and pre-processing from the beginning.
--index_name: The name of the index file used for both training and generation (without the extension).
--index_type: The file type (extension) of the index files used. Can be xlsx, xls or csv, and both the training and generation index file type must be the same.
--sp_train --ap_train --f_train : Enabling these make sre the Spectral, Aperiodic and Frequency models are trained respectively. Set any to False if you want to skip training them or if only inference/generation is required.
--sp_cont --ap_cont --f_cont: If a model was already trained (available in the TrainedModels folder), by setting these to True, we can continue training them further. If set to False, the already existing Models will be overwritten and trained from the beginning.
--f_use: During generation, using the frequency model and its output can be skipped. The direct frequency from the input can be used if this option is set to False. However for more matching and natural results it is recommended to use the frequency model as well during inference.
--f_custom: During generation, using the frequency model custom musical notes can be provided to change the tune of the output (refer to the README in the Output folder for more details).
--f_de_tune: Each vocal model trained has a vocal range (pitch range) depending on the singer or the available training data. When this option is enabled the key (pitch) of the output is changed to suit the model better.
--f_smooth: During generation, using the frequency model output may have a noisy/variable output. By smoothing the output noise can be reduced to match the input notes (frequency of the notes). Set the value to 0 for no smoothing, and a value greater than 1 for smoothing (higher the value, more smoothing carried out).
All the other options are related to th pre-processing and the model. Changing these from the default value may cause some unexpected behaviour.

Using the GUI

Simply run the interface.py python file to start the GUI.

$ python interface.py

The GUI can be easily used to train and use the model in an interactive manner. This can be used from the very first step of pre-processing the dataset to training the model to finally generating using that model. The following are some preview screenshots of the GUI.

References

The Neural Parametric Singing Synthesizer paper (https://arxiv.org/abs/1704.03809).
The torch NPSS implementation by @seaniezhao (https://github.com/seaniezhao/torch_npss)
Penn Phonetics Lab Forced Aligner by @jaekookang (https://github.com/jaekookang/p2fa_py3)

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
Dataset		Dataset
Output		Output
ProcessedData		ProcessedData
Resources		Resources
TrainedModels		TrainedModels
p2fa		p2fa
.gitignore		.gitignore
README.md		README.md
args.py		args.py
data_handler.py		data_handler.py
extract_features.py		extract_features.py
frequency_tools.py		frequency_tools.py
interface.kv		interface.kv
interface.py		interface.py
interface_tools.py		interface_tools.py
main.py		main.py
model.py		model.py
model_utils.py		model_utils.py
pre_process_data.py		pre_process_data.py
read_data.py		read_data.py
requirements.txt		requirements.txt
synthesize.py		synthesize.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VocSynth: Neural Singing Synthesizer

Introduction

Requirements and installation

Installing requirements on Linux/MacOS

Windows installation

HTK installation

Sox installation

MSYS installation

How to use

Using the command line

Using the GUI

References

About

Releases

Packages

Languages

daksithj/vocsynth-neural-singing-synthesizer

Folders and files

Latest commit

History

Repository files navigation

VocSynth: Neural Singing Synthesizer

Introduction

Requirements and installation

Installing requirements on Linux/MacOS

Windows installation

HTK installation

Sox installation

MSYS installation

How to use

Using the command line

Using the GUI

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages