Skip to content

HKU-BAL/Clairvoyante-PyTroch

Repository files navigation

Clairvoyante-pt

Pytorch version of Clairvoyante.

The main file is clairvoyante/clairvoyante_v3_pytorch.py which contains the code for the Pytorch model. It has the exact same APIs as the tensorflow Clairvoyante model in https://github.com/aquaskyline/Clairvoyante/blob/rbDev/clairvoyante/clairvoyante_v3.py.

The code initialises Clairvoyante with 3 convolutional layers, 2 hidden fully connected layers and 4 output layers. It specifies the parameters for these layers and it initialises the network's weights using He initializtion.

Pytorch uses NCHW format for tensor dimensions so all tensors require permutation in order to be used by the code.

Dependencies

Install this on top of the dependencies and folders listed in https://github.com/aquaskyline/Clairvoyante:

pip install torch torchvision

How to use the module

Initialise the model in the run function in train.py and callVar.py using

{module name}.Net()

Add

device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
if torch.cuda.device_count() > 0:
    m.to(device)

to the run function in train.py and callVar.py after initialising the model to use one or more GPUs.

GPU

Use the CUDA_VISIBLE_DEVICE environment variable to specify the GPUs to use. This can be done using the command export CUDA_VISIBLE_DEVICES="$i", where $i is an integer from 0 identifying the seqeunce of the GPU to be used. The code supports GPU parallelism. If no GPUs are specified, the CPU is used instead.

Folder Stucture and Program Descriptions

clairvoyante/ Contains the Pytorch Model
clairvoyante_v3_pytorch.py Pytorch Model of Clairvoyante.
clairvoyante_v3_pytorch_test.py Unit test cases to test Pytorch model's loss function.
correctVCFs/ Contains the VCFs produced by TF Clairvoyante and training and testing data sets
basic_luo_chr21.vcf VCF produced by CallVAr using model produced by demoRun.py.
correct_21.vcf chr21.vcf in the testingData folder.
luo_bam_21.vcf VCF produced by CallVarBam using fullv3-illumina-novoalign-hg001+hg002-hg38/learningRate1e-3.epoch500.
luo_tensor_can_21.vcf VCF produced by CallVar using fullv3-illumina-novoalign-hg001+hg002-hg38/learningRate1e3.epoch500.
ngmlr1_chr19.vcf VCF produced by CallVarBam using fullv3-ont-ngmlr-hg001-hg19.
evalResults/ Each folder contains a results for a different vcf-eval. The results are at summary/summary.txt in each folder.
TrainBamCPU_chr21/ Comparison between VCFs made by train.py and CallVarBam.py and correct_21.vcf. (Used in presentation)
basicLuo_correct/ Comparison between VCFs made by train.py and correct_21.vcf. (Used in presentation)
correct_bam/ Comparison between VCFs made by fullv3-illumina-novoalign-hg001+hg002-hg38/learningRate1e3.epoch500 using CallVarBam.py and correct_21.vcf.
luo_correct/ Comparison between VCFs made by fullv3-illumina-novoalign-hg001+hg002-hg38/learningRate1e3.epoch500 using CallVar.py and correct_21.vcf.
ngmlr1_chr19/ Comparison between VCFs made by fullv3-ont-ngmlr-hg001-hg19 using CallVarBam.py and /nas7/yswong/base/hg19_chr19.vcf.gz. (Used in presentation)
trainAll2_chr19/ Second comparison betwen VCFs produced by CallVarBam using fullv3-ont-ngmlr-hg001-hg19 and /nas7/yswong/base/hg19_chr19.vcf.gz using the GTX 980. (Used in presentation)
trainAll3_chr19/ Comparison betwen VCFs produced by CallVarBam.py using fullv3-ont-ngmlr-hg001-hg19 and /nas7/yswong/base/hg19_chr19.vcf.gz using the GTX Titan and GTX 1080 Ti with a training batch size of 5000.
trainAll4_chr19/ Comparison betwen VCFs produced by CallVarBam.py using fullv3-ont-ngmlr-hg001-hg19 and /nas7/yswong/base/hg19_chr19.vcf.gz using the GTX Titan and GTX 1080 Ti with a training batch size of 10000.
trainAll_correct/ Comparison betwen VCFs produced by CallVarBam.py using fullv3-ont-ngmlr-hg001-hg19 and /nas7/yswong/base/hg19_chr19.vcf.gz using the GTX 980.
pytorchModels/ Each folder is a training experiment. Each folder contains the output of each training and some also contains the model parameters stored in a txt file. All models uses /nas7/yswong/trainingData/tensor_all.bin to train.
trainAll/ Model produced by training using the GTX 980.
trainAll2/ Model produced after training a second time using the GTX 980.
trainAll3_5000PGPU/ Model produced after training using the GTX 1080 Ti and GTX Titan using a training batch size of 5000.
trainAll4_10000PGPU/ Model produced after training using the GTX 1080 Ti and GTX Titan using a training batch size of 10000.
trainAll5_1080Ti/ Output produced after training using the GTX 1080 Ti.
trainAll6_Titan/ Output produced after training using the GTX Titan.
trainAll7_2_1080_Ti/ Output produced after training using two GTX 1080 Ti.