forked from JDACS4C-IMPROVE/GraphDRP
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathnotes
102 lines (72 loc) · 3.33 KB
/
notes
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
* Clone the repo
$ git clone https://github.com/JDACS4C-IMPROVE/GraphDRP
* Create env
$ conda_env_py37.sh
* Generate required datasets:
choice:
0: create mixed test dataset
1: create saliency map dataset
2: create blind drug dataset
3: create blind cell dataset
$ python preprocess.py --choice 0
* Train model
$ python training.py --model 0 --train_batch 1024 --val_batch 1024 --test_batch 1024 --lr 0.0001 --num_epoch 300 --log_interval 20 --cuda_name "cuda:0" --set drug
* Saliency
$ python saliency_map.py --model 0 --num_feature 10 --processed_data_file "data/processed/GDSC_bortezomib.pt" --model_file "model_GINConvNet_GDSC.model" --cuda_name "cuda:0"
--------
Run
--------
# Preprocess data (create datasets)
# choice: 0: create mixed test dataset, 1: create saliency map dataset, 2: create blind drug dataset, 3: create blind cell dataset
python preprocess.py --choice 0
python preprocess.py --choice 1
python preprocess.py --choice 2
python preprocess.py --choice 3
# Training mixed test experiment
python training.py --model 0 --train_batch 1024 --val_batch 1024 --test_batch 1024 --lr 0.0001 --num_epoch 300 --log_interval 20 --cuda_name "cuda:0"
python training.py --model 1 --train_batch 1024 --val_batch 1024 --test_batch 1024 --lr 0.0001 --num_epoch 300 --log_interval 20 --cuda_name "cuda:0"
python training.py --model 2 --train_batch 1024 --val_batch 1024 --test_batch 1024 --lr 0.0001 --num_epoch 300 --log_interval 20 --cuda_name "cuda:0"
python training.py --model 3 --train_batch 1024 --val_batch 1024 --test_batch 1024 --lr 0.0001 --num_epoch 300 --log_interval 20 --cuda_name "cuda:0"
------------------
Data preprocessing
------------------
preprocess.py was adopted from tCNN (https://github.com/Lowpassfilter/tCNNS-Project/blob/master/data/preprocess.py).
* PANCANCER_Genetic_feature.csv
https://www.cancerrxgene.org/downloads/genetic_features
col "genetic_feature" contains either mutation suffixed with "_mut" or CNA prefixes with "cna_"
* PANCANCER_IC.csv
https://www.cancerrxgene.org/downloads/drug_data
Click on Download to get response data
GDSC1 and GDSC2 provides different files
* Cell_list.csv
* Druglist.csv --> 265 drugs
CSV file downloaded from https://www.cancerrxgene.org/downloads/drug_data (click on CSV, not Download)
GDSC1 and GDSC2 provides different files (in this paper they use only one file, which one?)
* drug_smiles.csv --> 223 drugs
Generated by func preprocess.py/download_smiles() to contain drugs from Druglist and their SMILES
* pychem_cid.csv and unknow_drug_by_pychem.csv
Generated by func preprocess.py/write_drug_cid()
pychem_cid: molecules retrieved from PubChem
unknow_drug_by_pychem: molecules not found in PubChem
* small_molecule.csv
Downloaded from http://lincs.hms.harvard.edu/db/sm/
This dataset was downloaded in order to find molecules that are present in Druglist but were not retrieved from PubChem
--------
Issues
--------
Didn't find create_data.py
That becomes a problem when using saliancy_map.py
---------------
Learning curves
---------------
Created lc_prep.py from preprocess.py and modified it as needed to generete data for learning curves.
python lc_prep.py
./lc_batch.sh
---------------
CANDLE
---------------
---------------
CSG
---------------
* Prepare raw data using either cross_study_gen or IMP_data
* ./frm_preprocess.sh