-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathATACseq_init
3398 lines (2775 loc) · 168 KB
/
ATACseq_init
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
LFY ATACseq Analysis
MONDAY, 10/15/2018
1.
2.
source ~/custom_scripts/removeChrFromSam.sh 6hrDexA.sam ChrC ChrMMove data from BaseSpace to the server - Create Script to join samples from the four lanes
basemount /data/home/runjin/BaseSpace_Mount/
# make directories necessary for script
mkdir scripts
mkdir log_files
mkdir fastq
ls /home/runjin/BaseSpace_Mount/Projects/Wagnerlab_10062018/Samples/*/Files/*R1* | awk -F"/" '{gsub(" ","\\ ",$0)} NR%4==1{printf "cat "$0" "}NR%4==2||NR%4==3{printf $0" "}NR%4==0{split($NF,a,"_"); printf $0"> fastq/"a[1]"_R1.fastq.gz\n"}'| sed -e "s/fastq\/42/fastq\//g" > scripts/rename_fastq_files_R1.sh
ls /home/runjin/BaseSpace_Mount/Projects/Wagnerlab_10062018/Samples/*/Files/*R2* | awk -F"/" '{gsub(" ","\\ ",$0)} NR%4==1{printf "cat "$0" "}NR%4==2||NR%4==3{printf $0" "}NR%4==0{split($NF,a,"_"); printf $0"> fastq/"a[1]"_R2.fastq.gz\n"}'| sed -e "s/fastq\/42/fastq\//g" > scripts/rename_fastq_files_R2.sh
chmod 755 scripts/rename_fastq_files_R1.sh
chmod 755 scripts/rename_fastq_files_R2.sh
qlogin
nohup ./scripts/rename_fastq_files_R1.sh > log_files/rename_fastq_files_R1.log 2>&1 &
nohup ./scripts/rename_fastq_files_R2.sh > log_files/rename_fastq_files_R2.log 2>&1 &
2. Working directory /home/runjin/LFY
TUESDAY, 10/16/2018×
●
Install Trimmomatic
curl -O http://www.usadellab.org/cms/uploads/supplementary/Trimmomatic/Trimmomatic-0.38.zip
●
Sammy's custom scripts
ls /home/wagner-lab/sklasfeld/custom_scripts/
cp /home/wagner-lab/sklasfeld/custom_scripts/name_truSeq-SingleIdx_adapters.py ~/sammy_custom_scripts/
●
Moved Sammy's name script into sammy's custom script
●
And then use the script to find full adapters
python ~/sammy_custom_scripts/name_truSeq-SingleIdx_adapters.py adapters/index.txt > adapters/adapters.txt
●
Run FASTQC on raw files:
awk '{print $1"_adapter\t"$3}' adapters/adapters.txt > adapters/adapterList4FASTQC.txt
mkdir FASTQC
mkdir FASTQC/raw
chmod 755 ~/Downloads/FastQC/fastqc
nohup ~/Downloads/FastQC/fastqc -a adapters/adapterList4FASTQC.txt -o FASTQC/raw fastq/ATAC-24hrDexA_R1.fastq.gz fastq/ATAC-24hrDexA_R2.fastq.gz fastq/ATAC-24hrDexA-Spikein_R1.fastq.gz fastq/ATAC-24hrDexA-Spikein_R2.fastq.gz fastq/ATAC-24hrDexB_R1.fastq.gz fastq/ATAC-24hrDexB_R2.fastq.gz fastq/ATAC-24hrDexB-Spikein_R1.fastq.gz fastq/ATAC-24hrDexB-Spikein_R2.fastq.gz fastq/ATAC-24hrMockA_R1.fastq.gz fastq/ATAC-24hrMockA_R2.fastq.gz fastq/ATAC-24hrMockA-Spikein_R1.fastq.gz fastq/ATAC-24hrMockA-Spikein_R2.fastq.gz fastq/ATAC-24hrMockB_R1.fastq.gz fastq/ATAC-24hrMockB_R2.fastq.gz fastq/ATAC-24hrMockB-Spikein_R1.fastq.gz fastq/ATAC-24hrMockB-Spikein_R2.fastq.gz fastq/ATAC-6hrDexA_R1.fastq.gz fastq/ATAC-6hrDexA_R2.fastq.gz fastq/ATAC-6hrDexB_R1.fastq.gz fastq/ATAC-6hrDexB_R2.fastq.gz fastq/ATAC6hrMockA_R1.fastq.gz fastq/ATAC6hrMockA_R2.fastq.gz fastq/ATAC-6hrMockB_R1.fastq.gz fastq/ATAC-6hrMockB_R2.fastq.gz fastq/MNase-HM1_R1.fastq.gz fastq/MNase-HM1_R2.fastq.gz fastq/MNase-HM2_R1.fastq.gz fastq/MNase-HM2_R2.fastq.gz fastq/MNase-LM1_R1.fastq.gz fastq/MNase-LM1_R2.fastq.gz fastq/MNase-LM2_R1.fastq.gz fastq/MNase-LM2_R2.fastq.gz &
Trim the adapters from reads and clear out low quality sequences
Then I made a script to make adapter fasta files using the command:
cd adapters
awk '{print "echo \\>D701–D712_Adapter > "$1"_adapters.fasta\necho "$3" >> " $1"_adapters.fasta\n"}' adapters.txt > ../scripts/get_adapters.sh
○
Then I ran the customized script
source ../scripts/get_adapters.sh
○
To run Trimmomatic I first made a batch script (Note these parameters were selected based on the example given in the Trimmomatic manual)
cd /home/runjin/LFY/fastq
ls *.gz| awk '$1~"R1"{printf $0"\t"} $1~"R2"{print}'| awk -F"\t" 'BEGIN{print "#!/bin/bash"}{a=$1; sub("_R1.fastq.gz","",a); print "java -jar ~/Downloads/Trimmomatic-0.38/trimmomatic-0.38.jar PE -phred33 -trimlog log_files/trimming/"a"_trimming.log fastq/"$1" fastq/"$2" trimmed_fastq/"a"_R1.paired.fastq trimmed_fastq/"a"_R1.unpaired.fastq trimmed_fastq/"a"_R2.paired.fastq trimmed_fastq/"a"_R2.unpaired.fastq ILLUMINACLIP:adapters/"a"_adapters.fasta:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36 TOPHRED33\nfastqc -f fastq -o FASTQC/trimmed/ trimmed_fastq/"a"_R1.paired.fastq trimmed_fastq/"a"_R2.paired.fastq trimmed_fastq/"a"_R1.unpaired.fastq trimmed_fastq/"a"_R2.unpaired.fastq"}' > ../scripts/trim_adapters.sh
cd ..
chmod 755 scripts/trim_adapters.sh
mkdir trimmed_fastq/
mkdir FASTQC/trimmed/
mkdir log_files/trimming/
nohup ./scripts/trim_adapters.sh > log_files/trimming.log &
●
Install bowtie1 for paired end
cd /home/runjin/Downloads
curl -O https://sourceforge.net/projects/bowtie-bio/files/bowtie/1.2.2/bowtie-1.2.2-linux-x86_64.zip
●
Install MACS2
●
MACS2
○
Installation commands
git clone https://github.com/taoliu/MACS.git
cd MACS
python setup_w_cython.py install --prefix=/home/runjin/local
python setup.py install --prefix /home/runjin/local
●
Path fo MACS2
/home/runjin/Downloads/MACS/bin/macs2 -h
scp /Users/RunJin/Downloads/bowtie-1.2.2-linux-x86_64.zip [email protected]:/home/runjin/Downloads
FRIDAY, 10/19/2018×
●
Run FASTQC on the trimmed reads
nohup ~/Downloads/FastQC/fastqc -a adapters/adapterList4FASTQC.txt -o FASTQC/trimmed trimmed_fastq/ATAC-24hrDexA_R1.paired.fastq trimmed_fastq/ATAC-24hrDexA_R2.paired.fastq trimmed_fastq/ATAC-24hrDexA-Spikein_R1.paired.fastq trimmed_fastq/ATAC-24hrDexA-Spikein_R2.paired.fastq trimmed_fastq/ATAC-24hrDexB_R1.paired.fastq trimmed_fastq/ATAC-24hrDexB_R2.paired.fastq trimmed_fastq/ATAC-24hrDexB-Spikein_R1.paired.fastq trimmed_fastq/ATAC-24hrDexB-Spikein_R2.paired.fastq trimmed_fastq/ATAC-24hrMockA_R1.paired.fastq trimmed_fastq/ATAC-24hrMockA_R2.paired.fastq trimmed_fastq/ATAC-24hrMockA-Spikein_R1.paired.fastq trimmed_fastq/ATAC-24hrMockA-Spikein_R2.paired.fastq trimmed_fastq/ATAC-24hrMockB_R1.paired.fastq trimmed_fastq/ATAC-24hrMockB_R2.paired.fastq trimmed_fastq/ATAC-24hrMockB-Spikein_R1.paired.fastq trimmed_fastq/ATAC-24hrMockB-Spikein_R2.paired.fastq trimmed_fastq/ATAC-6hrDexA_R1.paired.fastq trimmed_fastq/ATAC-6hrDexA_R2.paired.fastq trimmed_fastq/ATAC-6hrDexB_R1.paired.fastq trimmed_fastq/ATAC-6hrDexB_R2.paired.fastq trimmed_fastq/ATAC6hrMockA_R1.paired.fastq trimmed_fastq/ATAC6hrMockA_R2.paired.fastq trimmed_fastq/ATAC-6hrMockB_R1.paired.fastq trimmed_fastq/ATAC-6hrMockB_R2.paired.fastq trimmed_fastq/MNase-HM1_R1.paired.fastq trimmed_fastq/MNase-HM1_R2.paired.fastq trimmed_fastq/MNase-HM2_R1.paired.fastq trimmed_fastq/MNase-HM2_R2.paired.fastq trimmed_fastq/MNase-LM1_R1.paired.fastq trimmed_fastq/MNase-LM1_R2.paired.fastq &
Mapping
●
Build Bowtie Index -- can be skipped
# -f reference is fasta file
bowtie-build -f /home/runjin/Araport11/TAIR10_Chr.all.fasta /home/runjin/Araport11/TAIR10_bowtie1
●
Run Bowtie
○
example bowtie command
# -m suppress all alignments if >1 exist
# --best hits guaranteed best stratum; ties broken by quality
# --no-unal
bowtie --no-unal -S --chunkmbs 200 --best -m 1 /home/runjin/Araport11/TAIR10_bowtie1 -1 trimmed_fastq/6hrDexA_P1.paired.fastq -2 trimmed_fastq/6hrDexA_P2.paired.fastq > mapping/6hrDexA.sam
●
Run bowtie for the first 2 files and see how that goes
nohup bowtie --no-unal -S --chunkmbs 200 --best -m 1 /home/runjin/Araport11/TAIR10_bowtie1 -1 trimmed_fastq/ATAC-24hrDexA_R1.paired.fastq -2 trimmed_fastq/ATAC-24hrDexA_R2.paired.fastq > mapping/ATAC-24hrDexA.sam 2> log_files/mapping/bowtie_ATAC-24hrDexA.log &
●
Make batch script
mkdir mapping
awk 'BEGIN{print "#!/bin/bash\necho \42START\42"} \
{print "nohup bowtie --no-unal -S --chunkmbs 200 \
--best -m 1 /home/runjin/Araport11/TAIR10_bowtie1 \
-1 trimmed_fastq/"$1"_R1.paired.fastq \
-2 trimmed_fastq/"$1"_R2.paired.fastq > mapping/"$1".sam \
2> log_files/mapping/bowtie_"$1".log &"} NR%2==0{print "wait\necho \42FINISHED: "NR/2"/2 WAIT\42"}END{print "wait\necho \42YAY YOU FINALLY DONE!\42"}' \
meta/SRX.list > scripts/mapping.sh
chmod 755 scripts/mapping.sh
mkdir log_files/mapping
nohup ./scripts/mapping.sh > log_files/mapping.log &
●
Move FASTQC files into my own computer
scp -r [email protected]:~/LFY/FASTQC/raw .
scp -r [email protected]:~/LFY/FASTQC/trimmed_HC15 .
●
Check the mapping rates for bowtie -- maybe try different parameters
less log_files/mapping/bowtie_ATAC-24hrDexA.log
●
Check the trimming of reads -- how many get trimmined and how many get lost
less log_files/trimming.log
●
If want to trim more adapters - we could possibly use cutadapt
cutadapt \
-a AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC \
-A AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT \
-o trimmed.1.fastq.gz -p trimmed.2.fastq.gz \
reads.1.fastq.gz reads.2.fastq.gz
●
Run FASTQC on the MNase_LM1 & 2
nohup ~/Downloads/FastQC/fastqc -a adapters/adapterList4FASTQC.txt -o FASTQC/trimmed trimmed_fastq/MNase-LM1_R1.paired.fastq trimmed_fastq/MNase-LM1_R2.paired.fastq trimmed_fastq/MNase-LM2_R1.paired.fastq trimmed_fastq/MNase-LM2_R2.paired.fastq &
●
Re mount base space & get files from it again -- not all of them because we editted the scripts/rename_fastq_files_R1.sh & scripts/rename_fastq_files_R1.sh in nano by # out all the samples that we do not need
basemount /data/home/runjin/BaseSpace_Mount/
nohup ./scripts/rename_fastq_files_R1.sh > log_files/rename_fastq_files_R1.log 2>&1 &
nohup ./scripts/rename_fastq_files_R2.sh > log_files/rename_fastq_files_R2.log 2>&1 &
●
Copy sammy's scripts to mine:
SATURDAY, 10/20/2018×
●
Check mapping rates for all analysis
nano log_files/mapping/bowtie_*.log
A
B
C
D
E
F
G
1
Sample ID Reads processed Reads mapped % mapped Reads unmapped % unmapped % duplicate
2
ATAC-24hrDexA 31173873 5548552 17.80% 22838988 73.26% 8.94%
3
ATAC-24hrDexA-Spikein 47673829 8239950 17.28% 35859107 75.22% 7.50%
4
ATAC-24hrDexB 39380500 6947699 17.64% 29024296 73.70% 8.66%
5
ATAC-24hrDexB-Spikein 37649929 6804466 18.07% 27919741 74.16% 7.77%
6
ATAC-24hrMockA 32649995 6393668 19.58% 23281761 71.31% 9.11%
7
ATAC-24hrMockA-Spikein 28757417 5972593 20.77% 20782269 72.27% 6.96%
8
ATAC-24hrMockB 28489518 5746656 20.17% 20101568 70.56% 9.27%
9
ATAC-24hrMockB-Spikein 32607285 7073689 21.69% 23070178 70.75% 7.55%
10
ATAC-6hrDexA 27889018 9743527 34.94% 14820862 53.14% 11.92%
11
ATAC-6hrDexB 27719206 27719206 35.27% 14688579 52.99% 11.74%
12
ATAC-6hrMockA 24546624 7854967 32.00% 13965293 56.89% 11.11%
13
ATAC-6hrMockB 24646711 7891856 32.02% 13949007 56.60% 11.38%
14
MNase-HM1 47044647 20958040 44.55% 17345285 36.87% 18.58%
15
MNase-HM2 74702910 30281225 40.54% 30808892 41.24% 18.22%
16
Table1
●
Import the following to get_adapters_fasta.py:
#!/usr/bin/env python
import argparse
import sys
import os
parser = argparse.ArgumentParser(description="given a tab delimited\
file of i7 and i5 TruSeq HT adapters return their respective adapters")
parser.add_argument('input_file', help='A text file with three columns. \
SAMPLE ID in column 1, Index 1 (i7 adapters) in column 2, \
and Index 2 (i5 adapters) in column 3.')
parser.add_argument('-o','--out_dir', help='directory where output is placed', \
default=".")
args = parser.parse_args()
D501_D508_1= "AATGATACGGCGACCACCGAGATCTACAC"
D501_D508_2= "TCGTCGGCAGCGTCAGATGTG"
D701_D712_1= "CAAGCAGAAGACGGCATACGAGAT"
D701_D712_2= "GTCTCGTGGGCTCGGAGATGT"
in_file = open(args.input_file, 'r')
for line in in_file:
line = line.rstrip()
fields = line.split('\t')
newfile=("%s/%s_adapters.fasta" % (os.path.abspath(args.out_dir),fields[0]))
f = open(newfile, "w")
adapter1 = (("%s%s%s") % (D701_D712_1, fields[1].strip(), D701_D712_2))
f.write(">i7_Adapter\n")
f.write("%s\n" % adapter1)
adapter2 = (("%s%s") % (D501_D508_1, D501_D508_2))
if len(fields)==3 and len(fields[2])>0 :
adapter2 = (("%s%s%s") % (D501_D508_1, fields[2], D501_D508_2))
f.write(">i5_Adapter\n")
f.write("%s\n" % adapter2)
f.close()
in_file.close()
python ~/custom_scripts/get_adapters_fasta.py adapters/index.txt -o adapters
●
Unmount base space
●
Re mount base space & get files from it again -- not all of them because we editted the scripts/rename_fastq_files_R1.sh & scripts/rename_fastq_files_R1.sh in nano by # out all the samples that we do not need
basemount /data/home/runjin/BaseSpace_Mount/
nohup ./scripts/rename_fastq_files_R1.sh > log_files/rename_fastq_files_R1.log 2>&1 &
nohup ./scripts/rename_fastq_files_R2.sh > log_files/rename_fastq_files_R2.log 2>&1 &
●
Now use the new method trimming method to trim the reads -- with the new trimming fasta files and a HEADCROP of 15bp based on the fastQC files
●
Didn't do well with the HEADCROP -- all of them are droped
●
Need to separately do the job -- try the new adapters first - first see whether that helps
cd /home/runjin/LFY/fastq
ls *.gz| awk '$1~"R1"{printf $0"\t"} $1~"R2"{print}'| awk -F"\t" 'BEGIN{print "#!/bin/bash"}{a=$1; sub("_R1.fastq.gz","",a); print "java -jar ~/Downloads/Trimmomatic-0.38/trimmomatic-0.38.jar PE -phred33 -trimlog log_files/trimming/"a"_trimming.log fastq/"$1" fastq/"$2" trimmed_fastq/"a"_R1.paired.fastq trimmed_fastq/"a"_R1.unpaired.fastq trimmed_fastq/"a"_R2.paired.fastq trimmed_fastq/"a"_R2.unpaired.fastq ILLUMINACLIP:adapters/"a"_adapters.fasta:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36 TOPHRED33\nfastqc -f fastq -o FASTQC/trimmed/ trimmed_fastq/"a"_R1.paired.fastq trimmed_fastq/"a"_R2.paired.fastq trimmed_fastq/"a"_R1.unpaired.fastq trimmed_fastq/"a"_R2.unpaired.fastq"}' > ../scripts/trim_adapters.sh
cd ..
chmod 755 scripts/trim_adapters.sh
mkdir trimmed_fastq/
mkdir FASTQC/trimmed/
mkdir log_files/trimming/
nohup ./scripts/trim_adapters.sh > log_files/trimming.log &
●
Remove previous mapped SAM files from the server -- only left with the MNase_HM1 and HM2 since they are not mount and trimmed using the new method
●
For the MNase_LM1 and LM2 - also used the new trimming with HEADCROP:15 to see whether that makes a differences
●
To do next: try out different bowtie mapping parameters to see whether that makes a differnece
SUNDAY, 10/21/2018×
●
Try trimming with the HEADCROP: 3
○
Maybe the issue i had previously was after headcrop15 they no longer have MINLEN of 36?
cd /home/runjin/LFY/fastq
ls *.gz| awk '$1~"R1"{printf $0"\t"} $1~"R2"{print}'| awk -F"\t" 'BEGIN{print "#!/bin/bash"}{a=$1; sub("_R1.fastq.gz","",a); print "java -jar ~/Downloads/Trimmomatic-0.38/trimmomatic-0.38.jar PE -phred33 fastq/"$1" fastq/"$2" trimmed_fastq/"a"_R1.paired.fastq trimmed_fastq/"a"_R1.unpaired.fastq trimmed_fastq/"a"_R2.paired.fastq trimmed_fastq/"a"_R2.unpaired.fastq ILLUMINACLIP:adapters/"a"_adapters.fasta:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 HEADCROP:3 MINLEN:36 TOPHRED33\nfastqc -f fastq -o FASTQC/trimmed/ trimmed_fastq/"a"_R1.paired.fastq trimmed_fastq/"a"_R2.paired.fastq trimmed_fastq/"a"_R1.unpaired.fastq trimmed_fastq/"a"_R2.unpaired.fastq"}' > ../scripts/trim_adapters.sh
cd ..
chmod 755 scripts/trim_adapters.sh
mkdir trimmed_fastq/
mkdir FASTQC/trimmed/
mkdir log_files/trimming/
nohup ./scripts/trim_adapters.sh > log_files/trimming.log &
●
Since the first two trimming 24hrDexA and DexB are done -- try mapping on those two files
●
Build bowtie index first:
bowtie-build -f /home/runjin/Araport11/TAIR10_Chr.all.fasta /home/runjin/Araport11/TAIR10_bowtie1
mkdir mapping
nohup bowtie --no-unal -S --chunkmbs 200 --best -m 1 /home/runjin/Araport11/TAIR10_bowtie1 -1 trimmed_fastq/ATAC-24hrDexA_R1.paired.fastq -2 trimmed_fastq/ATAC-24hrDexA_R2.paired.fastq > mapping/ATAC-24hrDexA.sam &
●
Reget the files from base space
# unmount BaseSpace
basemount --unmount /data/home/runjin/BaseSpace_Mount
basemount --unmount /data/home/runjin/BaseSpace_mount
# remount BaseSpace to update it
basemount /data/home/runjin/BaseSpace_Mount/
nohup ./scripts/rename_fastq_files_R1.sh > log_files/rename_fastq_files_R1.log 2>&1 &
nohup ./scripts/rename_fastq_files_R2.sh > log_files/rename_fastq_files_R2.log 2>&1 &
●
Single line for doing the trimming
java -jar ~/Downloads/Trimmomatic-0.38/trimmomatic-0.38.jar PE -phred33 fastq/ATAC-24hrDexA_R1.fastq.gz fastq/ATAC-24hrDexA_R2.fastq.gz trimmed_HC15_fastq/ATAC-24hrDexA_R1.paired.fastq trimmed_HC15_fastq/ATAC-24hrDexA_R1.unpaired.fastq trimmed_HC15_fastq/ATAC-24hrDexA_R2.paired.fastq trimmed_HC15_fastq/ATAC-24hrDexA_R2.unpaired.fastq ILLUMINACLIP:adapters/ATAC-24hrDexA_adapters.fasta:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 HEADCROP:15 MINLEN:24 TOPHRED33
fastqc -f fastq -o FASTQC/trimmed_HC15/ trimmed_HC15_fastq/ATAC-24hrDexA_R1.paired.fastq trimmed_HC15_fastq/ATAC-24hrDexA_R2.paired.fastq trimmed_HC15_fastq/ATAC-24hrDexA_R1.unpaired.fastq trimmed_HC15_fastq/ATAC-24hrDexA_R2.unpaired.fastq
mkdir trimmed_HC15_fastq/
mkdir FASTQC/trimmed_HC15/
mkdir log_files/trimming_HC15/
To do next:
●
Try different trimming parameters -- for example the HEADCROP of 15 and do the mapping following that -- to see whether that makes any difference
java -jar ~/Downloads/Trimmomatic-0.38/trimmomatic-0.38.jar PE -phred33 -trimlog log_files/trimming_HC15/ATAC-24hrDexB_trimming.log fastq/ATAC-24hrDexB_R1.fastq.gz fastq/ATAC-24hrDexB_R2.fastq.gz trimmed_HC15_fastq/ATAC-24hrDexB_R1.paired.fastq trimmed_HC15_fastq/ATAC-24hrDexB_R1.unpaired.fastq trimmed_HC15_fastq/ATAC-24hrDexB_R2.paired.fastq trimmed_HC15_fastq/ATAC-24hrDexB_R2.unpaired.fastq ILLUMINACLIP:adapters/ATAC-24hrDexB_adapters.fasta:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 HEADCROP:15 MINLEN:24 TOPHRED33
fastqc -f fastq -o FASTQC/trimmed_HC15/ trimmed_HC15_fastq/ATAC-24hrDexB_R1.paired.fastq trimmed_HC15_fastq/ATAC-24hrDexB_R2.paired.fastq trimmed_HC15_fastq/ATAC-24hrDexB_R1.unpaired.fastq trimmed_HC15_fastq/ATAC-24hrDexB_R2.unpaired.fastq
●
Use HEADCROP of 3 for MNase and use HEADCROP of 15 for ATAC -- then they pass all the quality check
●
Started mapping
mkdir mapping
nohup bowtie --no-unal -S --chunkmbs 200 --best -m 1 /home/runjin/Araport11/TAIR10_bowtie1 -1 trimmed_HC15_fastq/ATAC-24hrDexB_R1.paired.fastq -2 trimmed_HC15_fastq/ATAC-24hrDexB_R2.paired.fastq > mapping/ATAC-24hrDexB.sam &
mkdir mapping_HC15
nohup bowtie --no-unal -S --chunkmbs 200 --best -m 1 /home/runjin/Araport11/TAIR10_bowtie1 -1 trimmed_HC15_fastq/ATAC-24hrDexB_R1.paired.fastq -2 trimmed_fastq/ATAC-24hrDexB_R2.paired.fastq > mapping_HC15/ATAC-24hrDexB.sam &
●
Make custom scripts for mapping the HC15:
awk 'BEGIN{print "#!/bin/bash\necho \42START\42"} \
{print "nohup bowtie --no-unal -S --chunkmbs 200 \
--best -m 1 /home/runjin/Araport11/TAIR10_bowtie1 \
-1 trimmed_HC15_fastq/"$1"_R1.paired.fastq \
-2 trimmed_HC15_fastq/"$1"_R2.paired.fastq > mapping_HC15/"$1".sam \
2> log_files/mapping_HC15/bowtie_"$1".log &"} NR%2==0{print "wait\necho \42FINISHED: "NR/2"/7 WAIT\42"}' \
meta/SRX.list > scripts/mapping_HC15.sh
chmod 755 scripts/mapping_HC15.sh
mkdir log_files/mapping_HC15
nohup ./scripts/mapping_HC15.sh > log_files/mapping_HC15.log &
●
Removed the space between 2 and > and get it to work
nohup bowtie --no-unal -S --chunkmbs 200 --best -m 1 /home/runjin/Araport11/TAIR10_bowtie1 -1 trimmed_HC15_fastq/ATAC-24hrDexA_R1.paired.fastq -2 trimmed_HC15_fastq/ATAC-24hrDexA_R2.paired.fastq > mapping_HC15/ATAC-24hrDexA.sam 2>log_files/mapping_HC15/bowtie_ATAC-24hrDexA.log &
java -jar ~/Downloads/Trimmomatic-0.38/trimmomatic-0.38.jar PE -phred33 -trimlog log_files/trimming_HC15/ATAC-24hrDexB_trimming.log fastq/ATAC-24hrDexB_R1.fastq.gz fastq/ATAC-24hrDexB_R2.fastq.gz trimmed_HC15_fastq/ATAC-24hrDexB_R1.paired.fastq trimmed_HC15_fastq/ATAC-24hrDexB_R1.unpaired.fastq trimmed_HC15_fastq/ATAC-24hrDexB_R2.paired.fastq trimmed_HC15_fastq/ATAC-24hrDexB_R2.unpaired.fastq ILLUMINACLIP:adapters/ATAC-24hrDexB_adapters.fasta:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 HEADCROP:15 MINLEN:24 TOPHRED33
fastqc -f fastq -o FASTQC/trimmed_HC15/ trimmed_HC15_fastq/ATAC-24hrDexB_R1.paired.fastq trimmed_HC15_fastq/ATAC-24hrDexB_R2.paired.fastq trimmed_HC15_fastq/ATAC-24hrDexB_R1.unpaired.fastq trimmed_HC15_fastq/ATAC-24hrDexB_R2.unpaired.fastq
MONDAY, 10/22/2018×
●
Try to map the low MNase with the new trimming to see whether that changes anything
●
The trimmed MNase files are in trimmed_fastq
nohup bowtie --no-unal -S --chunkmbs 200 --best -m 1 /home/runjin/Araport11/TAIR10_bowtie1 -1 trimmed_fastq/MNase-LM1_R1.paired.fastq -2 trimmed_fastq/MNase-LM1_R2.paired.fastq > mapping_HC15/MNase-LM1.sam 2> log_files/mapping_HC3/bowtie_MNase-LM1.log &
mkdir log_files/mapping_HC3
●
Try trimming the HM1 & 2
●
FASTQC should be run AFTER the trimming is done
nohup java -jar ~/Downloads/Trimmomatic-0.38/trimmomatic-0.38.jar PE -phred33 fastq/MNase-HM1_R1.fastq.gz fastq/MNase-HM1_R2.fastq.gz trimmed_fastq/MNase-HM1_R1.paired.fastq trimmed_fastq/MNase-HM1_R1.unpaired.fastq trimmed_fastq/MNase-HM1_R2.paired.fastq trimmed_fastq/MNase-HM1_R2.unpaired.fastq ILLUMINACLIP:adapters/MNase-HM1_adapters.fasta:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 HEADCROP:3 MINLEN:33 TOPHRED33 &
fastqc -f fastq -o FASTQC/trimmed_HC15/ trimmed_fastq/MNase-HM1_R1.paired.fastq trimmed_fastq/MNase-HM1_R2.paired.fastq trimmed_fastq/MNase-HM1_R1.unpaired.fastq trimmed_fastq/MNase-HM1_R2.unpaired.fastq
nohup java -jar ~/Downloads/Trimmomatic-0.38/trimmomatic-0.38.jar PE -phred33 fastq/MNase-HM2_R1.fastq.gz fastq/MNase-HM2_R2.fastq.gz trimmed_fastq/MNase-HM2_R1.paired.fastq trimmed_fastq/MNase-HM2_R1.unpaired.fastq trimmed_fastq/MNase-HM2_R2.paired.fastq trimmed_fastq/MNase-HM2_R2.unpaired.fastq ILLUMINACLIP:adapters/MNase-HM2_adapters.fasta:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 HEADCROP:3 MINLEN:33 TOPHRED33 &
fastqc -f fastq -o FASTQC/trimmed_HC3/ trimmed_HC3_fastq/MNase-HM2_R1.paired.fastq trimmed_HC3_fastq/MNase-HM2_R2.paired.fastq trimmed_HC3_fastq/MNase-HM2_R1.unpaired.fastq trimmed_HC3_fastq/MNase-HM2_R2.unpaired.fastq
●
Try mapping the HC3 and HC15 files for ATAC and see which method gives better results
nohup bowtie --no-unal -S --chunkmbs 200 --best -m 1 /home/runjin/Araport11/TAIR10_bowtie1 -1 trimmed_HC15_fastq/ATAC-24hrDexB_R1.paired.fastq -2 trimmed_HC15_fastq/ATAC-24hrDexB_R2.paired.fastq > mapping_HC15/ATAC-24hrDexB.sam 2> log_files/mapping_HC15/bowtie_ATAC-24hrDexB.log &
nohup bowtie --no-unal -S --chunkmbs 200 --best -m 1 /home/runjin/Araport11/TAIR10_bowtie1 -1 trimmed_HC3_fastq/ATAC-24hrDexA_R1.paired.fastq -2 trimmed_HC3_fastq/ATAC-24hrDexA_R2.paired.fastq > mapping_HC3/ATAC-24hrDexA.sam 2> log_files/mapping_HC3/bowtie_ATAC-24hrDexA.log &
●
Move the FASTQC of the trimmed into my folder
scp -r [email protected]:~/LFY/FASTQC/trimmed_HC3 .
●
Checked FASTQC for MNase LM and HM -- they all look good with HC3 -- move on to mapping -- they are all mapped to HC3
nohup bowtie --no-unal -S --chunkmbs 200 --best -m 1 /home/runjin/Araport11/TAIR10_bowtie1 -1 trimmed_HC3_fastq/MNase-HM1_R1.paired.fastq -2 trimmed_HC3_fastq/MNase-HM1_R2.paired.fastq > mapping_HC3/MNase-HM1.sam 2> log_files/mapping_HC3/bowtie_MNase-HM1.log &
mkdir log_files/mapping_HC3
●
Log into previous node and see what is running
qstat | grep 'runjin'
●
Find the node name where i was running the previous things -- use the following code to log back into the previous node
qlogin -l h=node01.gpc-global
●
And the find my jobs by doing this:
ps -ef | grep runjin
●
Results for LM1
●
Results for ATAC-24hr-DexB is not any better -- with HC = 15
nohup: ignoring input
# reads processed: 30704935
# reads with at least one reported alignment: 13190852 (42.96%)
# reads that failed to align: 11728219 (38.20%)
# reads with alignments suppressed due to -m: 5785864 (18.84%)
Reported 13190852 paired-end alignments
●
Run the following with HM2 and LM2
nohup bowtie --no-unal -S --chunkmbs 200 --best -m 1 /home/runjin/Araport11/TAIR10_bowtie1 -1 trimmed_HC3_fastq/MNase-HM2_R1.paired.fastq -2 trimmed_HC3_fastq/MNase-HM2_R2.paired.fastq > mapping_HC3/MNase-HM2.sam 2> log_files/mapping_HC3/bowtie_MNase-HM2.log &
nohup bowtie --no-unal -S --chunkmbs 200 --best -m 1 /home/runjin/Araport11/TAIR10_bowtie1 -1 trimmed_HC3_fastq/MNase-LM2_R1.paired.fastq -2 trimmed_HC3_fastq/MNase-LM2_R2.paired.fastq > mapping_HC3/MNase-LM2.sam 2> log_files/mapping_HC3/bowtie_MNase-LM2.log &
●
Run the ATAC data with Bob Schmit's parameters: '-v 2 -m 3'
nohup bowtie --no-unal -S --chunkmbs 200 -v 2 -m 3 /home/runjin/Araport11/TAIR10_bowtie1 -1 trimmed_HC3_fastq/ATAC-24hrDexA_R1.paired.fastq -2 trimmed_HC3_fastq/ATAC-24hrDexA_R2.paired.fastq > mapping_HC3_BS/ATAC-24hrDexA.sam 2> log_files/mapping_HC3_BS/bowtie_ATAC-24hrDexA.log &
A
B
C
D
E
F
G
1
Trimming Mapping parameters Reads processed Reads mapped % mapped dropped non unique %
2
ATAC_24hrDexA No HC --best -m 1 31173873 5548552 17.80% 8.94%
3
ATAC_24hrDexA No HC -v 2 -m 3 31173862 6848975 21.97% 3.02%
4
ATAC_24hrDexA HC3 --best -m 1 30938097 5457221 17.64% 9.15%
5
ATAC_24hrDexA HC3 -v 2 -m 3 30938097 6745194 21.80% 3.15%
6
ATAC_24hrDexB No HC --best -m 1 39380500 6947699 17.64% 8.66%
7
ATAC_24hrDexB HC15 --best -m 1 39085647 6282739 16.07% 9.66%
8
MNase_HM1 No HC --best -m 1 47044647 20958040 44.55% 18.58%
9
MNase_HM1 HC3 --best -m 1 47045760 21071154 44.79% 19.34%
10
MNase_HM2 No HC --best -m 1 74702910 30281225 40.54% 18.22%
11
MNase_LM1 HC3 --best -m 1 30704935 13190852 42.96% 18.84%
12
MNase_LM2 HC3 --best -m 1 65029667 24178807 37.18% 17.50%
Table2
●
There is NO need to trim differently -- because during bowtie mapping, we can use -s to skip the first several bases
○
By doing HC -- we have less reads to start with and we have more non-unique mapping rate -- also, the % map is decreased
○
So stick with no HC for trimming
●
Retrim everything with the updated adapter sequences:
d /home/runjin/LFY/fastq
ls *.gz| awk '$1~"R1"{printf $0"\t"} $1~"R2"{print}'| awk -F"\t" 'BEGIN{print "#!/bin/bash"}{a=$1; sub("_R1.fastq.gz","",a); print "java -jar ~/Downloads/Trimmomatic-0.38/trimmomatic-0.38.jar PE -phred33 fastq/"$1" fastq/"$2" trimmed_fastq/"a"_R1.paired.fastq trimmed_fastq/"a"_R1.unpaired.fastq trimmed_fastq/"a"_R2.paired.fastq trimmed_fastq/"a"_R2.unpaired.fastq ILLUMINACLIP:adapters/"a"_adapters.fasta:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36 TOPHRED33\nfastqc -f fastq -o FASTQC/trimmed/ trimmed_fastq/"a"_R1.paired.fastq trimmed_fastq/"a"_R2.paired.fastq trimmed_fastq/"a"_R1.unpaired.fastq trimmed_fastq/"a"_R2.unpaired.fastq"}' > ../scripts/trim_adapters.sh
cd ..
chmod 755 scripts/trim_adapters.sh
mkdir trimmed_fastq/
mkdir FASTQC/trimmed/
mkdir log_files/trimming/
nohup ./scripts/trim_adapters.sh > log_files/trimming.log &
●
Try the trimming without HC but with the new adapter sequences -- to see whether it makes a difference
●
And then try mapping with BS method -- only incrase % mapped marginally
nohup java -jar ~/Downloads/Trimmomatic-0.38/trimmomatic-0.38.jar PE -phred33 fastq/ATAC-24hrDexA_R1.fastq.gz fastq/ATAC-24hrDexA_R2.fastq.gz trimmed_fastq/ATAC-24hrDexA_R1.paired.fastq trimmed_fastq/ATAC-24hrDexA_R1.unpaired.fastq trimmed_fastq/ATAC-24hrDexA_R2.paired.fastq trimmed_fastq/ATAC-24hrDexA_R2.unpaired.fastq ILLUMINACLIP:adapters/ATAC-24hrDexA_adapters.fasta:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36 TOPHRED33 &
fastqc -f fastq -o FASTQC/trimmed/ trimmed_fastq/ATAC-24hrDexA_R1.paired.fastq trimmed_fastq/ATAC-24hrDexA_R2.paired.fastq trimmed_fastq/ATAC-24hrDexA_R1.unpaired.fastq trimmed_fastq/ATAC-24hrDexA_R2.unpaired.fastq
mkdir trimmed_fastq/
mkdir FASTQC/trimmed/
mkdir log_files/trimming/
●
Run the ATAC data with Bob Schmit's parameters: '-v 2 -m 3'
nohup bowtie --no-unal -S --chunkmbs 200 -v 2 -m 3 /home/runjin/Araport11/TAIR10_bowtie1 -1 trimmed_fastq/ATAC-24hrDexA_R1.paired.fastq -2 trimmed_fastq/ATAC-24hrDexA_R2.paired.fastq > mapping_BS/ATAC-24hrDexA.sam 2> log_files/mapping_BS/bowtie_ATAC-24hrDexA.log &
●
Seems to be slightly better -- but the mapping does not make any differences for ATAC
●
Make batch script to trim ALL ATAC files the same way without HC
○
Also did not include the trimming log
cd /home/runjin/LFY/fastq
ls *.gz| awk '$1~"R1"{printf $0"\t"} $1~"R2"{print}'| awk -F"\t" 'BEGIN{print "#!/bin/bash"}{a=$1; sub("_R1.fastq.gz","",a); print "java -jar ~/Downloads/Trimmomatic-0.38/trimmomatic-0.38.jar PE -phred33 fastq/"$1" fastq/"$2" trimmed_fastq/"a"_R1.paired.fastq trimmed_fastq/"a"_R1.unpaired.fastq trimmed_fastq/"a"_R2.paired.fastq trimmed_fastq/"a"_R2.unpaired.fastq ILLUMINACLIP:adapters/"a"_adapters.fasta:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36 TOPHRED33\nfastqc -f fastq -o FASTQC/trimmed/ trimmed_fastq/"a"_R1.paired.fastq trimmed_fastq/"a"_R2.paired.fastq trimmed_fastq/"a"_R1.unpaired.fastq trimmed_fastq/"a"_R2.unpaired.fastq"}' > ../scripts/trim_adapters.sh
cd ..
chmod 755 scripts/trim_adapters.sh
mkdir trimmed_fastq/
mkdir FASTQC/trimmed/
nohup ./scripts/trim_adapters.sh > log_files/trimming.log &
●
Try mapping with BS method -- MNase-HM1 -- see whether it gives better result:
nohup bowtie --no-unal -S --chunkmbs 200 -v 2 -m 3 /home/runjin/Araport11/TAIR10_bowtie1 -1 trimmed_HC3_fastq/MNase-HM1_R1.paired.fastq -2 trimmed_HC3_fastq/MNase-HM1_R2.paired.fastq > mapping_HC3_BS/MNase-HM1.sam 2> log_files/mapping_HC3_BS/bowtie_MNase-HM1.log &
nohup bowtie --no-unal -S --chunkmbs 200 --best -m 1 /home/runjin/Araport11/TAIR10_bowtie1 -1 trimmed_HC3_fastq/MNase-HM1_R1.paired.fastq -2 trimmed_HC3_fastq/MNase-HM1_R2.paired.fastq > mapping_HC3/MNase-HM1.sam 2> log_files/mapping_HC3/bowtie_MNase-HM1.log &
●
Make batch script for mapping using BS method -- for all the trimmed ATACseq
awk 'BEGIN{print "#!/bin/bash\necho \42START\42"} \
{print "nohup bowtie --no-unal -S --chunkmbs 200 \
-v 2 -m 3 /home/runjin/Araport11/TAIR10_bowtie1 \
-1 trimmed_fastq/"$1"_R1.paired.fastq \
-2 trimmed_fastq/"$1"_R2.paired.fastq > mapping_BS/"$1".sam \
2> log_files/mapping_BS/bowtie_"$1".log &"} NR%2==0{print "wait\necho \42FINISHED: "NR/2"/7 WAIT\42"}' \
meta/SRX.list > scripts/mapping_BS.sh
chmod 755 scripts/mapping_BS.sh
mkdir log_files/mapping_BS
nohup ./scripts/mapping_BS.sh > log_files/mapping_BS.log &
TUESDAY, 10/23/2018×
●
Trim the remaining 4 files individually:
nohup java -jar ~/Downloads/Trimmomatic-0.38/trimmomatic-0.38.jar PE -phred33 fastq/ATAC-6hrDexB_R1.fastq.gz fastq/ATAC-6hrDexB_R2.fastq.gz trimmed_fastq/ATAC-6hrDexB_R1.paired.fastq trimmed_fastq/ATAC-6hrDexB_R1.unpaired.fastq trimmed_fastq/ATAC-6hrDexB_R2.paired.fastq trimmed_fastq/ATAC-6hrDexB_R2.unpaired.fastq ILLUMINACLIP:adapters/ATAC-6hrDexB_adapters.fasta:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36 TOPHRED33 &
fastqc -f fastq -o FASTQC/trimmed/ trimmed_fastq/ATAC-6hrDexB_R1.paired.fastq trimmed_fastq/ATAC-6hrDexB_R2.paired.fastq
nohup java -jar ~/Downloads/Trimmomatic-0.38/trimmomatic-0.38.jar PE -phred33 fastq/ATAC-6hrMockB_R1.fastq.gz fastq/ATAC-6hrMockB_R2.fastq.gz trimmed_fastq/ATAC-6hrMockB_R1.paired.fastq trimmed_fastq/ATAC-6hrMockB_R1.unpaired.fastq trimmed_fastq/ATAC-6hrMockB_R2.paired.fastq trimmed_fastq/ATAC-6hrMockB_R2.unpaired.fastq ILLUMINACLIP:adapters/ATAC-6hrMockB_adapters.fasta:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36 TOPHRED33 &
fastqc -f fastq -o FASTQC/trimmed/ trimmed_fastq/ATAC-6hrMockB_R1.paired.fastq trimmed_fastq/ATAC-6hrMockB_R2.paired.fastq
nohup java -jar ~/Downloads/Trimmomatic-0.38/trimmomatic-0.38.jar PE -phred33 fastq/ATAC6hrMockA_R1.fastq.gz fastq/ATAC6hrMockA_R2.fastq.gz trimmed_fastq/ATAC6hrMockA_R1.paired.fastq trimmed_fastq/ATAC6hrMockA_R1.unpaired.fastq trimmed_fastq/ATAC6hrMockA_R2.paired.fastq trimmed_fastq/ATAC6hrMockA_R2.unpaired.fastq ILLUMINACLIP:adapters/ATAC6hrMockA_adapters.fasta:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36 TOPHRED33 &
fastqc -f fastq -o FASTQC/trimmed/ trimmed_fastq/ATAC6hrMockA_R1.paired.fastq trimmed_fastq/ATAC6hrMockA_R2.paired.fastq
●
Make custom scripts for all the mapping
awk 'BEGIN{print "#!/bin/bash\necho \42START\42"} \
{print "nohup bowtie --no-unal -S --chunkmbs 200 \
-v 2 -m 3 /home/runjin/Araport11/TAIR10_bowtie1 \
-1 trimmed_fastq/"$1"_R1.paired.fastq \
-2 trimmed_fastq/"$1"_R2.paired.fastq > mapping_BS/"$1".sam \
2> log_files/mapping_HC15/bowtie_"$1".log &"} NR%2==0{print "wait\necho \42FINISHED: "NR/2"/7 WAIT\42"}' \
meta/SRX.list > scripts/mapping_BS.sh
chmod 755 scripts/mapping_BS.sh
mkdir log_files/mapping_BS
nohup ./scripts/mapping_BS.sh > log_files/mapping_BS.log &
Mapping result summary:
A
B
C
D
E
F
G
H
1
Trimming Mapping parameters Reads processed Reads mapped % mapped dropped non unique %
2
ATAC_24hrDexA No HC -v 2 -m 3 31173862 6848975 21.97% 3.02%
3
ATAC_24hrDexA_Spikein No HC -v 2 -m 3 47673812 9827851 20.61% 2.63%
4
ATAC_24hrDexA_Spikein No HC -v 3 -m 5 47673812 10080840 21.15% 3.80% 252989
5
ATAC_24hrDexB No HC -v 2 -m 3 39380479 8535953 21.68% 2.93%
6
ATAC_24hrDexB_Spikein No HC -v 2 -m 3 37649925 8113689 21.55% 2.70%
7
ATAC_24hrMockA No HC -v 2 -m 3 32649978 7782640 23.84% 3.06%
8
ATAC_24hrMockA_Spikein No HC -v 2 -m 3 28757472 6835486 23.77% 2.46%
9
ATAC_24hrMockB No HC -v 2 -m 3 28489512 6968808 24.46% 3.13%
10
ATAC_24hrMockB_Spikein No HC -v 2 -m 3 32607430 8185941 25.10% 2.61%
11
ATAC_6hrDexA No HC -v 2 -m 3 27889031 11097677 39.79% 4.39%
12
ATAC_6hrDexB No HC -v 2 -m 3 27719182 11105659 40.06% 4.33%
13
ATAC_6hrMockA No HC -v 2 -m 3 24546624 8981360 36.59% 4.07%
14
ATAC_6hrMockB No HC -v 2 -m 3 24646722 9049141 36.72% 4.15%
Table3
●
Install samtools and picard
○
samtools are added by downloading on my computer and move to the server
#unzip samtools
tar xvjf samtools-1.9.tar.bz2
#make PATH to samtools
cd ~
nano .bashrc
# this is added for bowtie
export PATH="/home/runjin/Downloads/bowtie-1.2.2-linux-x86_64/:$PATH"
# this is added for samtools
export PATH="/home/runjin/Downloads/samtools-1.9/:$PATH"
#run .bashrc again to make sure it is working:
source .bashrc
# path to picard
/home/runjin/Downloads/picard/build/libs
# test installation
java -jar /home/runjin/Downloads/picard/build/libs/picard.jar -h
●
Use custom scripts from Kaufmann for filtering and removing ChrFromSam:
create_ATAC_filtering.sh
#!/bin/bash
# **************** 9. Filtering ****************
#===================================
# 1. Remove unmapped, mate unmapped not primary alignment,
# reads failing platform
# 2. Remove low MAPQ reads
# 4. Remove duplicate reads
#===================================
myWorkDIR=/home/runjin/LFY
NTHREADS=2 ## number of threads
miniQ=30
PICARD=$HOME/Downloads/picard/build/libs
maxFragLen=250
minFragLen=150
#PICARD=$HOME/Downloads/picard/src/main/java/picard ## path for Picard tools
echo "#!/bin/bash"
cat $myWorkDIR/meta/SRX.list | while read SRX; do
# Filter bam file, based on FLAG 1804: segment unmapped (4) + next segment in the template unmapped (8) + secondary alignments (256) + not passing filters, such as platform/vendor quality controls (512)$
[[ ! -f $SRX.bam ]] && echo """samtools view -@ ${NTHREADS} -h ${SRX}.sam -o ${SRX}.bam
[[ ! -f $SRX.filter.bam ]] && echo """samtools view -@ ${NTHREADS} -h -f3 -F 1804 -q $miniQ ${SRX}.bam Chr1 Chr2 Chr3 Chr4 Chr5 | \
samtools sort - -o ${SRX}.filter.bam """
[[ ! -f $SRX.filter.bam.bai ]] && echo "samtools index ${SRX}.filter.bam"
[[ ! -f $SRX.dupmark.bam ]] && echo "java -jar $PICARD/picard.jar MarkDuplicates I=${SRX}.filter.bam O=${SRX}.dupmark.bam M=${SRX}.dup.qc VALIDATION_STRINGENCY=LENIENT REMOVE_DUPLICATES=false ASSUME_SOR$
[[ ! -f $SRX.dupmark.sorted.bam ]] && echo "samtools sort ${SRX}.dupmark.bam > ${SRX}.dupmark.sorted.bam"
[[ ! -f $SRX.dupmark.sorted.bai ]] && echo "samtools index ${SRX}.dupmark.sorted.bam"
[[ ! -f $SRX.final.bam ]] && echo "samtools view -@ ${NTHREADS} -F 1804 -b ${SRX}.dupmark.bam > ${SRX}.final.bam"
[[ ! -f $SRX.final.bam.bai ]] && echo "samtools index ${SRX}.final.bam && rm ${SRX}.dupmark.bam"
done
cd /home/runjin/LFY/mapping
source ~/custom_scripts/create_ATAC_filtering.sh > ~/custom_scripts/ATAC_filtering.sh
chmod 755 ~/custom_scripts/ATAC_filtering.sh
nohup ~/custom_scripts/ATAC_filtering.sh > ../log_files/ATAC_filtering.log 2>&1 &
●
get number of mapped reads in the sam file
samtools view -c -F 4 ATAC-24hrDexA.sam
samtools view -c -F 4 ATAC-24hrDexA-Spikein.sam
●
get number of reads (singletons, proper paired & NOT in ChM or ChC) has MAPQ>=30
samtools view -c ATAC-24hrDexA.filter.bam
samtools view -c ATAC-24hrDexA-Spikein.filter.bam
●
get number of PROPER PAIRED reads that mapped, has MAPQ>=30, and no duplicates
samtools view -c -F 4 -f 3 ATAC-24hrDexA.final.bam
samtools view -c -F 4 -f 3 ATAC-24hrDexA-Spikein.final.bam
●
Get average fragment size
samtools view -f 3 -F 4 ATAC-24hrDexA.final.bam| cut -f9| awk '$0>0{x=$0+x;n++}END{print x/n}'
samtools view -f 3 -F 4 ATAC-24hrDexA-Spikein.final.bam| cut -f9| awk '$0>0{x=$0+x;n++}END{print x/n}'
●
Get mad for not sorting
○
So sort the sam files into bam files and index it
samtools sort -@ 2 ATAC-24hrDexA.sam -o ATAC-24hrDexA.bam
samtools index ATAC-24hrDexA.bam
A
B
C
D
E
F
G
1
Mapped reads to start MAPQ>=30, paired, in nucleus MAPQ>=30, paired, in nucleus, no dup. Average fragment size
2
ATAC-24hrDexA 13697950 12881394 11781412 114.184
3
ATAC-24hrDexA-Spikein 19655702 18430342 16343676 118.341
4
ATAC-24hrDexB 17071906 16053752 14521348 113.607
5
ATAC-24hrDexB-Spikein 16227378 15190078 13584286 118.149
6
ATAC-24hrMockA 15565280 14537428 13288272 109.91
7
ATAC-24hrMockA-Spikein 13670972 12995400 9837738 114.007
Table4
●
To change the script for ATAC_filter.sh
○
Go to meta file and change the SRX file
○
Source the create_ATAC_filter.sh again --> call new SRX and chmod 755
○
Start running the filtering in the mapping directory
nohup ~/custom_scripts/ATAC_filtering.sh > ../log_files/ATAC_filtering.log 2>&1 &
●
get number of mapped reads in the sam file
samtools view -c -F 4 ATAC-24hrDexB.sam
samtools view -c -F 4 ATAC-24hrDexB-Spikein.sam
samtools view -c -F 4 ATAC-24hrMockA.sam
samtools view -c -F 4 ATAC-24hrMockA-Spikein.sam
●
get number of reads (singletons, proper paired & NOT in ChM or ChC) has MAPQ>=30
samtools view -c ATAC-24hrDexB.filter.bam
samtools view -c ATAC-24hrDexB-Spikein.filter.bam
samtools view -c ATAC-24hrMockA.filter.bam
samtools view -c ATAC-24hrMockA-Spikein.filter.bam
●
get number of PROPER PAIRED reads that mapped, has MAPQ>=30, and no duplicates
samtools view -c -F 4 -f 3 ATAC-24hrDexB.final.bam
samtools view -c -F 4 -f 3 ATAC-24hrDexB-Spikein.final.bam
samtools view -c -F 4 -f 3 ATAC-24hrMockA.final.bam
samtools view -c -F 4 -f 3 ATAC-24hrMockA-Spikein.final.bam
●
Get average fragment size
samtools view -f 3 -F 4 ATAC-24hrDexB.final.bam| cut -f9| awk '$0>0{x=$0+x;n++}END{print x/n}'
samtools view -f 3 -F 4 ATAC-24hrDexB-Spikein.final.bam| cut -f9| awk '$0>0{x=$0+x;n++}END{print x/n}'
samtools view -f 3 -F 4 ATAC-24hrMockA.final.bam| cut -f9| awk '$0>0{x=$0+x;n++}END{print x/n}'
samtools view -f 3 -F 4 ATAC-24hrMockA-Spikein.final.bam| cut -f9| awk '$0>0{x=$0+x;n++}END{print x/n}'
WEDNESDAY, 10/24/2018×
●
Trimmed HC3 fastq files for MNase & mapping results
A
B
C
D
E
F
G
1
R1 R2 Bowtie parameter Reads processed Reads mapped (%mapped) Dropped non unique
2
MNase_HM1 188183040 188183040 -v 2 -m 3 47045760 25781400 (54.80%) 5.77%
3
MNase_HM2 298831780 298831780 -v 2 -m 3 74707945 38590498 (51.66%) 4.93%
4
MNase_LM1 122819740 122819740 -v 2 -m 3 30704935 16025304 (52.19%) 5.98%
5
MNase_LM2 260118668 260118668 -v 2 -m 3 65029667 30479047 (46.87%) 4.77%
Table5
●
Transferred all the trimmed_fastq files to BOX and external hard drive
●
Started mapping the final two ATAC files
●
Finish up filtring for all the mapped files
nohup bowtie --no-unal -S --chunkmbs 200 -v 2 -m 3 /home/runjin/Araport11/TAIR10_bowtie1 -1 trimmed_fastq/ATAC6hrMockA_R1.paired.fastq -2 trimmed_fastq/ATAC6hrMockA_R2.paired.fastq > mapping_BS/ATAC-6hrMockA.sam 2> log_files/mapping_BS/bowtie_ATAC-6hrMockA.log &
nohup bowtie --no-unal -S --chunkmbs 200 -v 2 -m 3 /home/runjin/Araport11/TAIR10_bowtie1 -1 trimmed_fastq/ATAC-6hrMockB_R1.paired.fastq -2 trimmed_fastq/ATAC-6hrMockB_R2.paired.fastq > mapping_BS/ATAC-6hrMockB.sam 2> log_files/mapping_BS/bowtie_ATAC-6hrMockB.log &
●
Try mapping the spiked-in samples with -v 3 -m 5 to see whether that gives different results:
nohup bowtie --no-unal -S --chunkmbs 200 -v 3 -m 5 /home/runjin/Araport11/TAIR10_bowtie1 -1 trimmed_fastq/ATAC-24hrDexA-Spikein_R1.paired.fastq -2 trimmed_fastq/ATAC-24hrDexA-Spikein_R2.paired.fastq > mapping_BS/ATAC-24hrDexA-Spikein-v3m5.sam 2> log_files/mapping_BS/bowtie_ATAC-24hrDexA-Spikein-v3m5.log &
●
Sample filtering steps:
samtools sort -@ 2 ATAC-24hrMockB.sam -o ATAC-24hrMockB.bam
samtools index ATAC-24hrMockB.bam
samtools view -@ 2 -h -f3 -F 1804 -q 30 ATAC-24hrMockB.bam Chr1 Chr2 Chr3 Chr4 Chr5 | samtools sort -o ATAC-24hrMockB.filter.bam
samtools index ATAC-24hrMockB.filter.bam
java -jar /home/runjin/Downloads/picard/build/libs/picard.jar MarkDuplicates I=ATAC-24hrMockB.filter.bam O=ATAC-24hrMockB.dupmark.bam M=ATAC-24hrMockB.dup.qc VALIDATION_STRINGENCY$
samtools sort ATAC-24hrMockB.dupmark.bam > ATAC-24hrMockB.dupmark.sorted.bam
samtools index ATAC-24hrMockB.dupmark.sorted.bam
samtools view -@ 2 -F 1804 -b ATAC-24hrMockB.dupmark.bam > ATAC-24hrMockB.final.bam
samtools index ATAC-24hrMockB.final.bam && rm ATAC-24hrMockB.dupmark.bam
●
For the filtering -- before doing anything:
module load java/1.8
THURSDAY, 10/25/2018×
●
Mapping of MNase using the -v 2 -m 3 method
cd /home/runjin/LFY
nohup bowtie --no-unal -S --chunkmbs 200 -v 2 -m 3 /home/runjin/Araport11/TAIR10_bowtie1 -1 trimmed_HC3_fastq/MNase-HM1_R1.paired.fastq -2 trimmed_HC3_fastq/MNase-HM1_R2.paired.fastq > mapping_BS/MNase-HM1.sam 2> log_files/mapping_BS/bowtie_MNase-HM1.log &
nohup bowtie --no-unal -S --chunkmbs 200 -v 2 -m 3 /home/runjin/Araport11/TAIR10_bowtie1 -1 trimmed_HC3_fastq/MNase-HM2_R1.paired.fastq -2 trimmed_HC3_fastq/MNase-HM2_R2.paired.fastq > mapping_BS/MNase-HM2.sam 2> log_files/mapping_BS/bowtie_MNase-HM2.log &
nohup bowtie --no-unal -S --chunkmbs 200 -v 2 -m 3 /home/runjin/Araport11/TAIR10_bowtie1 -1 trimmed_HC3_fastq/MNase-LM1_R1.paired.fastq -2 trimmed_HC3_fastq/MNase-LM1_R2.paired.fastq > mapping_BS/MNase-LM1.sam 2> log_files/mapping_BS/bowtie_MNase-LM1.log &
nohup bowtie --no-unal -S --chunkmbs 200 -v 2 -m 3 /home/runjin/Araport11/TAIR10_bowtie1 -1 trimmed_HC3_fastq/MNase-LM2_R1.paired.fastq -2 trimmed_HC3_fastq/MNase-LM2_R2.paired.fastq > mapping_BS/MNase-LM2.sam 2> log_files/mapping_BS/bowtie_MNase-LM2.log &
●
Get filtering results using Samtools
A
B
C
D
E
F
G
1
Mapped reads to start MAPQ>=30, paired, in nucleus MAPQ>=30, paired, in nucleus, no dup. Average fragment size
2
ATAC-24hrMockB 13937616 12990626 11954756 110.221
3
ATAC-24hrMockB-Spikein 16371882 15603984 11606516 107.863
4
ATAC-6hrDexA 22195354 21598930 18201424 114.827
5
ATAC-6hrDexB 22211318 21688746 17983100 113.603
6
ATAC-6hrMockA 17962720 17430810 15603468 114.133
7
ATAC-6hrMockB 18098282 17538954 15530098 114.039
8
MNase-HM1 51562800 51030706 42328778 160.312
9
MNase-LM1 32050608 31560990 27932500
Table6
●
get number of mapped reads in the sam file
samtools view -c -F 4 ATAC-24hrMockB.sam
samtools view -c -F 4 ATAC-24hrMockB-Spikein.sam
samtools view -c -F 4 ATAC-6hrDexA.sam
samtools view -c -F 4 ATAC-6hrDexB.sam
samtools view -c -F 4 ATAC-6hrMockA.sam
samtools view -c -F 4 ATAC-6hrMockB.sam
samtools view -c -F 4 MNase-HM1.sam
samtools view -c -F 4 MNase-LM1.sam
●
get number of reads (singletons, proper paired & NOT in ChM or ChC) has MAPQ>=30
samtools view -c ATAC-24hrMockB.filter.bam
samtools view -c ATAC-24hrMockB-Spikein.filter.bam
samtools view -c ATAC-6hrDexA.filter.bam
samtools view -c ATAC-6hrDexB.filter.bam
samtools view -c ATAC-6hrMockA.filter.bam
samtools view -c MNase-HM1.filter.bam
samtools view -c MNase-LM1.filter.bam
●
get number of PROPER PAIRED reads that mapped, has MAPQ>=30, and no duplicates
samtools view -c -F 4 -f 3 ATAC-24hrMockB.final.bam
samtools view -c -F 4 -f 3 ATAC-24hrMockB-Spikein.final.bam
samtools view -c -F 4 -f 3 ATAC-6hrDexA.final.bam
samtools view -c -F 4 -f 3 ATAC-6hrDexB.final.bam
samtools view -c -F 4 -f 3 ATAC-6hrMockA.final.bam
samtools view -c -F 4 -f 3 ATAC-6hrMockB.final.bam
samtools view -c -F 4 -f 3 MNase-HM1.final.bam
samtools view -c -F 4 -f 3 MNase-LM1.final.bam
●
Get average fragment size
samtools view -f 3 -F 4 ATAC-24hrMockB.final.bam| cut -f9| awk '$0>0{x=$0+x;n++}END{print x/n}'
samtools view -f 3 -F 4 ATAC-24hrMockB-Spikein.final.bam| cut -f9| awk '$0>0{x=$0+x;n++}END{print x/n}'
samtools view -f 3 -F 4 ATAC-6hrDexA.final.bam| cut -f9| awk '$0>0{x=$0+x;n++}END{print x/n}'
samtools view -f 3 -F 4 ATAC-6hrDexB.final.bam| cut -f9| awk '$0>0{x=$0+x;n++}END{print x/n}'
samtools view -f 3 -F 4 ATAC-6hrMockA.final.bam| cut -f9| awk '$0>0{x=$0+x;n++}END{print x/n}'
samtools view -f 3 -F 4 ATAC-6hrMockB.final.bam| cut -f9| awk '$0>0{x=$0+x;n++}END{print x/n}'
samtools view -f 3 -F 4 MNase-HM1.final.bam| cut -f9| awk '$0>0{x=$0+x;n++}END{print x/n}'
samtools view -f 3 -F 4 MNase-LM1.final.bam| cut -f9| awk '$0>0{x=$0+x;n++}END{print x/n}'
MONDAY, 10/29/2018×
●
Previous mapping of MNase using -v 2 -m 3 works better than -m 1 -- best
○
Decided to use this parameter: H2 and M2 get error messages: RERUN
●
For the HM1 and LM1, as well as the ATAC-6hrMockA.sam, ATAC-6hrMockB.sam -- run the filtering program
○
Edited the SRX.list
ATAC-6hrMockA
ATAC-6hrMockB
MNase-HM1
MNase-LM1
source ~/custom_scripts/create_ATAC_filtering.sh > ~/custom_scripts/ATAC_filtering.sh
chmod 755 ~/custom_scripts/ATAC_filtering.sh
module load java/1.8
nohup ~/custom_scripts/ATAC_filtering.sh > ../log_files/ATAC_filtering.log 2>&1 &
Steps to visualize in IGV
●
get bigwig files
○
get bigwig files for each replicate
■
example command
●
Copy sammy's custom scrip at the following directory:
cp /home/wagner-lab/sklasfeld/custom_scripts/bam2big_pairedEnd.sh /home/runjin/sammy_custom_scripts
source ~/sammy_custom_scripts/bam2big_pairedEnd.sh ATAC-6hrMockA.final.bam ~/Araport11/TAIR10_chr_count.txt .
●
Build kentUtils for the script and send PATH for it:
/home/runjin/kent/src/utils/
●
get bigGraph files
○
get bedGraph files for each replicate
■
example command
source ~/custom_scripts/bam2big_pairedEnd.sh 6hrDexA.final.bam ~/Araport11/TAIR10_chr_count.txt .
■
batch script
awk 'BEGIN{print "echo \42START\42"}{print "nohup ~/sammy_custom_scripts/bam2big_pairedEnd.sh "$0".final.bam ~/Araport11/TAIR10_chr_count.txt . &"}NR%3==0{x++;print "wait\necho \42WAIT "x"/4 COMPLETE\42"}' ../meta/SRX.list > ../scripts/visualize.sh
chmod 755 ../scripts/visualize.sh
nohup ../scripts/visualize.sh > ../log_files/visualize.log 2>&1 &
○
merge the bedgraph replicates
■
run all at once
awk 'NR%2==1{print "bedtools unionbedg -i "$0".final.bg "}NR%2==0{a=$0; sub("_rep2","",a); printf $0".final.bg > "a".unionbed\n"}' ../meta/SRX.list > ../scripts/union_bedgraph.sh
chmod 755 ../scripts/union_bedgraph.sh
nohup ../scripts/union_bedgraph.sh > ../log_files/union_bedgraph.log 2>&1 &
■
OR run one at a time
nohup bedtools unionbedg -i ATAC-6hrMockA.final.bg ATAC-6hrMockB.final.bg > ATAC-6hrMock.unionbed &
nohup bedtools unionbedg -i ATAC-6hrDexA.final.bg ATAC-6hrDexB.final.bg > ATAC-6hrDex.unionbed &
nohup bedtools unionbedg -i ATAC-24hrDexA.final.bg ATAC-24hrDexB.final.bg > ATAC-24hrDex.unionbed &
nohup bedtools unionbedg -i ATAC-24hrMockA.final.bg ATAC-24hrMockB.final.bg > ATAC-24hrMock.unionbed &
○
get average of each row in the unionbedg (scripts/avg_unionbg.sh) - and make bigwig files for that
awk 'NR%2==0{sub("_rep2","",$0); print "awk \47BEGIN{OFS=\42\\t\42} NR>1{avg=($4+$5)/2; print $1,$2,$3,avg}\47 "$0".unionbed > "$0".bedgraph\nbedGraphToBigWig "$0".bedgraph ~/Araport11/TAIR10_chr_count.txt "$0".bigwig"}' ../meta/SRX.list > ../scripts/avg_unionbg.sh
chmod 755 ../scripts/avg_unionbg.sh
nohup ../scripts/avg_unionbg.sh
#Individual samples look like this:
awk 'BEGIN{OFS="\t"; print "track name=\42 6hr DEX\42 useScore=1"} NR>1{avg=($4+$5)/2; print $1,$2,$3,avg}' 6hrDex.unionbed > 6hrDex.bedgraph
bedGraphToBigWig 6hrDex.bedgraph ~/Araport11/TAIR10_chr_count.txt 6hrDex.bigwig
awk 'BEGIN{OFS="\t"; print "track name=\426hr DEX\42 useScore=1"} NR>1{avg=($4+$5)/2; print $1,$2,$3,avg}' 6hrMock.unionbed > 6hrMock.bedgraph
bedGraphToBigWig 6hrMock.bedgraph ~/Araport11/TAIR10_chr_count.txt 6hrMock.bigwig
TUESDAY, 10/30/2018×
●
Run the filtering for LM2 and HM2
●
For the HM2 and LM2, as well as the ATAC-6hrMockA.sam, ATAC-6hrMockB.sam -- run the filtering program
○
Edited the SRX.list
○
Although it says ATAC -- but it should be good enough for MNase as well
MNase-HM2
MNase-LM2
source ~/custom_scripts/create_ATAC_filtering.sh > ~/custom_scripts/ATAC_filtering.sh
chmod 755 ~/custom_scripts/ATAC_filtering.sh
module load java/1.8
nohup ~/custom_scripts/ATAC_filtering.sh > ../log_files/ATAC_filtering.log 2>&1 &
●
get number of mapped reads in the sam file
samtools view -c -F 4 MNase-HM2.sam
samtools view -c -F 4 MNase-LM2.sam
●
get number of reads (singletons, proper paired & NOT in ChM or ChC) has MAPQ>=30
samtools view -c MNase-HM2.filter.bam
samtools view -c MNase-LM2.filter.bam
●
get number of PROPER PAIRED reads that mapped, has MAPQ>=30, and no duplicates
samtools view -c -F 4 -f 3 MNase-HM2.final.bam
samtools view -c -F 4 -f 3 MNase-LM2.final.bam
●
Get average fragment size
samtools view -f 3 -F 4 MNase-HM2.final.bam| cut -f9| awk '$0>0{x=$0+x;n++}END{print x/n}'
samtools view -f 3 -F 4 MNase-LM2.final.bam| cut -f9| awk '$0>0{x=$0+x;n++}END{print x/n}'
A
B
C
D
E
F
G
1
Mapped reads to start MAPQ>=30, paired, in nucleus MAPQ>=30, paired, in nucleus, no dup. Average fragment size
2
MNase-HM1 51562800 51030706 42328778 160.312
3