Commands used to identify gene models in regions containing identified QTL associated with phenotypic response to soil substrates
This work was performed in the following directory:
ProjDir=/home/groups/harrisonlab/project_files/strawberry_substrate_qtl
mkdir -p $ProjDir
cd $ProjDir
The v4 F. vesca assembly was downloaded along with gene models and annotations
OutDir=assembly/external/F.vesca/Hawaii4/v4
mkdir -p $OutDir
wget -P $OutDir ftp://ftp.bioinfo.wsu.edu/www.rosaceae.org/Fragaria_vesca/Fvesca-genome.v4.0.a1/assembly/Fragaria_vesca_v4.0.a1.fasta.gz
wget -P $OutDir ftp://ftp.bioinfo.wsu.edu/www.rosaceae.org/Fragaria_vesca/Fvesca-genome.v4.0.a1/genes/Fragaria_vesca_v4.0.a1_makerStandard_CDS.fasta.gz
wget -P $OutDir ftp://ftp.bioinfo.wsu.edu/www.rosaceae.org/Fragaria_vesca/Fvesca-genome.v4.0.a1/genes/Fragaria_vesca_v4.0.a1_makerStandard_proteins.fasta.gz
wget -P $OutDir ftp://ftp.bioinfo.wsu.edu/www.rosaceae.org/Fragaria_vesca/Fvesca-genome.v4.0.a1/genes/Fragaria_vesca_V4.0.a1_TE_Library.fasta.gz
wget -P $OutDir ftp://ftp.bioinfo.wsu.edu/www.rosaceae.org/Fragaria_vesca/Fvesca-genome.v4.0.a1/genes/Fragaria_vesca_v4.0.a1.transcripts.gff3.gz
wget -P $OutDir ftp://ftp.bioinfo.wsu.edu/species/Fragaria_vesca/Fvesca-genome.v4.0.a1/functional/Fragaria_vesca_v4.0.a1_IRP.xlsx
wget -P $OutDir ftp://ftp.bioinfo.wsu.edu/species/Fragaria_vesca/Fvesca-genome.v4.0.a1/functional/Fragaria_vesca_v4.0.a1_go.xlsx
wget -P $OutDir ftp://ftp.bioinfo.wsu.edu/species/Fragaria_vesca/Fvesca-genome.v4.0.a1/functional/Fragaria_vesca_v4.0.a1_KEGG_pathways.xlsx
wget -P $OutDir ftp://ftp.bioinfo.wsu.edu/species/Fragaria_vesca/Fvesca-genome.v4.0.a1/functional/Fragaria_vesca_v4.0.a1_KEGG_orthologs.xlsx
Downloaded files were unzipped:
gunzip assembly/external/F.vesca/Hawaii4/v4/*.gz
The assembly .fasta file was edited to remove carriage returns between fasta headers and the sequence data
Assembly=$(ls assembly/external/F.vesca/Hawaii4/v4/Fragaria_vesca_v4.0.a1.fasta)
NewFile=assembly/external/F.vesca/Hawaii4/v4/Fragaria_vesca_v4.0.a1_parsed.fasta
cat $Assembly | grep -v "^\W$" > $NewFile
The interproscan file was in .xlsx format with three sheets. This was opened on my local machine and each sheet saved individually as a tsv file. These files were copied back up to the cluster and concatenated.
cat assembly/external/F.vesca/Hawaii4/v4/Fragaria_vesca_v4.0.a1_IRP_sheet*.txt | sed 's/\r/\n/g' | grep -v -e 'multiple worksheets' -e "^\W*$" -e "Query.Match.Description" > assembly/external/F.vesca/Hawaii4/v4/Fragaria_vesca_v4.0.a1_IRP.tsv
Rob Vickerstaff mapped 31mer SNP probes to the v4 F.vesca assembly using an in-house pipeline. He did this using BWA and then removing any probes that didnt match twice to the same Genetic Map position (maps also generated by RV) using a python script. This resulted in a .csv file giving probe names and locations.
Small contigs had been removed from the v4 assembly by RV so that only the 7 chromosomal sequences were left.
The SNP location file is at:
ls /home/vicker/octoploid_mapping/vesca_v4_hybrid_map/snp_posns_vesca_v4.csv
The contig names in this file had been renamed. As such the original contig names were restored. The file was also parsed into tsv format:
SnpLocations=$(ls /home/vicker/octoploid_mapping/vesca_v4_hybrid_map/snp_posns_vesca_v4.csv)
OutDir=snp_locations/F.vesca/Hawaii4
mkdir -p $OutDir
NewFile=snp_posns_vesca_v4.tsv
cat $SnpLocations | sed "s/,/\t/g" | awk '$2="Fvb"$2' > $OutDir/$NewFile
SNP located are marked by their leftmost position in the assembly.
ProbeTsv=$(ls snp_locations/F.vesca/Hawaii4/snp_posns_vesca_v4.tsv)
Assembly=$(ls assembly/external/F.vesca/Hawaii4/v4/Fragaria_vesca_v4.0.a1_parsed.fasta)
ProgDir=/home/armita/git_repos/emr_repos/scripts/strawberry_substrate_QTL/scripts
$ProgDir/probes2gff.py --probes $ProbeTsv --assembly $Assembly --bp 10000 > snp_locations/F.vesca/Hawaii4/snp_posns_vesca_v4.gff
significant qtl were identified by Helen Cockerton and were copied to the following location
mkdir -p significant_qtl
# cat significant_qtl/All_Significant_QTLhkBF.csv | sed "s/,/\t/g" > significant_qtl/All_Significant_QTLhkBF.tsv
# cat significant_qtl/all_for_Andy.csv | sed "s/,/\t/g" > significant_qtl/All_Significant_QTLhkBF.tsv
# cat significant_qtl/All_Significant_QTLhkBF.tsv | cut -f5 | sed 's/\./-/g' | tail -n+2 | sed 's/"//g' | sort | uniq > significant_qtl/All_Significant_QTLhkBF_headers.txt
cat significant_qtl/all_for_and_two.csv | sed "s/,/\t/g" > significant_qtl/all_significant_revision.tsv
cat significant_qtl/all_significant_revision.tsv | cut -f6 | sed 's/\./-/g' | tail -n+2 | sed 's/"//g' | sort | uniq > significant_qtl/all_significant_revision_headers.txt
mkdir -p significant_qtl/F.vesca/Hawaii4
# cat snp_locations/F.vesca/Hawaii4/snp_posns_vesca_v4.gff | grep -w -f significant_qtl/All_Significant_QTLhkBF_headers.txt > significant_qtl/F.vesca/Hawaii4/snp_posns_vesca_v4.gff
cat snp_locations/F.vesca/Hawaii4/snp_posns_vesca_v4.gff | grep -w -f significant_qtl/all_significant_revision_headers.txt > significant_qtl/F.vesca/Hawaii4/snp_posns_revision_vesca_v4.gff
bedtools intersect -wao -a significant_qtl/F.vesca/Hawaii4/snp_posns_revision_vesca_v4.gff -b assembly/external/F.vesca/Hawaii4/v4/Fragaria_vesca_v4.0.a1.transcripts.gff3 | grep -w 'mRNA' > significant_qtl/F.vesca/Hawaii4/genes_in_10Kb_revision.gff
cat significant_qtl/F.vesca/Hawaii4/genes_in_10Kb_revision.gff | cut -f18 | cut -f1 -d ';' | cut -f2 -d '=' > significant_qtl/F.vesca/Hawaii4/genes_in_10Kb_revision_headers.txt
cat assembly/external/F.vesca/Hawaii4/v4/Fragaria_vesca_v4.0.a1_IRP.tsv | grep -w -f significant_qtl/F.vesca/Hawaii4/genes_in_10Kb_revision_headers.txt > significant_qtl/F.vesca/Hawaii4/genes_in_10Kb_revision_IPR.tsv
ProgDir=/home/armita/git_repos/emr_repos/scripts/strawberry_substrate_QTL/scripts
# $ProgDir/probes2iprtable.py --sig_qtl significant_qtl/All_Significant_QTLhkBF.tsv --gene_intersect significant_qtl/F.vesca/Hawaii4/genes_in_10Kb.gff --ipr significant_qtl/F.vesca/Hawaii4/genes_in_10Kb_IPR.tsv > significant_qtl/F.vesca/Hawaii4/annotated_genes_in_10Kb_IPR.tsv
$ProgDir/probes2iprtable3.py --sig_qtl significant_qtl/all_significant_revision.tsv --gene_intersect significant_qtl/F.vesca/Hawaii4/genes_in_10Kb_revision.gff --ipr significant_qtl/F.vesca/Hawaii4/genes_in_10Kb_IPR.tsv > significant_qtl/F.vesca/Hawaii4/annotated_genes_in_10Kb_revision_IPR.tsv
significant qtl were identified by Helen Cockerton and were copied to the following location
Prefix="fruit_morphology"
mkdir -p significant_qtl_${Prefix}
## from local machine:
# cat significant_qtl/All_Significant_QTLhkBF.csv | sed "s/,/\t/g" > significant_qtl/All_Significant_QTLhkBF.tsv
# scp /Users/armita/Downloads/markers_for_interproscan.csv cluster:/home/groups/harrisonlab/project_files/strawberry_substrate_qtl/significant_qtl_fruit_morphology/.
cat significant_qtl_${Prefix}/*.csv | sed "s/,/\t/g" > significant_qtl_${Prefix}/all_significant_$Prefix.tsv
# cat significant_qtl_${Prefix}/all_significant_$Prefix.tsv | cut -f2 | sed "s/\r//g" | sort | uniq > significant_qtl_${Prefix}/significant_qtl_${Prefix}_tratilist.txt
# for Trait in $(cat significant_qtl_${Prefix}/significant_qtl_${Prefix}_tratilist.txt); do
# echo $Trait
# mkdir significant_qtl_${Prefix}/$Trait
# cat significant_qtl_${Prefix}/all_significant_$Prefix.tsv | grep -w $Trait | cut -f1 | sed 's/\./-/g' | sed "s/\r//g" | sort | uniq > significant_qtl_${Prefix}/$Trait/${Trait}_sig_headers.txt
# done
# for Trait in $(cat significant_qtl_${Prefix}/significant_qtl_${Prefix}_tratilist.txt); do
# echo $Trait
# OutDir=significant_qtl_${Prefix}/$Trait
# # cat $OutDir/${Trait}_sig_headers.txt
#
# cat snp_locations/F.vesca/Hawaii4/snp_posns_vesca_v4.gff | grep -w -f $OutDir/${Trait}_sig_headers.txt > $OutDir/snp_posns_vesca_v4.gff
#
# bedtools intersect -wao -a $OutDir/snp_posns_vesca_v4.gff -b assembly/external/F.vesca/Hawaii4/v4/Fragaria_vesca_v4.0.a1.transcripts.gff3 | grep -w 'mRNA' > $OutDir/genes_in_10Kb.gff
# cat $OutDir/genes_in_10Kb.gff | cut -f18 | cut -f1 -d ';' | cut -f2 -d '=' > $OutDir/genes_in_10Kb_headers.txt
#
# cat assembly/external/F.vesca/Hawaii4/v4/Fragaria_vesca_v4.0.a1_IRP.tsv | grep -w -f $OutDir/genes_in_10Kb_headers.txt > $OutDir/genes_in_10Kb_IPR.tsv
#
# cat $OutDir/genes_in_10Kb_IPR.tsv >> significant_qtl_${Prefix}/
# done
mkdir -p significant_qtl_${Prefix}/F.vesca/Hawaii4
cat snp_locations/F.vesca/Hawaii4/snp_posns_vesca_v4.gff | grep -w -f significant_qtl_${Prefix}/all_significant_${Prefix}_headers.txt > significant_qtl_${Prefix}/F.vesca/Hawaii4/snp_posns_vesca_v4.gff
bedtools intersect -wao -a significant_qtl_${Prefix}/F.vesca/Hawaii4/snp_posns_vesca_v4.gff -b assembly/external/F.vesca/Hawaii4/v4/Fragaria_vesca_v4.0.a1.transcripts.gff3 | grep -w 'mRNA' > significant_qtl_${Prefix}/F.vesca/Hawaii4/genes_in_10Kb.gff
cat significant_qtl_${Prefix}/F.vesca/Hawaii4/genes_in_10Kb.gff | cut -f18 | cut -f1 -d ';' | cut -f2 -d '=' > significant_qtl_${Prefix}/F.vesca/Hawaii4/genes_in_10Kb_headers.txt
cat assembly/external/F.vesca/Hawaii4/v4/Fragaria_vesca_v4.0.a1_IRP.tsv | grep -w -f significant_qtl_${Prefix}/F.vesca/Hawaii4/genes_in_10Kb_headers.txt > significant_qtl_${Prefix}/F.vesca/Hawaii4/genes_in_10Kb_IPR.tsv
mkdir -p significant_qtl_${Prefix}/F.vesca/Hawaii4
ProgDir=/home/armita/git_repos/emr_repos/scripts/strawberry_substrate_QTL/scripts
$ProgDir/probes2iprtable2.py --sig_qtl significant_qtl_${Prefix}/all_significant_$Prefix.tsv --gene_intersect significant_qtl_${Prefix}/F.vesca/Hawaii4/genes_in_10Kb.gff --ipr significant_qtl_${Prefix}/F.vesca/Hawaii4/genes_in_10Kb_IPR.tsv > significant_qtl_${Prefix}/F.vesca/Hawaii4/annotated_genes_in_10Kb_IPR.tsv