Skip to content

Commands used to identify gene models in regions containing identified QTL associated with phenotypic response to soil substrates

Notifications You must be signed in to change notification settings

harrisonlab/strawberry_substrate_QTL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 

Repository files navigation

Strawberry_substrate_QTL

Commands used to identify gene models in regions containing identified QTL associated with phenotypic response to soil substrates

This work was performed in the following directory:

ProjDir=/home/groups/harrisonlab/project_files/strawberry_substrate_qtl
mkdir -p $ProjDir
cd $ProjDir

0 Preparing genome and SNP location data

0.1 Strawberry genome

The v4 F. vesca assembly was downloaded along with gene models and annotations

  OutDir=assembly/external/F.vesca/Hawaii4/v4
  mkdir -p $OutDir
  wget -P $OutDir ftp://ftp.bioinfo.wsu.edu/www.rosaceae.org/Fragaria_vesca/Fvesca-genome.v4.0.a1/assembly/Fragaria_vesca_v4.0.a1.fasta.gz
  wget -P $OutDir ftp://ftp.bioinfo.wsu.edu/www.rosaceae.org/Fragaria_vesca/Fvesca-genome.v4.0.a1/genes/Fragaria_vesca_v4.0.a1_makerStandard_CDS.fasta.gz
  wget -P $OutDir ftp://ftp.bioinfo.wsu.edu/www.rosaceae.org/Fragaria_vesca/Fvesca-genome.v4.0.a1/genes/Fragaria_vesca_v4.0.a1_makerStandard_proteins.fasta.gz
  wget -P $OutDir ftp://ftp.bioinfo.wsu.edu/www.rosaceae.org/Fragaria_vesca/Fvesca-genome.v4.0.a1/genes/Fragaria_vesca_V4.0.a1_TE_Library.fasta.gz
  wget -P $OutDir ftp://ftp.bioinfo.wsu.edu/www.rosaceae.org/Fragaria_vesca/Fvesca-genome.v4.0.a1/genes/Fragaria_vesca_v4.0.a1.transcripts.gff3.gz
  wget -P $OutDir ftp://ftp.bioinfo.wsu.edu/species/Fragaria_vesca/Fvesca-genome.v4.0.a1/functional/Fragaria_vesca_v4.0.a1_IRP.xlsx
  wget -P $OutDir ftp://ftp.bioinfo.wsu.edu/species/Fragaria_vesca/Fvesca-genome.v4.0.a1/functional/Fragaria_vesca_v4.0.a1_go.xlsx
  wget -P $OutDir ftp://ftp.bioinfo.wsu.edu/species/Fragaria_vesca/Fvesca-genome.v4.0.a1/functional/Fragaria_vesca_v4.0.a1_KEGG_pathways.xlsx
  wget -P $OutDir ftp://ftp.bioinfo.wsu.edu/species/Fragaria_vesca/Fvesca-genome.v4.0.a1/functional/Fragaria_vesca_v4.0.a1_KEGG_orthologs.xlsx

Downloaded files were unzipped:

  gunzip assembly/external/F.vesca/Hawaii4/v4/*.gz

The assembly .fasta file was edited to remove carriage returns between fasta headers and the sequence data

  Assembly=$(ls assembly/external/F.vesca/Hawaii4/v4/Fragaria_vesca_v4.0.a1.fasta)
  NewFile=assembly/external/F.vesca/Hawaii4/v4/Fragaria_vesca_v4.0.a1_parsed.fasta
  cat $Assembly | grep -v "^\W$" > $NewFile

The interproscan file was in .xlsx format with three sheets. This was opened on my local machine and each sheet saved individually as a tsv file. These files were copied back up to the cluster and concatenated.

cat assembly/external/F.vesca/Hawaii4/v4/Fragaria_vesca_v4.0.a1_IRP_sheet*.txt | sed 's/\r/\n/g' | grep -v -e 'multiple worksheets' -e "^\W*$" -e "Query.Match.Description" > assembly/external/F.vesca/Hawaii4/v4/Fragaria_vesca_v4.0.a1_IRP.tsv

0.2 SNP probe locations

Rob Vickerstaff mapped 31mer SNP probes to the v4 F.vesca assembly using an in-house pipeline. He did this using BWA and then removing any probes that didnt match twice to the same Genetic Map position (maps also generated by RV) using a python script. This resulted in a .csv file giving probe names and locations.

Small contigs had been removed from the v4 assembly by RV so that only the 7 chromosomal sequences were left.

The SNP location file is at:

ls /home/vicker/octoploid_mapping/vesca_v4_hybrid_map/snp_posns_vesca_v4.csv

The contig names in this file had been renamed. As such the original contig names were restored. The file was also parsed into tsv format:

SnpLocations=$(ls /home/vicker/octoploid_mapping/vesca_v4_hybrid_map/snp_posns_vesca_v4.csv)
OutDir=snp_locations/F.vesca/Hawaii4
mkdir -p $OutDir
NewFile=snp_posns_vesca_v4.tsv
cat $SnpLocations | sed "s/,/\t/g" | awk '$2="Fvb"$2' > $OutDir/$NewFile

SNP located are marked by their leftmost position in the assembly.

0.3 Creating a gff file showing X bp around these SNP locations

  ProbeTsv=$(ls snp_locations/F.vesca/Hawaii4/snp_posns_vesca_v4.tsv)
  Assembly=$(ls assembly/external/F.vesca/Hawaii4/v4/Fragaria_vesca_v4.0.a1_parsed.fasta)
  ProgDir=/home/armita/git_repos/emr_repos/scripts/strawberry_substrate_QTL/scripts
  $ProgDir/probes2gff.py --probes $ProbeTsv --assembly $Assembly --bp 10000 > snp_locations/F.vesca/Hawaii4/snp_posns_vesca_v4.gff

1. Substrate analysis

1.1 identifying significant qtl

significant qtl were identified by Helen Cockerton and were copied to the following location

mkdir -p significant_qtl
# cat significant_qtl/All_Significant_QTLhkBF.csv | sed "s/,/\t/g" > significant_qtl/All_Significant_QTLhkBF.tsv

# cat significant_qtl/all_for_Andy.csv | sed "s/,/\t/g" > significant_qtl/All_Significant_QTLhkBF.tsv
# cat significant_qtl/All_Significant_QTLhkBF.tsv | cut -f5 | sed 's/\./-/g' | tail -n+2 | sed 's/"//g' | sort | uniq > significant_qtl/All_Significant_QTLhkBF_headers.txt

cat significant_qtl/all_for_and_two.csv | sed "s/,/\t/g" > significant_qtl/all_significant_revision.tsv
cat significant_qtl/all_significant_revision.tsv | cut -f6 | sed 's/\./-/g' | tail -n+2 | sed 's/"//g' | sort | uniq > significant_qtl/all_significant_revision_headers.txt

1.2 Identify genes near qtl

mkdir -p significant_qtl/F.vesca/Hawaii4

# cat snp_locations/F.vesca/Hawaii4/snp_posns_vesca_v4.gff | grep -w -f significant_qtl/All_Significant_QTLhkBF_headers.txt > significant_qtl/F.vesca/Hawaii4/snp_posns_vesca_v4.gff
cat snp_locations/F.vesca/Hawaii4/snp_posns_vesca_v4.gff | grep -w -f significant_qtl/all_significant_revision_headers.txt > significant_qtl/F.vesca/Hawaii4/snp_posns_revision_vesca_v4.gff

bedtools intersect -wao -a significant_qtl/F.vesca/Hawaii4/snp_posns_revision_vesca_v4.gff -b assembly/external/F.vesca/Hawaii4/v4/Fragaria_vesca_v4.0.a1.transcripts.gff3 | grep -w 'mRNA' > significant_qtl/F.vesca/Hawaii4/genes_in_10Kb_revision.gff
cat significant_qtl/F.vesca/Hawaii4/genes_in_10Kb_revision.gff |  cut -f18 | cut -f1 -d ';' | cut -f2 -d '=' > significant_qtl/F.vesca/Hawaii4/genes_in_10Kb_revision_headers.txt

cat assembly/external/F.vesca/Hawaii4/v4/Fragaria_vesca_v4.0.a1_IRP.tsv | grep -w -f significant_qtl/F.vesca/Hawaii4/genes_in_10Kb_revision_headers.txt > significant_qtl/F.vesca/Hawaii4/genes_in_10Kb_revision_IPR.tsv
ProgDir=/home/armita/git_repos/emr_repos/scripts/strawberry_substrate_QTL/scripts
# $ProgDir/probes2iprtable.py --sig_qtl significant_qtl/All_Significant_QTLhkBF.tsv --gene_intersect significant_qtl/F.vesca/Hawaii4/genes_in_10Kb.gff --ipr significant_qtl/F.vesca/Hawaii4/genes_in_10Kb_IPR.tsv > significant_qtl/F.vesca/Hawaii4/annotated_genes_in_10Kb_IPR.tsv
$ProgDir/probes2iprtable3.py --sig_qtl significant_qtl/all_significant_revision.tsv --gene_intersect significant_qtl/F.vesca/Hawaii4/genes_in_10Kb_revision.gff --ipr significant_qtl/F.vesca/Hawaii4/genes_in_10Kb_IPR.tsv > significant_qtl/F.vesca/Hawaii4/annotated_genes_in_10Kb_revision_IPR.tsv

2. Fruit shape

2.1 identifying significant qtl

significant qtl were identified by Helen Cockerton and were copied to the following location

Prefix="fruit_morphology"
mkdir -p significant_qtl_${Prefix}
## from local machine:
# cat significant_qtl/All_Significant_QTLhkBF.csv | sed "s/,/\t/g" > significant_qtl/All_Significant_QTLhkBF.tsv
# scp /Users/armita/Downloads/markers_for_interproscan.csv cluster:/home/groups/harrisonlab/project_files/strawberry_substrate_qtl/significant_qtl_fruit_morphology/.
cat significant_qtl_${Prefix}/*.csv | sed "s/,/\t/g" > significant_qtl_${Prefix}/all_significant_$Prefix.tsv
# cat significant_qtl_${Prefix}/all_significant_$Prefix.tsv | cut -f2 | sed "s/\r//g" | sort | uniq > significant_qtl_${Prefix}/significant_qtl_${Prefix}_tratilist.txt
# for Trait in $(cat significant_qtl_${Prefix}/significant_qtl_${Prefix}_tratilist.txt); do
#   echo $Trait
#   mkdir significant_qtl_${Prefix}/$Trait
#   cat significant_qtl_${Prefix}/all_significant_$Prefix.tsv | grep -w $Trait | cut -f1 | sed 's/\./-/g' | sed "s/\r//g" | sort | uniq > significant_qtl_${Prefix}/$Trait/${Trait}_sig_headers.txt
# done

2.2 Identify genes near qtl

# for Trait in $(cat significant_qtl_${Prefix}/significant_qtl_${Prefix}_tratilist.txt); do
#   echo $Trait
#   OutDir=significant_qtl_${Prefix}/$Trait
#   # cat $OutDir/${Trait}_sig_headers.txt
#
#   cat snp_locations/F.vesca/Hawaii4/snp_posns_vesca_v4.gff | grep -w -f $OutDir/${Trait}_sig_headers.txt > $OutDir/snp_posns_vesca_v4.gff
#
#   bedtools intersect -wao -a $OutDir/snp_posns_vesca_v4.gff -b assembly/external/F.vesca/Hawaii4/v4/Fragaria_vesca_v4.0.a1.transcripts.gff3 | grep -w 'mRNA' > $OutDir/genes_in_10Kb.gff
#   cat $OutDir/genes_in_10Kb.gff | cut -f18 | cut -f1 -d ';' | cut -f2 -d '=' > $OutDir/genes_in_10Kb_headers.txt
#
#   cat assembly/external/F.vesca/Hawaii4/v4/Fragaria_vesca_v4.0.a1_IRP.tsv | grep -w -f $OutDir/genes_in_10Kb_headers.txt > $OutDir/genes_in_10Kb_IPR.tsv
#
#   cat $OutDir/genes_in_10Kb_IPR.tsv >> significant_qtl_${Prefix}/
# done

mkdir -p significant_qtl_${Prefix}/F.vesca/Hawaii4

cat snp_locations/F.vesca/Hawaii4/snp_posns_vesca_v4.gff | grep -w -f significant_qtl_${Prefix}/all_significant_${Prefix}_headers.txt > significant_qtl_${Prefix}/F.vesca/Hawaii4/snp_posns_vesca_v4.gff

bedtools intersect -wao -a significant_qtl_${Prefix}/F.vesca/Hawaii4/snp_posns_vesca_v4.gff -b assembly/external/F.vesca/Hawaii4/v4/Fragaria_vesca_v4.0.a1.transcripts.gff3 | grep -w 'mRNA' > significant_qtl_${Prefix}/F.vesca/Hawaii4/genes_in_10Kb.gff
cat significant_qtl_${Prefix}/F.vesca/Hawaii4/genes_in_10Kb.gff |  cut -f18 | cut -f1 -d ';' | cut -f2 -d '=' > significant_qtl_${Prefix}/F.vesca/Hawaii4/genes_in_10Kb_headers.txt

cat assembly/external/F.vesca/Hawaii4/v4/Fragaria_vesca_v4.0.a1_IRP.tsv | grep -w -f significant_qtl_${Prefix}/F.vesca/Hawaii4/genes_in_10Kb_headers.txt > significant_qtl_${Prefix}/F.vesca/Hawaii4/genes_in_10Kb_IPR.tsv
mkdir -p significant_qtl_${Prefix}/F.vesca/Hawaii4

ProgDir=/home/armita/git_repos/emr_repos/scripts/strawberry_substrate_QTL/scripts
$ProgDir/probes2iprtable2.py --sig_qtl significant_qtl_${Prefix}/all_significant_$Prefix.tsv --gene_intersect significant_qtl_${Prefix}/F.vesca/Hawaii4/genes_in_10Kb.gff --ipr significant_qtl_${Prefix}/F.vesca/Hawaii4/genes_in_10Kb_IPR.tsv > significant_qtl_${Prefix}/F.vesca/Hawaii4/annotated_genes_in_10Kb_IPR.tsv

About

Commands used to identify gene models in regions containing identified QTL associated with phenotypic response to soil substrates

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages