812-Final-Assignment

NCBI gene IDs for 100 nifH genes were gathered in nifh_accessions.csv
gene_IDs.R isolated the gene IDs in integer form.
Gene IDs were copied and pasted into the loop object of gene_id_parser.sh
gene_id_parser.sh retrieved the gene sequences for each gene ID and put them in a fasta file called nifh_sequences1.fasta. For pipeline see: sequence_generator.sh

msa.R produced a multiple sequence alignment out.fasta by taking nifh_sequences1.fasta as input
The MView program was used to produce visualizations in figure1.html and figure2.html

take the out.fasta file and filter out sequences with >60% identity to create DMinput.fasta
use DMinput.fasta to calculate the distance matrix and construct and phylogenetic tree using distance_matrix_and_tree.r
output after running distance_matrix_and_tree.r will be a folder with four pdfs: DistMat(indel).pdf, DistMat(JC69&K80&K81&TN93).pdf, and DistMat(TS&TV).pdf to show the distance matrix and Phylogeny.pdf to show the phylogenetic tree

Name		Name	Last commit message	Last commit date
Latest commit History 177 Commits
1-GetSeqs		1-GetSeqs
2-MSA		2-MSA
3-DistMatPhylo		3-DistMatPhylo
_testing		_testing
renv		renv
.DS_Store		.DS_Store
.Rprofile		.Rprofile
.gitignore		.gitignore
BIOL-812-Final-Assignment_Group-1_Presentation_April-2022.pdf		BIOL-812-Final-Assignment_Group-1_Presentation_April-2022.pdf
BIOL-812-Final-Assignment_Group-1_Report_April-2022.Rmd		BIOL-812-Final-Assignment_Group-1_Report_April-2022.Rmd
BIOL-812-Final-Assignment_Group-1_Report_April-2022.pdf		BIOL-812-Final-Assignment_Group-1_Report_April-2022.pdf
README.md		README.md
pipeline.jpg		pipeline.jpg
pipeline.sh		pipeline.sh

Provide feedback