- Load the DNA sequence
fishes.fna.gz
using functions from theseqinr
package and theBiostrings
package. Note the differences between the created variables.
- Next, focus on the
Biostrings
package. Practice working with loaded data:- Check the number of loaded sequences:
length(seq)
- Determine the lengths of each sequence:
width(seq[1])
- View the sequence names (FASTA headers):
names(seq)
- Assign the first sequence including the name to the variable
seq1
:seq1 <- seq[1]
- Assign the first sequence without the name to the variable
seq1_sequence
:seq1_sequence <- seq[[1]]
- Assign the first sequence as a vector of characters to the variable
seq1_string
:seq1_string <- toString(seq[1])
- Learn more about the
XStringSet
class and theBiostrings
package:help(XStringSet)
- Check the number of loaded sequences:
- Globally align the two selected sequences using the BLOSUM62 matrix, a gap opening cost of -1 and a gap extension cost of 1.
- Practice working with regular expressions:
- Create a list of names, e.g.:
names_list <- c("anna", "jana", "kamil", "norbert", "pavel", "petr", "stanislav", "zuzana")
- Search for name
jana
:grep("jana", names_list, perl = TRUE)
- Search for all names containing letter
n
at least once:grep("n+", names_list, perl = TRUE)
- Search for all names containing letters
nn
:grep("n{2}", names_list, perl = TRUE)
- Search for all names starting with
n
:grep("^n", names_list, perl = TRUE)
- Search for names
Anna
orJana
:grep("Anna|Jana", names_list, perl = TRUE)
- Search for names starting with
z
and ending witha
:grep("^z.*a$", names_list, perl = TRUE)
- Create a list of names, e.g.:
- Load an amplicon sequencing run from 454 Junior machine
fishes.fna.gz
. - Get a sequence of a sample (avoid conditional statements), that is tagged by forward and reverse MID
ACGAGTGCGT
. - How many sequences are there in the sample?
-
Create a function
Demultiplexer()
for demultiplexing of sequencing data. -
Input:
- a string with path to fasta file
- a list of forward MIDs
- a list of reverse MIDs
- a list of samples labels
-
Output:
- fasta files that are named after the samples and contain sequences of the sample without MIDs (perform MID trimming)
- table named
report.txt
containing samples‘ names and the number of sequences each sample has
-
Check the functionality again on the
fishes.fna.gz
file, the list of samples and MIDs can be found in the corresponding tablefishes_MIDs.csv
.
Basic Git settings
- Configure the Git editor
git config --global core.editor notepad- Configure your name and email address
git config --global user.name "Zuzana Nova" git config --global user.email [email protected]- Check current settings
git config --global --list
-
Create a fork on your GitHub account. On the GitHub page of this repository find a Fork button in the upper right corner.
-
Clone forked repository from your GitHub page to your computer:
git clone <fork repository address>
- In a local repository, set new remote for a project repository:
git remote add upstream https://github.com/mpa-prg/exercise_02.git
Create a new commit and send new changes to your remote repository.
- Add file to a new commit.
git add <file_name>
- Create a new commit, enter commit message, save the file and close it.
git commit
- Send a new commit to your GitHub repository.
git push origin main