Project 1c: Predicting Genetic Interactions

For this project, you will implement and run the featurization and random forest procedure described in Yu, et al. (Cell Systems, 2016) on the S. cerevisiae (baker's yeast) data from Costanzo, et al. (Science, 2010).

Data

The input data for your algorithm is a matrix of genetic interaction scores for pairs of genes and a hierarchy of gene sets. The genetic interactions are stored in a square NumPy matrix format with a corresponding file that lists the gene names for the rows/columns. The hierarchy is stored in a tab-separated text file, where each line lists the genes (leaves) in a set (internal node) of the hiearchy.

Example data

You can find a small example dataset for your project in data/examples.

Real data

You will need to download real data for your project and process it into the same format as the example data. You will create a S. cerevisiae hierarchy from the Gene Ontology.

For genetic interaction data, Yu, et al. used the ~3 million interactions from Costanzo, et al. (Science, 2010). However, because 3 million is probably too many for you to reasonably be able to compute in a short period of time, please use the data from Collins, et al. (Nature, 2007) instead. I've already preprocessed the data, and you can download it from this link.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data/examples		data/examples
.gitignore		.gitignore
Predict.R		Predict.R
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project 1c: Predicting Genetic Interactions

Data

Example data

Real data

About

Releases

Packages

Languages

sanjusinha7/project1c-predicting-genetic-interactions

Folders and files

Latest commit

History

Repository files navigation

Project 1c: Predicting Genetic Interactions

Data

Example data

Real data

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages