Skip to content

sanjusinha7/project1c-predicting-genetic-interactions

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Project 1c: Predicting Genetic Interactions

For this project, you will implement and run the featurization and random forest procedure described in Yu, et al. (Cell Systems, 2016) on the S. cerevisiae (baker's yeast) data from Costanzo, et al. (Science, 2010).

Data

The input data for your algorithm is a matrix of genetic interaction scores for pairs of genes and a hierarchy of gene sets. The genetic interactions are stored in a square NumPy matrix format with a corresponding file that lists the gene names for the rows/columns. The hierarchy is stored in a tab-separated text file, where each line lists the genes (leaves) in a set (internal node) of the hiearchy.

Example data

You can find a small example dataset for your project in data/examples.

Real data

You will need to download real data for your project and process it into the same format as the example data. You will create a S. cerevisiae hierarchy from the Gene Ontology.

For genetic interaction data, Yu, et al. used the ~3 million interactions from Costanzo, et al. (Science, 2010). However, because 3 million is probably too many for you to reasonably be able to compute in a short period of time, please use the data from Collins, et al. (Nature, 2007) instead. I've already preprocessed the data, and you can download it from this link.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • R 100.0%