Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
kaitlin309 authored Oct 28, 2022
1 parent 041d04c commit 4d3d708
Showing 1 changed file with 58 additions and 1 deletion.
59 changes: 58 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,59 @@
# RNAcompete_classifier
This repository contains the code and data for training the RNAcompete experiment classifier used in Ray et al. 2022
This repository contains the code and data for training the RNAcompete experiment classifier used in Ray *et al.* 2022. To reproduce the results in the paper, follow the instructions below.

## File Descriptions

`RNAcompete_LR_classifier.py`: Contains the Python script that trains the classifier, tests the classifier, and applies the classifier to all ucRBP experiments.

`training_set.txt`: Tab-delimited file containing the feature matrix for training. Each line is an individual RNAcompete experiment. The first column (class) indicates whether it was a passed (1) or failed (0) experiment. Subsequent columns contain values for each of the features as described in the paper.

`testing_set.txt`: Tab-delimited file with the same format as `training_set.txt`. This file contains the 40 experiments used to test the classifier.

`ucrbp_experiment_features.txt`: Tab-delimited file with the same format as `training_set.txt` and `testing_set.txt`, EXCEPT for the 'class' column. Contains feature values for all ucRBP experiments (full-length ucRBPs and truncated constructs).

`train_IDs.txt`: Contains the RNAcompete experiment identifiers for all experiments in `training_set.txt`.

`test_IDs.txt`: Contains the RNAcompete experiment identifiers for all experiments in `testing_set.txt`.

`ucrbp_IDs.txt`: Contains the RNAcompete experiment identifiers for all experiments in `ucrbp_experiment_features.txt`. IDs in this file correspond to IDs in Supplementary Table 1.

For all `*IDs.txt` files, the IDs correspond line-by-line to the values in `*_set.txt` and `ucrbp_experiment_features.txt`. For example, to directly view the feature values for ucRBP experiments by ID, the files can be merged as follows:

`paste ucrbp_IDs.txt ucrbp_experiment_features.txt`

## Train & Apply the Classifier

Clone the repository and ensure the following requirements are installed:

- python3 version 3.8
- numpy version 1.19.2
- sklearn version 0.23.2
- skopt version 0.8.1
- pickleshare version 0.7.5

Run the following from within the directory:

`python RNAcompete_LR_classifier.py`

## Output File Descriptions

`LR_model.sav`: Contains a "pickled" version of the logistic regression model trained in `RNAcompete_LR_classifier.py`. This model can be loaded in python by running the following:
```
import pickle
lr = pickle.load(open('LR_model.sav', 'rb'))
```

`LR_coefficients.txt`: Tab-delimited file containing one line for each feature along with the weight of the feature in the model.

`test_set_AUROC.txt`: Contains the AUROC for the test set.

`test_set_probability_estimates.txt`: Contains one line per test set experiment with the probability estimate that the experiment belongs to class 1 (passed experiments).

`ucrbp_probability_estimates.txt`: Contains one line per ucRBP experiment with the probability estimate that the experiment belongs to class 1 (passed experiments).

For both `*_probability_estimates.txt` files, the probability estimates correspond line-by-line to the associated `*IDs.txt` files (and thus the files containing the features as well). For example, to directly view the probability estimates for ucRBP experiments by ID, the files can be merged as follows:

`paste ucrbp_IDs.txt ucrbp_probability_estimates.txt`



0 comments on commit 4d3d708

Please sign in to comment.