Skip to content

Files

Latest commit

 

History

History
151 lines (122 loc) · 6.2 KB

README.rst

File metadata and controls

151 lines (122 loc) · 6.2 KB

GuiltyTargets Development Build Status

This is a tool for therapeutic target prioritization using network representation learning. If you find it useful in your work, please consider citing [muslu2020].

[muslu2020]Muslu, Ö., Hoyt, C. T., De Lacerda, M. P., Hofmann-Apitius, M., & Fröhlich, H. (2020). GuiltyTargets: Prioritization of Novel Therapeutic Targets with Deep Network Representation Learning. IEEE/ACM Transactions on Computational Biology and Bioinformatics.

Installation Current version on PyPI Stable Supported Python Versions MIT License

Download this repository, go to the directory it resides and run:

$ git clone https://github.com/phanein/deepwalk.git
$ cd deepwalk
$ pip install .
$ cd ..
$ # Install GAT2VEC, which depends on DeepWalk
$ git clone https://github.com/ozlemmuslu/GAT2VEC.git gat2vec
$ cd gat2vec
$ pip install .
$ cd ..
$ # Actually install GuiltyTargets
$ git clone https://github.com/guiltytargets/guiltytargets.git
$ cd guiltytargets
$ pip install -e .

NOTE: Please see the issue #6 if you cannot install opentargets. We currently do not have a solution for this

Usage

After that, you can use it as a library in Python

import guiltytargets

guiltytargets.run(
    input_directory,
    targets_path,
    ppi_graph_path,
    dge_path,
    auc_output_path,
    probs_output_path,
    max_adj_p=max_padj,
    max_log2_fold_change=lfc_cutoff * -1,
    min_log2_fold_change=lfc_cutoff,
    entrez_id_header=entrez_id_name,
    log2_fold_change_header=log_fold_change_name,
    adj_p_header=adjusted_p_value_name,
    base_mean_header=base_mean_name,
    entrez_delimiter=split_char,
    ppi_edge_min_confidence=confidence_cutoff,
 )

This will create files in paths auc_output_path and probs_output_path, where the former shows the AUC values of cross validation and the latter shows the predicted targets.

The parameters are explained below. A use case can be found under https://github.com/GuiltyTargets/reproduction

INPUT FILES

There are 3 files which are necessary to run this program. All input files should be found under input_directory

  1. ppi_graph_path: A path to a file containing a protein-protein interaction network in the format of:

    source_entrez_id

    target_entrez_id

    confidence

    216

    216

    0.76

    3679

    1134

    0.73

    55607

    71

    0.65

    5552

    960

    0.63

    2886

    2064

    0.90

    5058

    2064

    0.73

    1742

    2064

    0.87

    An example of such a network can be found under http://cbdm-01.zdv.uni-mainz.de/~mschaefer/hippie/download.php

  2. dge_path: A path to a file containing an experiment, in tsv format. Rows show individual entries, columns are the values of the following properties:

    • Base mean
    • Log fold change
    • Adjusted p value
    • Entrez id
The file may contain other columns too, but the indices and names of the above columns must be entered to the configuration file.
  1. targets_path: A path to a file containing a list of Entrez ids of known targets, in the format of

    ... code-block:: sh

    1742 3996 150 152 151

OPTIONS

The options that should be set are:

  • max_adj_p: Maximum value for adjusted p-value for a gene to be considered differentially expressed.
  • max_log2_fold_change: Maximum value for log2 fold change for a gene to be considered differentially expressed
  • min_log2_fold_change: Minimum value for log2 fold change for a gene to be considered differentially expressed
  • ppi_edge_min_confidence: Minimum confidence score for the edges in PPI network.
  • entrez_id_header: The column name for the Entrez id in the differential expression file.
  • log2_fold_change_header: The column name for the log2 fold change in the differential expression file.
  • adj_p_header: The column name for the adjusted p-value in the differential expression file.
  • base_mean_header: The column name for the base mean in the differential expression file.
  • entrez_delimiter: If there is more than one Entrez id per row in the diff. expr. file, the separator betweem them.

OUTPUTS

  • *_gat2vec.emb: Embedding file
  • *_na.adjlist: Adjacency list of undirected bipartite graph
  • *_graph.adjlist: Adjacency list of the structural graph (PPI)
  • labels_maped.txt: Labels (drug target/not for the disease)
  • probs.tsv: Probabilities assigned by the classifier whether the Entrez gene is a possible target(class 1) or not (class 0)
  • auc.tsv: The results of the cross validation. The targets are ranked based on the class 1 probabilities