-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathsuppMethods.Rmd
140 lines (106 loc) · 11.8 KB
/
suppMethods.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
---
title: "scanMiR: a biochemically-based toolkit for versatile and efficient microRNA target prediction"
subtitle: "Supplementary Methods"
date: "`r format(Sys.time(), '%d %B, %Y')`"
author:
- "Michael Soutschek"
- "Fridolin Gross"
- "Gerhard Schratt"
- "Pierre-Luc Germain"
output: pdf_document
---
# Aggregation into predicted transcript repression
The aggregation of multiple binding sites into predicted transcript repression
is done mostly according to the model by
[McGeary, Lin et al. (2019)](https://dx.doi.org/10.1126/science.aav1741).
Briefly, multiple miRNA binding sites are assumed to have an additive effect on
the transcript's decay rate that is proportional to their occupancy. The
occupancy of AGO-bound miRNA $g$ on a mRNA $m$ with $p$ binding sites in the
open reading frame (ORF) and $q$ in the 3' untranslated region (UTR) is given
by the following equation:
$$
N_{m,g} =
\sum_{i=1}^{p}\left(\frac{a_g}{a_g + c_{\text{ORF}}
K_{d,i}^{\text{ORF}}}\right) +
\sum_{j=1}^{q}\left(\frac{a_g}{a_g + K_{d,j}^{\text{3'UTR}}}\right)
$$
where $a$ is the relative concentration of unbound AGO-miRNA complexes, and
$c$ is the penalty factor for sites that are found within the ORF. scanMiR also
includes a coefficient $e$ accounting for the effect of the 3' alignment:
$$
N_{m,g} =
\sum_{i=1}^{p}\left(\frac{a_g}{a_g + e_{i}c_{\text{ORF}}
K_{d,i}^{\text{ORF}}}\right) +
\sum_{j=1}^{q}\left(\frac{a_g}{a_g + e_{j}K_{d,j}^{\text{3'UTR}}}\right)
$$
where $e$ is the exponential of the product of the 3' alignment score and a
global parameter ($p3$). The 3' alignment score roughly corresponds to the
number of matched nucleotides (additionally penalizing gaps), and it is by
default set to 0 if the number of matches is below 3 (adapted from the
observations in Grimson et al., 2007) and capped to a maximum of 8 in order to
account for sites with higher scores possibly leading to target-directed miRNA
degradation (TDMD).
The repression by miRNA $g$ can then be understood as the ratio between its
occupancy and a background occupancy term, in which the dissociation constants
are set to that of nonspecifically bound sites (i.e. $K_d = 1.0$). More
specifically, McGeary et al. (2019) model the repression as:
$$
N_{m,g,\text{background}} =
\sum_{i=1}^{p}\left(\frac{a_g}{a_g + c_{\text{ORF}}}\right) +
\sum_{j=1}^{q}\left(\frac{a_g}{a_g + 1}\right)
$$
where $b$ can be interpreted as the additional repression caused by a single
bound AGO. Because UTR and ORF lengths have been reported to influence the
efficacy of repression (Agarwal et al., 2015; Hausser et al., 2009), scanMiR
allows for an additional term to take these effects into account:
$$
\text{repression}_{\text{adj}} = \text{repression}\cdot
(1+f\cdot\text{UTR.length}+h\cdot\text{ORF.length})
$$
$UTR.length$ and $ORF.length$ are linearly normalized so that 0 and 1 are
respectively the 5\% and 95\% quantiles of the distribution of lengths (see
Garcia et al. 2011; Agarwal et al. 2015). This adjustment overall leads to
slightly improved correlations with observed repression (Suppl. Fig. 7B-C).
Given that the effect is small except for some extreme transcripts, these
parameters are however set to 0 by default.
While $b$, $c$, $p3$, $f$ and $h$ are considered global parameters (i.e. the
same for different miRNAs and transcripts and also across experimental
contexts), $a$ is expected to be different for each miRNA in a given
experimental condition. Values for parameters $b$ (= 1.77) and $c$ (= -1.71)
were therefore obtained from the Biochemical Plus model of McGeary et al.
(2019), which was optimized with the experimentally determined RBNS data.
Parameters $p3$, $f$ and $h$ were globally fitted to maximize the correlation
with miRNA transfection experiments from McGeary et al. (2019).
As shown by McGeary et al. (2019), the performance of the biochemical model is
robust to changes in parameter $a$ of several orders of magnitude (see also
Suppl. Fig. 3A-B). Furthermore, repression predictions obtained with the
globally optimized $a$ value still outperform TargetScan substantially
(see Fig. 3 & Suppl. Fig. 2). Therefore, scanMiR provides reasonable default
parameters, although users can easily provide their own parameter values.
# Comparison of predicted and observed repression
For HeLa and HEK cell lines, we used the transcriptome reconstructions and
quantifications provided by the authors in the GEO series (respectively in
series GSE140217 and GSE140218). Given the absence of controls, both expression
and predicted repression were normalized as described in McGeary et al. (2019).
TargetScan7 values were obtained by running the TargetScan7 Python scripts of
Kathy Lin (https://github.com/kslin/targetscan) with default parameters and conservation files supplied in the Git repository. The 100way Multi-Species Alignment (MAF) file of the custom reconstructed HEK transcripts of McGeary et al. (2019) was obtained by downloading alignments from the UCSC Genome Browser with mafFetch and then further processing these with “Stitch MAF blocks” from Galaxy (Blankenberg et al., 2011) and R. Common species identifiers were obtained from the NCBI Taxonomy Browser. miRNA seed and family information was downloaded from the TargetScan7 homepage (http://www.targetscan.org/vert_72/) and further processed in R to conform to the example input files. TargetScan8 occupancy scores were downloaded from the TargetScan8 homepage (http://www.targetscan.org/vert_80/).
To assess the scanMiR performance in another species, we downloaded two datasets of miRNA knockout studies performed in mice (Amin et al., 2015; Eichhorn et al., 2014). Reads were mapped to the GRCm38 genome and subsequently counted with Salmon 1.3.0 (Patro et al., 2017). logFC-values were obtained from edgeR (v. 3.32) (Robinson et al., 2010), by filtering for expressed transcripts using the filterByExpr-function and normalizing with a weighted trimmed mean of the logarithmic expression ratios of individual samples (TMM). For the miR-122 knockout dataset from Eichhorn et al. (2014), a common negative binomial dispersion was estimated since the authors performed RNA-sequencing with only one replicate. In order to correlate logFCs of these two datasets with scanMiR repression predictions, we considered only transcripts that constitute 90\% of the expressed transcripts of one gene in the specific setting, that are expressed higher than 10 TPM (Amin et al., 2015) or 0.5 TPM (Eichhorn et al., 2014), respectively, that are reported as representative transcripts in TargetScan, and that are supported with at least five 3p-seq-tags (http://www.targetscan.org/mmu_80/mmu_80_data_download/Gene_info.txt.zip). Mouse TargetScan8 scores were downloaded from http://www.targetscan.org/mmu_80/ and are based on the custom TargetScan mouse 3’UTR annotations.
# Other external datasets used
miRNA expression changes upon TDMD knockout in induced mouse neurons at day 10 of differentiation were downloaded from the supplementary data (Data S2) from Shi et al. (2020). Expression changes with an adjusted p-value smaller than 10-5 were considered significant (concurrent with the 17 significantly changing miRNAs in Fig. 4B of Shi et al. 2020). Corresponding transcript expression levels were obtained from the study of Whipple et al. (2020). Sequenced reads were mapped to GRCm38 and afterwards counted using Salmon (Patro et al., 2017). Putative TDMD sites are shown on transcripts with expression levels of at least 10 TPM in more than one wild-type sample of neurons at day 10 of differentiation (Fig. 5A, Supplementary Table 1)
For the circular RNA scans, we used full circular RNA reconstructions from Zhang et al. (2021). A gtf file of the coordinates was kindly provided by the authors, containing also back-splice junction (BSJ) counts. We extracted the corresponding spliced sequences, appending the first 11 nucleotides at the end to enable the identification of sites spanning the back-splice junction.
miRNA expression levels in the brain (Fig. 5B, Fig. 6B) were calculated from the supplementary information of Chiang et al. (2010).
# References
Agarwal, V., Bell, G. W., Nam, J. W., & Bartel, D. P. (2015). Predicting effective microRNA target sites in mammalian mRNAs. eLife, 4, 1–38. https://doi.org/10.7554/eLife.05005
Amin, N. D., Bai, G., Klug, J. R., Bonanomi, D., Pankratz, M. T., Gifford, W. D., Hinckley, C. A., Sternfeld, M. J., Driscoll, S. P., Dominguez, B., Lee, K. F., Jin, X., & Pfaff, S. L. (2015). Loss of motoneuron-specific microRNA-218 causes systemic neuromuscular failure. Science, 350(6267), 1525–1529. https://doi.org/10.1126/science.aad2509
Chiang, H. R., Schoenfeld, L. W., Ruby, J. G., Auyeung, V. C., Spies, N., Baek, D., Johnston, W. K., Russ, C., Luo, S., Babiarz, J. E., Blelloch, R., Schroth, G. P., Nusbaum, C., & Bartel, D. P. (2010). Mammalian microRNAs: Experimental evaluation of novel and previously annotated genes. Genes & Development, 24(10), 992–1009. https://doi.org/10.1101/gad.1884710
Eichhorn, S. W., Guo, H., McGeary, S. E., Rodriguez-Mias, R. A., Shin, C., Baek, D., Hsu, S., Ghoshal, K., Villén, J., & Bartel, D. P. (2014). MRNA Destabilization Is the Dominant Effect of Mammalian MicroRNAs by the Time Substantial Repression Ensues. Molecular Cell, 56(1), 104–115. https://doi.org/10.1016/j.molcel.2014.08.028
Garcia, D. M., Baek, D., Shin, C., Bell, G. W., Grimson, A., & Bartel, D. P. (2011). Weak seed-pairing stability and high target-site abundance decrease the proficiency of lsy-6 and other microRNAs. Nature Structural & Molecular Biology, 18(10), 1139–1146. https://doi.org/10.1038/nsmb.2115
Grimson, A., Farh, K. K. H., Johnston, W. K., Garrett-Engele, P., Lim, L. P., & Bartel, D. P. (2007). MicroRNA Targeting Specificity in Mammals: Determinants beyond Seed Pairing. Molecular Cell, 27(1), 91–105. https://doi.org/10.1016/j.molcel.2007.06.017
Hausser, J., Landthaler, M., Jaskiewicz, L., Gaidatzis, D., & Zavolan, M. (2009). Relative contribution of sequence and structure features to the mRNA binding of Argonaute/EIF2C–miRNA complexes and the degradation of miRNA targets. Genome Research, 19(11), 2009–2020. https://doi.org/10.1101/gr.091181.109
McGeary, S. E., Lin, K. S., Shi, C. Y., Pham, T. M., Bisaria, N., Kelley, G. M., & Bartel, D. P. (2019). The biochemical basis of microRNA targeting efficacy. Science, 366(6472). https://doi.org/10.1126/science.aav1741
Blankenberg, D., Taylor, J., Nekrutenko, A., & The Galaxy Team. (2011). Making whole genome multiple alignments usable for biologists. Bioinformatics, 27(17), 2426–2428. https://doi.org/10.1093/bioinformatics/btr398
Patro, R., Duggal, G., Love, M. I., Irizarry, R. A., & Kingsford, C. (2017). Salmon provides fast and bias-aware quantification of transcript expression. Nature Methods, 14(4), 417–419. https://doi.org/10.1038/nmeth.4197
Robinson, M. D., McCarthy, D. J., & Smyth, G. K. (2010). edgeR: A Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics, 26(1), 139–140. https://doi.org/10.1093/bioinformatics/btp616
Shi, C. Y., Kingston, E., Kleaveland, B., Lin, D. H., Stubna, M. W., & Bartel, D. P. (2020). The ZSWIM8 ubiquitin ligase mediates target-directed microRNA degradation. Science, 21(1), 1–9. https://doi.org/10.1126/science.abc9359
Whipple, A. J., Breton-provencher, V., Jacobs, H. N., Chitta, U. K., Sur, M., Sharp, P. A., Whipple, A. J., Breton-provencher, V., Jacobs, H. N., Chitta, U. K., & Sur, M. (2020). Imprinted Maternally Expressed microRNAs Antagonize Paternally Driven Gene Programs in Article Imprinted Maternally Expressed microRNAs Antagonize Paternally Driven Gene Programs in Neurons. Molecular Cell, 1–11. https://doi.org/10.1016/j.molcel.2020.01.020
Zhang, J., Hou, L., Zuo, Z., Ji, P., Zhang, X., Xue, Y., & Zhao, F. (2021). Comprehensive profiling of circular RNAs with nanopore sequencing and CIRI-long. Nature Biotechnology, 39(7), 836–845. https://doi.org/10.1038/s41587-021-00842-6