-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathresults_deregulated_genes_tinats.tex
50 lines (31 loc) · 10.2 KB
/
results_deregulated_genes_tinats.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
\chapter{Experimental transcriptome}
\label{chap:r:tinats}
\minitoc
The previous chapter addressed the altered genes and pathways in the \dnmtchip genotype on the grounds of annotated genes and transcripts. Source of this annotation was release~\num{84} of the \emphdatabasename{NCBI Reference Sequence Database (RefSeq)} published on September 11, 2017. Although the \emphdatabasename{RefSeq} collection does include alternatively spliced transcripts, pseudogenes and alternative haplotypes as well as provisional entries, its scope of being a non-redundant database conflicts with its aptitude to represent the true transcriptional complexity of every cell type.
The \emphcollectionname{FANTOM} projects had first shed light on widespread cell-type specific promoter usage\cite{Carninci2005,Faulkner2008} and later frequent aberrant alternative splicing across tumors was proven\cite{Kahles2018}. Furthermore fusion proteins of \proteinnamehuman{Mll} (\mllfp) affect the transcriptional machinery\cite{Slany2016} and epigenetically repressed cryptic promoters can be activated upon hypomethylation\cite{Brocks2017}. Therefore, we assumed that the \emphdatabasename{RefSeq} annotation might not adequately reflect the situation in \mllafnine cells and set out to generate an experimentally determined transcriptome.
\section{Assembly of non-reference transcripts}
\label{chap:r:tinats:stringtie}\label{chap:r:tinats:denovoexpression}\label{chap:r:tinats:denovomethylation}
\fyfrank
In principle, there are two approaches to establish a custom transcriptome from RNA-seq data. One can opt for a true de novo assembly\cite{Haas2013}, which combines overlapping sequencing reads into longer continuous genomic sequences (so called contigs). However, for most bioinformatic approaches to the problem (like \emph{De Bruijn graphs}), there is a trade-off between ambiguity as well as efficiency and computational demands such as memory consumption. Thus, for common model organisms, for which reference genomes exist, it is generally preferable and more precise to align the reads first to the genome and reconstruct the transcriptome from those alignments\cite{Liao2013,Liao2014,Maretty2014}. After alignment \supple, we employed \emphsoftwarename{StringTie}\cite{Pertea2015} to reconstruct the transcripts and retained only those assemblies, which were supported by a 5'-prime CAGE-seq signal with a custom script.
We were able to reconstruct \num{43597} elongated transcripts from CAGE-seq confirmed transcription start sites. \num{12850} (\SI{29.47}{\percent}) were ultimately considered for downstream analyses, others were e.g. artificial mergers of reads originating from different samples. As before with reference transcripts, considerably more transcripts were differentially expressed between \kithi and \kitlow cells ($n\,=\,3686$) than between the genotypes \dnmtchip and \dnmtwt ($n\,=\,519$).
\SI{25}{\percent} respectively \SI{17}{\percent} were non-reference transcripts. The most common non-reference transcripts were unique, intergenic transcripts with no direct relation to annotated genes. These were mostly short ($<$\SI{2}{\kilo b}), unspliced fragments typically framed by SINEs or LINEs, possibly pseudogenes, sequences of viral origin or transposons. Their expression or number did not increase in \dnmtchip, therefore they were probably not attributable for the impairment of self-renewal, although their exact role remained elusive \supple.
Published literature had suggested that DNA hypomethylation is capable of reactivating cryptic, dormant promoters, which were referred to as \emph{treatment-induced non-annotated transcription start sites} (TINATs)\cite{Brocks2017}
We detected only few transcripts with novel splice junctions (\textbf{\texttt{j}}), none of them was differentially expressed and thus of interest for this project. We also could not identify a non-reference transcript class, whose transcripts were predominantly initiated from hypomethylated, reactivated promoters in \dnmtchip \supple. Not even the promoters of the few truly differentially expressed non-reference transcripts exhibited a pronounced hypomethylation in \dnmtchip \supple. Our data therefore did not corroborate widespread joining of RNAs originating from cryptic promoters to regular reference transcripts during splicing as described before \cite{Brocks2017}.
\section{Isolated transcriptional initiation events}
\label{chap:r:tinats:denovolocalization}
Because we observed many isolated transcription initiation events in CAGE-seq, which could not be extended to full length transcripts by RNA-seq, we were concerned to underestimate the amount of TINATs by the combined CAGE-seq/RNA-seq approach. For example, we observed a strong CAGE-seq signal (and hypomethylation) right at the site of the cryptic promoter in the \genenamemouse{Dapk1} gene, which had been described in the original TINAT publication\cite{Brocks2017}, but the RNA-seq data did not allow for a successful elongation into a de novo assembled transcript. It seemed that our de novo assembled transcriptome of \mllafnine leukemia comprised mostly recurrent, faithfully reconstructed transcripts at the expense of most random, rare RNAs originating from TINAT-like initiation events.
\begin{figure}[!htb]
\centering
\includegraphics[width=\textwidth]{figures/output/chromatin/gam_hic_features/tinatdensplot_LSChicpc1overall.pdf}
\caption{Genomic localization of transcriptional initiation. The tag clusters were assigned the respective first principal component of HPC-7 murine blood stem/progenitor cell Hi-C uniquely mapping interaction data. TSS were separated according to specific occurrence, overlap with Fantom~5 reference as well was classification as enhancer or promoter. Black arrows emphasize the unusual enrichment of robust or facultative heterochromatic localizations in the wild-type specific, unannotated clusters.}
\label{fig:tinatdensplot_LSChicpc1overall.pdf}
\end{figure}
Therefore, we once loosened the criteria and focused solely on the CAGE-seq data to elaborate on aberrant transcriptional initiation, although \mllfp are known to rather affect the elongation of transcripts than their initiation\cite{Mohan2010}. In total we could identify \num{140267} tag clusters, of which \num{45259} overlapped known promoters from the \emphcollectionname{Fantom~5} reference, while \num{95008} were unique. Two-thirds of the unannotated sites were specific to either leukemic genotype.
To explore the genomic localization of the tag clusters, we mapped the first principal component, which distinguishes \emph{active/permissive} from \emph{inactive/inert} chromatin compartments\cite{Lieberman-Aiden2009}, of Hi-C chromatin interaction data generated in the HPC-7 murine blood stem/progenitor cell model\dissrefpage{chap:r:gam:chromatin:hic}\cite{Wilson2016}. While basically all annotated tag clusters irrespective of their specificity were exclusively located in the open chromatin regions \reffigure{fig:tinatdensplot_LSChicpc1overall.pdf}{, top row}, we noticed an abnormal enrichment of \dnmtwt-specific, unannotated TSS clusters in typically inert heterochromatic regions. Since the decompaction of chromatin is a prerequisite of active transcription, this either pointed towards an increased flexibility of the chromatin structure or a more readily initiated transcription\reffigure{fig:tinatdensplot_LSChicpc1overall.pdf}{, bottom row}.
\section{Summary}
\label{chap:r:tinats:degenesummary}
This and the previous chapter focused on the transcriptome of \mllafnine leukemia and in particular on the changes induced by hypomethylation as a consequence of \genenamemouse{Dnmt1} reduction. On the grounds of the reference transcripts however, neither general promoter hypomethylation\dissrefpage{chap:r:transcription:promhypooverall} nor a putative elongation bias\dissrefpage{chap:r:transcription:elongation} had profound effects on the transcriptome.
Just \num{730} genes were consistently and significantly differentially expressed between the two genotypes and only \num{40} exhibited a promoter hypomethylation in conjunction with an upregulation \dissrefpage{chap:r:degenes:genotypecontrast}. Nevertheless we saw enrichment of the genes in some interesting pathways\dissrefpage{chap:r:degenes:genotypecontrast:pathways}, which could eventually explain the observed differences in self-renewal, tumor growth and senescence. Yet, a direct link to methylation could not be established, neither directly nor with the help of a comprehensive buffer domain analysis, which aimed for the identification of crucial regulatory genes linked to cell identity\dissrefpage{chap:r:degenes:bufferdomains}.
Subsequently, we hoped that the reconstruction of non-reference transcripts might shed light on the ramifications of DNA hypomethylation in the \dnmtchip mouse model. Although we detected a relevant number of non-reference transcripts \dissref{chap:r:tinats:stringtie}, they were not regulated by methylation\supple and no superordinate mechanism could be elucidated.
Importantly, indications were weak that TINAT transcripts\cite{Brocks2017} would commonly be spliced to reference RNAs\footnote{just \num{11} transcripts in class \textbf{\texttt{j}}\cite{Pertea2016}, none of which differentially expressed}, although technical limitations might have caused us to underestimate the true extent. While stray transcription of full length transcripts was virtually absent\dissrefpage{chap:r:transcription:strayreference}, the purely CAGE-seq based approach resulted in the identification of thousands of non-annotated \dnmtwt specific TSS clusters in heterochromatin areas of the genome \dissref{chap:r:tinats:denovolocalization}.
Although the latter was definitively a finding of great interest, its interpretation remained challenging. Possibly, the samples were switched and these clusters were in fact confined to the \dnmtchip genotype, which would resonate well with the pronounced hypomethylation of LADs and a explanatory model involving TINATs. Alternatively, these clusters were indeed absent in \dnmtchip and reflected a diminished cellular plasticity. In the latter case, hypomorphic \mllafnine LSCs would, according to a model by the Feinberg lab\cite{Pujadas2012}, poorly respond when challenged by variable conditions resulting in a survival and self-renewal bias.