Pinpointing molecular mechanisms of long non-coding RNAs : at the crossroads of biochemistry and genetics
Abstract
Ribonucleic acid (RNA) is one of the major molecular biopolymers in all known forms of life. Although originally known for its crucial roles in directing diverse aspects of translation, today it is know its functions extend to other, if not all, areas of cellular biology. This is demonstrated by the fact that although >80% of the genome is pervasively transcribed, only 2% of the genome accounts for protein-coding genes, putting forward the question on the function, if any, of these non-coding transcripts? Since the sequencing of the human genome, advancements in massively parallel sequencing and RNA extraction methods have revealed a previously unprecedented complexity of the non-coding transcriptome. These ncRNAs comprise a highly diverse group with high structural and functional variation. LncRNAs are of particular interest, as these comprise the bulk of the transcriptome in terms of number of genes. Although some lncRNAs have been extensively studied and characterized, most lncRNAs have long been regarded as transcriptional noise, owing in part by their generally low expression levels. However, for the last three decades, pioneering research has demonstrated that lncRNA transcripts can have important if not essential functions in cellular biology, and it has become clear that the non-coding part of the transcriptome adds several layers of regulation on the central dogma. In addition, dysregulated expression of lncRNAs can contribute to a wide plethora of human diseases. The highly tissue-specific expression profile of many of these lncRNA genes makes the human lncRNome an untapped source of potential targets for precision medicine, and translating these insights to the clinic may lead to new exciting avenues for diagnostic, prognostic, and therapeutic applications. A big hurdle in lncRNA research remains the elucidation of the molecular mechanism of these transcripts. However, insights in these mechanisms may prove to be very useful for the development of treatment modalities that are synergistic to existing therapy or to predict and counter potential resistance mechanisms. A lncRNA rarely acts as a sole effector molecule, but rather functions by interacting with other biomolecular entities such as DNA or proteins. Although recent technological advances to interrogate an RNA’s interactome have been made, these methods have proven difficult to implement and each has its own benefits and drawbacks. Most methods to characterize an RNA’s interactome rely on RNA pulldown methods using various ways of cross-linking and the use of biotinylated tiling probes complementary to the RNA of interest (e.g. iDRiP-MS, RAP-MS, ChIRP-MS, etc). Although conceptually the same, small technical differences have shown great differences in size and content of the identified interactome. Drawing upon the PPI field, a strong case can be made for the need of orthogonal methods to create high confidence sets of interaction candidates. We combined comprehensive identification of RNA-binding proteins by mass spectrometry (ChIRP-MS) and RNA-BioID to create a high confidence RBP set of several lncRNAs. To evaluate these methods, they were applied them on HOX Transcript Antisense Intergenic RNA (HOTAIR). HOTAIR is a well-known lncRNA expressed from the HOXC gene cluster and shown to regulate the expression of the HOXD gene cluster in trans during the development limb patterning. Overexpression of HOTAIR has been associated with a metastasis-promoting phenotype in breast and ovarian cancer. Although HOTAIR is known to bind PRC2 and LSD1 protein complexes for its mode of action, an unbiased and comprehensive method to map its interactome has not yet been performed. We overlapped significantly enriched proteins in both methods and identified proteins including the hallmark interactors PRC2 and REST/CoREST chromatin remodeler complexes, but surprisingly also subunits of mitochondrial ribosomes (MRPLs). We elaborated on the MRPL-HOTAIR interaction and show the interaction does not occur in mitochondria. After optimizing ChIRP-MS on HOTAIR, we applied the method on NESPR, a neuroblastoma-specific lncRNA that we are actively dissecting at a mechanistic level. Neuroblastoma is a childhood cancer derived from the SA lineage that is defined by an adrenergic and mesenchymal cell identity. We screened lncRNAs associated with the adrenergic identity and prioritized NESPR due to its adrenergic-specific expression profile, its association with clinical parameters, and foremost its genomic location. NESPR is abundantly expressed from the PHOX2B super-enhancer, a crucial adrenergic-specific cell identity gene. Knockdown of the NESPR fraction significantly decreases cellular survival and induces an apoptotic program. RNA-sequencing upon NESPR knockdown revealed a significant reduction in the expression of PHOX2B. 4C-sequencing demonstrated that the NESPR and PHOX2B loci form an insulated neighborhood in adrenergic but not mesenchymal cells, demonstrating that NESPR regulates PHOX2B in cis. We show that perturbation of NESPR does not impact the chromatin looping between the PHOX2B and NESPR loci, nor do we find evidence for direct NESPR DNA binding at enhancer elements in the loop. However, RNA-sequencing data from NESPR and PHOX2B knockdown experiments revealed a PHOX2B-independent function for NESPR. After stringent filtering, ChIRP-sequencing revealed one in trans target gene, ESRRG. Consistent with a genetic regulatory role, we show that NESPR interacts with proteins that are involved in genome stability and transcription machinery.