LNCipedia

LNCipedia 5.2 screenshot

The human genome is pervasively transcribed, producing vast amounts of RNA transcripts, of which the majority does not encode protein 1. Long non-coding RNAs (lncRNAs) are typically defined as non-coding RNA transcripts longer than 200 nucleotides. First regarded as transcriptional noise, lncRNAs are now known to exhibit diverse functions through a wide array of mechanisms 2 3. In addition, deregulation of lncRNAs is associated with diseases including cancer 4.

The fundamental question of how many lncRNAs are embedded in the human genome has proven to be difficult to answer. While some studies report large transcriptomes containing over 90 000 lncRNAs 5, more conservative resources such as GENCODE annotate only 16 000 lncRNAs 6. The challenges in the annotation of lncRNAs have led to the creation of several specialised lncRNAs databases. Notable examples are LncRNAWiki, a wiki-based resource that combines computational and manual curation 7 8 and NONCODE, a lncRNA annotation database covering 17 species of which human and mouse have the highest number of annotations 9. Of note, lncRNA annotation is not limited to human or laboratory animal species. The domestic-animal lncRNA database ALDB for instance, stores pig, chicken and cow lncRNAs 10. And even though lncRNAs are often regarded as evolutionary new, also plant lncRNAs have been discovered and are catalogued in the PLncDB database 11. A recent valuable effort by the European Bioinformatics Institute (EMBL-EBI) aims to unite all non-coding RNA annotation databases into a single compendium RNAcentral 12.

While the advent of massively parallel RNA sequencing technologies drastically accelerated the identification of novel lncRNAs, functional annotation is lagging behind. In addition, the lack of official gene names for many lncRNAs makes it increasingly difficult to keep track of what is currently known of a particular. Several research groups have therefore turned to manual literature curation to annotate lncRNA with functional evidence or aberrant expression in disease contexts. Notable examples of such datasets are Lnc2Cancer 13, LncRNADisease 14, the recently published pan-cancer lncRNA co-expression atlas LncMAP 15 and the Mammal ncRNA Disease Repository (MNDR) that stores 3213 mammalian lncRNAs associated with diseases 16. Despite these clear advances in lncRNA annotation, current resources are unfortunately still incomplete and plagued with inaccurate transcript and gene models, with import consequences for the lncRNA research field 17. In addition, the coding potential of numerous genes is still debated, and as such the fundamental differentiation between coding and non-coding RNA remains troublesome 18.

In 2012, we released LNCipedia, a database to collect human lncRNA sequences and annotation 19. Central to LNCipedia is the merging of redundant transcripts across the different data sources and grouping of the transcript into genes resulting in a highly consistent database. Through regular updates, LNCipedia offers a complete set of human lncRNAs without compromising the quality of the annotations. An example of this is the high-confidence gene set introduced in LNCipedia 3 20 as a subset of the database with lncRNAs that lack coding potential by any metric. Here, we describe the development and novel features of LNCipedia 5, the latest update of the database. Following the release of valuable resources such as FANTOM CAT 21, we expanded our database with new lncRNAs. In addition to several small improvements, we introduced an improved filtering pipeline and support for official HGNC gene names. Importantly, an extensive manual literature curation effort resulted in the annotation of 2 482 lncRNA publications, providing insights into functions of 1555 human lncRNAs.

References


  1. Djebali S., Davis C.A., Merkel A., Dobin A., Lassmann T., Mortazavi A., Tanzer A., Lagarde J., Lin W., Schlesinger F.et al. Landscape of transcription in human cells. Nature. 2012; 489:101–108. ↩︎

  2. Marchese F.P., Raimondi I., Huarte M. The multidimensional mechanisms of long noncoding RNA function. Genome Biol. 2017; 18:206. ↩︎

  3. Mattick J.S. The state of long non-coding RNA biology. Noncoding RNA. 2018; 4:17. ↩︎

  4. Huarte M. The emerging role of lncRNAs in cancer. Nat. Med. 2015; 21:1253–1261. ↩︎

  5. Iyer M.K., Niknafs Y.S., Malik R., Singhal U., Sahu A., Hosono Y., Barrette T.R., Prensner J.R., Evans J.R., Zhao S.et al. The landscape of long noncoding RNAs in the human transcriptome. Nat. Genet. 2015; 47:199–208. ↩︎

  6. Harrow J., Frankish A., Gonzalez J.M., Tapanari E., Diekhans M., Kokocinski F., Aken B.L., Barrell D., Zadissa A., Searle S.et al. GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res. 2012; 22:1760–1774. ↩︎

  7. Ma L., Li A., Zou D., Xu X., Xia L., Yu J., Bajic V.B., Zhang Z. LncRNAWiki: harnessing community knowledge in collaborative curation of human long non-coding RNAs. Nucleic Acids Res. 2015; 43:D187–D192. ↩︎

  8. Members B.I.G.D.C. Database resources of the BIG data center in 2018. Nucleic Acids Res. 2018; 46:D14–D20. ↩︎

  9. Fang S., Zhang L., Guo J., Niu Y., Wu Y., Li H., Zhao L., Li X., Teng X., Sun X.et al. NONCODEV5: a comprehensive annotation database for long non-coding RNAs. Nucleic Acids Res. 2018; 46:D308–D314. ↩︎

  10. Li A., Zhang J., Zhou Z., Wang L., Liu Y., Liu Y. ALDB: a domestic-animal long noncoding RNA database. PLoS One. 2015; 10:e0124003. ↩︎

  11. Jin J., Liu J., Wang H., Wong L., Chua N.H. PLncDB: plant long non-coding RNA database. Bioinformatics. 2013; 29:1068–1071. ↩︎

  12. Consortium T.R.N.A. RNAcentral: a comprehensive database of non-coding RNA sequences. Nucleic Acids Res. 2017; 45:D128–D134. ↩︎

  13. Ning S., Zhang J., Wang P., Zhi H., Wang J., Liu Y., Gao Y., Guo M., Yue M., Wang L.et al. Lnc2Cancer: a manually curated database of experimentally supported lncRNAs associated with various human cancers. Nucleic Acids Res. 2015; 44:D980–D985. ↩︎

  14. Chen G., Wang Z., Wang D., Qiu C., Liu M., Chen X., Zhang Q., Yan G., Cui Q. LncRNADisease: a database for long-non-coding RNA-associated diseases. Nucleic Acids Res. 2012; 41:D983–D986. ↩︎

  15. Li Y., Li L., Wang Z., Pan T., Sahni N., Jin X., Wang G., Li J., Zheng X., Zhang Y.et al. LncMAP: Pan-cancer atlas of long noncoding RNA-mediated transcriptional network perturbations. Nucleic Acids Res. 2018; 46:1113–1123. ↩︎

  16. Cui T., Zhang L., Huang Y., Yi Y., Tan P., Zhao Y., Hu Y., Xu L., Li E., Wang D. MNDR v2.0: an updated resource of ncRNA-disease associations in mammals. Nucleic Acids Res. 2018; 46:D371–D374. ↩︎

  17. Uszczynska-Ratajczak B., Lagarde J., Frankish A., Guigó R., Johnson R. Towards a complete map of the human long non-coding RNA transcriptome. Nat. Rev. Genet. 2018; 19:535–548. ↩︎

  18. Abascal F., Juan D., Jungreis I., Martinez L., Rigau M., Rodriguez J.M., Vazquez J., Tress M.L. Loose ends: almost one in five human genes still have unresolved coding status. Nucleic Acids Res. 2018; 46:7070–7084. ↩︎

  19. Volders P.J., Helsens K., Wang X., Menten B., Martens L., Gevaert K., Vandesompele J., Mestdagh P. LNCipedia: a database for annotated human lncRNA transcript sequences and structures. Nucleic Acids Res. 2013; 41:D246–D251. ↩︎

  20. Volders P.J., Verheggen K., Menschaert G., Vandepoele K., Martens L., Vandesompele J., Mestdagh P. An update on LNCipedia: a database for annotated human lncRNA sequences. Nucleic Acids Res. 2015; 43:D174–D180. ↩︎

  21. Hon C.C., Ramilowski J.A., Harshbarger J., Bertin N., Rackham O.J., Gough J., Denisenko E., Schmeier S., Poulsen T.M., Severin J.et al. An atlas of human long non-coding RNAs with accurate 5′ ends. Nature. 2017; 543:199–204. ↩︎

Pieter-Jan Volders
Pieter-Jan Volders
PostDoctoral Fellow

LncRNA aficionado working with transcriptomics and proteomics

Pieter Mestdagh
Pieter Mestdagh
Professor

Studying non-coding RNAs in cancer.

Jo Vandesompele
Jo Vandesompele
Professor

RNA addict trying to connect all the dots

Jasper Anckaert
Jasper Anckaert
Bioinformatician

The real Jasper