1. Dataset and processing of NGS data
1.1 Dataset
1.2 Data Processing
2. Search Menu
2.1 How-to search
2.2 Search result: Brief info.
2.3 Search result: Global Expr.
2.4 Search result: Diff. Expr.
2.5 Search result: Protein Int.
3. Coexpression & Function Menu
3.1 Step1. Search.
3.2 Step2. Expression Heatmap.
3.3 Step3. Gene set analysis.
4. Data Browser & Diff. Expr. Menu
4.1 Step1. Search.
4.2 Step2. Dataset Browser.
4.3 Step3. Differential Expression.
5. Protein Binding Menu
5.1 Step1. Search.
5.2 Step2. Heatmap.
5.3 Step3. Sample information.
6. Cons. & Corr. lncRNA Menu
6.1 Step1. Search.
6.2 Step2. Dataset.
6.3 Step3. Coexpression.


1. Dataset and processing of NGS data
1.1 Dataset
    Annotation of lncRNAtor follows ENSEMBL for 6 organisms; human, mouse, zebrafish, fly, worm, and yeast. Our annotation is encompassing lncRNA biotype in ENSEMBL, and also additionally including HGNC, MGI, lncRNAdb. For example HOTAIR is famous lncRNAs, but ENSEMBL is providing the type is antisense. Our database samples are collected from public NGS datasets from GEO, ENCODE, and TCGA. Especially lncRNAtor TCGA dataset is RNA-Seq level2 downloaded May 2013. lncRNAtor supports 2 NGS platforms; RNA-Seq and CLIP-Seq. In our system CLIP-Seq means all kind of protein binding NGS techniques to be called RIP-Seq or Par-CLIP. Statistics and data counts are provided in the page of Document > Statistics.
1.2 Data Processing
    RNA-Seq data processing was sequentially performed by raw data quality check, quality filter, alignment, quantification, and DEG test. In a case of TCGA, we performed DEG test using level3 read count. (Fig. 1)
    CLIP-Seq data processing resembled RNA-Seq until alignment. Additionally we found protein binding peaks using RIP-Seeker, and identified lncRNA-protein interactions. If you wonder dataset and sample count, please visit documentation statistics menu.
 
 
Figure 1. Flowchart of NGS processing

2. Search Menu
2.1 How-to search
    Input keyword of lncRNAtor search bar. The keyword is allowed official gene name, alias, or ENSEMBL id. Please be careful to use our auto complete function. After typing keyword, select one ID of drop down box that you want.

    Search result is divided by tab.
2.2 Search result: Brief info.
  1. Sequence annotation information: basic annotation originated from ENSEMBL
    A. icon links to browse gene structure and multiple alignment and conservation score status.
  2. Coding potential: ORF prediction, homology, and coding potential prediction
  3. Conservation scores: genomic structure based conservation scores.
2.3 Search result: Global Expr.
    Browse global lncRNA expression of human and mouse. Human is shown of TCGA dataset for various normal/tumor samples, and mouse is various tissues of ENCODE Project.
2.4 Search result: Diff. Expr.
    Total RNA-Seq datasets are listed what include significantly differential expression of search lncRNA. User can briefly recognize lncRNA expression condition. Titles are linked to ‘Data Browser and Diff. Expr’ menu, and icons are linked to ‘Coexpression & Function’.
2.5 Search result: Protein Int.
    Total CLIP-Seq datasets are listed what include significantly binding lncRNA with specific protein.

3. Coexpression & Function Menu
3.1 Step1. Search.
  1. Select dataset and ID type.
  2. Input search keyword. (lncRNA: gene name or ENSEMBL ID, protein coding gene: ENSEMBL ID).
  3. Click ID of drop box.
3.2 Step2. Expression Heatmap.
  1. Download coexpressed analysis CSV file.
  2. Filter out isolated genes using Reactome PPI interaction.
  3. Search input gene name.
  4. Click arrow of column name 'Corr' & 'P-value', then data is sorted by ascending or descending order.
3.3 Step3. Gene set analysis.
  1. Test gene set apply Reactome PPI filter option.
  2. Select gene set size from top 100 to 500.
  3. Select gene set type: GO BP, GO CC, GO MF, and KEGG
  4. Sort analysis result for each field.
  5. Input keyword to search specific gene set term.
 

4. Data Browser & Diff. Expr. Menu
4.1 Step1. Search.
  1. Search dataset to match with input keyword.
4.2 Step2. Dataset Browser.
  1. Browse and select dataset on tree category.
  2. Select dataset selected on step1.
4.3 Step3. Differential Expression.
  1. Download differential expression CSV file.
  2. 3. 4. & 5. Reselect p-value scale and gene type to browse expression, and click reset button. Input gene name to search on the dataset heatmap on (4), and then click reset or enter.
  1. Click icons, then move on the menu and browse coexpressed protein coding gene set of lncRNA.
  2. Sample groups and ID information.
 

5. Protein Binding Menu
5.1 Step1. Search.
  1. Select organism, RNA binding protein types, and protein names.
5.2 Step2. Heatmap.
  1. Download protein-binding assay p-value CSV file.
  2. Search gene name.
  3. Sort row by detected gene count.
5.3 Step3. Sample information.
  1. Sample information of heatmap columns.
 

6. Cons. & Corr. lncRNA Menu
6.1 Step1. Search.
  1. Select organism.
  2. Input interesting gene name.
6.2 Step2. Dataset.
  1. Select interesting gene.
6.3 Step3. Coexpression.
  1. Browse tissue-specific coexpression between human and mouse.
 


1.1 lncRNA count for each DB source
Organisms ENSEMBL
Annotation
version
lncRNAtor ENSEMBL Ver. 70 Organism
Specific
DB
lncRNAdb
Gene Transcript Gene Transcript Gene Gene
Homo sapiens GRCh37 version 70 14051 24195 13238 23858 1575
(HGNC)
124
Mus musculus GRCm28 version 70 4030 5860 3914 5791 1328
(MGI)
93
Danio rerio DR ZV9 version 72 1666 2561 1666 2561 7
Drosophila melanogaster BDGP5 version 70 501 657 538
(ncRNA)
692 5
Caenorhabditis elegans WBcel215 version 70 1312 1317 22760
(ncRNA)
22765 1
Saccharomyces cerevisiae EP4 version 70 15 15 15
(ncRNA)
15
♦ 3 organisms (Drosophila M, Caenorhabditis E, Saccharomyces C) is provided to ncRNA instead of lncRNA. lncRNAtor filted out by sequence
   length < 200.

2.1 Data collection count
Sample
(dataset)
GEO ENCODE TCGA Total
RNA-Seq CLIP-Seq RNA-Seq CLIP-Seq RNA-Seq
Homo sapiens 68 136 0 0 4523 4714
(8) (23) (0) (0) (133) (164)
Mus musculus 4 92 142 0 0 199
(1) (8) (3) (0) (0) (12)
Drosophila melanogaster 0 4 129 60 0 190
(0) (1) (54) (1) (0) (56)
Caenorhabditis elegans 0 4 117 0 0 121
(0) (1) (8) (0) (0) (9)
Saccharomyces cerevisiae 0 23 0 0 0 13
(0) (2) (0) (0) (0) (2)
Total 72 259 388 60 4523 5237
(9) (35) (65) (1) (133) (243)

2.2 TCGA cancer types and samples
No. of Sample By Cancer Types Acute Myeloid Leukemia LAML 167
Adrenocortical carcinoma ACC
Bladder Urothelial Carcinoma BLCA 45
Brain Lower Grade Glioma LGG 220
Breast invasive carcinoma BRCA 894
Cervical squamous cell carcinoma and endocervical adenocarcinoma CESC 39
Colon adenocarcinoma COAD 190
Esophageal carcinoma ESCA
Glioblastoma multiforme GBM 169
Head and Neck squamous cell carcinoma HNSC 341
Kidney Chromophobe KICH 91
Kidney renal clear cell carcinoma KIRC 552
Kidney renal papillary cell carcinoma KIRP 101
Liver hepatocellular carcinoma LIHC 65
Lung adenocarcinoma LUAD 291
Lung squamous cell carcinoma LUSC 391
Lymphoid Neoplasm Diffuse Large B-cell Lymphoma DLBL
Mesothelioma MESO
Ovarian serous cystadenocarcinoma OV 266
Pancreatic adenocarcinoma PAAD 41
Pheochromocytoma and Paraganglioma PCPG
Prostate adenocarcinoma PRAD 220
Rectum adenocarcinoma READ 71
Sarcoma SARC
Skin Cutaneous Melanoma SKCM 284
Stomach adenocarcinoma STAD
Thyroid carcinoma THCA 93
Uterine Carcinosarcoma UCS
Uterine Corpus Endometrioid Carcinoma UCEC 314
Total 4845
♦ TCGA RNA-Seq level 2 dataset was downloaded 3rd May 2013.

3.1 lncRNA and binding protein count
Ensembl Gene ID Gene Binding proteins
ENSG00000225840RN18S139DGCR8,RTCB,Fip1,CPSF100,hnRNPA2/B1,FMR1,CPSF6,LIN28A,IGF2BP2,CPSF7,CPSF30,hnRNPH,PTB,LIN28B,IGF2BP3,hnRNPU,ZC3H7B,HuR,LIN28,CCNT1,hnRNPA1,QKI,hnRNPM,TNRC6A,NUDT21,CstF64tau,ALKBH5,TNRC6C,AGO2,AGO3,ELG,CPSF73,TNRC6B,CstF64,MOV10,hnRNPF,AGO4,Pumilio2,PAPD5
ENSG00000215417MIR17HG27hnRNPF,RTCB,CstF64,TNRC6A,CPSF160,FMR1,IGF2BP3,hnRNPU,HuR,PTB,NUDT21,hnRNPA2/B1,AGO4,DGCR8,LIN28B,eIF4AIII,TNRC6B,AGO2,IGF2BP2,AGO1,CstF64tau,IGF2BP1,Fip1,CPSF73,ALKBH5,hnRNPA1,TNRC6C
ENSG00000229807XIST25hnRNPU,CPSF160,DGCR8,hnRNPF,PTB,CCNT1,MOV10,ALKBH5,RTCB,CstF64tau,hnRNPA2/B1,FMR1,hnRNPA1,ZC3H7B,NUDT21,hnRNPM,CPSF100,Fip1,CPSF7,CPSF73,eIF4AIII,AGO2,LIN28,HuR,CPSF6
ENSG00000178977LINC0032423hnRNPU,PTB,DGCR8,hnRNPF,hnRNPM,AGO2,ELG,CstF64tau,hnRNPA2/B1,hnRNPA1,MOV10,ALKBH5,NUDT21,HuR,RTCB,Fip1,AGO4,CPSF6,AGO3,CPSF160,FMR1,eIF4AIII,CPSF73
ENSG00000231607DLEU223CstF64,HuR,hnRNPF,IGF2BP1,TNRC6C,CPSF100,AGO2,hnRNPU,hnRNPA2/B1,AGO4,DGCR8,Pumilio2,eIF4AIII,TNRC6A,hnRNPA1,PAPD5,IGF2BP2,MOV10,CPSF7,CPSF73,PTB,Fip1,CstF64tau
ENSG00000225733AC090937.222hnRNPU,hnRNPF,PAPD5,hnRNPH,NUDT21,HuR,IGF2BP1,FMR1,hnRNPA2/B1,hnRNPA1,AGO2,CstF64tau,CPSF100,Fip1,eIF4AIII,CPSF30,RTCB,CPSF73,ALKBH5,CCNT1,PTB,MOV10
ENSG00000015479MATR321Fip1,hnRNPA1,hnRNPA2/B1,eIF4AIII,hnRNPM,FMR1,LIN28B,hnRNPU,CPSF6,HuR,AGO2,CPSF100,MOV10,PTB,IGF2BP1,CPSF7,hnRNPF,CPSF73,DGCR8,NUDT21,IGF2BP3
ENSG00000232656IDI2-AS121IGF2BP2,PTB,NUDT21,hnRNPU,hnRNPA2/B1,Fip1,FMR1,IGF2BP3,hnRNPM,hnRNPF,CstF64tau,MOV10,CPSF73,HuR,eIF4AIII,AGO2,CCNT1,hnRNPA1,IGF2BP1,CPSF6,CPSF7
ENSG00000236756C10orf10321MOV10,CCNT1,hnRNPF,CPSF73,CPSF6,eIF4AIII,PTB,HuR,CstF64tau,hnRNPU,NUDT21,hnRNPM,AGO2,Fip1,CPSF30,RBM4,hnRNPA1,IGF2BP2,hnRNPA2/B1,CPSF7,LIN28B
ENSG00000258441CTD-2552B11.421CPSF30,IGF2BP1,AGO1,ALKBH5,PTB,RTCB,hnRNPF,AGO2,LIN28B,CstF64tau,hnRNPA2/B1,CPSF73,MOV10,hnRNPU,Fip1,eIF4AIII,HuR,hnRNPA1,CPSF6,FMR1,NUDT21
ENSG00000224078SNHG1420PAPD5,HIF2a,eIF4AIII,LIN28,hnRNPF,FMR1,AGO2,NUDT21,hnRNPU,ALKBH5,hnRNPA1,CstF64tau,CPSF73,hnRNPA2/B1,LIN28B,HuR,Fip1,CCNT1,MOV10,DGCR8
ENSG00000225578NCBP2-AS120hnRNPF,HuR,Fip1,IGF2BP1,AGO2,FMR1,PAPD5,PTB,hnRNPA2/B1,ALKBH5,CPSF30,CstF64tau,hnRNPA1,CPSF73,NUDT21,CPSF100,hnRNPU,MOV10,eIF4AIII,hnRNPH
ENSG00000235954TTC28-AS120Fip1,AGO2,eIF4AIII,PTB,hnRNPU,CstF64tau,PAPD5,HuR,HIF2a,NUDT21,DGCR8,hnRNPF,FMR1,hnRNPA1,LIN28B,hnRNPM,hnRNPA2/B1,MOV10,CPSF7,CPSF73
ENSG00000236901MIR600HG20eIF4AIII,DGCR8,Fip1,NUDT21,PAPD5,CPSF160,hnRNPA1,CstF64,FMR1,hnRNPM,AGO2,hnRNPH,CstF64tau,MOV10,PTB,hnRNPA2/B1,hnRNPF,hnRNPU,CPSF73,ALKBH5
ENSG00000240498CDKN2B-AS20FMR1,hnRNPH,hnRNPM,LIN28B,ALKBH5,hnRNPA1,CstF64tau,Fip1,PAPD5,HuR,hnRNPU,CPSF73,RTCB,MOV10,hnRNPA2/B1,PTB,DGCR8,NUDT21,hnRNPF,AGO2
ENSG00000244124ATP1B3-AS120MOV10,AGO2,hnRNPH,Fip1,NUDT21,PTB,hnRNPU,hnRNPA2/B1,RTCB,eIF4AIII,CstF64tau,PAPD5,HuR,CPSF73,ALKBH5,FMR1,hnRNPA1,CPSF30,CPSF100,hnRNPF
ENSG00000245532NEAT120hnRNPU,eIF4AIII,CstF64tau,hnRNPH,PTB,hnRNPF,AGO2,CPSF7,AGO3,CstF64,hnRNPA2/B1,MOV10,hnRNPA1,AGO4,LIN28B,NUDT21,HuR,Fip1,hnRNPM,CPSF73
FBgn0052252CR3225220mub,Rbp1,eIF3-S4,RnpS1,Srp54,msi,tra2,Cbp20,x16,sqd,elav,Cnot4,Prp6,RpS3,Rm62,ps,pMK33,Prp5,Hrb87F,snRNP-U1-70K
ENSG00000132204LINC0047019Fip1,PTB,AGO2,PAPD5,hnRNPF,CPSF100,hnRNPM,FMR1,RTCB,CPSF6,CCNT1,MOV10,hnRNPU,hnRNPA2/B1,hnRNPA1,CPSF7,CPSF30,eIF4AIII,NUDT21
ENSG00000223882ABCC5-AS119CstF64tau,ALKBH5,hnRNPA1,CPSF100,hnRNPH,Fip1,CPSF6,MOV10,CPSF30,AGO2,PAPD5,CPSF73,NUDT21,FMR1,eIF4AIII,hnRNPU,PTB,HuR,hnRNPA2/B1
ENSG00000231194FARP1-AS119CPSF100,IGF2BP1,PTB,LIN28B,eIF4AIII,Fip1,CPSF73,PAPD5,FMR1,HuR,CstF64,hnRNPA2/B1,NUDT21,CCNT1,AGO2,hnRNPA1,hnRNPF,hnRNPU,MOV10

3.2 Protein types and sample counts. Dataset was collected from ENCODE and GEO CLIP-Seq.
Organism Types Proteins Samples Proteins
Homo sapiens binding 10 11 HIF2a, IGF2BP1, IGF2BP2, IGF2BP3, Pumilio2, QKI, RBM4, TNRC6A, TNRC6B, TNRC6C
disease 1 6 FMR1
epigenetic 1 1 MOV10
microRNA 12 31 AGO1, AGO2, AGO3, AGO4, DGCR8, LIN28A, LIN28B, Pumilio2, QKI, TNRC6A, TNRC6B, TNRC6C
misc 4 4 ALKBH5, C17orf85, C22orf28, ZC3H7B
transcription 21 71 CCNT1, CFIm25, CFIm59, CFIm68, CPSF100, CPSF160, CPSF30, CPSF73, CstF64, CstF64tau,
eIF4AIII, Fip1, hnRNPA1, hnRNPA2/B1, hnRNPF, hnRNPH, hnRNPM, hnRNPU, HuR, PAPD5, PTB
Mus musculus binding 1 2 CIRBP
disease 2 4 FUS/TLS, TDP43
epigenetic 2 8 Ezh2, IgG
microRNA 2 33 Ago2, LIN28A
transcription 2 6 CstF64, Mbnl1
Drosophila melanogaster binding 29 53 B52, Cbp20, CG17838, CG6227, CG6841, CG8636, Cnot4, elav, Fmr1, Hrb87F, msi, mub, pMK33,
ps, qkr54B, Rbp1, Rm62, RnpS1, Rox8, RpS3, SF2, Smn, snRNP-U1-70K, sqd, Srp54, tra2, U2af50,
Upf1, x16
transcription 1 4 UAP56
Caenorhabditis elegans translation 1 4 GLD1
Saccharomyces cerevisiae binding 1 2 Puf3
transcription 2 2 Nrd1 , Puf3, Rpb2

4.1 Conserved lncRNA and expression correlation status
Dataset Organism Gene Expr. Gene Cor > 0.5 lncRNA DB
Mammalian organs Gallus
gallus
10 1 0 0
Macaca mulatta 154 53 20 0
Monodelphis domestica 11 4 0 0
Mus musculus 217 159 72 3
Ornithorhynchus anatinus 3 1 0 0
Pan
troglodytes
175 35 12 0
Mouse ENCODE Mus
musculus
223 207 76 2
♦ Expr. Gene: Expressed gene
♦ Cor > 0.5: correlation > 0.5
♦ lncRNA: Overlapped with 35 genes of lncRNA DB conserved in Human

5.1 Summary of datasets used for examining expression distribution.
ID Organism Source & ID Sample size Title
1HumanTCGA BLCA45Bladder Normal vs Tumor
2HumanTCGA KICH91Kidney chromophobe Normal vs Tumor
3HumanGSE2226030Prostate normal and cancer
4HumanGSE3242412Identification of a Novel Angiogenesis and Tumor Suppressor Gene Rab25
in Esophageal Squamous Cell Carcinoma
5HumanGSE2915511LNCaP vs PrEC
6MouseENCODE Mouse CSHL94Mouse CSHL Various Tissues
7WormENCODE C. elegans N2 stage48C. elegans N2 Early Embryo Developmental Stage
8FruitflyENCODE polyA9D. Melanogaster PolyA Developmental Stage
9ZebrafishGSE3060312Sanger Zebrafish Sequencing

5.2 Statistics of expressed transcripts among lncRNA and protein coding genes.
ID. Dataset Annotated
Transcripts
lncRNA Protein coding
lncRNA Protein
coding
FPKM > 0 FPKM > 1 FPKM > 0 FPKM > 1
1. Human TCGA BLCA241548226217565842419015741
2. Human TCGA KICH 18906482460315669
3. Human GSE22260 251322482302518298
4. Human GSE32424 131010481893813864
5. Human GSE29155 118510671608812937
6. Mouse ENCODE57914688421354143696416286
7. Worm ENCODE13173123481792050017052
8. Fruitfly ENCODE65526950292123601531
9. Zebrafish GSE3060325614617615526843011015422
♦ The number of expressed transcripts is the average value over all samples in each dataset.

5.3 Number of coexpressed protein coding genes for 5 representative lncRNAs.
Dataset H19 MALAT1 NEAT1 CYFIP1 GASS
1) 2) C R C R C R C R
1. Human TCGA BLCA (Bladder urothelial carcinoma: Normal vs Tumor)40115221714168318619
2. Human TCGA KICH (Kidney chromophobe: Normal vs Tumor)5137931249435606899731367
3. Human GSE22260 (prostate cancer vs normal)163334698917528175305414
4. Human GSE32424 (Identification of a Novel Angiogenesis
    and Tumor Suppressor Gene Rab25 in Esophageal Squamous Cell Carcinoma)
761310115741215030549
6. Mouse ENCODE (Mouse CSHL Various Tissues)20851444685581065048748479
1) Numbers in the C column indicate the gene count of correlation p-value < 0.05.
2) Numbers in the R column indicate the gene count that pass the REACTOME filtering condition.

5.4 Expression distribution and cut-off

5.5 Protein-coding gene vs lncRNA
Designed by Free CSS Templates
. . .
TOP