ncRNA | Free Full-Text | Common Features in lncRNA Annotation and Classification: A Survey

CONC 2006 SVM Eukaryotes (both protein-coding and non-coding
genes) peptide length, amino acid composition, predicted secondary
structure content, mean hydrophobicity, percentage of residues exposed to
solvent, sequence compositional entropy, number of homologues, alignment
entropy 10-fold CV on protein-coding: F1-score: 97.4% ☼ Precision: 97.1% ☼ Recall: 97.8% ◙ On non-coding: F1-score: 94.5% ☼ Precision: 95.2% ☼ Recall: 93.8% 12.4 CPC 2007 SVM Eukaryotes (both protein-coding and
non-coding genes) ORF features (quality, coverage, integrity), number
of BLASTX hits, hit score, frame score 10-fold CV: 95.77% ☼ Accuracy on Rfam database (non-coding): 98.62% ☼ RNADB (non-coding): 91.5% ☼ EMBL cds (protein-coding): 99.08% ◙ Accuracy in lncRNA detection: 76.2% 131.8 PORTRAIT 2009 SVM Species neutral, case study on Paracoccidioides brasiliensis and 5 other fungi ORF length, isoelectric point, hydropathy, compositional entropy Accuracy: 91.9% ☼ Specificity: 95% ☼ Sensitivity: 86.4% ☼ 10.2 CNCI 2013 SVM Vertebrates, plants, orangutan adjacent
nucleotide triplets, sequence score, codon-bias, most-like CDS (MLCDS),
length-percentage, score-distance 10-fold CV accuracy on human: 97.3% ◙ Minimum average error for vertebrates < 0.1 ☼ Plants: 0.24 111.3 CPAT 2013 Logistic regression human ORF length, ORF to
transcript length ratio, Fickett score, hexamer usage bias 10-fold accuracy: 99% ☼ Precision: 96% 135.6 iSeeRNA 2013 SVM human, mouse frequency of six k-mers
(GC, CT, TAG, TGT, ACG, TCG), conservation score, ORF length and
proportion Accuracy in human lncRNA detection: 96.1% ☼ Mouse: 94.2% ◙ Accuracy in human protein-coding gene detection: 94.7% ☼ Mouse: 92.7% 19.5 PLEK 2014 SVM 11 vertebrates k-mer frequency (for
k = [1,5]) 10-fold CV accuracy: 95.6% 50.5 lncRScan-SVM 2015 SVM human, mouse sum of lengths of exons, frequency of exons,
mean exon length, standard deviation of stop
codon frequency, txCdsPredict Two test sets created based on (i) random
protein-coding and lncRNA sequences
and (ii) only dissimilar sequences. Accuracy on set A for human: 91.54% ☼
Mouse: 92.21% ◙ On set B for human:
91.45% ☼ Mouse: 92.2% ◙ MCC on set
A for human: 83.17% ☼ Mouse: 84.59% ◙
On set B for human: 82.99% ☼ Mouse:
84.69% ◙ AUC on set A for human:
96.39% ☼ Mouse: 96.62% ◙ On set B for
human: 96.39% ☼ Mouse: 96.64% ◙ 13.2 LncRNA-ID 2015 Random forests human, mouse ORF related
features, ribosomal interaction related features, protein conservation
scores Specificity on human: 95.28% ☼ Mouse: 92.1% ◙ Recall on human: 96.28% ☼ Mouse: 94.45% ◙ Accuracy on human: 95.78% ☼ Mouse: 93.28% 12.7 COME 2016 Random forest human, mouse, nematode,
fruit fly, arabidopsis GC content, DNA sequence conservation, protein
conservation, polyA abundance, RNA secondary structure conservation, ORF
score, expression specificity score Accuracy: human (93.7%), arabidopsis (98.3%), mouse (89.8%), nematode (98.9%), fruit fly (98.4%) 16.2 DeepLNC 2016 Deep neural network human k-mer
combinations (for k = [2,5]) 10-fold CV accuracy: 98.07% ☼ MCC: 96% ☼ Recall: 98.98% ☼ Precision: 97.14% ☼ AUC: 99.3% 12.4 FEELnc 2017 Random forests human, mouse ORF features
(coverage, length), sequence length, coding potential score, k-mer
score based on frequency Accuracy for human: 91.9% ☼ Mouse: 93.9%
◙ Sensitivity for human: 92.3% ☼ Mouse: 93.8% ◙ Specificity for
human: 91.5% ☼ Mouse: 94.1% ◙ F score for human: 91.9% ☼ Mouse:
95.6% ◙ MCC for human: 83.8% ☼ Mouse: 85.6% 49.5 CPC2 2017 Random forest Species neutral, trained and
tested on animals and plants (both protein-coding and non-coding genes) ORF features (quality, coverage, integrity), Fickett score, isoelectric
point Accuracy: 96.1% ☼ Specificity: 97% ☼ Recall: 95.2% ◙ Accuracy in lncRNA detection: 94.2% 97.3 PlncPRO 2017 Random forest plants 64 k-mer frequencies, ORF coverage, ORF
score, BLASTX: hits, significance, total bit
score, frame entropy 13.8 lncRNAnet 2018 Convolutional neural network, recurrent
neural network human, mouse sequence, ORF features (length, coverage,
indicator) 5-fold accuracy: 99% ◙ Accuracy on human: 91.79% ☼ Mouse: 91.83% ◙ Specificity on human: 87.66% ☼ Mouse: 89.03% ◙ Sensitivity on human: 95.91% ☼ Mouse: 94.63% ◙ AUC on human: 96.72% ☼ Mouse: 96.67% Also available are test results on 11 different species and on experimental NGS data. 22 lncADeep 2018 Deep belief network human, mouse ORF
features (length, coverage, hexamer score of longest ORF, entropy density
profile), UTR coverage, GC content of UTRs, Fickett score, HMMER index Precision for lncRNA detection from full-length mRNA transcripts: 97.2% ☼ Recall: 98.1% ☼ Average harmonic mean: 97.7% ◙ Precision for lncRNA detection from both full and partial-length mRNA transcripts: 94.5% ☼ Recall: 93.8% ☼ Average harmonic mean: 94.2% ◙ Precision for lncRNA detection from partial-length mRNA transcripts: 90.3% ☼ Recall: 93.8% ☼ Average harmonic mean: 92% 22.6 LncFinder 2018 SVM Trained on human, tested on human,
mouse, wheat, zebrafish, chicken genomic distance to lncRNA, genomic
distance to protein-coding transcript, distance ratio, EIIP value 10-fold CV accuracy: 96.87% 17.6 BASiNET 2018 Decision tree on complex networks datasets
from PLEK and CPC2 average shortest path, average betweenness
centrality, average degree, assortativity, maximum degree, minimum
degree, clustering coefficient, motif frequency 8.6 CREMA 2018 Random forest human, mouse, rice, arabidopsis length, GC content, hexamer score, alignment
identity, ratio of alignment length and mRNA
length, ratio of alignment length and ORF
length, transposable elements, sequence divergence from transposable element, ORF length,
Ficket score 11 CNIT 2019 SVM 11 animal species, 26 plant species max_score of MLCDS, standard deviation of MLCDS scores and MLCDS
lengths, frequency of 64 codons Accuracy on human: 98% ☼ Mouse: 95% ☼ Zebrafish: 93% ☼ Fruit fly: 93% ☼ arabidopsis: 98% 20.5 PLIT 2019 Random forest plants: arabidopsis, soy bean,
rice, tomato, sorghum, vine grape, maize transcript
length, GC content, Ficket-score, hexamer score, maximum ORF length, ORF
coverage, mean ORF coverage, codon bias AUC: 93.3 % for everything except S. bicolor (75%) and arabidopsis (85%) 8.5 LGC 2019 Feature relationship human, mouse, zebrafish,
nematode, rice, tomato GC content, ORF length, coding potential score Accuracy, 10 fold cross-validation: human (94.5%), mouse (93.6%),
zebrafish (88.4%),
nematode (93.3%), tomato (93.3%), rice (96.3%) 12.5

Read more here: Source link