Tag: K-mer
A graph-based genome and pan-genome variation of the model plant Setaria
Variation and evolution in Setaria We collected genome-wide resequencing data for 630 wild (S. viridis), 829 landrace and 385 modern cultivated accessions from the Setaria genus with an average sequencing depth of ~15×, of which 1,004 were newly generated and 840 were from previous studies16,21 (Supplementary Table 1). After aligning…
A self-transmissible plasmid from a hyperthermophile that facilitates genetic modification of diverse Archaea
Lederberg, J., Cavalli, L. L. & Lederberg, E. M. Sex compatibility in Escherichia Coli. Genetics 37, 720–730 (1952). Article CAS PubMed PubMed Central Google Scholar Elisabeth, G., Günther, M. & Manuel, E. Conjugative plasmid transfer in gram-positive bacteria. Microbiol. Mol. Biol. Rev. 67, 277–301 (2003). Article Google Scholar de la…
Biomedicines | Free Full-Text | High-Accuracy ncRNA Function Prediction via Deep Learning Using Global and Local Sequence Information
1. Introduction In recent years, growing access to massive transcriptome sequencing technologies has led to the discovery of an increasing number of novel transcripts from various species. The majority of these transcripts result in non-coding ribonucleic acid (ncRNA) molecules, short sequences of RNA that, with the exception of a small…
Characterization of metagenome-assembled genomes from the International Space Station | Microbiome
Metagenome-assembled bacterial genomes Out of the 42 ISS metagenomes submitted at NCBI, only PMA-treated metagenomes (n = 21) representing the viable/intact cells were used for generating bacterial MAGs. Characteristics of MAGs (n = 46) such as genome size (2.6 to 6.6 Mb), completeness, contamination percentage, the average mean coverage, number…
Illumina Complete Long Reads software analysis workflow for human WGS
Introduction Next-generation sequencing (NGS) enables scientists to decipher the genome for a deeper understanding of biology. Proven Illumina sequencing by synthesis (SBS) chemistry combined with award-winning DRAGEN secondary analysis delivers whole-genome sequencing (WGS) data with outstanding accuracy.1,2 DRAGEN Multigenome (graph) further improves mapping accuracy in challenging regions by ~50%.1 Still,…
The inchworm process failed. Trinity running error.
The inchworm process failed. Trinity running error. 0 Hello everyone, I’m trying to perform a de novo transcriptome using Trinity and having many issues. The last time I got the inchworm error attached. ******************************************************************** ** Warning, Trinity cannot determine which version of Java is being used. Version 1.7 is required….
Chromosome-level genome assemblies from two sandalwood species provide insights into the evolution of the Santalales
Genome sequencing and assembly We sequenced and assembled genomes for the sandalwood species S. album and S. yasi (Fig. 1). In total, ~23 Gb and ~25 Gb of clean short reads of S. album and S. yasi were obtained for the genomic survey, respectively (Supplementary Tables 1 and 2). According to k-mer analysis, the…
Comparative genome features and secondary metabolite biosynthetic potential of Kutzneria chonburiensis and other species of the genus Kutzneria
Adamek, M., Spohn, M., Stegmann, E. & Ziemert, N. Mining bacterial genomes for secondary metabolite gene clusters. Methods Mol. Biol. 1520, 23–47 (2017). CAS PubMed Google Scholar Belknap, K. C., Park, C. J., Barth, B. M. & Andam, C. P. Genome mining of biosynthetic and chemotherapeutic gene clusters in Streptomyces…
CircSSNN: circRNA-binding site prediction via sequence self-attention neural networks with pre-normalization | BMC Bioinformatics
Datasets To verify the effectiveness of the CircSSNN, we adopted 37 circRNA datasets as benchmark datasets following the baselines we compared [15, 16]. We first downloaded the datasets from the circRNA interactome database (circinteractome.nia.nih.gov/). Subsequently, we obtained 335,976 positive samples and 335,976 negative samples following the process of iCircRBP-DHN [17]….
An unusual tandem kinase fusion protein confers leaf rust resistance in wheat
Plant material Bread wheat accessions Transfer (TA5524), WL711, TA5605, Ae. umbellulata accession TA1851 and Ae. triuncialis accession TA10438 were obtained from the Wheat Genetics Resource Center (WGRC). TcLr9 (Transfer/6*Thatcher) is a near-isogenic line carrying Lr9 from Transfer in the genetic background of the susceptible wheat line Thatcher. TcLr9 and TA5605…
A preliminary study of the use of MinION sequencing to specifically detect Shiga toxin-producing Escherichia coli in culture swipes containing multiple serovars of this species
Tarr, P. I., Gordon, C. A. & Chandler, W. L. Shiga toxin-producing Escherichia coli and haemolytic uremic syndrome. Lancet 365, 1073–1086 (2006). Google Scholar Koudelka, G. B., Arnold, J. W. & Chkraborty, D. Evolution of STEC virulence: Insights from the antipredator activities of shiga toxing-producing E. coli. Int. J. Med….
RPI-EDLCN: An Ensemble Deep Learning Framework Based on Capsule Network for ncRNA-Protein Interaction Prediction
Noncoding RNAs (ncRNAs) play crucial roles in many cellular life activities by interacting with proteins. Identification of ncRNA-protein interactions (ncRPIs) is key to understanding the function of ncRNAs. Although a number of computational methods for predicting ncRPIs have been developed, the problem of predicting ncRPIs remains challenging. It has always…
Hybrids of RNA viruses and viroid-like elements replicate in fungi
Ribozyme search of the Sequence Read Archive Observing that ribozymes are sufficiently short to be captured on a short sequence read (less than 100 nt), we reasoned it will be possible to screen large volumes of sequencing data to identify libraries potentially containing ribozyme agents. To this end we adapted…
A high-quality chromosomal-level genome assembly of Greater Scaup (Aythya marila)
Ethics statement All animal experimental procedures were approved by the Biomedical Ethics Committee of Qufu Normal University (approval number: 2022001). Sampling and sequening The experimental sample is a wounded male duck found during the wild bird survey in Jiangsu, China, which died unexpectedly during rescue. We dissected the sample and…
error in Genome Mepping by BWA tools in Linux
$ gmap_build -D:\btau8refflat.gtf Unknown option: D:btau8refflat.gtf -k flag not specified, so building main hash table with default 15-mers -j flag not specified, so building regional hash tables with default 6-mers gmap_build: Builds a gmap database for a genome to be used by GMAP or GSNAP. Part of GMAP package, version…
Introducing GPMeta: Ultrarapid GPU-accelerate | EurekAlert!
image: Runtime of GPMeta versus existing solutions view more Credit: BGI Genomics Metagenomic sequencing (mNGS) is a powerful diagnostic tool to detect causative pathogens in clinical microbiological testing. Rapid and accurate classification of metagenomic sequences is a critical procedure for pathogen identification in the dry-lab step of mNGS tests. However, this…
Phenotypic and Genetic Analysis of KPC-49
Introduction The worldwide dissemination of carbapenem-resistant Enterobacteriaceae (CRE), particularly carbapenem-resistant K. pneumoniae (CRKP), poses a significant risk to public health. CRKP can cause various infections, such as urinary tract infections, bloodstream infections, and pneumonia, leading to high morbidity and mortality.1 Prevention and control of K. pneumoniae infection are becoming more…
kallisto bootstrap / condo installation problem
kallisto bootstrap / condo installation problem 0 I have used kallisto in the past, but am now struggling to get it to work on a new computer (MacBook M1). When I download kallisto using brew, and try to run kallisto quant, I get an error not generating bootstraps ‘Warning: kallisto…
An apicomplexan parasite drives the collapse of the bay scallop population in New York
Lafferty, K. D., Porter, J. W. & Ford, S. E. Are diseases increasing in the ocean?. Ann. Rev. Ecol. Evol. Syst. 35, 31–54 (2004). Article Google Scholar Ward, J. R. & Lafferty, K. D. The elusive baseline of marine disease: Are diseases in ocean ecosystems increasing?. PLoS Biol. 2, 542–547…
Inference of phylogenetic trees directly from raw sequencing reads using Read2Tree
State-of-the-art phylogenomic pipelines require many steps, which can be both time consuming and error prone (Fig. 1a). With Read2Tree, we directly process raw sequencing reads and reconstruct sequence alignments for conventional tree inference methods (Fig. 1b and Supplementary Fig. 1). We start by aligning raw reads to nucleotide sequences derived…
Co-evolution of large inverted repeats and G-quadruplex DNA in fungal mitochondria may facilitate mitogenome stability: the case of Malassezia
Burger, G., Gray, M. W. & Lang, B. F. Mitochondrial genomes: Anything goes. Trends Genet. 19, 709–716 (2003). Article CAS PubMed Google Scholar Hawksworth, D. L. & Lücking, R. Fungal diversity revisited: 2.2 to 3.8 million species. Microbiol. Spectr. 5, 5–4 (2017). Article Google Scholar Theelen, B., Christinaki, A. C.,…
NGS: Sequence QC – Texas A&M HPRC
Back to Bioinformatics Main Menu Evaluation FastQC GCATemplates available: grace terra module spider FastQC After running FastQC via the command line, you can ssh to an HPRC cluster enabling X11 forwarding by using the -X option and view the images using the eog tool. From your desktop: ssh -X username@grace.hprc.tamu.edu From your FastQC working…
removing lines of code from a function?
I’m working on a project for a bioinformatics class. We are given various DNA strings and an integer k for the project. The project’s goal is to identify a K-mer motif that minimises the total of the hamming distances between the motif and each DNA strand. So, first, look at…
Comprehensive benchmark and architectural analysis of deep learning models for nanopore sequencing basecalling | Genome Biology
Benchmark setup We first developed a basecalling benchmarking framework enabling new and existing basecalling algorithms to be easily compared. Moreover, our benchmark facilitates the study of individual components of basecallers, as different combinations of basecaller components can readily be evaluated. The framework is divided into two main components: (i) standardized…
Super-pangenome analyses highlight genomic diversity and structural variation across wild and cultivated tomato species
Giovannoni, J. J. Genetic regulation of fruit development and ripening. Plant Cell 16, S170–S180 (2004). CAS PubMed PubMed Central Google Scholar Tieman, D. et al. A chemical genetic roadmap to improved tomato flavor. Science 355, 391–394 (2017). CAS PubMed Google Scholar Peralta, I. E., Spooner, D. M. & Knapp, S….
The Biostar Herald for Monday, April 03, 2023
The Biostar Herald publishes user submitted links of bioinformatics relevance. It aims to provide a summary of interesting and relevant information you may have missed. You too can submit links here. This edition of the Herald was brought to you by contribution from Istvan Albert, and was edited by Istvan…
poor classification using qiime2 – User Support
Good morning, I am experiencing some difficultie sto get results even if indeed my pipeline has not changed.In specific what I obtain is kind of poor classification: half of the sequences (very low number of OTU in addition (e.g 900) are just attributed to Bacteria or OD1. So I think…
Chromosome-level genome assembly of the critically endangered Baer’s pochard (Aythya baeri)
Ethics statement All animal handling and experimental procedures were approved by the Qufu Normal University Biomedical Ethics Committee (approval number: 2022001). Sample and sequencing Baer’s pochard tissue for whole-genome sequencing was obtained from a dead individual that had strayed into a fishing net in Shandong (China). The muscle tissue that…
Multi-faceted metagenomic analysis of spacecraft associated surfaces reveal planetary protection relevant microbial composition
. 2023 Mar 22;18(3):e0282428. doi: 10.1371/journal.pone.0282428. eCollection 2023. Sarah K Highlander 1 , Jason M Wood 2 , John D Gillece 1 3 , Megan Folkerts 1 , Viacheslav Fofanov 3 4 , Tara Furstenau 3 , Nitin K Singh 2 , Lisa Guan 2 , Arman Seuylemezian 2 , James N Benardini 2 , David M Engelthaler …
Preadapted to adapt: underpinnings of adaptive plasticity revealed by the downy brome genome
Bradley, B. A. et al. Cheatgrass (Bromus tectorum) distribution in the intermountain western United States and its relationship to fire frequency, seasonality, and ignitions. Biol. Invasions 20, 1493–1506 (2018). Article Google Scholar Balch, J. K., Bradley, B. A., D’Antonio, C. M. & Gomez-Dans, J. Introduced annual grass increases regional fire…
Functional metagenomics uncovers nitrile-hydrolysing enzymes in a coal metagenome
Introduction Cyanide-containing compounds are known as nitriles and are widely distributed in the natural environment. They are generated by different plants in various forms, such as ricinine, phenyl acetonitrile, cyanogenic glycosides, and β -cyanoalanine (Sewell et al., 2003). Anthropogenic activities have substantially influenced the production of vast quantities of nitrile…
Diagnostic Performance of mNGS in Detecting IAI
Introduction An intra-abdominal abscess is a collection of pus or infected fluid located inside or near the liver, kidneys, pancreas, spleen, or other abdominal organs.1 Unlike skin abscesses with obvious signs of redness and swelling,2 intra-abdominal abscesses occur less frequently and are often difficult to identify, of which patients may…
Jellyfish Output
Jellyfish Output 0 Hi, I used the command below to count the k-mer in a genome. jellyfish count -m 3 -s 5M -L 1 B_amyloliquefaciens_CIAD-IB72.fna -o 3mers_jellyfish_output/B_amyloliquefaciens_CIAD-IB72_3.jf –text and here is a part of the output that I get: I want my output to only include the k-mers and their…
QUAST Genome Assembly Quality Assessment
Genetic Variation studies often involve analyzing samples from a previously studied species. For instance, it is of interest to examine genomes of various cultivars, strains, or populations of the same species. In such cases, it may be necessary to perform de novo DNA-Seq assembly to obtain the genome of the…
implementation of k-mer counting from krakenuniq to kraken2
implementation of k-mer counting from krakenuniq to kraken2 1 Unless I am reading this wrong, authors say that KrakenUniq is only compatible with Kraken 1 databases, not Kraken 2. You may be able to choose a classifier using the advice on that page. Login before adding your answer. Traffic: 2220…
Hybrid de novo genome assembly and comparative genomics of three different isolates of Gnomoniopsis castaneae
Crous, P. et al. Fungal planet description sheets: 107–127. Pers. Mol. Phylogeny Evol. Fungi 28, 138–182. doi.org/10.3767/003158512X652633 (2012). Article CAS Google Scholar Visentin, I. et al. Gnomoniopsis castanea sp. nov. (Gnomoniaceae, Diaporthales) as the causal agent of nut rot in sweet chestnut. J. Plant Pathol. 94, 411–419. doi.org/10.4454/JPP.FA.2012.045 (2012). Article …
Detection of Streptococcus pyogenes M1UK in Australia and characterization of the mutation driving enhanced expression of superantigen SpeA
Walker, M. J. et al. Disease manifestations and pathogenic mechanisms of Group A Streptococcus. Clin. Microbiol. Rev. 27, 264–301 (2014). Article PubMed PubMed Central Google Scholar Carapetis, J. R., Steer, A. C., Mulholland, E. K. & Weber, M. The global burden of group A streptococcal diseases. Lancet Infect. Dis. 5,…
MMseqs error: Filter prefilter died
MMseqs error: Filter prefilter died 0 Hi all, I’m testing MMseqs to assign taxonomy, everything runs smoothly until the specific point of taxonomy assignation where I get: “Error: orf filter prefilter died”. Has anyone experienced this and knows a workaround? I posted this issue on MMseqs’ Github page but got…
A wheat kinase and immune receptor form host-specificity barriers against the blast fungus
Wheat blast, caused by Pyricularia oryzae (syn. Magnaporthe oryzae) pathotype Triticum was first identified in Brazil in 1985 (ref. 1). The pathogen subsequently spread to cause epidemics in other regions of Brazil and neighbouring countries, including Bolivia and Paraguay2. Outbreaks of wheat blast occurred in Bangladesh in 2016, and the…
K-mer sequencing vs sequence-alignment
K-mer sequencing vs sequence-alignment 2 I am a completely rookie on Bioinformatics, so please bear with me and use simple language (I am a computer scientist) 🙂 How can we use k-mers to find out if a gene is similar to our query string? For example: We have a reference…
Genetic mapping of microbial and host traits reveals production of immunomodulatory lipids by Akkermansia muciniphila in the murine gut
Animal studies Animal care and study protocols were approved by the AAALAC-accredited Institutional Animal Care and Use Committee of the College of Agricultural Life Sciences at the University of Wisconsin-Madison (UW-Madison). All experiments with mice were performed under protocols approved by the UW-Madison Animal Care and Use Committee (Protocol number…
comparing two metagenomics data sets
comparing two metagenomics data sets 2 Hello all, I have a shotgun metagenomic dataset (20 samples) paired-end reads. I want to compare my data to another dataset published and available online. I am confused as how can I do it. Please let me know if you have an idea. Thanks…
Draft genomes of Blastocystis subtypes from human samples of Colombia | Parasites & Vectors
Andersen LO, Bonde I, Nielsen HB, Stensvold CR. A retrospective metagenomics approach to studying Blastocystis. FEMS Microbiol Ecol. 2015. doi.org/10.1093/femsec/fiv072. Article Google Scholar Audebert C, Even G, Cian A, Loywick A, Merlin S, Viscogliosi E, et al. Colonization with the enteric protozoa Blastocystis is associated with increased diversity of human…
A chromosome-level genome assembly of Plantago ovata
Genome assembly and chromosome identification A Plantago ovata genome reference was generated by utilizing a total of 5.98 M (7 cells, 40.21 Gb, N50 = 10.45 Kb, 50 bp–121.17 Kb) PacBio long reads and 636.5 million (47.74 Gb) Hi-C short-reads. PacBio reads were used to assemble contigs, while Hi-C reads were used to achieve chromosome-level assembly. The final…
Annelid functional genomics reveal the origins of bilaterian life cycles
Hall, B. K. & Wake, M. H. in The Origin and Evolution of Larval Forms (eds Hall, B. K. & Wake, M. H.) 1–19 (Academic Press, 1999). Nielsen, C. Animal phylogeny in the light of the trochaea theory. Biol. J. Linn. Soc. 25, 243–299 (2008). Article Google Scholar Garstang, W….
Need help understanding reference transcriptome and where to download
Hello, Apologies for a pretty elementary question. I tried my best to answer it using resources online but I find many tutorials/explanations out there difficult to understand. I am trying to quantify human rnaseq data using salmon. The reason I am using salmon is because I would like to perform…
Determinants of associations between codon and amino acid usage patterns of microbial communities and the environment inferred based on a cross-biome metagenomic analysis
Data collection Metagenomic project information was collected from the MGnify metagenomic database31. Currently (September 2021), microbiome data (sequence, taxonomic, and functional information, etc.) of 325,323 environmental samples can be found in this database. Often, microbes from similar ecological communities have been studied by different groups at different times and locations….
In Silico Validation Of NcRNA-ncRNA Interaction Sites With NcRNAs Represented By K-mers Features
A recent catalogue of human transcriptome, namely CHESS database, assembled from RNA sequencing experiments as a part of the Genotype-Tissue Expression (GTEx) Project reported more non-coding RNA genes (21,856) than protein-coding (21,306), revealing an unexpectedly vast amount of transcriptional noise (Pertea et al, 2018). In this study, we introduce…
Genomic signatures associated with maintenance of genome stability and venom turnover in two parasitoid wasps
Genomic features of two Anastatus wasps, A. japonicus and A. fulloi We employed PacBio high-fidelity (HiFi) long-read sequencing and Illumina short-read sequencing technologies to generate high-quality contigs for two Anastatus wasps, A. japonicus and A. fulloi (Supplementary Tables 1 and 2). These contigs were further scaffolded using Hi-C libraries to…
Building a Simulated Metagenomic Dataset
Building a Simulated Metagenomic Dataset – HackMD Published Linked with GitHub — tags: ‘JPL: Genetic Inventory Project’ — # Building a Simulated Metagenomic Dataset Here we’ll create a simulated metagenomic datasets for controlled testing. This dataset was used to determine the Kraken 2 confidence score that best…
KGDCMI: A New Approach for Predicting circRNA-miRNA Interactions From Multi-Source Information Extraction and Deep Learning
doi: 10.3389/fgene.2022.958096. eCollection 2022. Affiliations Expand Affiliations 1 School of Information Engineering, Xijing University, Xi’an, China. 2 College of Grassland and Environment Sciences, Xinjiang Agricultural University, Urumqi, China. 3 School of Computer Science, Northwestern Polytechnical University, Xi’an, China. Item in Clipboard Xin-Fei Wang et al. Front Genet. 2022. Show details Display…
Mapping reads using kallisto – rna seq analysis
Mapping reads using kallisto – rna seq analysis 0 Hi, I’m trying to map reads to a reference genome using kallisto for rna seq analysis with terminal on mac and the following command keeps loading for hours and won’t run. I’m not exactly sure where I’ve gone wrong. kallisto index…
Mitogenome-wise codon usage pattern from comparative analysis of the first mitogenome of Blepharipa sp. (Muga uzifly) with other Oestroid flies
Outcome of DNA sequencing, assembly, and validation In this study, initially total DNA was isolated from the finely chopped, full-grown pupa of Blepharipa sp. The NanoDrop spectrophotometer (1294 ng/μl) and the Qubit fluorometer (732.8 ng/μl) both found that the concentration of total DNA in the sample at an optimum level for mitochondrial DNA enrichment. The Tape Station profile showed…
Reference-based alignment using MUSKET
Reference-based alignment using MUSKET 1 I’m running MUSKET on my dataset trimmed_data.tar.gz using 1000 threads, 2000 threads, and 4000 threads on a HPC. I’ve been unable to obtain any results because the software seems to be running for a long time. ./../musket-1.1/musket -k 90 600000000 -p 1000 -zlib 9 -ino…
CRISPR-VAE: A Method for Explaining CRISPR/Cas12a Predictions, and an Efficiency-aware gRNA Sequence Generator
Abstract Deep learning has shown great promise in the prediction of the gRNA efficiency, which helps optimize the engineered gRNAs, and thus has greatly improved the usage of CRISPR-Cas systems in genome editing. However, the black box prediction of deep learning methods does not provide adequate explanation to the factors…
Pangenome-based genome inference allows efficient and accurate genotyping across a wide spectrum of variant classes
Sequencing data We used publicly available sequencing data from the GIAB consortium45, 1000 Genomes Project high-coverage data46 and Human Genome Structural Variation Consortium (HGSVC)4. All datasets include only samples consented for public dissemination of the full genomes. Statistics and reproducibility For generating the assemblies, we used all 14 samples for…
SAWRPI: A Stacking Ensemble Framework With Adaptive Weight for Predicting ncRNA-Protein Interactions Using Sequence Information
doi: 10.3389/fgene.2022.839540. eCollection 2022. Affiliations Expand Affiliations 1 School of Information Engineering, Xijing University, Xi’an, China. 2 School of Computer Science, Northwestern Polytechnical University, Xi’an, China. Item in Clipboard Zhong-Hao Ren et al. Front Genet. 2022. Show details Display options Display options Format AbstractPubMedPMID doi: 10.3389/fgene.2022.839540. eCollection 2022. Affiliations 1 School…
Should I trim adapter sequences and filter by phred score, before alignment by salmon? : bioinformatics
First, trimming adapters is definitely necessary as they are essentially a form of contamination. For quality trimming and filtering I would highly recommend reading the following: Trimming of sequence reads alters RNA-Seq gene expression estimates Essentially they show that aggressive trimming is a problem. To quote from the Conclusions: The…
Frontiers | Machine Learning and Deep Learning Applications in Metagenomic Taxonomy and Functional Annotation
Introduction The study of the microbial environments has benefited from the sequencing revolution, where technology improvement decreased the DNA sequencing cost and increased the number of sequenced nucleic bases. For approximately 20 years (depending on how we define the term metagenomics), it has allowed the decryption of the microbial composition…
Role of mobile genetic elements in the global dissemination of the carbapenem resistance gene blaNDM
Wu, W. et al. NDM metallo-β-lactamases and their bacterial producers in health care settings. Clin. Microbiol. Rev. 32, e00115–18 (2019). Yong, D. et al. Characterization of a new metallo-β-lactamase gene, bla NDM-1, and a novel erythromycin esterase gene carried on a unique genetic structure in Klebsiella pneumoniae sequence type 14…
Using AnnoTree to Get More Assignments, Faster, in DIAMOND+MEGAN Microbiome Analysis
INTRODUCTION Next-generation sequencing (NGS) has revolutionized many areas of biological research (1, 2), providing ever-more data at an ever-decreasing cost. One such area is microbiome research, the study of microbes in their theater of activity using metagenomic sequencing (3). Here, deep short-read sequencing, and improving performance of long-read sequencing, are…
Petabase-scale sequence alignment catalyses viral discovery
Serratus alignment architecture Serratus (v0.3.0) (github.com/ababaian/serratus) is an open-source cloud-infrastructure designed for ultra-high-throughput sequence alignment against a query sequence or pangenome (Extended Data Fig. 1). Serratus compute costs are dependent on search parameters (expanded discussion available: github.com/ababaian/serratus/wiki/pangenome_design). The nucleotide vertebrate viral pangenome search (bowtie2, database size: 79.8 MB) reached processing rates…
A Fast, Memory-Efficient, and Accurate Mechanism to Find Fuzzy Seed Matches
BLEND is a mechanism that can efficiently find fuzzy seed matches between sequences to significantly improve the performance and accuracy while reducing the memory space usage of two important applications: 1) finding overlapping reads and 2) read mapping. Finding fuzzy seed matches enable BLEND to find both 1) exact-matching seeds…
ncRNA | Free Full-Text | Common Features in lncRNA Annotation and Classification: A Survey
CONC 2006 SVM Eukaryotes (both protein-coding and non-coding genes) peptide length, amino acid composition, predicted secondary structure content, mean hydrophobicity, percentage of residues exposed to solvent, sequence compositional entropy, number of homologues, alignment entropy 10-fold CV on protein-coding: F1-score: 97.4% ☼ Precision: 97.1% ☼ Recall: 97.8% ◙ On non-coding: F1-score:…
Intepreting kmer spectrum
Intepreting kmer spectrum 0 Could someone provide an intuitive description of how to inerpret a k-mer spectrum? I understand that the plot shows how many kmers appear a certain number of times, but could someone describe to me what valuable information we can gain from visualizing kmer counts this way?…
Assembling all transcripts for an individual gene? (using single sequence to seed the assembly)
Assembling all transcripts for an individual gene? (using single sequence to seed the assembly) 0 Let’s say I have a candidate gene and I believe that in an individual sample, the genome sequence differs from the reference which then interferes with alignment. Is there a way for me to do…
DNA Sequence Classification Based on Milvus
Introduction DNA sequencing is a popular concept in both academic research and practical applications, such as gene traceability, species identification, and disease diagnosis. Whereas all industries starve for a more intelligent and efficient research method, artificial intelligence has attracted much attention, especially from the biological and medical domains. More and…
Where can I get ?or how can I make a mappability track for hg38 assembly
Where can I get ?or how can I make a mappability track for hg38 assembly 2 Lucky you @manojmumar_bhosale I worked on similar problem recently and therefore have the bash script you can use. Required tools: GEM libary from here UCSC’s wigToBigWig from here (I chose binary for Linux 64…
a k-mer counter in Rust using the rust-bio and rayon crates
Tool:krust: a k-mer counter in Rust using the rust-bio and rayon crates 0 I hope this isn’t inappropriate as a Share a Tool post, it’s more about getting feedback on and seeing if anybody here is interested in this project: github.com/suchapalaver/krust It’s a k-mer counter in written in Rust. I’ve…
Prevalence and Molecular Characteristics Based on Whole Genome Sequenc
Introduction Tuberculosis, caused by Mycobacterium tuberculosis, remains one of the top 10 causes of death worldwide and the leading cause of death from a single infectious agent (ranking above HIV/AIDS).1 In 2020, World Health Organization (WHO) reported that 7.1 million people with tuberculosis were newly diagnosed and notified in 2019,…
The Biostar Herald for Tuesday, August 17, 2021
The Biostar Herald publishes user submitted links of bioinformatics relevance. It aims to provide a summary of interesting and relevant information you may have missed. You too can submit links here. This edition of the Herald was brought to you by contribution from Istvan Albert, and was edited by lakhujanivijay,…
k-mer counters – presence/absence matrix
k-mer counters – presence/absence matrix 2 Hi lizabe, You’re right that this tutorial is out of date. The –matrix option is no longer valid as an option to jellyfish count. However, I don’t think it’s original intent was to do what you wanted anyway. It doesn’t write out a binary…