Metagenomic profiling from sequencing data aims to disentangle a microbial sample at lower ranks of taxonomy, such as species and strains. Deep taxonomic profiling involving accurate estimation of strain level abundances aids in precise quantification of the microbial composition, which plays a crucial role in various downstream analyses. Existing tools primarily focus on strain/subspecies identification and limit abundance estimation to the species level. Abundance quantification of the identified strains is challenging and remains largely unaddressed by the existing approaches. We propose a novel algorithm MAGE (Microbial Abundance GaugE), for accurately identifying constituent strains and quantifying strain level relative abundances. For accurate profiling, MAGE uses read mapping information and performs a novel local search-based profiling guided by a constrained optimization based on maximum likelihood estimation. Unlike the existing approaches that often rely on strain-specific markers and homology information for deep profiling, MAGE works solely with read mapping information, which is the set of target strains from the reference collection for each mapped read. As part of MAGE, we provide an alignment-free and kmer-based read mapper that uses a compact and comprehensive index constructed using FM-index and R-index. We use a variety of evaluation metrics for validating abundances estimation quality. We performed several experiments using a variety of datasets, and MAGE exhibited superior performance compared to the existing tools on a wide range of performance metrics.
Competing Interest Statement
The authors have declared no competing interest.
Read more here: Source link