Metagenomic assembly, removing redundant contigs.

Metagenomic assembly, removing redundant contigs.


What is the purpose of the following excerpt from paper using metagenomic assemblies:

Redundancies of sequences from the same organism within the metagenome
were removed by clustering all contigs at 95% identity with CD-hit
v4.6.6 (72), and only the longest contig per cluster was kept

I understand what is being done but I don’t know why. I could see use if binning to reduce computation time perhaps but otherwise I am not sure? Would it able reduce annotation time or something similar?

The paper in question under the methods section “Metagenome sequencing and assembly.”:




Read more here: Source link