UniFrac and Phylogenetic Methods with Full-length 16S-ITS-23S Sequences – General Discussion

I have a question that I’m guessing most here haven’t had to deal with. It concerns how to handle the analysis for sequencing data that contains the ITS sequence between the 16S and 23S rRNA genes (the full rRNA operon).

Because most bacteria have multiple copies of the rRNA operon, and each copy can differ from one another, sequencing beyond just the 16S rRNA gene can give more granular information on the bacteria that are present. Here, the ITS region can be helpful in distinguishing different strains within the same species, as it can be highly variable. The ITS region can also encode things like tRNAs. However, not all copies of the rRNA operon will have an ITS that is the same length.

Which leads to my question. Is it still appropriate to use the full operon to generate phylogenies, ie for use with UniFrac? I ask because the presence of a tRNA in one copy of the operon but not another from the same bacterium doesn’t really reflect an evolutionary history between the two sequences like a tree would suggest. Because of that, would it be improper to use this tree? The alternative is to do an in silico PCR to extract only the 16S region, and create a tree based only on that. This eliminates the problem posed by the ITS, but also eliminates a lot of data and the benefits of sequencing this larger amplicon instead of just the full 16S gene.

