Mark duplicates the bam files sorted by coordinates
As it is mentioned in the documentation (gatk.broadinstitute.org/hc/en-us/articles/360037224932?page=1#comment_4406762304155), it is ideal to submit the query name based sorted bam files, so will it be computationally intensive process to submit the coordinated based sorted bam files?
First, I sorted the unmapped and mapped bam files by queryname and merged these files and then sorted by coordinates. Can these merged bam files which are sorted by coordinates be used to mark duplicates by spark? Also, subsequently run SetNmMdAndUqTags before running BQSR.Please advice
• 9 views
Read more here: Source link