MAPQ filtering for clinical applications

A discussion recently arose about how one ought to filter MAPQ in a clinical setting, i.e., where a NGS sample is being processed in order to produce a result for a patient who has an unknown or hypothesised diagnosis. The result could obviously be key.

It was suggested by a friend that MAPQ of 20 would be a sufficient cutoff, whereas, I stated that it ought to be as high as 60. Another colleague implied that my high cutoff didn’t make sense because each region of the genome is covered by reads at varying MAPQ and that there would be many over each region, I assume s/he meant, that would have high MAPQ.

Keep in mind that BWA is being used, which produces MAPQ in the range 0-60. Also, I generally drop to as low as MAPQ 40 in clinical pipelines and then rely on a whole bunch of other metrics to ensure that only true variants are called, confirmed with Sanger

For the record: >50% of the genome exhibits a high level of homology and there are certain regions that will simply never attain a MAPQ >30 due to their high level of homology. Look at the CYP genes, for example. Some of the exons of these just cannot be reliably sequenced using the standard NGS protocols. Some reads do map to these highly homologous regions. For example, at MAPQ 60, you may get coverage of around 10 or 20, whereas other less homologous regions may get >1000.

Remember that this is a clinical setting where a result can change a person’s life. As the analyst, would you sign your name on a clinical report, a document type that has legal weight, in knowing that you let these low MAPQ reads through?

The second issue of putting too much focus on MAPQ also arose. Of course, there are countless other QC metrics to use, but MAPQ is one of the first and therefore one of the most important. If you get it wrong, a lot of your results may end up being false-positives.

Cheers for any comments!

Read more here: Source link