Coverage drops in fastq alignment against custom Immunoglobulin reference

Coverage drops in fastq alignment against custom Immunoglobulin reference


I am working on Hiseq2000/2500 single end reads on RNASeq leukemia samples.
I am interested in aligning all the reads beloging to the Immunoglobulin genes (Ig) for further analysis.
The task is difficult for two main reasons:

  • Final Ig genes are the result of a “collage” of genomic regions and alignment is difficult;
  • Ig are hypermutated, meaning that there are a lot of mutations interfering with the alignment;

I decided to develop my own pipeline since those available are inappropriate.
I post a brief resume of the workflow:

  • Star alignment agains hg38;
  • Extraction of unmapped reads;
  • Extraction of reads mapping to the Ig locus;
  • De novo construction of the Ig with Trinity;
  • Alignment of de novo sequences with IgBlast;
  • Select the “correct” Ig;
  • Align sequences against the custom Ig reference with bowtie2 (–very-sensitive-local).

I’ve analyzed around 100 samples, 70% of those align correctly displaying a uniform coverage.
The 30% of samples show a non-uniform coverage profile (See Fig. attached). As you can see, there is the common 3′ bias with the coverage increasing at the end of the Ig. However, there is a huge drop in the middle of the Ig and I cannot understand whether is a kind of artifact lacking those reads or is the alignment parameters wrong.

enter image description here
To resume, I cannot understand how is possible to observe different coverage profiles for samples belonging to the very same experiment.
Any suggestion will be helpful





Read more here: Source link