I’m running CNVKit in amplicon mode on a set of tumor bam files generated with a small amplicon panel of 45 genes. The panel includes just one gene on chrX, and none on chrY. My reference is generated by 10 normal male samples sequenced with the same panel.
Initially I ran the pipeline using -y at all appropriate steps. The reference samples are correctly assumed male, but almost all of my tumor samples are also treated as male. I then ran the same pipeline but without the -y. The reference samples are still assumed male, but now I have a far more even breakdown of male and female in the sample set.
I prefer the output without the -y.
My understanding is that without -y the log2 ratios for male reference samples will be doubled. I confirmed that the diploid chrX log2 ratios are +1 with respect to the haploid chrX ratios. Since there is only one chrX region and no chrY region covered, it seems automatic determination of sample gender will be confounded if there is a CN change in that X region.
Are there any special considerations I should bear in mind with only one gene on chrX and no genes on chrY sequenced?
How, if at all, would incorrect gender identification for a sample affect CN calls on chrX?