consensus fasta

Hi everyone!
I’m new in bioinformatics and also in informatics so I’m struggling a bit trying to learn on my own.

At the moment I’m having some difficulties trying to generate a fasta file from a bam file of a complete human genome.

I’ve red on the internet that a way to obtain what I’m trying to get is to follow this command:

call variants

bcftools mpileup -Ou -f reference.fa alignments.bam | bcftools call -mv -Oz -o calls.vcf.gz
bcftools index calls.vcf.gz

normalize indels

bcftools norm -f reference.fa calls.vcf.gz -Ob -o calls.norm.bcf

filter adjacent indels within 5bp

bcftools filter –IndelGap 5 calls.norm.bcf -Ob -o calls.norm.flt-indels.bcf

apply variants to create consensus sequence

cat reference.fa | bcftools consensus -m lowcoveragesites.bed calls.vcf.gz > consensus.fa

1)Let me start by saying that I find a bit weird that, according to the command above, I have to generate normalized and filtered files (call.norm.bcf, call.norm.filtr-indels.bcf) but never use them to create the consensus. Is there something wrong that the command?

But that being said, I’m following step by step the command.
2)I’m having some issues with the overlapping variants.
I constantly receive the error message:
“The site 3:60771641 overlaps with another variant, skipping…”
for different position in my genome.
Is there a way to solution it?

3)Moreover, I was wondering if the bcftools consensus was considering the Heterozygous alleles inserting at the specific position the letters for snp, according to the IUPAC, so I started searching for R,W,Y,… in my consensus genome and I only found 3 heterozygous sites with make me think something went wrong.

Can someone help me with my work? Do you know how I can easily get a consensus fasta from my bam?

Thank you !

Read more here: Source link