Pacbio de-novo assembly

Pacbio de-novo assembly

2

Hi,
Recently I got Pacbio Hifi reads generated using CCS mode of a plant whole genome de-novo assembly.
I received 2 file types from the sequencing facility.
Fastq.gz and Bam file
I am getting confused in two places.

  1. From my understanding i learned that Pacbio sequencing output is in bam format by default. But the file that i received i feel it is not raw file but produced by CCS software using this command ” ccs movie.subreads.bam movie.ccs.bam “.
  2. I have used hifiasm to generate primary contig using the fastq file. Now i want to align the HiFi reads back to the assemblies and filter contigs showing a read depth close to 0, as well as aligning contigs to plant mitochondrial and chloroplast genome sequences to detect organellar contigs. I am confused which alliner to use . I have came across pbalign and minimap2 for the purpose.

I am new to working with Pacbio data. Please let me know if you have any suggestions.
Note- WGS Hifi data(Not RNA- seq data).


Pacbio


minimap2


de-novo


assembly


alligner

• 453 views

pbalign is outdated – pbmm2 is it’s spiritual successor and is really just a wrapper for minimap2
github.com/PacificBiosciences/pbmm2

pbmm2 can align either a bam OR a fasta/fastq to a reference genome

Is your CCS bam file suffixed with *.ccs.bam or *.hifi_reads.bam? If the latter, then it is the HiFi subset of CCS reads, if the former, then it is likely to be ALL CCS reads in the dataset, not filtered for Q20 reads.

Working from just the fasta/fastq should be fine for what you’re trying to do. There is often additional information in the BAM files necessary for certain analyses (kinetics/basemods) but for your purpose working with the fasta/fastqs should be sufficient.

Don’t worry about the raw BAM file, it has no data that is useful for you at this stage. The CSS reads are what you need.

Use minimap2 to align your reads, you will see it has several modes of presets:

   Preset:
    -x STR       preset (always applied before other options; see minimap2.1 for details) []

                 - map-pb/map-ont - PacBio CLR/Nanopore vs reference mapping
                 - map-hifi - PacBio HiFi reads vs reference mapping
                 - ava-pb/ava-ont - PacBio/Nanopore read overlap
                 - asm5/asm10/asm20 - asm-to-ref mapping, for ~0.1/1/5% sequence divergence
                 - splice/splice:hq - long-read/Pacbio-CCS spliced alignment
                 - sr - genomic short-read mapping


Login
before adding your answer.

Traffic: 2485 users visited in the last hour

Read more here: Source link