CNVKit does not output all the accessible regions in the targets bed file

CNVKit does not output all the accessible regions in the targets bed file

1

Hello everybody,

I am using CNVkit on my data using hg38 as reference. The command that I am using is the following:

cnvkit.py batch sample.bam  -n control.bam -m wgs -f reference.fasta  --target-avg-size 1000 --output-dir results/

So, on the CNVkit documentation page is reported that

The batch --method wgs option uses the given reference genome’s sequencing-accessible regions (“access” BED) as the “targets” – these will be calculated on the fly if not provided.

Problem is, when it calculates accesibile regions from the reference, there are a lot of them listed in the log file that are not present in the final .bed output, and I cannot figure out why. Those are the outputs I get on a short example:

Log file:

WGS protocol: recommend '--annotate' option (e.g. refFlat.txt) to help locate genes in output files.
NC_000001.10: Scanning for accessible regions
        Accessible region NC_000001.10:10000-177417 (size 167417)
        Accessible region NC_000001.10:227417-267719 (size 40302)
        Accessible region NC_000001.10:317719-471368 (size 153649)
        Accessible region NC_000001.10:521368-2634220 (size 2112852)
        Accessible region NC_000001.10:2684220-3845268 (size 1161048)
        Accessible region NC_000001.10:3995268-13052998 (size 9057730)
        Accessible region NC_000001.10:13102998-13219912 (size 116914)
        Accessible region NC_000001.10:13319912-13557162 (size 237250)
        Accessible region NC_000001.10:13607162-17125658 (size 3518496)
        Accessible region NC_000001.10:17175658-29878082 (size 12702424)
        Accessible region NC_000001.10:30028082-103863906 (size 73835824)
        Accessible region NC_000001.10:103913906-120697156 (size 16783250)
        Accessible region NC_000001.10:120747156-120936695 (size 189539)
        Accessible region NC_000001.10:121086695-121485434 (size 398739)
        Accessible region NC_000001.10:142535434-142731022 (size 195588)
        Accessible region NC_000001.10:142781022-142967761 (size 186739)
        Accessible region NC_000001.10:143117761-143292816 (size 175055)
        Accessible region NC_000001.10:143342816-143544525 (size 201709)
        Accessible region NC_000001.10:143644525-143771002 (size 126477)
        Accessible region NC_000001.10:143871002-144095783 (size 224781)
        Accessible region NC_000001.10:144145783-144224481 (size 78698)
        Accessible region NC_000001.10:144274481-144401744 (size 127263)
        Accessible region NC_000001.10:144451744-144622413 (size 170669)
        Accessible region NC_000001.10:144672413-144710724 (size 38311)
        Accessible region NC_000001.10:144810724-145833118 (size 1022394)
        Accessible region NC_000001.10:145883118-146164650 (size 281532)
        Accessible region NC_000001.10:146214650-146253299 (size 38649)
        Accessible region NC_000001.10:146303299-148026038 (size 1722739)
        Accessible region NC_000001.10:148176038-148361358 (size 185320)
        Accessible region NC_000001.10:148511358-148684147 (size 172789)
        Accessible region NC_000001.10:148734147-148954460 (size 220313)
        Accessible region NC_000001.10:149004460-149459645 (size 455185)
        Accessible region NC_000001.10:149509645-205922707 (size 56413062)
        Accessible region NC_000001.10:206072707-206332221 (size 259514)
        Accessible region NC_000001.10:206482221-223747846 (size 17265625)
        Accessible region NC_000001.10:223797846-235192211 (size 11394365)
        Accessible region NC_000001.10:235242211-248908210 (size 13665999)
        Accessible region NC_000001.10:249058210-249240621 (size 182411)
NT_113878.1: Scanning for accessible regions
        Accessible region NT_113878.1:0-106433 (size 106433)
NT_167207.1: Scanning for accessible regions
        Accessible region NT_167207.1:0-547496 (size 547496)
NC_000002.11: Scanning for accessible regions
        Accessible region NC_000002.11:10000-3529312 (size 3519312)
        Accessible region NC_000002.11:3579312-5018788 (size 1439476)
        Accessible region NC_000002.11:5118788-16279724 (size 11160936)
        Accessible region NC_000002.11:16329724-21153113 (size 4823389)
        Accessible region NC_000002.11:21178113-31705550 (size 10527437)
        Accessible region NC_000002.11:31705551-31725939 (size 20388)
        Accessible region NC_000002.11:31726790-31816827 (size 90037)
        Accessible region NC_000002.11:31816828-31816854 (size 26)
        Accessible region NC_000002.11:31816855-31816858 (size 3)
        Accessible region NC_000002.11:31816859-33092197 (size 1275338)
        Accessible region NC_000002.11:33093197-33141692 (size 48495)
        Accessible region NC_000002.11:33142692-87668206 (size 54525514)
        Accessible region NC_000002.11:87718206-89630436 (size 1912230)
        Accessible region NC_000002.11:89830436-90321525 (size 491089)
        Accessible region NC_000002.11:90371525-90545103 (size 173578)
        Accessible region NC_000002.11:91595103-92326171 (size 731068)
        Accessible region NC_000002.11:95326171-110109337 (size 14783166)
        Accessible region NC_000002.11:110251337-149690582 (size 39439245)
        Accessible region NC_000002.11:149790582-234003741 (size 84213159)
        Accessible region NC_000002.11:234053741-239801978 (size 5748237)
        Accessible region NC_000002.11:239831978-240784132 (size 952154)
        Accessible region NC_000002.11:240809132-243102476 (size 2293344)
        Accessible region NC_000002.11:243152476-243189373 (size 36897)
NC_000003.11: Scanning for accessible regions
        Accessible region NC_000003.11:60000-17137943 (size 17077943)
NT_113878.1: Joining over small gaps
NT_167207.1: Joining over small gaps
Wrote GCF_000001405.25_GRCh37.p13_genomic.bed with 2 regions

So, multuple accessible regions are present for sequences NC_000001.10 , NC_000002.11 and NC_000003.11, but only regions for NT_113878.1 and NT_167207.1 are reported:

NT_113878.1     0       106433
NT_167207.1     0       547496

Why is that so? I cannot really figure out.


CNVkit


cnv


bed


target


wgs

• 36 views

Could you post the chromosome names in the bam file and your bed file? They’re probably not matching.


Login
before adding your answer.

Traffic: 2711 users visited in the last hour

Read more here: Source link