Tag: chrX
Counting Isoforms from Sam File
Counting Isoforms from Sam File 0 I am attempting to count the number of reads of each isoform of the Rt-GEF gene in Drosophila across multiple sam files. My sam files are currently formatted so that reads are listed by coordinates on chromosomes, such as VH00562:14:AAANWG5HV:1:1101:26828:1568 1:N:0:ACTAAGAT+GCGGTTGT 99 chrX 5075295…
Making pairwise matrix using ChIP-Seq peak binding matrix in R
Hi everyone, I have a query about R, How to convert this Protein1 Protein2 Protein3 chr1_1564 1 0 0 chr3_9087 0 1 1 chr4_877671 1 1 0 chr9_90988 0 1 1 chr11_87676 1 0 0 chrX_1546 0 1 1 to this Protein1 Protein2 Protein3 Protein1 3 1 0 Protein2 1…
Problem while working with sequenza
Problem while working with sequenza – Chromosomes out of order 1 Hi, I’m trying to work with sequenza in order to calculate HRD score of a sample using WES data. When I run sequenza, I get a message saying that “chromosomes are out of order”, and I don’t know how…
CONTRA for CNV detection. troubleshooting
CONTRA for CNV detection. troubleshooting 4 Hi Biostars Users, I would like to know if you have encountered this issue while working with CONTRA tool for detecting CNVs in targeted NGS. In fact, my analysis is “stopping” at the binning process with no error message, I have checked all the…
How to calculate GC content of reads that mapped to a specific gene?
How to calculate GC content of reads that mapped to a specific gene? 1 Hello, I aligned my data with STAR and I have the BAM files. I would like to calculate the GC content of reads that mapped to a specific gene. I have found this thread Gc Content…
How to mark as QC fail reads with specific CIGARs
Before the problem, I give some context. I am developing a amplicon NGS bioinformatics pipeline. I keep the primers to do the alignment, then I use samtools-ampliconclip to mask the primers and finally I use Pisces to call the variants. The problem is that occasionally, reads come with a large…
bash – MergeBamAlignment error – Bioinformatics Stack Exchange
I doing the alignment of samples following the GATK pipeline, and doing the MergeBamAlignment,like this: MergeBamAlignment -ALIGNED $path/file.unsorted.bam -UNMAPPED $path/file.unmapped.bam -O $path/file.merged.bam -R $references/GRCh38.primary_assembly.genome.fa -SO coordinate -TMP_DIR $path/tmp/ i got this error: Exception in thread “main” java.lang.IllegalArgumentException: Do not use this function to merge dictionaries with different sequences in them….
Pisces doesn’t like high-quality reads when there is a soft-clip affecting the full read.
When using Pisces, I get the following error. System.Exception: RACP2-6poolv4_P5-A_FINAL_SORTED.bam: Error processing chr ‘chr7’: Failed to process variants for MN01972:49:000H5KYMY:1:11102:26356:2968 … 150S —> System.Exception: Failed to process variants for MN01972:49:000H5KYMY:1:11102:26356:2968 … 150S at Pisces.Domain.Logic.CandidateVariantFinder.ProcessCigarOps(Read alignment, String refChromosome, Int32 readStartPosition, String chromosomeName) at Pisces.Logic.SomaticVariantCaller.Execute() at Pisces.Processing.Logic.BaseGenomeProcessor.ProcessByBam(BamWorkRequest workRequest, String chrName) — End…
Failed to process variants for MN01972:49:000H5KYMY:1:11102:26356:2968
Error processing chr ‘chr7’: Failed to process variants for MN01972:49:000H5KYMY:1:11102:26356:2968 0 I got this error from time to time when running PISCES (Variante calling). Do you have any idea about the reason may be producing this? System.Exception: RACP2-6poolv4_P5-A_FINAL_SORTED.bam: Error processing chr ‘chr7’: Failed to process variants for MN01972:49:000H5KYMY:1:11102:26356:2968 … 150S…
bcftoold guess ploidy
bcftoold guess ploidy 0 Hi, I am trying to use bcftools guess-ploidy to check gender. This is how I tried to use it bcftools view sample.vcf.gz -r chrX:2699521-154931043 | bcftools +guess-ploidy >guess_ploidy.output The VCF has around 50 samples data from Whole Genome Sequencing and has data on all chromosomes. However…
count number of GC for a given genomic ranges
I have Granges with start and end but I want to count number of gc in stretch of DNA. however, i’m not able to properly subset bsgenome to be able to do that. here’s my approach. > gr_pro GRanges object with 3 ranges and 2 metadata columns: seqnames ranges strand…
count number of GC in DNA region
I have Granges with start and end but I want to count number of gc in stretch of DNA. however, i’m not able to properly subset bsgenome to be able to do that. here’s my approach. > gr_pro GRanges object with 3 ranges and 2 metadata columns: seqnames ranges strand…
Liftedover vcf header/contig compatibility
I have a collaborator that has lifted over their hg19 files to hg38 using Crossmap. The first step in the workflow they need to run is a simple bcftools filter for variant quality. They are getting an unknown file type error. Are there any obvious problems with this header that…
GATK BaseRecalibrator known-sites vcf file
Hi, I am trying to run GATK’s BaseRecalibrator on a BAM file created with the hg19 reference sequence downloaded from UCSC website. For the –known-sites option I would like to use either a gnomAD .vcf file or a dbSNP .vcf, downloaded from their respective websites. The analysis works if I…
problems in hg19 and b37 compatibility
Hi everybody, A bam file has been aligned using hg19 reference genome. Thus, the chromosome notation is [chrM, chr1, chr2, chr3, chr4, …, chrX,chrY]. I want to look for PMs using MuTect that requires in input vcf files from dbSNP and COSMIC. In these vcf files the chromosome notation is…
GATK Mutect2 Input files reference and features have incompatible contigs: No overlapping contigs found.
Hi, I am following the GATK best practices pipeline for variant calling starting from targeted sequencing bam and bai files using the hg19 reference. When applying GATK Mutect2 got the following error A USER ERROR has occurred: Input files reference and features have incompatible contigs: No overlapping contigs found. reference…
PLINK1.9 check sex problem
PLINK1.9 check sex problem 0 Hello! I use plink1.9 with –check-sex option to check sex in data with only one sample. I use –read-freq with a file contained frequencies of variants from chrX of 1000G (obtained by –freq). My problem is that for all samples I have F=-1 and SNPSEX=2….
Where can I download the length of short and long arms for each chromosome
Where can I download the length of short and long arms for each chromosome 3 Where can I download the length of short and long arm for each chromosome? Thank you sequencing SNP genome • 3.3k views Download cytoband file from UCSC: And summarise using R: library(data.table) x <- fread(“http://hgdownload.cse.ucsc.edu/goldenpath/hg19/database/cytoBand.txt.gz”,…
GATK CollectReadCounts UnsatisfiedLinkError hdf5
GATK CollectReadCounts UnsatisfiedLinkError hdf5 0 I’m trying to run GATK’s CollectReadCounts. It looks like running propperly but then it stop and gives the following error message: java.lang.UnsatisfiedLinkError: /tmp/libjhdf5.2.11.06741640809408344263.so: /tmp/libjhdf5.2.11.06741640809408344263.so: failed to map segment from shared object: Operation not permitted Can’t seem to find anyone with the same problem anywhere on…
object of type ‘NoneType’ has no len()
I run Truvari for benchmarking of 2 vcf files truvari bench -b NA12878_S1.genome.vcf.gz -c b1.vcf.gz -o out. However, it gives the following error. First vcf file contains format, info, filter, contig and maxdepth headers, which is the vcf file I found on Internet. The second vcf file is output of…
How to determine the exact version of hg38 if I have only the FASTA file
How to determine the exact version of hg38 if I have only the FASTA file 1 I have a FASTA file which contains hg38 assembly. It contains the primary contigs, alt contigs, decoy, HLA, mito. How do I determine the exact version of hg38 based on the FASTA? Here some…
How to add conditional annotation from other column of R dataframe in ggplot2
I’m trying to make a plot from a dataframe of over 300k rows where the peaks and valleys will be annotated from another column rather than the x and y. How can I do that..!! Dataframe : chr start end.x StoZ.x Hscore end.y StoZ.y Tier Gene 1 chr1 1 10000…
Answer: Estimate sizes of repeats in a especific Gene
Tell me if I’m in the way. I have the CRAM file and the respective CRAI (index). So I just ran the SAM like this, clipping my area of interest: > $ samtools view -b NG1PSZ7BE9.mm2.sortdup.bqsr.cram “chrX:147912050-147912110” > result.bam Then I indexed the .bam file: > $ samtools index result.bam…
how can I generate a VCF (in hg38 coords) of differences between hg38 and CHM13?
I downloaded s3-us-west-2.amazonaws.com/human-pangenomics/pangenomes/freeze/freeze1/minigraph/hprc-v1.0-minigraph-grch38.gfa.gz which contains hg38, chm13, and other assemblies, and now am trying to use vg to generate a VCF with the variants in CHM13 relative to hg38. After converting to vg format, by running vg convert <(gunzip -c hprc-v1.0-minigraph-grch38.gfa.gz) > hprc-v1.0-minigraph-grch38.vg, I tried a few different variations of…
How to Diagnose Fragile X from Whole Genome Sequencing
Following this post. Perhaps you can indicate if I’m on the right track, although I’m a complete amateur now. We know that the FMR1 gene is located in: chrX:147,911,919-147,951,125 Your size is: 39,207 bases So I exactly “snipped” the above sequence, which is exactly 39,207 characters from a 30x Whole-Genome…
Annotate vcf file using GNOMAD
Hi, I use a loop for that. Something like this to inspire you: # Enter folder where gnomAD data are here: gnomAD=”/path/to/gnomAD/database/release/3.1.2/gnomad.genomes.v3.1.2.sites.” # Enter the folder where your results are and will be annotated further cd /path/to/your/results/folder/ # Enter the name of the final results’ file from SnpSift ann=”results.ann.gnomAD.genomes.v3.1.2.vcf” #…
How come on T2T human genome assembly the PAR regions of chrY are bigger than the corresponding regions on chrX?
How come on T2T human genome assembly the PAR regions of chrY are bigger than the corresponding regions on chrX? 1 As you can see on the PAR file listed here: s3-us-west-2.amazonaws.com/human-pangenomics/T2T/CHM13/assemblies/analysis_set/chm13v2.0_PAR.bed If you download the file you see the coordinates for PAR1 listed are: PAR1 chrX 0 2394410 chrY…
Double Alleles in gVCF file for ChrX, ChrY & ChrM
Double Alleles in gVCF file for ChrX, ChrY & ChrM 1 Hello, I understand for that for men X,Y, mitochondria chromosomes should be represented by a single allele. but I have a gVCF file that has 2 alleles (homozygous or heterozygous) for each position under these chromosomes, mostly with GT:…
Bed files of narrowPeak in ENCODE
Bed files of narrowPeak in ENCODE 0 I found that the Pvalue column is all -1 in the narrowpeak.bed file I downloaded from ENCODE. According to the manual provided by ENCODE, that means the Pvalues of all peaks are unassigned. Does anyone know what’s the meaning of that? Thanks a…
error in Genome Mepping by BWA tools in Linux
$ gmap_build -D:\btau8refflat.gtf Unknown option: D:btau8refflat.gtf -k flag not specified, so building main hash table with default 15-mers -j flag not specified, so building regional hash tables with default 6-mers gmap_build: Builds a gmap database for a genome to be used by GMAP or GSNAP. Part of GMAP package, version…
Normalization of ChIP-seq results with deepTools
Normalization of ChIP-seq results with deepTools 0 Hello everyone! I`m trying to visualize ChIP-seq data with deepTools. I used the following commands: bamCoverage (normalization) -> bigwigCompare (normalization against input) -> computeMatrix -> plotProfile As a result I got this figure. Green and yellow plots are for treatment samples replicas, blue…
unrecognized arguments using deepTools bamCoverage
Error: unrecognized arguments using deepTools bamCoverage 0 Hello everyone! I`m trying to visualize ChIP-seq data with deepTools but facing errors when use optional arguments for normalization. Comands like: bamCoverage -b file.bam -o file.bigWig -of bigwig –normalizeUsing RPGC –effectiveGenomeSize 2862010578 –ignoreForNormalization chrX chrY bamCompare -b1 file-1.bam -b2 file-2.bam -o files-1-2.bigWig -of…
Help in replicating LDSC heritability estimates
Hi, I am trying to replicate the heritability estimates based on the insomnia GWAS summary statistics using LDSC. However, I have encountered a problem as my estimates seem to be only about half of the original estimates listed in Table S1. Despite my efforts to locate the error, I have…
VEP-like tool for sequence ontology and HGVS annotation of VCF files
Mehari is a software package for annotating VCF files with variant effect/consequence. The program uses hgvs-rs for projecting genomic variants to transcripts and proteins and thus has high prediction quality. Other popular tools offering variant effect/consequence prediction include: Mehari offers predictions that aim to mirror VariantValidator, the gold standard for…
ChrX allele frequency in males and females
ChrX allele frequency in males and females 0 Hi, I have a question about the allele frequency output files (.frq) from VCFtools. My question is specific to chromosome X variants. I am writing out the allele frequencies for males and females separately to identify the variants with different allele frequencies….
deepTools multiBigwigSummary “Invalid interval bounds” error
I’m trying to bin 1x normalized ATAC-seq bigWig (generated by the bamCoverage function) with the multiBigwigSummary function in deepTools with the intention of clustering several ATAC-seq samples with deepTool’s plotCorrelation function. My bamCoverage commands look like this: > bamCoverage -b input.bam -o output.SeqDepthNorm.bw -p “max” –effectiveGenomeSize 2805636331 –normalizeUsing RPGC -ignore…
Converting dbSNP VCF to work with RefSeq chromossome ID
Hello everyone! I’ve been trying to use GATK with updated version of the human genome as the GATK files are outdated by ten years. I’ve downloaded NCBI reference GCF_000001405.40.fna, which is GRCh38.p14 For dbSNP version, I’ve downloaded GCF_000001405.40.gz , which is also GRCh38.p14 When extracting the contig names from my…
Bedtools multicov output file error
Bedtools multicov output file error 0 Hello, I ran bedtools multicov for 5 bam files. The output file is corrupted after a certain set of time i.e, from chr13 to chrM the output generated as below : tail multicov.txt output – error 6500 chrX 140406100 140406600 chrX 140406200 140406700 chrX…
Fasta file and GTF file for STAR alignment
Fasta file and GTF file for STAR alignment 3 Hello there, The top-level fasta file will include chromsomes, regions not assembled into chromosomes and N padded haplotype/patch regions. See more here: ftp.ensembl.org/pub/release-92/fasta/mus_musculus/dna/README. If you are only looking for reference genome assembly chromosome level sequences then use the primary_assembly.fa file. The…
VCF input error using locateVariant command (VariantAnnotation)
Hi Everyone! I have a problem with using rDGIdb looking for possible drug candidates. I want to use the option of VCF input and I am following the code described in bioconductor.org/packages/release/bioc/vignettes/rDGIdb/inst/doc/vignette.pdf. However it always gives me an error in the end: library(VariantAnnotation) library(TxDb.Hsapiens.UCSC.hg19.knownGene) library(org.Hs.eg.db) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene vcf <-…
Systematic and benchmarking studies of pipelines for mammal WGBS data in the novel NGS platform | BMC Bioinformatics
Comparison of read level and improving the mapping efficiency according to trimming Since the generation of high-quality WGBS data ultimately impacts the quantification and interpretation of Cs methylation levels, it is indispensable to monitor the raw data quality and interrogate the appropriate pre-processing step to cleanse data [1]. To avoid…
getTable ignores query ranges
HI everyone. I have a set of mouse SNPs (~974) from GRCm39 that I’m trying to get either GERP or UCSC Conservation scores on. To do this, I’m using rtracklayer to try to query the ranges of the SNP and return the multiz35way conservation score. However, when I do this,…
Datasets | TogoVar
Variant frequencies for which you can apply for use of individual-level data∗1 to the NBDC human databases∗2 Click the links at the Included controlled-access datasets to apply for use of individual-level data ∗1:fastq/bam/cel files and/or lists of genotype data etc.∗2:Japanese Genotype-phenotype Archive (JGA) / AMED Genome group sharing Database (AGD)…
Scatter Gather principle by chromosome on Gatk
Scatter Gather principle by chromosome on Gatk 0 Hi all, On a quest to optimize gatk pipeline, I met scatter gather principle, so I did following, pids= for chr in chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chr20…
Help with PCA plot from SNPRelate
Hello, BioStars community! I am working on a datase of 50 samples(between cases and controls) with genotyping data for ~200 SNPs. I ran SNPRelate PCA analysis without adding population data and tried to plot its results. My PCA plot seems a bit “strange” since the observations did not come together…
Split multiallelic SNPs to biallelic from vcf
Dear all, I have a particular vcf file like this, chrX 29 . G A,T . PASS AC=1,1;AN=3 GT:DP:HF:CILOW:CIUP:SDP 0/1/2:4839:0.003,0.001:0.002,0.0:0.005,0.003:14;0,4;2 I tried various tools to split this, but I get the following results, so the FORMAT and INFO lines are identical. chrX 29 . G A . PASS AC=1,1;AN=3;OLD_MULTIALLELIC=chrM:899:G/A/T GT:DP:HF:CILOW:CIUP:SDP…
Tools for Normalizing and Comparing ChIP-seq Samples
data(H3K27Ac, package = “MAnorm2”) attr(H3K27Ac, “metaInfo”) ## Make a comparison between GM12891 and GM12892 cell lines and create an MA ## plot on the comparison results. # Perform MA normalization and construct bioConds to represent the two cell # lines. norm <- normalize(H3K27Ac, 5:6, 10:11) norm <- normalize(norm, 7:8, 12:13)…
Produce PCA bi-plot for 1000 Genomes Phase III
Note1 – Previous version: Produce PCA bi-plot for 1000 Genomes Phase III in VCF format (old) Note2 – this data is for hg19 / GRCh37 Note3 – GRCh38 data is available HERE The tutorial has been updated based on the 1000 Genomes Phase III imputed genotypes. The original tutorial was…
MAPQ (Mapping quality) of 0 for most reads from BWA-MEM2 (with no secondary alignment or other apparent reason)
Hello, I got a very weird output from BWA-mem2 – most of the reads have mapping quality of 0, even though there is no secondary alignment or anything else suspicious. I got sequencing data that was aligned with Novoalign to hg18, the data was bam files. I needed to realign…
Transcription Factor Functions
Transcription Factor Functions 1 I have been wondering if there is a database that contains information about whether a transcription factor has a repressive or an activating or a context-dependent function. factor database transcription • 71 views If you performed a whole-genome FIMO scan, selecting those hits that match your…
Fasta.fai file error
Fasta.fai file error 0 Hi, I have been struggling with an error in bedtools intersect. The command I am trying to run is as follows bedtools intersect -a sorted.vcf -b nstd166.GRCh38.variant_call_chr.vcf.gz -wo -sorted -f 0.8 -r -g Homo_sapiens_assembly38.fasta.fai For some of the files that I am assessing, I don’t get…
What is the difference between GRCh37 and hs37? And hg19?
This is what I have found so far. Please correct me if I am wrong. GRCh37 w/o patches includes the primary assembly (22 autosomal, X. Y, and non-chromosomal supecontigs) and alternate scaffolds, but not a reference mitogenome. Non-chromosomal supercontigs are the unlocalized and unplaced scaffolds. The rCRS reference mitogenome in…
To find total genes on favorable chromosome
To find total genes on favorable chromosome 2 How can I find how many genes exist on each chromosome? genes total • 52 views Counting from GENCODE for the vM27 mouse reference: wget ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M27/gencode.vM27.annotation.gtf.gz #/ In R: library(rtracklayer) gtf <- rtracklayer::import(“~/gencode.vM27.annotation.gtf.gz”) table(as.character(seqnames(gtf[gtf$type==”gene”]))) chr1 chr10 chr11 chr12 chr13 chr14 chr15 chr16…
Platypus
Platypus 0 Hi, I’m super new to WGS and bioinformatics, but I’m a classic software data scientist, so I know enough to be annoying. I’m using Platypus too call variants on 100X WGS via Nebula Genomics. I found an odd series of calls and am not sure if this is…
gender determination and chrX CN calls
gender determination and chrX CN calls 1 I’m running CNVKit in amplicon mode on a set of tumor bam files generated with a small amplicon panel of 45 genes. The panel includes just one gene on chrX, and none on chrY. My reference is generated by 10 normal male samples…
Default CNV call thresholds for haplotype chromosome s
Default CNV call thresholds for haplotype chromosome s 0 Hi, I confuse a topic that about the CNV call. The default thresholds are -1.1 => 0, -0.25 => 1, 0.2 => 2, 0.7 => 3 for discrete copy number. But these thresholds doesn’t work for chrY and chrX. What is…