Tag: chrX

how to test for differential expression in samples where a global increase in gene expression is expected

As the title suggestions, I’m wondering what the best way to test each gene in a count matrix containing two groups is, where one group is expected to have a global increase in gene expression. I need to use spike-in normalized RPKM data, so from my understanding of DESeq, it…

Continue Reading how to test for differential expression in samples where a global increase in gene expression is expected

Methylation Analysis Tutorial in R_part1

The code and approaches that I share here are those I am using to analyze TCGA methylation data. At the bottom of the page, you can find references used to make this tutorial. If you are coming from a computer background, please bear with a geneticist who tried to code…

Continue Reading Methylation Analysis Tutorial in R_part1

overlapping duplicate dispersed_repeat feature in stringtie

GFF Error: overlapping duplicate dispersed_repeat feature in stringtie 0 Hi. I got following error when I use stringtie. with repeatmasker annotation gff file and RNA-seq bam files which is already sorted with samtools. GFF Error: overlapping duplicate dispersed_repeat feature (ID=461) GFF Error: overlapping duplicate dispersed_repeat feature (ID=712) GFF Error: overlapping…

Continue Reading overlapping duplicate dispersed_repeat feature in stringtie

Advanced Emerging Techniques for Forensic DNA Analysis: STRs, SNPs, and mtDNA Analysis

Alshehhi A, Almarzooqi A, Alhammadi K, Werghi N, Tay GK, Alsafar H (2023) Advancement in human face prediction using DNA. Genes (Basel) 14:136. https://doi.org/10.3390/genes14010136 CrossRef  CAS  PubMed  Google Scholar  Amorim A, Fernandes T, Taveira N (2019) Mitochondrial DNA in human identification: a review. PeerJ 7:e7314. https://doi.org/10.7717/peerj.7314 CrossRef  PubMed Central  PubMed …

Continue Reading Advanced Emerging Techniques for Forensic DNA Analysis: STRs, SNPs, and mtDNA Analysis

Fetching subsets with slow5curl and samtools

{“payload”:{“allShortcutsEnabled”:false,”fileTree”:{“docs”:{“items”:[{“name”:”data.md”,”path”:”docs/data.md”,”contentType”:”file”},{“name”:”mount.md”,”path”:”docs/mount.md”,”contentType”:”file”},{“name”:”slow5curl.md”,”path”:”docs/slow5curl.md”,”contentType”:”file”}],”totalCount”:3},””:{“items”:[{“name”:”docs”,”path”:”docs”,”contentType”:”directory”},{“name”:”README.md”,”path”:”README.md”,”contentType”:”file”}],”totalCount”:2}},”fileTreeProcessingTime”:21.958637,”foldersToFetch”:[],”reducedMotionEnabled”:null,”repo”:{“id”:641926755,”defaultBranch”:”main”,”name”:”gtgseq”,”ownerLogin”:”GenTechGp”,”currentUserCanPush”:false,”isFork”:false,”isEmpty”:false,”createdAt”:”2023-05-17T13:03:07.000Z”,”ownerAvatar”:”https://avatars.githubusercontent.com/u/133880336?v=4″,”public”:true,”private”:false,”isOrgOwned”:true},”symbolsExpanded”:false,”treeExpanded”:true,”refInfo”:{“name”:”main”,”listCacheKey”:”v0:1684328588.326433″,”canEdit”:false,”refType”:”branch”,”currentOid”:”4079e27791c34880ca1a3a9bba9e2b2fc2885bab”},”path”:”docs/slow5curl.md”,”currentUser”:null,”blob”:{“rawLines”:null,”stylingDirectives”:null,”csv”:null,”csvError”:null,”dependabotInfo”:{“showConfigurationBanner”:false,”configFilePath”:null,”networkDependabotPath”:”/GenTechGp/gtgseq/network/updates”,”dismissConfigurationNoticePath”:”/settings/dismiss-notice/dependabot_configuration_notice”,”configurationNoticeDismissed”:null,”repoAlertsPath”:”/GenTechGp/gtgseq/security/dependabot”,”repoSecurityAndAnalysisPath”:”/GenTechGp/gtgseq/settings/security_analysis”,”repoOwnerIsOrg”:true,”currentUserCanAdminRepo”:false},”displayName”:”slow5curl.md”,”displayUrl”:”https://github.com/GenTechGp/gtgseq/blob/main/docs/slow5curl.md?raw=true”,”headerInfo”:{“blobSize”:”3.77 KB”,”deleteInfo”:{“deleteTooltip”:”You must be signed in to make or propose changes”},”editInfo”:{“editTooltip”:”You must be signed in to make or propose changes”},”ghDesktopPath”:”https://desktop.github.com”,”gitLfsPath”:null,”onBranch”:true,”shortPath”:”59fb302″,”siteNavLoginPath”:”/login?return_to=https%3A%2F%2Fgithub.com%2FGenTechGp%2Fgtgseq%2Fblob%2Fmain%2Fdocs%2Fslow5curl.md”,”isCSV”:false,”isRichtext”:true,”toc”:[{“level”:1,”text”:”Fetching subsets with slow5curl and samtools”,”anchor”:”fetching-subsets-with-slow5curl-and-samtools”,”htmlText”:”Fetching subsets with slow5curl and samtools”},{“level”:2,”text”:”Installing necessary tools”,”anchor”:”installing-necessary-tools”,”htmlText”:”Installing necessary tools”},{“level”:2,”text”:”Example: Fetching a subset of reads”,”anchor”:”example-fetching-a-subset-of-reads”,”htmlText”:”Example: Fetching a subset of reads”},{“level”:2,”text”:”Example: Fetching and basecalling a subset of…

Continue Reading Fetching subsets with slow5curl and samtools

Longitudinal detection of circulating tumor DNA

Analysis of Roche KAPA Target Enrichment kit experimental data obtained on an Illumina sequencing system is most frequently performed using a variety of publicly available, open-source analysis tools. The typical variant calling analysis workflow consists of sequencing read quality assessment, read filtering, mapping against the reference genome, duplicate removal, coverage…

Continue Reading Longitudinal detection of circulating tumor DNA

Handling male samples chrX vcf genotype from 1000G high-coverage 30x

Handling male samples chrX vcf genotype from 1000G high-coverage 30x 0 Hello, I am working with the vcf files from the 1000G project high-coverage (30x) release. I do not completely understand how have the authors handled the genotypes of male individuals in the non-pseudoautosomal chrX regions. The genotypes in the…

Continue Reading Handling male samples chrX vcf genotype from 1000G high-coverage 30x

CRISPR-broad: combined design of multi-targeting gRNAs and broad, multiplex target finding

CRISPR-broad framework We developed a procedural pipeline for detecting gRNAs and implemented this in Python as a standalone application (Fig. 1a). For speeding up gRNA selection, we employed multithreading and used big data Python module Pandas. This allowed splitting millions of short sequences for mapping and processing large numbers of uncompressed…

Continue Reading CRISPR-broad: combined design of multi-targeting gRNAs and broad, multiplex target finding

human genome – How many Ns and ns in GRCh37 / GRCh38 per ‘canonical’ chromosome?

This is kind of pedantic, but I’m not sure where to look… For GRCh38 (and a lot of work…) I have the following… Chr Length Ns ns chr1 248,956,422 18,475,229 181 chr2 242,193,529 1645,291 10 chr3 198,295,559 195,420 4 chr4 190,214,555 461,888 0 chr5 181,538,259 272,881 0 chr6 170,805,979 727,255…

Continue Reading human genome – How many Ns and ns in GRCh37 / GRCh38 per ‘canonical’ chromosome?

Quickly retrieve reference genome sequence within python

Quickly retrieve reference genome sequence within python 0 Hi all, For a project I’m working on, I need to be able to quickly retrieve the sequence at a given 2kb window of the reference genome hg38 within python. The windows I need might not be consecutive (i.e. one thread might…

Continue Reading Quickly retrieve reference genome sequence within python

Mosaic chromosomal alterations in blood across ancestries using whole-genome sequencing

Study population We included 67,390 participants from 19 TOPMed studies: Genetics of Cardiometabolic Health in the Amish (n = 1,109) (ref. 32), Atherosclerosis Risk in Communities Study (n = 3,780) (ref. 33), Barbados Genetics Asthma Study (n = 980), Mount Sinai BioMe Biobank (n = 9,392) (ref. 34), Coronary Artery Risk Development in Young Adults (n = 3,293) (ref. 35),…

Continue Reading Mosaic chromosomal alterations in blood across ancestries using whole-genome sequencing

Hey guys, I’m having a prob when using GATK4 BQSR . This dbsnp vcf file has chromosomes notated as 1,2 …. but my reference contiges are chr1.chr2…incompatibility in coutigs..

anilkumar@ak-omen-laptop:~/NGStools/gatk-4.4.0.0$ gatk –java-options “-DGATK_STACKTRACE_ON_USER_EXCEPTION=true” BaseRecalibrator -I “/media/anilkumar/My Passport/CRC/fastq/C_4_mkdp.bam” -R “/media/anilkumar/My Passport/CRC/fastq/hg19.fa” –known-sites “/media/anilkumar/My Passport/CRC/fastq/dbsnp_138.b37.vcf” –known-sites “/media/anilkumar/My Passport/CRC/fastq/Mills_and_1000G_gold_standard.indels.b37.vcf” –known-sites “/media/anilkumar/My Passport/CRC/fastq/1000G_phase1.indels.b37.vcf” -O “/media/anilkumar/My Passport/CRC/fastq/C_4_bqsr.table” Using GATK jar /home/anilkumar/NGStools/gatk-4.4.0.0/gatk-package-4.4.0.0-local.jar Running: java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -DGATK_STACKTRACE_ON_USER_EXCEPTION=true -jar /home/anilkumar/NGStools/gatk-4.4.0.0/gatk-package-4.4.0.0-local.jar BaseRecalibrator -I /media/anilkumar/My Passport/CRC/fastq/C_4_mkdp.bam -R /media/anilkumar/My Passport/CRC/fastq/hg19.fa –known-sites /media/anilkumar/My Passport/CRC/fastq/dbsnp_138.b37.vcf –known-sites /media/anilkumar/My Passport/CRC/fastq/Mills_and_1000G_gold_standard.indels.b37.vcf –known-sites…

Continue Reading Hey guys, I’m having a prob when using GATK4 BQSR . This dbsnp vcf file has chromosomes notated as 1,2 …. but my reference contiges are chr1.chr2…incompatibility in coutigs..

How to deal with reads which CIGAR is [0-9]+S

I am developing an NGS Bioinformatics pipeline to analyze somatic amplicon data. After mapping my reads, I soft-clip both ends of my reads because these reagions are the primers and I dont want them to be used for the variant calling. I soft-clip these regions with this simple code apptainer…

Continue Reading How to deal with reads which CIGAR is [0-9]+S

Counting Isoforms from Sam File

Counting Isoforms from Sam File 0 I am attempting to count the number of reads of each isoform of the Rt-GEF gene in Drosophila across multiple sam files. My sam files are currently formatted so that reads are listed by coordinates on chromosomes, such as VH00562:14:AAANWG5HV:1:1101:26828:1568 1:N:0:ACTAAGAT+GCGGTTGT 99 chrX 5075295…

Continue Reading Counting Isoforms from Sam File

Making pairwise matrix using ChIP-Seq peak binding matrix in R

Hi everyone, I have a query about R, How to convert this Protein1 Protein2 Protein3 chr1_1564 1 0 0 chr3_9087 0 1 1 chr4_877671 1 1 0 chr9_90988 0 1 1 chr11_87676 1 0 0 chrX_1546 0 1 1 to this Protein1 Protein2 Protein3 Protein1 3 1 0 Protein2 1…

Continue Reading Making pairwise matrix using ChIP-Seq peak binding matrix in R

Problem while working with sequenza

Problem while working with sequenza – Chromosomes out of order 1 Hi, I’m trying to work with sequenza in order to calculate HRD score of a sample using WES data. When I run sequenza, I get a message saying that “chromosomes are out of order”, and I don’t know how…

Continue Reading Problem while working with sequenza

CONTRA for CNV detection. troubleshooting

CONTRA for CNV detection. troubleshooting 4 Hi Biostars Users, I would like to know if you have encountered this issue while working with CONTRA tool for detecting CNVs in targeted NGS. In fact, my analysis is “stopping” at the binning process with no error message, I have checked all the…

Continue Reading CONTRA for CNV detection. troubleshooting

How to calculate GC content of reads that mapped to a specific gene?

How to calculate GC content of reads that mapped to a specific gene? 1 Hello, I aligned my data with STAR and I have the BAM files. I would like to calculate the GC content of reads that mapped to a specific gene. I have found this thread Gc Content…

Continue Reading How to calculate GC content of reads that mapped to a specific gene?

How to mark as QC fail reads with specific CIGARs

Before the problem, I give some context. I am developing a amplicon NGS bioinformatics pipeline. I keep the primers to do the alignment, then I use samtools-ampliconclip to mask the primers and finally I use Pisces to call the variants. The problem is that occasionally, reads come with a large…

Continue Reading How to mark as QC fail reads with specific CIGARs

bash – MergeBamAlignment error – Bioinformatics Stack Exchange

I doing the alignment of samples following the GATK pipeline, and doing the MergeBamAlignment,like this: MergeBamAlignment -ALIGNED $path/file.unsorted.bam -UNMAPPED $path/file.unmapped.bam -O $path/file.merged.bam -R $references/GRCh38.primary_assembly.genome.fa -SO coordinate -TMP_DIR $path/tmp/ i got this error: Exception in thread “main” java.lang.IllegalArgumentException: Do not use this function to merge dictionaries with different sequences in them….

Continue Reading bash – MergeBamAlignment error – Bioinformatics Stack Exchange

Pisces doesn’t like high-quality reads when there is a soft-clip affecting the full read.

When using Pisces, I get the following error. System.Exception: RACP2-6poolv4_P5-A_FINAL_SORTED.bam: Error processing chr ‘chr7’: Failed to process variants for MN01972:49:000H5KYMY:1:11102:26356:2968 … 150S —> System.Exception: Failed to process variants for MN01972:49:000H5KYMY:1:11102:26356:2968 … 150S at Pisces.Domain.Logic.CandidateVariantFinder.ProcessCigarOps(Read alignment, String refChromosome, Int32 readStartPosition, String chromosomeName) at Pisces.Logic.SomaticVariantCaller.Execute() at Pisces.Processing.Logic.BaseGenomeProcessor.ProcessByBam(BamWorkRequest workRequest, String chrName) — End…

Continue Reading Pisces doesn’t like high-quality reads when there is a soft-clip affecting the full read.

Failed to process variants for MN01972:49:000H5KYMY:1:11102:26356:2968

Error processing chr ‘chr7’: Failed to process variants for MN01972:49:000H5KYMY:1:11102:26356:2968 0 I got this error from time to time when running PISCES (Variante calling). Do you have any idea about the reason may be producing this? System.Exception: RACP2-6poolv4_P5-A_FINAL_SORTED.bam: Error processing chr ‘chr7’: Failed to process variants for MN01972:49:000H5KYMY:1:11102:26356:2968 … 150S…

Continue Reading Failed to process variants for MN01972:49:000H5KYMY:1:11102:26356:2968

bcftoold guess ploidy

bcftoold guess ploidy 0 Hi, I am trying to use bcftools guess-ploidy to check gender. This is how I tried to use it bcftools view sample.vcf.gz -r chrX:2699521-154931043 | bcftools +guess-ploidy >guess_ploidy.output The VCF has around 50 samples data from Whole Genome Sequencing and has data on all chromosomes. However…

Continue Reading bcftoold guess ploidy

count number of GC for a given genomic ranges

I have Granges with start and end but I want to count number of gc in stretch of DNA. however, i’m not able to properly subset bsgenome to be able to do that. here’s my approach. > gr_pro GRanges object with 3 ranges and 2 metadata columns: seqnames ranges strand…

Continue Reading count number of GC for a given genomic ranges

count number of GC in DNA region

I have Granges with start and end but I want to count number of gc in stretch of DNA. however, i’m not able to properly subset bsgenome to be able to do that. here’s my approach. > gr_pro GRanges object with 3 ranges and 2 metadata columns: seqnames ranges strand…

Continue Reading count number of GC in DNA region

Liftedover vcf header/contig compatibility

I have a collaborator that has lifted over their hg19 files to hg38 using Crossmap. The first step in the workflow they need to run is a simple bcftools filter for variant quality. They are getting an unknown file type error. Are there any obvious problems with this header that…

Continue Reading Liftedover vcf header/contig compatibility

GATK BaseRecalibrator known-sites vcf file

Hi, I am trying to run GATK’s BaseRecalibrator on a BAM file created with the hg19 reference sequence downloaded from UCSC website. For the –known-sites option I would like to use either a gnomAD .vcf file or a dbSNP .vcf, downloaded from their respective websites. The analysis works if I…

Continue Reading GATK BaseRecalibrator known-sites vcf file

problems in hg19 and b37 compatibility

Hi everybody, A bam file has been aligned using hg19 reference genome. Thus, the chromosome notation is [chrM, chr1, chr2, chr3, chr4, …, chrX,chrY]. I want to look for PMs using MuTect that requires in input vcf files from dbSNP and COSMIC. In these vcf files the chromosome notation is…

Continue Reading problems in hg19 and b37 compatibility

GATK Mutect2 Input files reference and features have incompatible contigs: No overlapping contigs found.

Hi, I am following the GATK best practices pipeline for variant calling starting from targeted sequencing bam and bai files using the hg19 reference. When applying GATK Mutect2 got the following error A USER ERROR has occurred: Input files reference and features have incompatible contigs: No overlapping contigs found. reference…

Continue Reading GATK Mutect2 Input files reference and features have incompatible contigs: No overlapping contigs found.

PLINK1.9 check sex problem

PLINK1.9 check sex problem 0 Hello! I use plink1.9 with –check-sex option to check sex in data with only one sample. I use –read-freq with a file contained frequencies of variants from chrX of 1000G (obtained by –freq). My problem is that for all samples I have F=-1 and SNPSEX=2….

Continue Reading PLINK1.9 check sex problem

Where can I download the length of short and long arms for each chromosome

Where can I download the length of short and long arms for each chromosome 3 Where can I download the length of short and long arm for each chromosome? Thank you sequencing SNP genome • 3.3k views Download cytoband file from UCSC: And summarise using R: library(data.table) x <- fread(“http://hgdownload.cse.ucsc.edu/goldenpath/hg19/database/cytoBand.txt.gz”,…

Continue Reading Where can I download the length of short and long arms for each chromosome

GATK CollectReadCounts UnsatisfiedLinkError hdf5

GATK CollectReadCounts UnsatisfiedLinkError hdf5 0 I’m trying to run GATK’s CollectReadCounts. It looks like running propperly but then it stop and gives the following error message: java.lang.UnsatisfiedLinkError: /tmp/libjhdf5.2.11.06741640809408344263.so: /tmp/libjhdf5.2.11.06741640809408344263.so: failed to map segment from shared object: Operation not permitted Can’t seem to find anyone with the same problem anywhere on…

Continue Reading GATK CollectReadCounts UnsatisfiedLinkError hdf5

object of type ‘NoneType’ has no len()

I run Truvari for benchmarking of 2 vcf files truvari bench -b NA12878_S1.genome.vcf.gz -c b1.vcf.gz -o out. However, it gives the following error. First vcf file contains format, info, filter, contig and maxdepth headers, which is the vcf file I found on Internet. The second vcf file is output of…

Continue Reading object of type ‘NoneType’ has no len()

How to determine the exact version of hg38 if I have only the FASTA file

How to determine the exact version of hg38 if I have only the FASTA file 1 I have a FASTA file which contains hg38 assembly. It contains the primary contigs, alt contigs, decoy, HLA, mito. How do I determine the exact version of hg38 based on the FASTA? Here some…

Continue Reading How to determine the exact version of hg38 if I have only the FASTA file

How to add conditional annotation from other column of R dataframe in ggplot2

I’m trying to make a plot from a dataframe of over 300k rows where the peaks and valleys will be annotated from another column rather than the x and y. How can I do that..!! Dataframe : chr start end.x StoZ.x Hscore end.y StoZ.y Tier Gene 1 chr1 1 10000…

Continue Reading How to add conditional annotation from other column of R dataframe in ggplot2

Answer: Estimate sizes of repeats in a especific Gene

Tell me if I’m in the way. I have the CRAM file and the respective CRAI (index). So I just ran the SAM like this, clipping my area of interest: > $ samtools view -b NG1PSZ7BE9.mm2.sortdup.bqsr.cram “chrX:147912050-147912110” > result.bam Then I indexed the .bam file: > $ samtools index result.bam…

Continue Reading Answer: Estimate sizes of repeats in a especific Gene

how can I generate a VCF (in hg38 coords) of differences between hg38 and CHM13?

I downloaded https://s3-us-west-2.amazonaws.com/human-pangenomics/pangenomes/freeze/freeze1/minigraph/hprc-v1.0-minigraph-grch38.gfa.gz which contains hg38, chm13, and other assemblies, and now am trying to use vg to generate a VCF with the variants in CHM13 relative to hg38. After converting to vg format, by running vg convert <(gunzip -c hprc-v1.0-minigraph-grch38.gfa.gz) > hprc-v1.0-minigraph-grch38.vg, I tried a few different variations of…

Continue Reading how can I generate a VCF (in hg38 coords) of differences between hg38 and CHM13?

How to Diagnose Fragile X from Whole Genome Sequencing

Following this post. Perhaps you can indicate if I’m on the right track, although I’m a complete amateur now. We know that the FMR1 gene is located in: chrX:147,911,919-147,951,125 Your size is: 39,207 bases So I exactly “snipped” the above sequence, which is exactly 39,207 characters from a 30x Whole-Genome…

Continue Reading How to Diagnose Fragile X from Whole Genome Sequencing

Annotate vcf file using GNOMAD

Hi, I use a loop for that. Something like this to inspire you: # Enter folder where gnomAD data are here: gnomAD=”/path/to/gnomAD/database/release/3.1.2/gnomad.genomes.v3.1.2.sites.” # Enter the folder where your results are and will be annotated further cd /path/to/your/results/folder/ # Enter the name of the final results’ file from SnpSift ann=”results.ann.gnomAD.genomes.v3.1.2.vcf” #…

Continue Reading Annotate vcf file using GNOMAD

How come on T2T human genome assembly the PAR regions of chrY are bigger than the corresponding regions on chrX?

How come on T2T human genome assembly the PAR regions of chrY are bigger than the corresponding regions on chrX? 1 As you can see on the PAR file listed here: https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/CHM13/assemblies/analysis_set/chm13v2.0_PAR.bed If you download the file you see the coordinates for PAR1 listed are: PAR1 chrX 0 2394410 chrY…

Continue Reading How come on T2T human genome assembly the PAR regions of chrY are bigger than the corresponding regions on chrX?

Double Alleles in gVCF file for ChrX, ChrY & ChrM

Double Alleles in gVCF file for ChrX, ChrY & ChrM 1 Hello, I understand for that for men X,Y, mitochondria chromosomes should be represented by a single allele. but I have a gVCF file that has 2 alleles (homozygous or heterozygous) for each position under these chromosomes, mostly with GT:…

Continue Reading Double Alleles in gVCF file for ChrX, ChrY & ChrM

Bed files of narrowPeak in ENCODE

Bed files of narrowPeak in ENCODE 0 I found that the Pvalue column is all -1 in the narrowpeak.bed file I downloaded from ENCODE. According to the manual provided by ENCODE, that means the Pvalues of all peaks are unassigned. Does anyone know what’s the meaning of that? Thanks a…

Continue Reading Bed files of narrowPeak in ENCODE

error in Genome Mepping by BWA tools in Linux

$ gmap_build -D:\btau8refflat.gtf Unknown option: D:btau8refflat.gtf -k flag not specified, so building main hash table with default 15-mers -j flag not specified, so building regional hash tables with default 6-mers gmap_build: Builds a gmap database for a genome to be used by GMAP or GSNAP. Part of GMAP package, version…

Continue Reading error in Genome Mepping by BWA tools in Linux

Normalization of ChIP-seq results with deepTools

Normalization of ChIP-seq results with deepTools 0 Hello everyone! I`m trying to visualize ChIP-seq data with deepTools. I used the following commands: bamCoverage (normalization) -> bigwigCompare (normalization against input) -> computeMatrix -> plotProfile As a result I got this figure. Green and yellow plots are for treatment samples replicas, blue…

Continue Reading Normalization of ChIP-seq results with deepTools

unrecognized arguments using deepTools bamCoverage

Error: unrecognized arguments using deepTools bamCoverage 0 Hello everyone! I`m trying to visualize ChIP-seq data with deepTools but facing errors when use optional arguments for normalization. Comands like: bamCoverage -b file.bam -o file.bigWig -of bigwig –normalizeUsing RPGC –effectiveGenomeSize 2862010578 –ignoreForNormalization chrX chrY bamCompare -b1 file-1.bam -b2 file-2.bam -o files-1-2.bigWig -of…

Continue Reading unrecognized arguments using deepTools bamCoverage

Help in replicating LDSC heritability estimates

Hi, I am trying to replicate the heritability estimates based on the insomnia GWAS summary statistics using LDSC. However, I have encountered a problem as my estimates seem to be only about half of the original estimates listed in Table S1. Despite my efforts to locate the error, I have…

Continue Reading Help in replicating LDSC heritability estimates

VEP-like tool for sequence ontology and HGVS annotation of VCF files

Mehari is a software package for annotating VCF files with variant effect/consequence. The program uses hgvs-rs for projecting genomic variants to transcripts and proteins and thus has high prediction quality. Other popular tools offering variant effect/consequence prediction include: Mehari offers predictions that aim to mirror VariantValidator, the gold standard for…

Continue Reading VEP-like tool for sequence ontology and HGVS annotation of VCF files

ChrX allele frequency in males and females

ChrX allele frequency in males and females 0 Hi, I have a question about the allele frequency output files (.frq) from VCFtools. My question is specific to chromosome X variants. I am writing out the allele frequencies for males and females separately to identify the variants with different allele frequencies….

Continue Reading ChrX allele frequency in males and females

deepTools multiBigwigSummary “Invalid interval bounds” error

I’m trying to bin 1x normalized ATAC-seq bigWig (generated by the bamCoverage function) with the multiBigwigSummary function in deepTools with the intention of clustering several ATAC-seq samples with deepTool’s plotCorrelation function. My bamCoverage commands look like this: > bamCoverage -b input.bam -o output.SeqDepthNorm.bw -p “max” –effectiveGenomeSize 2805636331 –normalizeUsing RPGC -ignore…

Continue Reading deepTools multiBigwigSummary “Invalid interval bounds” error

Converting dbSNP VCF to work with RefSeq chromossome ID

Hello everyone! I’ve been trying to use GATK with updated version of the human genome as the GATK files are outdated by ten years. I’ve downloaded NCBI reference GCF_000001405.40.fna, which is GRCh38.p14 For dbSNP version, I’ve downloaded GCF_000001405.40.gz , which is also GRCh38.p14 When extracting the contig names from my…

Continue Reading Converting dbSNP VCF to work with RefSeq chromossome ID

Bedtools multicov output file error

Bedtools multicov output file error 0 Hello, I ran bedtools multicov for 5 bam files. The output file is corrupted after a certain set of time i.e, from chr13 to chrM the output generated as below : tail multicov.txt output – error 6500 chrX 140406100 140406600 chrX 140406200 140406700 chrX…

Continue Reading Bedtools multicov output file error

Fasta file and GTF file for STAR alignment

Fasta file and GTF file for STAR alignment 3 Hello there, The top-level fasta file will include chromsomes, regions not assembled into chromosomes and N padded haplotype/patch regions. See more here: ftp://ftp.ensembl.org/pub/release-92/fasta/mus_musculus/dna/README. If you are only looking for reference genome assembly chromosome level sequences then use the primary_assembly.fa file. The…

Continue Reading Fasta file and GTF file for STAR alignment

VCF input error using locateVariant command (VariantAnnotation)

Hi Everyone! I have a problem with using rDGIdb looking for possible drug candidates. I want to use the option of VCF input and I am following the code described in https://bioconductor.org/packages/release/bioc/vignettes/rDGIdb/inst/doc/vignette.pdf. However it always gives me an error in the end: library(VariantAnnotation) library(TxDb.Hsapiens.UCSC.hg19.knownGene) library(org.Hs.eg.db) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene vcf <-…

Continue Reading VCF input error using locateVariant command (VariantAnnotation)

Systematic and benchmarking studies of pipelines for mammal WGBS data in the novel NGS platform | BMC Bioinformatics

Comparison of read level and improving the mapping efficiency according to trimming Since the generation of high-quality WGBS data ultimately impacts the quantification and interpretation of Cs methylation levels, it is indispensable to monitor the raw data quality and interrogate the appropriate pre-processing step to cleanse data [1]. To avoid…

Continue Reading Systematic and benchmarking studies of pipelines for mammal WGBS data in the novel NGS platform | BMC Bioinformatics

getTable ignores query ranges

HI everyone. I have a set of mouse SNPs (~974) from GRCm39 that I’m trying to get either GERP or UCSC Conservation scores on. To do this, I’m using rtracklayer to try to query the ranges of the SNP and return the multiz35way conservation score. However, when I do this,…

Continue Reading getTable ignores query ranges

Datasets | TogoVar

Variant frequencies for which you can apply for use of individual-level data∗1 to the NBDC human databases∗2 Click the links at the Included controlled-access datasets to apply for use of individual-level data ∗1:fastq/bam/cel files and/or lists of genotype data etc.∗2:Japanese Genotype-phenotype Archive (JGA) / AMED Genome group sharing Database (AGD)…

Continue Reading Datasets | TogoVar

Scatter Gather principle by chromosome on Gatk

Scatter Gather principle by chromosome on Gatk 0 Hi all, On a quest to optimize gatk pipeline, I met scatter gather principle, so I did following, pids= for chr in chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chr20…

Continue Reading Scatter Gather principle by chromosome on Gatk

Help with PCA plot from SNPRelate

Hello, BioStars community! I am working on a datase of 50 samples(between cases and controls) with genotyping data for ~200 SNPs. I ran SNPRelate PCA analysis without adding population data and tried to plot its results. My PCA plot seems a bit “strange” since the observations did not come together…

Continue Reading Help with PCA plot from SNPRelate

Split multiallelic SNPs to biallelic from vcf

Dear all, I have a particular vcf file like this, chrX 29 . G A,T . PASS AC=1,1;AN=3 GT:DP:HF:CILOW:CIUP:SDP 0/1/2:4839:0.003,0.001:0.002,0.0:0.005,0.003:14;0,4;2 I tried various tools to split this, but I get the following results, so the FORMAT and INFO lines are identical. chrX 29 . G A . PASS AC=1,1;AN=3;OLD_MULTIALLELIC=chrM:899:G/A/T GT:DP:HF:CILOW:CIUP:SDP…

Continue Reading Split multiallelic SNPs to biallelic from vcf

Tools for Normalizing and Comparing ChIP-seq Samples

data(H3K27Ac, package = “MAnorm2”) attr(H3K27Ac, “metaInfo”) ## Make a comparison between GM12891 and GM12892 cell lines and create an MA ## plot on the comparison results. # Perform MA normalization and construct bioConds to represent the two cell # lines. norm <- normalize(H3K27Ac, 5:6, 10:11) norm <- normalize(norm, 7:8, 12:13)…

Continue Reading Tools for Normalizing and Comparing ChIP-seq Samples

Produce PCA bi-plot for 1000 Genomes Phase III

Note1 – Previous version: Produce PCA bi-plot for 1000 Genomes Phase III in VCF format (old) Note2 – this data is for hg19 / GRCh37 Note3 – GRCh38 data is available HERE The tutorial has been updated based on the 1000 Genomes Phase III imputed genotypes. The original tutorial was…

Continue Reading Produce PCA bi-plot for 1000 Genomes Phase III

MAPQ (Mapping quality) of 0 for most reads from BWA-MEM2 (with no secondary alignment or other apparent reason)

Hello, I got a very weird output from BWA-mem2 – most of the reads have mapping quality of 0, even though there is no secondary alignment or anything else suspicious. I got sequencing data that was aligned with Novoalign to hg18, the data was bam files. I needed to realign…

Continue Reading MAPQ (Mapping quality) of 0 for most reads from BWA-MEM2 (with no secondary alignment or other apparent reason)

Transcription Factor Functions

Transcription Factor Functions 1 I have been wondering if there is a database that contains information about whether a transcription factor has a repressive or an activating or a context-dependent function. factor database transcription • 71 views If you performed a whole-genome FIMO scan, selecting those hits that match your…

Continue Reading Transcription Factor Functions

Fasta.fai file error

Fasta.fai file error 0 Hi, I have been struggling with an error in bedtools intersect. The command I am trying to run is as follows bedtools intersect -a sorted.vcf -b nstd166.GRCh38.variant_call_chr.vcf.gz -wo -sorted -f 0.8 -r -g Homo_sapiens_assembly38.fasta.fai For some of the files that I am assessing, I don’t get…

Continue Reading Fasta.fai file error

What is the difference between GRCh37 and hs37? And hg19?

This is what I have found so far. Please correct me if I am wrong. GRCh37 w/o patches includes the primary assembly (22 autosomal, X. Y, and non-chromosomal supecontigs) and alternate scaffolds, but not a reference mitogenome. Non-chromosomal supercontigs are the unlocalized and unplaced scaffolds. The rCRS reference mitogenome in…

Continue Reading What is the difference between GRCh37 and hs37? And hg19?

To find total genes on favorable chromosome

To find total genes on favorable chromosome 2 How can I find how many genes exist on each chromosome? genes total • 52 views Counting from GENCODE for the vM27 mouse reference: wget http://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M27/gencode.vM27.annotation.gtf.gz #/ In R: library(rtracklayer) gtf <- rtracklayer::import(“~/gencode.vM27.annotation.gtf.gz”) table(as.character(seqnames(gtf[gtf$type==”gene”]))) chr1 chr10 chr11 chr12 chr13 chr14 chr15 chr16…

Continue Reading To find total genes on favorable chromosome

Platypus

Platypus 0 Hi, I’m super new to WGS and bioinformatics, but I’m a classic software data scientist, so I know enough to be annoying. I’m using Platypus too call variants on 100X WGS via Nebula Genomics. I found an odd series of calls and am not sure if this is…

Continue Reading Platypus

gender determination and chrX CN calls

gender determination and chrX CN calls 1 I’m running CNVKit in amplicon mode on a set of tumor bam files generated with a small amplicon panel of 45 genes. The panel includes just one gene on chrX, and none on chrY. My reference is generated by 10 normal male samples…

Continue Reading gender determination and chrX CN calls

Default CNV call thresholds for haplotype chromosome s

Default CNV call thresholds for haplotype chromosome s 0 Hi, I confuse a topic that about the CNV call. The default thresholds are -1.1 => 0, -0.25 => 1, 0.2 => 2, 0.7 => 3 for discrete copy number. But these thresholds doesn’t work for chrY and chrX. What is…

Continue Reading Default CNV call thresholds for haplotype chromosome s