Tag: hg38

ZP77 – YFull YTree Info

R-ZP77 – YFull YTree Info SNPs currently defining R-ZP77 ZP77 / FGC6562     Sample ID Country / Language Info Ref File Testing company Statistics Status YF008362 —— R-ZP77* —— Hg19 .BAM FTDNA (Y500) 41X, 13.8 Mbp, 165 bp YF067652 Unknown R-BY40744 —— Hg38 .BAM FTDNA (Y700) 36X, 18.7 Mbp, 151…

Continue Reading ZP77 – YFull YTree Info

Download full list of SNPs and their coordinates in hg38

Download full list of SNPs and their coordinates in hg38 3 What is the best / standard place to get a full list of SNPs and their coordinates in hg38? I downloaded the SNPsnap database, but just realized that those coordinates are in hg19. I’m trying to figure out how…

Continue Reading Download full list of SNPs and their coordinates in hg38

htseq-count -t gene not working

I found a little problem. When I set the “-t gene”, the reads is mark “__no_feature”. But when I set the “-t exon”, the reads is mark “ENSG00000276104”. The gene “ENSG00000276104” is a single exon gene. I don’t know why this happens. reads: “TGTCTGTGGCGGTGGGATCCCGCGGCCGTGTTTTCCTGGTGGCCCGGCCGTGCCTGAGGTTTCTCCCCGAGCCGCCGCCTCTGCGGGCTCCCGGGTGCCCTTGCCCTCGCGGTCCCCGGCCCTCGCCCGTCTGTGCCCTCTTCCCCGCCCGCCGATCCTCTTCTTCCCCCCGAGCGGCTCACCGGCTTCACGTCCGTTGGTGGCCCCGCCTGGGAC”. I had aligned to hg38 by…

Continue Reading htseq-count -t gene not working

Bioconductor – ChIPQC

    This package is for version 3.1 of Bioconductor; for the stable, up-to-date release version, see ChIPQC. Quality metrics for ChIPseq data Bioconductor version: 3.1 Quality metrics for ChIPseq data Author: Tom Carroll, Wei Liu, Ines de Santiago, Rory Stark Maintainer: Tom Carroll <tc.infomatics at gmail.com>, Rory Stark <rory.stark…

Continue Reading Bioconductor – ChIPQC

hg38 Import custom reference upload error

Our version of TS is 5.12.2 When trying to upload new custom reference fasta (downloaded from ncbi ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz, gunzipped and renamed to hg38.fasta) through “Import custom reference” in interface an error occures: “uploaded file size is incorrect” (to be honest the error was not shown in logs, because of TypeError…

Continue Reading hg38 Import custom reference upload error

help with CrossMap

help with CrossMap 0 Hello all, I would really appreciate your help as I am new to working with different file builds and having a setback lifting a vcf file from build hg38 to hg19. in essence, using CrossMap the chromosome value gets altered. Like for example, below is the…

Continue Reading help with CrossMap

Systems biology analysis of human genomes points to key pathways conferring spina bifida risk

Significance Genetic investigations of most structural birth defects, including spina bifida (SB), congenital heart disease, and craniofacial anomalies, have been underpowered for genome-wide association studies because of their rarity, genetic heterogeneity, incomplete penetrance, and environmental influences. Our systems biology strategy to investigate SB predisposition controls for population stratification and avoids…

Continue Reading Systems biology analysis of human genomes points to key pathways conferring spina bifida risk

Padding out a GVCF file with 1000G exomes to get gatk VariantRecalibrator working with a small sample

I’ve got sequencing data for a small 500 bp amplicon from a few samples. GATK best principles suggest running VariantRecalibrator on the GVCF files I generate. I’m trying to get this working, but I get an error about “Found annotations with zero variances”. Reading the gatk manual and other posts…

Continue Reading Padding out a GVCF file with 1000G exomes to get gatk VariantRecalibrator working with a small sample

computeMatrix in deeptool is Running with no result

computeMatrix in deeptool is Running with no result 0 Hi All, I wonder if someone can help me in explaining what to input on the -R <bed file> argument of the code below? computeMatrix scale-regions -S <bigwig file(s)> -R <bed file> -b 1000 what I did for example, I download…

Continue Reading computeMatrix in deeptool is Running with no result

NoClassDefFoundError: htsjdk/samtools/util/IntervalTree

NoClassDefFoundError: htsjdk/samtools/util/IntervalTree 0 When I run circm6A (github.com/canceromics/circm6a) example code: cd ../.. java -Xmx16g -jar circm6a.jar -ip test_data/HeLa_eluate_rep_1.chr22.bam -input test_data/HeLa_input_rep_1.chr22.bam -r test_data/gencode_chr22.gtf -g test_data/hg38_chr22.fa -o test_data/example_Hela The following error occurred: Start at 2021-12-12 16:33:26 Exception in thread “main” java.lang.NoClassDefFoundError: htsjdk/samtools/util/IntervalTree at main.Method.loadGenes(Method.java:200) at main.Method.run(Method.java:66) at main.Main.main(Main.java:9) Caused by: java.lang.ClassNotFoundException: htsjdk.samtools.util.IntervalTree…

Continue Reading NoClassDefFoundError: htsjdk/samtools/util/IntervalTree

transcripts are not true in TxDb.Hsapiens.UCSC.hg38.knownGene

transcripts are not true in TxDb.Hsapiens.UCSC.hg38.knownGene 1 @11b02720 Last seen 2 hours ago United States Hello, I used TxDb.Hsapiens.UCSC.hg38.knownGene/GenomicFeatures to retrieve gene promoters and other genomic features. here is code: library(‘TxDb.Hsapiens.UCSC.hg38.knownGene’) txdb <- TxDb.Hsapiens.UCSC.hg38.knownGene PR <- promoters(txdb, upstream=2000, downstream=0) but when I take a look at the PR results: it…

Continue Reading transcripts are not true in TxDb.Hsapiens.UCSC.hg38.knownGene

gatk VariantRecalibrator positional argument error

I’m trying to use recalibrate my vcf using gatk VariantRecalibrator, but keep getting an error “Illegal argument value: Positional arguments were provided”. But I don’t know what this means, or how to correct it! Here’s my call: gatk VariantRecalibrator -R “/Volumes/Seagate Expansion Drive/refs/hg38/gatk download/Homo_sapiens_assembly38.fasta” -V “$OUT”/results/variants/”$SN”.norm.vcf.gz -AS –resource hapmap,known=false,training=true,truth=true,prior=15.0: “/Volumes/Seagate…

Continue Reading gatk VariantRecalibrator positional argument error

What is the single nucleotide polymorphism database ( dbsnp )?

The Single Nucleotide Polymorphism Database (dbSNP) is a free public archive for genetic variation within and across different species developed and hosted by the National Center for Biotechnology Information (NCBI) in collaboration with the National Human Genome Research Institute (NHGRI). Furthermore, are there any databases for single nucleotide polymorphisms?As there…

Continue Reading What is the single nucleotide polymorphism database ( dbsnp )?

The Biostar Herald for Tuesday, September 21, 2021

The Biostar Herald publishes user submitted links of bioinformatics relevance. It aims to provide a summary of interesting and relevant information you may have missed. You too can submit links here. This edition of the Herald was brought to you by contribution from Istvan Albert, and was edited by Istvan…

Continue Reading The Biostar Herald for Tuesday, September 21, 2021

I can’t get a dossage file using PLINK

Hi, I have been trying to get a dosage file from vcf, map and fam files. For that, I have written this bash script : plink –fam plink.fam –map plink.map –dosage one.vcf –write-dosage However, I got this error: –dosage: Reading from one.vcf. Error: Line 1 of one.vcf has fewer tokens…

Continue Reading I can’t get a dossage file using PLINK

What is the codification in genestrand 1 and 2?

What is the codification in genestrand 1 and 2? 0 Hi there, I’m doing some peak annotation using ChIPseeker library(ChIPseeker) library(TxDb.Hsapiens.UCSC.hg38.knownGene) library(clusterProfiler) library(annotables) library(org.Hs.eg.db) txdb <- TxDb.Hsapiens.UCSC.hg38.knownGene peaks= readPeakFile(“peaks_”, header = F) peakAnno <- annotatePeak(peaks, tssRegion=c(-3000, 3000), TxDb=txdb, annoDb=”org.Hs.eg.db”) peaks_annot <- as.data.frame(peakAnno) In my annotation file “geneStrand” is codified as…

Continue Reading What is the codification in genestrand 1 and 2?

Best tools for calling structural variants from 2 assemblies?

Best tools for calling structural variants from 2 assemblies? 0 Dear community, I have the fasta files of 2 assemblies of the human genome (for example hg19 and hg38). What would be the best tools to call structural variants from these 2 fasta files? Most of the tools I know…

Continue Reading Best tools for calling structural variants from 2 assemblies?

python – snakemake multiple parameters for multiple input and single output in snakemake. ConbineGVCFs gatk problem

I have written a rule for CombineGVCFs in gatk4. The rule is as follow all_gvcf = get_all_gvcf_list() rule cohort: input: all_gvcf_list = all_gvcf, ref=”/data/refgenome/hg38.fa”, interval_list = prefix+”/bedfiles/hg38.interval_list”, params: extra = “–variant”, output: prefix+”/vcf/cohort.g.vcf”, shell: “gatk CombineGVCFs -R {input.ref} {params.extra} {input.all_gvcf_list} -O {output} –tmp-dir=/data/tmp -L {input.interval_list}” all_gvcf is the dataset for…

Continue Reading python – snakemake multiple parameters for multiple input and single output in snakemake. ConbineGVCFs gatk problem

Alternate nucleotide is more frequent than reference nucleotide. OMG I’m dizzy. How do I stop the twirl?

This is due to the fact that the very reference genomes that we use for re-alignment are themselves based on individuals who carry rare risk alleles. Thus, when we call variants against these genomes, we are, at many loci, comparing against rare disease risk alleles. As the best/worst example (depending…

Continue Reading Alternate nucleotide is more frequent than reference nucleotide. OMG I’m dizzy. How do I stop the twirl?

mixing hg38 and GRCh38 during variant calling

mixing hg38 and GRCh38 during variant calling 0 Hello everyone! I’ve been working on a variant calling pipeline for WES data and used a mix of hg38 and GRCh38 reference files after reading that hg38 is just an abbreviation of GRCh38, and that they refer to the same thing. But…

Continue Reading mixing hg38 and GRCh38 during variant calling

SNP exon region UCSC

SNP exon region UCSC 2 how i can get SNP in only exons regions genome with UCSC? UCSC get the all SNP of gene region, and there is no filter option to get only exon region. tx ucsc SNP exon • 245 views • link updated 2 hours ago by…

Continue Reading SNP exon region UCSC

ZhaozzReal/SNV_IPA: Detect SNV-associated intronic polyadenylation events from standard RNAseq data

Description Somatic single nucleotide variants (SNVs) in cancer genome affect gene expression through various mechanisms depending on their genomic location. In this study, we found that somatic SNVs near splice site are associated with abnormal intronic polyadenylation (IPA) . Here we give examples to show how to detect SNV-associated IPA…

Continue Reading ZhaozzReal/SNV_IPA: Detect SNV-associated intronic polyadenylation events from standard RNAseq data

Where can I get ?or how can I make a mappability track for hg38 assembly

Where can I get ?or how can I make a mappability track for hg38 assembly 2 Lucky you @manojmumar_bhosale I worked on similar problem recently and therefore have the bash script you can use. Required tools: GEM libary from here UCSC’s wigToBigWig from here (I chose binary for Linux 64…

Continue Reading Where can I get ?or how can I make a mappability track for hg38 assembly

How to load user-defined genome in IGV-webapp

How to load user-defined genome in IGV-webapp 0 I would like to create a session in IGV-webapp using a HTML file. The following works with pre-defined genomes (g.e. genome: “hg38”), but I would like to load my own genome. Is there a way to achieve this? <!DOCTYPE html> <html lang=”en”>…

Continue Reading How to load user-defined genome in IGV-webapp

UCSC knownCanonical hg19 vs. hg38

Hello, We have an FAQ page that covers this topic (genome.ucsc.edu/FAQ/FAQgenes.html#singledownload). As posted by ATpoint, it boils down to different datasets and different approaches. hg19 knownCanonical was last updated in 2013 and built primarily from RefSeq and GenBank sequences and a few other sources. One isoform was identified from each…

Continue Reading UCSC knownCanonical hg19 vs. hg38

Get rsID for a list of SNPs in an entire GWAS sumstats file

Here is a fairly efficient way to do this; assuming hg38 and BEDOPS and standard Unix tools installed. $ bedmap –echo –echo-map-id –delim ‘t’ <(awk ‘{n=split($0,a,/[:_]/); print “chr”a[1]”t”a[2]”t”a[2]+1″t”a[3]”https://www.biostars.org/”a[4];}’ sumstats.txt | sort-bed -) <(wget -qO- hgdownload.cse.ucsc.edu/goldenPath/hg38/database/snp150.txt.gz | gunzip -c | cut -f2-5 | sort-bed -) > answer.bed This gets around making…

Continue Reading Get rsID for a list of SNPs in an entire GWAS sumstats file

UCSC liftover

UCSC liftover 2 Hi, I’m using UCSC liftover to convert hg19 to hg38. The result came out that I don’t understand. Feb. 2009 (GRCh37/hg19) → Dec. 2013 (GRCh38/hg38) – chr1:120904787 → chr1:143905854 Dec. 2013 (GRCh38/hg38) → Feb. 2009 (GRCh37/hg19) – chr1:143905854 → chr1:149400430 (I didn’t check “Allow multiple output regions”.)…

Continue Reading UCSC liftover

Paired-end reads reported without mates: how to play matchmaker?

Hi Everyone, I am currently looking at Acute Myeloid Leukemia (AML) paired-end WGS samples from the TARGET data ocg.cancer.gov/programs/target/target-methods#3241. A bioinformatician in our group remapped the samples from hg19 to hg38. Unfortunately, we do not have any copies of the hg19 version anymore. However, when I try to run anything…

Continue Reading Paired-end reads reported without mates: how to play matchmaker?

Coverage drops in fastq alignment against custom Immunoglobulin reference

Coverage drops in fastq alignment against custom Immunoglobulin reference 0 I am working on Hiseq2000/2500 single end reads on RNASeq leukemia samples. I am interested in aligning all the reads beloging to the Immunoglobulin genes (Ig) for further analysis. The task is difficult for two main reasons: Final Ig genes…

Continue Reading Coverage drops in fastq alignment against custom Immunoglobulin reference

vcf file analysis

vcf file analysis 0 Hello everyone, I have 22 vcf file for each chr. They were in genome build hg19 so I did a liftover and convert them to hg38 genome build. Now I need just chrom and position values from these vcf files and merge them together into a…

Continue Reading vcf file analysis

Bioconductor – BSgenome.Hsapiens.UCSC.hg38.dbSNP151.major

DOI: 10.18129/B9.bioc.BSgenome.Hsapiens.UCSC.hg38.dbSNP151.major     Full genome sequences for Homo sapiens (UCSC version hg38, based on GRCh38.p12) with injected major alleles (dbSNP151) Bioconductor version: Release (3.13) Full genome sequences for Homo sapiens (Human) as provided by UCSC (hg38, based on GRCh38.p12) with major allele injected from dbSNP151, and stored in Biostrings…

Continue Reading Bioconductor – BSgenome.Hsapiens.UCSC.hg38.dbSNP151.major

tool or database to convert Gene ID to genomic position

tool or database to convert Gene ID to genomic position 1 Hello.I have lots of Pseudogene IDs like LOC100431174 but none of the below methods worked for me to find their genomic position “offline”. I need a table or package to do it offline without querying to a webpage.methods I…

Continue Reading tool or database to convert Gene ID to genomic position

unable to find chromosome in SAM header

featureCounts: unable to find chromosome in SAM header 0 I am using featureCounts to try and create a count table for some RNA-Seq data I collected using an Oxford Nanopore platform. I have .sam files aligned with minimap2, and am running the following command to try to get a count…

Continue Reading unable to find chromosome in SAM header

miRNAseq analysis not shown adapter sequence and huge N’s content

miRNAseq analysis not shown adapter sequence and huge N’s content 0 Hi there, This is my third time doing miRNA sequencing analysis, so i do not have huge experience on this… So, i have 18 human semen samples, (also no experience in this type samples) i have been reading alot…

Continue Reading miRNAseq analysis not shown adapter sequence and huge N’s content

Predicting and characterizing a cancer dependency map of tumors with deep learning

INTRODUCTION The development of novel cancer therapies requires knowledge of specific biological pathways to target individual tumors and eradicate cancer cells. Toward this goal, the landscape of genetic vulnerabilities of cancer, or the cancer dependency map, is being systematically profiled. Using RNA interference (RNAi) loss-of-function screens, Marcotte et al. (1),…

Continue Reading Predicting and characterizing a cancer dependency map of tumors with deep learning

liftover using genome browser

liftover using genome browser 0 Hello everyone, I have a file which is hg38 build. I want to do a liftover and change it to hg19. I thought of using liftover tool from UCSC genome browser. I realise that the input file should be bed format. My file has only…

Continue Reading liftover using genome browser

VariantRecalibrator no positional argument is defined for this tool.

Hi, I am trying to run the following command: gatk VariantRecalibrator -R genome.fa -V all.Sample.SNP.vcf.gz –trust-all-polymorphic -tranche 100.0 -tranche 99.95 -tranche 99.9 -tranche 99.8 -tranche 99.6 -tranche 99.5 -tranche 99.4 -tranche 99.3 -tranche 99.0 -tranche 98.0 -tranche 97.0 -tranche 90.0 -an MQRankSum -an ReadPosRankSum -an FS -an MQ -an SOR…

Continue Reading VariantRecalibrator no positional argument is defined for this tool.

Get chromosome sizes from fasta file

Get chromosome sizes from fasta file 4 Hello, I’m wondering whether there is a program that could calculate chromosome sizes from any fasta file? The idea is to generate a tab file like the one expected in bedtools genomecov for example. I know there’s the fetchChromSize program from UCSC, but…

Continue Reading Get chromosome sizes from fasta file

Contig chr1 given as location, but this contig isn’t present in the Fasta sequence dictionary

Badly formed genome unclippedLoc: Contig chr1 given as location, but this contig isn’t present in the Fasta sequence dictionary 2 Hi everyone, I’m trying to run Mutect2 for WES cancer data. However, since their Resource bundle only supports h19 seems I cannot proceed (I want to compare it with Strelka2…

Continue Reading Contig chr1 given as location, but this contig isn’t present in the Fasta sequence dictionary

Using MACS2 parameters

Using MACS2 parameters 0 Trying to reproduce a galaxy training in Linux CLI. I’ve come up with the following commands for the peak calling with MACS2. Am I on the right track? The galaxy parameters are- macs2 command can be- macs2 callpeak -t input_file.bed -n macs_output -g 50818468 –nomodel –shift…

Continue Reading Using MACS2 parameters

Non-repeat human genome dataset

Non-repeat human genome dataset 1 Could anyone please point me to where I could find a dataset of non-repeat sequences for the human ref genome. I’m not sure if it’s still regarded as true, but I saw that possibly 2/3 of the human genome contains repeats. Is there a place…

Continue Reading Non-repeat human genome dataset

VCF file phasing by SHAPEIT

Hi everybody, I would like to phase (just phasing, not imputation) vcf file containing about 1100 individuals (a given human population) derived from whole genome sequencing, the vcf file obtained by GATK. As I searched, SHAPEIT was mostly used; based on its manual, it requires genetic map for phasing, however,…

Continue Reading VCF file phasing by SHAPEIT

Finding 16 mer not present in GRCh38

Thanks for the question – it has kept me busy this Sunday morning / afternoon. As implied by others, this poses a computational challenge but is not insurmountable. For motif searching generally, I usually use AWK. My approach here was to: generate all possible k-mers of the chosen size (run…

Continue Reading Finding 16 mer not present in GRCh38

question about running CIRI-full

question about running CIRI-full 1 I’m using ciri-full to calculate the full length sequence of circRNAs ,and I can run the test data set successfully, but I can’t run my own data running test data set: java -jar ../CIRI-full.jar Pipeline -1 test_1.fq.gz -2 test_2.fq.gz -a test_anno.gtf -r test_ref.fa -d test_output/…

Continue Reading question about running CIRI-full

VCF to 23 and Me format and changing ensamble reference help needed for underestanding VCF

Hello i am trying to change my nebula Genomics report to 23 and me Format i have to problems nebula uses 38 human ensemble and 23 and me 37, I was thinking to do a python script but i have some doubts: My plan was to change the genotype according…

Continue Reading VCF to 23 and Me format and changing ensamble reference help needed for underestanding VCF