Tag: hg19

Download full list of SNPs and their coordinates in hg38

Download full list of SNPs and their coordinates in hg38 3 What is the best / standard place to get a full list of SNPs and their coordinates in hg38? I downloaded the SNPsnap database, but just realized that those coordinates are in hg19. I’m trying to figure out how…

Continue Reading Download full list of SNPs and their coordinates in hg38

Bioconductor – RiboCrypt

DOI: 10.18129/B9.bioc.RiboCrypt     Interactive visualization in genomics Bioconductor version: Release (3.14) R Package for interactive visualization and browsing NGS data. It contains a browser for both transcript and genomic coordinate view. In addition a QC and general metaplots are included, among others differential translation plots and gene expression plots….

Continue Reading Bioconductor – RiboCrypt

Bwa on multiple processor

Hi Guys, When I am trying to run bwa mem on multiple processor, I am getting error as : > mpirun -np 16 bwa mem hg19-agilent.fasta R1.fastq R2.fastq | samtools sort -o aln.bam [M::bwa_idx_load_from_disk] read 0 ALT contigs [M::bwa_idx_load_from_disk] read 0 ALT contigs [M::bwa_idx_load_from_disk] read 0 ALT contigs [M::bwa_idx_load_from_disk] read…

Continue Reading Bwa on multiple processor

Bioconductor – derfinder (development version)

DOI: 10.18129/B9.bioc.derfinder     This is the development version of derfinder; for the stable release version, see derfinder. Annotation-agnostic differential expression analysis of RNA-seq data at base-pair resolution via the DER Finder approach Bioconductor version: Development (3.15) This package provides functions for annotation-agnostic differential expression analysis of RNA-seq data. Two…

Continue Reading Bioconductor – derfinder (development version)

Alignment report

Alignment report 0 Hi Guys, I did alignment of R1 and R2 fastq files with reference genome using bwa mem and got bam file. Now, I want to check whether the alignment is done correctly and alignment percentage,coverage etc. I run following command: bwa mem hg19.fasta R1.fastq R2.fastq | samtools…

Continue Reading Alignment report

Bioconductor – ChIPQC

    This package is for version 3.1 of Bioconductor; for the stable, up-to-date release version, see ChIPQC. Quality metrics for ChIPseq data Bioconductor version: 3.1 Quality metrics for ChIPseq data Author: Tom Carroll, Wei Liu, Ines de Santiago, Rory Stark Maintainer: Tom Carroll <tc.infomatics at gmail.com>, Rory Stark <rory.stark…

Continue Reading Bioconductor – ChIPQC

identical(current_classes, .UCSC_TXCOL2CLASS) is not TRUE

GenomicFeatures::makeTxDbFromUCSC failing with an error: identical(current_classes, .UCSC_TXCOL2CLASS) is not TRUE 1 @mikhail-dozmorov-23744 Last seen 1 day ago United States Hi,The GenomicFeatures::makeTxDbFromUCSC function fails with: library(GenomicFeatures) > hg19.refseq.db <- makeTxDbFromUCSC(genome=”hg19″, table=”refGene”) Download the refGene table … Error in .fetch_UCSC_txtable(genome(session), tablename, transcript_ids = transcript_ids) : identical(current_classes, .UCSC_TXCOL2CLASS) is not TRUE OK The…

Continue Reading identical(current_classes, .UCSC_TXCOL2CLASS) is not TRUE

QIAGEN Bioinformatics Manuals

The Reference Data Manager The QIAGEN Sets Reference Data Library tab gives access to the reference data used with the CLC Haplotype Calling plugin ready-to-use workflow. From the wizard you can download and configure the reference data. For the full documentation relating to QIAGEN Sets, please see the QIAGEN Sets…

Continue Reading QIAGEN Bioinformatics Manuals

help with CrossMap

help with CrossMap 0 Hello all, I would really appreciate your help as I am new to working with different file builds and having a setback lifting a vcf file from build hg38 to hg19. in essence, using CrossMap the chromosome value gets altered. Like for example, below is the…

Continue Reading help with CrossMap

From where to get a comprehensive list of genes with gene start, gene end and chromosome for build 37?

From where to get a comprehensive list of genes with gene start, gene end and chromosome for build 37? 0 Hi all, I am trying to annotate list of genes with gene start, gene end (build37) and chromosome. I mapped most of the genes from a list downloaded from Biomart/UCSC,…

Continue Reading From where to get a comprehensive list of genes with gene start, gene end and chromosome for build 37?

Bioconductor – ProteoDisco

DOI: 10.18129/B9.bioc.ProteoDisco     Generation of customized protein variant databases from genomic variants, splice-junctions and manual sequences Bioconductor version: Release (3.14) ProteoDisco is an R package to facilitate proteogenomics studies. It houses functions to create customized (mutant) protein databases based on user-submitted genomic variants, splice-junctions, fusion genes and manual transcript…

Continue Reading Bioconductor – ProteoDisco

How to convert bedgraph file with bins into GRanges object?

You could convert your bedGraph bins from hg18 to hg19 using liftover, so you can overlap them with your peaks. You would read them into a GRanges object, then hand this to the liftover function to translate from hg18 to hg19, then unlist the results to get back a regular…

Continue Reading How to convert bedgraph file with bins into GRanges object?

Why single cell R2 fastq have no read identified by bowtie2 ?

Why single cell R2 fastq have no read identified by bowtie2 ? 0 When we input R2 fastq.gz into bowtie2, human sequence was not removed ( ${base}_host_removed is zero). for i in $(find ./ -type f -name “.fastq.gz” | while read F; do basename $F | rev | cut -c…

Continue Reading Why single cell R2 fastq have no read identified by bowtie2 ?

Generating Multiple Species Alignment Of Novel Transcripts For Phylocsf

Short version: How would you go about generating multiple species alignments of novel transcripts from bos taurus (assembly UMD3.1) with human/mouse/dog for use with PhyloCSF? Context and what I’ve tried so far: Through a sequencing experiment, our lab has identified a large set of new transcripts in Bos taurus. We…

Continue Reading Generating Multiple Species Alignment Of Novel Transcripts For Phylocsf

How to call LOH with FreeC

How to call LOH with FreeC 0 Good morning, I am try to infer loss of heterozygosity (LOH) from WGS data using Freec. For this purpose, I am using these parameters in the “[BAF]” section of the configuration file: [BAF] makePileup = My_somaticVCF.vcf.gz fastaFile = hg19.fa SNPfile = hg19_snp142.SingleDiNucl.1based.txt.gz When…

Continue Reading How to call LOH with FreeC

What is the single nucleotide polymorphism database ( dbsnp )?

The Single Nucleotide Polymorphism Database (dbSNP) is a free public archive for genetic variation within and across different species developed and hosted by the National Center for Biotechnology Information (NCBI) in collaboration with the National Human Genome Research Institute (NHGRI). Furthermore, are there any databases for single nucleotide polymorphisms?As there…

Continue Reading What is the single nucleotide polymorphism database ( dbsnp )?

SNP2TFBS

SNP2TFBS Viewing variants that affect TF binding – Results – SNP identifier Chrom id (Feb 2009 GRCh37/hg19) SNP position NB. of TF factors rs1800629   dbSNP NC_000006.11 (chr6) 31543031 1 TF name  PWM score on Ref PWM score on Alt Score difference Low Score Thr High Score Thr MZF1_1-4  1024  ….

Continue Reading SNP2TFBS

Bioconductor – Rariant

    This package is for version 3.0 of Bioconductor; for the stable, up-to-date release version, see Rariant. Identification and Assessment of Single Nucleotide Variants through Shifts in Non-Consensus Base Call Frequencies Bioconductor version: 3.0 The ‘Rariant’ package identifies single nucleotide variants from sequencing data based on the difference of…

Continue Reading Bioconductor – Rariant

Bioconductor – BSgenome.Hsapiens.UCSC.hg19

    This package is for version 3.2 of Bioconductor; for the stable, up-to-date release version, see BSgenome.Hsapiens.UCSC.hg19. Full genome sequences for Homo sapiens (UCSC version hg19) Bioconductor version: 3.2 Full genome sequences for Homo sapiens (Human) as provided by UCSC (hg19, Feb. 2009) and stored in Biostrings objects. Author:…

Continue Reading Bioconductor – BSgenome.Hsapiens.UCSC.hg19

‘Deprecated’ Error with ngs.plot.r after sys admin update Bioconductor

Loading R libraries…..Done Configuring variables… Using database: /home/yensin/software/ngsplot/database/hg19/hg19.ensembl.genebody.protein_coding.RData Done Analyze bam files and calculate coverageWarning message: ‘isNotPrimaryRead’ is deprecated. Use ‘isSecondaryAlignment’ instead. See help(“Deprecated”) ………………………………………………………………………………………………………………………………………………………………………………….Done Plotting figures…Error in seq.default(min.e, max.e, length.out = ncolor + 1) : ‘from’ cannot be NA, NaN or infinite Calls: plotheat -> ColorBreaks -> seq ->…

Continue Reading ‘Deprecated’ Error with ngs.plot.r after sys admin update Bioconductor

Best tools for calling structural variants from 2 assemblies?

Best tools for calling structural variants from 2 assemblies? 0 Dear community, I have the fasta files of 2 assemblies of the human genome (for example hg19 and hg38). What would be the best tools to call structural variants from these 2 fasta files? Most of the tools I know…

Continue Reading Best tools for calling structural variants from 2 assemblies?

How can I find reads for specific elements in a bam file?

Hi, I have a specific set of 1,009 elements in a bed file that I am interested in. I also have bam files which I would like to process to know the number of reads for these specific elements (for comparison purposes). I understand some simple uses of samtools commands,…

Continue Reading How can I find reads for specific elements in a bam file?

difference between treat_pileup and bdgcmp fold enrichment tracks macs2

difference between treat_pileup and bdgcmp fold enrichment tracks macs2 0 Hello, I created bigwig file from a treat_pileup.bdg file generated by macs2 and also used treat_pileup.bdg and control_lambda.bdg with macs2 bdgcmp. Here is my codes; macs2 callpeak -t sample.bam -c sample_input.bam -g hs -f BAM -q 0.001 –bdg –outdir /folder…

Continue Reading difference between treat_pileup and bdgcmp fold enrichment tracks macs2

Convert UCSC isoform ID to Ensembl transcript ID

Convert UCSC isoform ID to Ensembl transcript ID 2 Hello everyone, I have a few UCSC isoform IDs and I would like to convert them to the corresponding Ensembl transcript IDs. I have tried to use some online conversion tools (such as DAVID), looked up the UCSC annotation files, but…

Continue Reading Convert UCSC isoform ID to Ensembl transcript ID

Gene coordinates for hg19

Gene coordinates for hg19 0 Hi, is there a list which gives for each gene its starting coordinate (chr:pos) and its ending one with respect to the hg19 reference genome? I have a list of positions on hg19 expressed as chr:pos and I have to assign each one to the…

Continue Reading Gene coordinates for hg19

Bioconductor – FunciSNP

DOI: 10.18129/B9.bioc.FunciSNP     This package is for version 3.11 of Bioconductor; for the stable, up-to-date release version, see FunciSNP. Integrating Functional Non-coding Datasets with Genetic Association Studies to Identify Candidate Regulatory SNPs Bioconductor version: 3.11 FunciSNP integrates information from GWAS, 1000genomes and chromatin feature to identify functional SNP in…

Continue Reading Bioconductor – FunciSNP

Why may BOLT-LMM and SAIGE (quantitative, linear-mixed model) yield different results when ran on the absolutely the same dataset?

As a validation experiment, I have run the same GWAS of a quantitative phenotype derived from the UKBiobank, alongside the genomic data from the UKBiobank, once using the program BOLT-LMM and once using SAIGE linear mixed model (with selected quantitative trait tag). I wanted to see if the results would…

Continue Reading Why may BOLT-LMM and SAIGE (quantitative, linear-mixed model) yield different results when ran on the absolutely the same dataset?

Alternate nucleotide is more frequent than reference nucleotide. OMG I’m dizzy. How do I stop the twirl?

This is due to the fact that the very reference genomes that we use for re-alignment are themselves based on individuals who carry rare risk alleles. Thus, when we call variants against these genomes, we are, at many loci, comparing against rare disease risk alleles. As the best/worst example (depending…

Continue Reading Alternate nucleotide is more frequent than reference nucleotide. OMG I’m dizzy. How do I stop the twirl?

Bioconductor – ChIPComp

    This package is for version 3.4 of Bioconductor; for the stable, up-to-date release version, see ChIPComp. Quantitative comparison of multiple ChIP-seq datasets Bioconductor version: 3.4 ChIPComp detects differentially bound sharp binding sites across multiple conditions considering matching control. Author: Hao Wu, Li Chen, Zhaohui S.Qin, Chi Wang Maintainer:…

Continue Reading Bioconductor – ChIPComp

Exon coordinates and sequence

I did it like that: 1- Download refGene.txt.gz and hg19.fasta from the UCSC goldenpath. ( note: convert hg19.2bit to hg19.fa using twoBitToFa ) 2- Create a bed file with exon coordiniate using my awk script // to_transcript.awk BEGIN { OFS =”t” } { name=$2 name2=$13 sens = $4 ==”+” ?…

Continue Reading Exon coordinates and sequence

UCSC Gene Table Exon Frames Generating Stop Codons

Hi, I’m using UCSC gene tables, and I am running into trouble with interpreting exon frames. In some cases, using the exon frame from the tables creates stop codons, which shouldn’t be happening in coding regions. As an example, from the hg19 gene NM_001369291 on chromosome 22, I have this…

Continue Reading UCSC Gene Table Exon Frames Generating Stop Codons

Answer: Highly mapped to introns

I think your problem is that your bed file doesn’t match the genome/gtf you used. I think it’s too old. My $gtf is the version 104 one like yours. zcat hg19_Ensembl_gene.bed.gz | head chr1 **66999065** 67210057 **ENST00000237247** 0 + 67000041 67208778 0 27 25,123,64,25,84,57,55,176,12,12,25,52,86,93,75,501,81,128,127,60,112,156,133,203,65,165,1302, 0,863,92464,99687,100697,106394,109427,110161,127130,134147,137612,138561,139898,143621,146295,148486,150724,155765,156807,162051,185911,195881,200365,205952,207275,207889,209690, grep ENST00000237247 $gtf 1 havana…

Continue Reading Answer: Highly mapped to introns

Converting between UCSC id and gene symbol with bioconductor annotation resources

You need to use the Homo.sapiens package to make that mapping. > library(Homo.sapiens) Loading required package: AnnotationDbi Loading required package: stats4 Loading required package: BiocGenerics Loading required package: parallel Attaching package: ‘BiocGenerics’ The following objects are masked from ‘package:parallel’: clusterApply, clusterApplyLB, clusterCall, clusterEvalQ, clusterExport, clusterMap, parApply, parCapply, parLapply, parLapplyLB, parRapply,…

Continue Reading Converting between UCSC id and gene symbol with bioconductor annotation resources

Highly mapped to introns

Highly mapped to introns 0 Hi, I am analyzing RNA-seq data from human blood samples. I checked the read distribution using RSeQC read_distribution after mapping by STAR. Usually, I get more than 80% of reads mapped to exons. However, at this time, the result showed only several % were mapped…

Continue Reading Highly mapped to introns

High tumor mutation burden and DNA repair gene mutations

Introduction Anaplastic lymphoma kinase (ALK)‑fusion genes represent a small but important part of oncogenic driver mutations in NSCLC, accounting for approximately 3%‑7% of all cases worldwide.1,2 Small molecule tyrosine kinase inhibitors (TKIs) are the standard therapy for ALK-rearranged NSCLC. Crizotinib, a first-generation TKI, is the most widely used targeted drug…

Continue Reading High tumor mutation burden and DNA repair gene mutations

Produce PCA bi-plot for 1000 Genomes Phase III

Note1 – Previous version: Produce PCA bi-plot for 1000 Genomes Phase III in VCF format (old) Note2 – this data is for hg19 / GRCh37 Note3 – GRCh38 data is available HERE The tutorial has been updated based on the 1000 Genomes Phase III imputed genotypes. The original tutorial was…

Continue Reading Produce PCA bi-plot for 1000 Genomes Phase III

Bioconductor – ramr

DOI: 10.18129/B9.bioc.ramr     Detection of Rare Aberrantly Methylated Regions in Array and NGS Data Bioconductor version: Release (3.13) ramr is an R package for detection of low-frequency aberrant methylation events in large data sets obtained by methylation profiling using array or high-throughput bisulfite sequencing. In addition, package provides functions…

Continue Reading Bioconductor – ramr

Bioconductor – methylationArrayAnalysis

DOI: 10.18129/B9.bioc.methylationArrayAnalysis     This package is for version 3.11 of Bioconductor; for the stable, up-to-date release version, see methylationArrayAnalysis. A cross-package Bioconductor workflow for analysing methylation array data. Bioconductor version: 3.11 Methylation in the human genome is known to be associated with development and disease. The Illumina Infinium methylation…

Continue Reading Bioconductor – methylationArrayAnalysis

tabix for ID column

tabix for ID column 4 Hello, I’m looking for something similar to tabix. But instead of looking for informations within a given region, I would like to use the values in the ID column for quickly lookup. So for example I would like to take the compressed dbSNP file, index…

Continue Reading tabix for ID column

MAPQ (Mapping quality) of 0 for most reads from BWA-MEM2 (with no secondary alignment or other apparent reason)

Hello, I got a very weird output from BWA-mem2 – most of the reads have mapping quality of 0, even though there is no secondary alignment or anything else suspicious. I got sequencing data that was aligned with Novoalign to hg18, the data was bam files. I needed to realign…

Continue Reading MAPQ (Mapping quality) of 0 for most reads from BWA-MEM2 (with no secondary alignment or other apparent reason)

Bioconductor – wateRmelon

DOI: 10.18129/B9.bioc.wateRmelon     This package is for version 3.11 of Bioconductor; for the stable, up-to-date release version, see wateRmelon. Illumina 450 methylation array normalization and metrics Bioconductor version: 3.11 15 flavours of betas and three performance metrics, with methods for objects produced by methylumi and minfi packages. Author: Leonard…

Continue Reading Bioconductor – wateRmelon

getting different value list from GATK gc content and CANOES

getting different value list from GATK gc content and CANOES 0 I was trying to run codes from this paper “A machine-learning approach for accurate detection of copy-number variants from exome sequencing” I need to get data from GATK GC content and CANOES and combined them, but I got a…

Continue Reading getting different value list from GATK gc content and CANOES

UCSC knownCanonical hg19 vs. hg38

Hello, We have an FAQ page that covers this topic (genome.ucsc.edu/FAQ/FAQgenes.html#singledownload). As posted by ATpoint, it boils down to different datasets and different approaches. hg19 knownCanonical was last updated in 2013 and built primarily from RefSeq and GenBank sequences and a few other sources. One isoform was identified from each…

Continue Reading UCSC knownCanonical hg19 vs. hg38

Get rsID for a list of SNPs in an entire GWAS sumstats file

Here is a fairly efficient way to do this; assuming hg38 and BEDOPS and standard Unix tools installed. $ bedmap –echo –echo-map-id –delim ‘t’ <(awk ‘{n=split($0,a,/[:_]/); print “chr”a[1]”t”a[2]”t”a[2]+1″t”a[3]”https://www.biostars.org/”a[4];}’ sumstats.txt | sort-bed -) <(wget -qO- hgdownload.cse.ucsc.edu/goldenPath/hg38/database/snp150.txt.gz | gunzip -c | cut -f2-5 | sort-bed -) > answer.bed This gets around making…

Continue Reading Get rsID for a list of SNPs in an entire GWAS sumstats file

UCSC liftover

UCSC liftover 2 Hi, I’m using UCSC liftover to convert hg19 to hg38. The result came out that I don’t understand. Feb. 2009 (GRCh37/hg19) → Dec. 2013 (GRCh38/hg38) – chr1:120904787 → chr1:143905854 Dec. 2013 (GRCh38/hg38) → Feb. 2009 (GRCh37/hg19) – chr1:143905854 → chr1:149400430 (I didn’t check “Allow multiple output regions”.)…

Continue Reading UCSC liftover

Paired-end reads reported without mates: how to play matchmaker?

Hi Everyone, I am currently looking at Acute Myeloid Leukemia (AML) paired-end WGS samples from the TARGET data ocg.cancer.gov/programs/target/target-methods#3241. A bioinformatician in our group remapped the samples from hg19 to hg38. Unfortunately, we do not have any copies of the hg19 version anymore. However, when I try to run anything…

Continue Reading Paired-end reads reported without mates: how to play matchmaker?

Separate vcf file creation for matched tumor-normal samples

Separate vcf file creation for matched tumor-normal samples 0 I have received 8 matched normal tumor vcf files from our collaborators. For some reason, they didn’t provide the sequence bam files and called the variants themselves (by aligning with the reference hg19 genome for both pairs separately). Basically, I have…

Continue Reading Separate vcf file creation for matched tumor-normal samples

Missense Variant on hg19

Missense Variant on hg19 1 Hello everybody, I am using plink for doing some statistic studies on a SNP set. I would like to use only missense variant, and I have the IDs of my SNPs of interesting. Can someone suggest me how can I download a database of homo…

Continue Reading Missense Variant on hg19

karyoploteR: uncircle your genomes

Hi all, I’d like to present karyoploteR, an R/Bioconductor package we have developed to plot any data on any genome in non-circular layouts. The goal of this project was to develop a tool as flexible as Circos, but easier to use and representing genomes as straight lines instead of circles,…

Continue Reading karyoploteR: uncircle your genomes

Aligning Multiple paired end files together

Aligning Multiple paired end files together 1 Hi All, I have 72 paired end .fastq file for which i need to do Alignment using BWA. Since its a paired end data and my files are named as sam_001_1.fastq sam_001_2.fastq sam_002_1.fastq sam_002_2.fastq & so on Since its a paired end data…

Continue Reading Aligning Multiple paired end files together

vcf file analysis

vcf file analysis 0 Hello everyone, I have 22 vcf file for each chr. They were in genome build hg19 so I did a liftover and convert them to hg38 genome build. Now I need just chrom and position values from these vcf files and merge them together into a…

Continue Reading vcf file analysis

Gene mutation analysis in papillary thyroid carcinoma

Introduction Thyroid tumors are the most common malignant tumors of the endocrine system, and their incidence has been increasing in the recent decades. Currently, there are some target drugs that can effectively treat PTC, and next-generation sequencing (NGS) can be used for targeted therapy. In order to make better informed…

Continue Reading Gene mutation analysis in papillary thyroid carcinoma

how to seperate names using awk

how to seperate names using awk 1 I have a file like this: “”” qboundary.0|hg19|chr10:1080001-1280001 boundary.2|hg19|chr10:3040001-3240001 boundary.4|hg19|chr10:4760001-4960001 “”” how to quickly use awk to make it look like this(seperated by TAB): “”” chr10 1080001 1280001 chr10 3040001 3240001 chr10 4760001 4960001 “”” linux awk shell • 39 views Read more…

Continue Reading how to seperate names using awk

Get Rs Number Based On Position (6 million SNPs)

Get Rs Number Based On Position (6 million SNPs) 5 I know this question has sort of been asked before….but I need to know which method would be the most efficient way to get the Rs numbers based on position (hg19) I’ve considered looping through two files, the .txt file…

Continue Reading Get Rs Number Based On Position (6 million SNPs)

gatk, ref and alt percentages .

gatk, ref and alt percentages . 0 Hello everyone, I need some info regarding how to get percentage of REF and ALT nucleotide sequence in my data. I am using gatk and currently not getting REF and ALT percentages . the command i am using for the gatk vcf file…

Continue Reading gatk, ref and alt percentages .

Bowtie2 hg19 reference for gatk MuTect

Bowtie2 hg19 reference for gatk MuTect 3 Hello, I understand that the suggested aligner to use with GATK is bwa. If I want to use Bowtie2 as the aligner, which reference file should I be using? The reference in GATK bundle (Homo_sapiens_assembly19.fasta) does not seem to work with Bowtie2 and…

Continue Reading Bowtie2 hg19 reference for gatk MuTect

How I do lift over a Plink bim file from Hg18 to Hg19.

How I do lift over a Plink bim file from Hg18 to Hg19. 2 I’ve got some very old SNP data from Data Dryad. The BIM files uses coordinates from Hg18, but my dataset uses coordinates from Hg19. I was wondering if anyone knows how to liftover coordinates in a…

Continue Reading How I do lift over a Plink bim file from Hg18 to Hg19.

liftover using genome browser

liftover using genome browser 0 Hello everyone, I have a file which is hg38 build. I want to do a liftover and change it to hg19. I thought of using liftover tool from UCSC genome browser. I realise that the input file should be bed format. My file has only…

Continue Reading liftover using genome browser

Pericentromeric noncoding RNA changes DNA binding of CTCF and inflammatory gene expression in senescence and cancer

Significance During the aging process, senescent cells secrete inflammatory factors, causing various age-related pathologies. Thus, controlling the senescence-associated secretory phenotype (SASP) can tremendously benefit human health. Although SASP seems to be induced by the alteration of chromosomal organization, its underlying mechanism remains unclear. Here, it has been revealed that noncoding…

Continue Reading Pericentromeric noncoding RNA changes DNA binding of CTCF and inflammatory gene expression in senescence and cancer

BSgenomes for HIV viruses

BSgenomes for HIV viruses 0 Dear Biostars users, I wonder if there are BSgenomes available for HIV viruses? I am trying to identify clusters from CLIP-seq data mapping to the HIV genome with wavClusteR. I stuck at one step as below: `require(BSgenome.Hsapiens.UCSC.hg19) wavclusters <- filterClusters( clusters = clusters, highConfSub =…

Continue Reading BSgenomes for HIV viruses

Calling variants on reads with MAPQ=0 on HaplotypeCaller or bcftools mpileup

Calling variants on reads with MAPQ=0 on HaplotypeCaller or bcftools mpileup 2 I am working with about 500 samples of human exome data. used hg19 to align my reads and ran a standard best-practices GATK workflow. Later only to realise that a small 1Mb loci has not mapped properly due…

Continue Reading Calling variants on reads with MAPQ=0 on HaplotypeCaller or bcftools mpileup

What is the difference between GRCh37 and hs37? And hg19?

This is what I have found so far. Please correct me if I am wrong. GRCh37 w/o patches includes the primary assembly (22 autosomal, X. Y, and non-chromosomal supecontigs) and alternate scaffolds, but not a reference mitogenome. Non-chromosomal supercontigs are the unlocalized and unplaced scaffolds. The rCRS reference mitogenome in…

Continue Reading What is the difference between GRCh37 and hs37? And hg19?

Non-repeat human genome dataset

Non-repeat human genome dataset 1 Could anyone please point me to where I could find a dataset of non-repeat sequences for the human ref genome. I’m not sure if it’s still regarded as true, but I saw that possibly 2/3 of the human genome contains repeats. Is there a place…

Continue Reading Non-repeat human genome dataset

extract entire header from BED file to FASTA

extract entire header from BED file to FASTA 1 Hi, Is there any way one can extract the entire header from a BED file while using bedtools getfasta command and write it in the FASTA output ? Have tried using bedtools getfasta -fi hg19.fa -bed file.bed -fo test.fasta -fullHeader but…

Continue Reading extract entire header from BED file to FASTA

HOMER hg19 not found in config.txt

Hi! I am trying to run findMotif.pl from HOMER, in order to detect some regulatory motifs in a set of fasta sequences. When I type: findMotifs.pl sequences.fasta hg19 . I get the following error: !!! hg19 not found in /mnt/lustre/scratch/home/programs/HOMER/.//config.txt Try typing “perl /mnt/lustre/scratch/home/programs/HOMER//.//configureHomer.pl -list” to see available promoter sets…

Continue Reading HOMER hg19 not found in config.txt

makeblastdb Bus error

makeblastdb Bus error 0 problem: $makeblastdb -in $reference -parse_seqids -title “hg19” -dbtype nucl Building a new DB, current time: 08/16/2021 14:21:54 New DB name: New DB title: hg19 Sequence type: Nucleotide Deleted existing Nucleotide BLAST database named Keep MBits: T Maximum file size: 1000000000B Bus error version: $makeblastdb -version makeblastdb:…

Continue Reading makeblastdb Bus error

bedGraphToBigWig Tutorial and Report

It is too easy to make error report in the bedGraphToBigWig process. I want to save the time for the fresh people. The following procedure would be work well for majority situations.  1, bedGraph should be without header before sorting awk ‘NR!=1’ input.bedGraph > input.deheader.bedGraph 2, bedGraph should be sorted sort…

Continue Reading bedGraphToBigWig Tutorial and Report

So many variants detected.

So many variants detected. 0 Dear All, I have done variant calling in Germline data that has single sample of each individual and two genes. I did following steps, but after checking results I found too many variants. After Haplotypecaller (the step 6) I found 140900 known variants, and the…

Continue Reading So many variants detected.

Is subtelomeric region and pericentromeric region defined in human genome?

Is subtelomeric region and pericentromeric region defined in human genome? 2 I’ve been trying to see if there’s any coordinates for these but doesn’t have much luck. Saw a bunch of people defining it by +-2MB around the centromere gap and 30kb away from the telomere. I was wondering if…

Continue Reading Is subtelomeric region and pericentromeric region defined in human genome?

The usage of sed

The usage of sed 1 sed -e ‘s/_scATAC_hg19_noDup_noMT.bam//g’ -e ‘s//directory/to/singleCell///g’ bamlist.txt | sed -e ‘s///t/g’ | awk ‘OFS=”t”{print $2}’ | tr ‘n’ ‘t’ > header.txt This replacement command is too complex. Can someone explain what this means? linux sed shell • 51 views • link updated 1 hour ago by…

Continue Reading The usage of sed

align using file.ht2

align using file.ht2 1 now i downloaded in my terminal indexed file of UCSC hg19 and when i uncompress it , i found two files genome.5.ht2 genome.8.ht2 and every time i want to align my samples at indexed file this error show up [e::bwa_idx_load_from_disk] fail to locate the index files…

Continue Reading align using file.ht2

I am converting the fq.gz. files (which are the results of the mgi study) to bam files to view on igv.

I am converting the fq.gz. files (which are the results of the mgi study) to bam files to view on igv. 0 Hey everyone, before i start apologies for the inconvenience cause of my wrong or inappropriate use of terms. I take some fails of bwa mem lately. As i…

Continue Reading I am converting the fq.gz. files (which are the results of the mgi study) to bam files to view on igv.

Histone marks enrichment analysis

Histone marks enrichment analysis 0 Hello everyone, here’s my question: I have a bed file of human genomic coordinates (hg19), and I would like to know whether ChIP-seq peaks for specific histone marks (such as those from ENCODE) are significantly more represented within my test regions compared to a background…

Continue Reading Histone marks enrichment analysis