Tag: hg19

Where to find vcf of dbsnp build 144 ?

Where to find vcf of dbsnp build 144 ? 0 Hi everyone, I have zipped vcf files that I would like to annotate using hg19 bsnp144. I have bed files for each chromosome but, based on other biostar answers (How to add rsIDs to VCF?), it seems it is easier…

Continue Reading Where to find vcf of dbsnp build 144 ?

Why do Illumina 850k/EPIC arrays ignore CpGs which are “GC” in the forward strand?

Why do Illumina 850k/EPIC arrays ignore CpGs which are “GC” in the forward strand? 0 CpGs are symmetrical, in that a CG sequence on the forward strand is hybridized to a GC — and both dinucleotides on each opposing strand are CpGs dinucleotides which can be methylated. Conversely, CpGs can…

Continue Reading Why do Illumina 850k/EPIC arrays ignore CpGs which are “GC” in the forward strand?

Bioconductor – SNPlocs.Hsapiens.dbSNP155.GRCh37 (development version)

DOI: 10.18129/B9.bioc.SNPlocs.Hsapiens.dbSNP155.GRCh37     This is the development version of SNPlocs.Hsapiens.dbSNP155.GRCh37; to use it, please install the devel version of Bioconductor. Human SNP locations and alleles extracted from dbSNP Build 155 and placed on the GRCh37/hg19 assembly Bioconductor version: Development (3.16) The 929,496,192 SNPs in this package were extracted from…

Continue Reading Bioconductor – SNPlocs.Hsapiens.dbSNP155.GRCh37 (development version)

Human hg38 chr7:73,678,750-73,740,129 UCSC Genome Browser v435

Use drop-down controls below and press refresh to alter tracks displayed.Tracks with lots of items will automatically be displayed in more compact modes.    Custom Tracks H3K27ac Meta NeuN SCZhidedensesquishpackfull H3K27ac NeuN SCZ del_CRDhidedensesquishpackfull H3K27ac NeuN SCZ del_CRD_del_peakshidedensesquishpackfull H3K27ac Tissuehidedensesquishpackfull H3K27ac Tissue BDhidedensesquishpackfull H3K27ac Tissue BD del_CRDhidedensesquishpackfull H3K27ac Tissue BD…

Continue Reading Human hg38 chr7:73,678,750-73,740,129 UCSC Genome Browser v435

A7993 – YFull YTree Info

R-A7993 – YFull YTree Info SNPs currently defining R-A7993 A7993     Sample ID Country / Language Info Ref File Testing company Statistics Status YF063745 —— R-A7993 R-A7993*, R-FGC59783* Hg38 .BAM FTDNA (Y700) 30X, 18.6 Mbp, 151 bp YF015291 Germany (Rheinland-Pfalz) R-A7993 R-A7993*, R-FGC59783* Hg38 .BAM FTDNA (Y500) 28X, 12.1 Mbp,…

Continue Reading A7993 – YFull YTree Info

Tophat2 Multiple alignment discrepancy – SEQanswers

Dear All, I am trying to get unique sequences when align to hg19 using tophat2. When I set parameter “Maximum number of alignments to be allowed” to 1,I got 25% of my data as “multiple alignment” sequences while I got 58% when I set the parameter to 20. As…

Continue Reading Tophat2 Multiple alignment discrepancy – SEQanswers

iPSCs derived from infertile men carrying complex genetic abnormalities can generate primordial germ-like cells

Patients and controls The patient 1 was 38 years old and consulted for infertility after he and his partner had been trying to conceive for 2 years. The patient was the first child of unrelated parents, and he had four brothers and five sisters whose fertility status could not be determined…

Continue Reading iPSCs derived from infertile men carrying complex genetic abnormalities can generate primordial germ-like cells

Bronchoalveolar Lavage Fluid for Metagenomic Sequencing

Introduction Severe pneumonia is one of the most common causes of infectious diseases among patients in the intensive care unit (ICU), and this can lead to various complications and high mortality.1–3 Timely and accurate pathogen diagnoses are crucial for appropriate antimicrobial therapy and improved clinical outcomes. However, the low detection…

Continue Reading Bronchoalveolar Lavage Fluid for Metagenomic Sequencing

Within analysis, low-coverage whole-genome sequencing out of cfDNA was held to examine blood plasma away from patients with spine metastasis

Within analysis, low-coverage whole-genome sequencing out of cfDNA was held to examine blood plasma away from patients with spine metastasis An analysis pipe is made and you will verified to evaluate the brand new CNV condition within the cfDNA, in order to determine whether brand new CIN score, that has…

Continue Reading Within analysis, low-coverage whole-genome sequencing out of cfDNA was held to examine blood plasma away from patients with spine metastasis

How to modify VCF file?

Hi community, I have a question: the SNP position in vcf file is from GRCh37/hg19, I need to change the position to GRCh38. So, I used UCSC liftover to replace the hg19 pos by GRCh38 pos and deleted some SNPs, then sorted the pos and saved to a new vcf…

Continue Reading How to modify VCF file?

extendedSequences length is not the required for DeepCpf1 (34bp)

Hi, I’m using CRISPRseek dev v. 1.35.2, installed from github (hukai916/CRISPRseek). I wanted to calculate the CFD, and the grna efficacy of a Cas12 sgRNA (my_sgrna.fa file) using Deep Cpf1. my_sgrna.fa, TTTT (PAM) + sgRNA (20bp): >sgrna1 TTTTTGTCTTTAGACTATAAGTGC Command: offTargetAnalysis(inputFilePath = “my_sgrna.fa”, format = “fasta”, header = FALSE, exportAllgRNAs =…

Continue Reading extendedSequences length is not the required for DeepCpf1 (34bp)

YP5260 – YFull YTree Info

Sample ID Country / Language Info Ref File Testing company Statistics Status I7021 Mongolia (Bulgan) C-F15910 C-F15910*, C-Y507 Hg19 .BAM Ancient 3X, 20.2 Mbp, 40 bp NEO249 Russia (Chukotskiy avtonomnyy okrug) C-F15910* —— Hg19 .BAM Ancient 1X, 7.2 Mbp, 81 bp I11696 Mongolia (Bulgan) C-Y507 —— Hg19 .BAM Ancient 2X,…

Continue Reading YP5260 – YFull YTree Info

BY3 – YFull YTree Info

J-BY3 – YFull YTree Info SNPs currently defining J-BY3 BY3 / FGC15184     Sample ID Country / Language Info Ref File Testing company Statistics Status YF016315 —— J-FGC15174 J-FGC15174*, J-FGC15168*, J-FT258574 Hg38 .BAM FTDNA (Y500) 23X, 12.0 Mbp, 151 bp YF068400 Sudan (Janūb Kurdufān) J-FGC38453* —— Hg38 .BAM FTDNA (Y700)…

Continue Reading BY3 – YFull YTree Info

Allelic expression imbalance of PIK3CA mutations is frequent in breast cancer and prognostically significant

Subjects Normal breast and tumor samples were obtained with the written informed consent from donors and appropriate approval from local ethical committees, with the detailed information described in the respective original publications: normal tissue9, METABRIC14, TCGA35. Differential allelic expression analysis DNA and total RNA from 64 samples of normal breast…

Continue Reading Allelic expression imbalance of PIK3CA mutations is frequent in breast cancer and prognostically significant

YP3952 – YFull YTree Info

Q-YP3952 – YFull YTree Info Sample ID Country / Language Info Ref File Testing company Statistics Status YF073154 Russia (Chechenskaya Respublika) / Chechen Q-YP3952* —— Hg38 .BAM FTDNA (Y700) 33X, 18.2 Mbp, 151 bp YF092378 Russia (Chechenskaya Respublika) / Chechen Q-BZ87 —— Hg38 .BAM FTDNA (Y700) 55X, 18.5 Mbp, 151…

Continue Reading YP3952 – YFull YTree Info

Genome hg19**not found in homer config

I want to use the Homer to do annotation. After I input “annotatePeaks.pl 31512_TH0_D0.bed hg19 > 31512_TH0_D0.ann.txt” , it shows “!!!!Genome hg19 not found in /home/jenny/NGStools/homer/.//config.txt” I used the command “perl /home/jenny/NGStools/homer/configureHomer.pl -install hg19” to install it, and I am sure I installed the hg19 for the homer Then input…

Continue Reading Genome hg19**not found in homer config

Detailed differences between sambamba and samtools

3 month , My first post in the new student group , The false-positive mutation appears because duplicates mark Not enough ?, Tells the story of supplementary read It won’t be GATK MarkDuplicates Marked as duplicates The problem of . after , In response to this question , I began…

Continue Reading Detailed differences between sambamba and samtools

Z697 – YFull YTree Info

R-Z697 – YFull YTree Info SNPs currently defining R-Z697 Z697     Sample ID Country / Language Info Ref File Testing company Statistics Status YF009397 Sweden (Västra Götalands län) R-Z697* —— Hg19 .BAM FTDNA (Y500) 81X, 14.4 Mbp, 165 bp YF084333 Italy (Chieti) R-FT285492 —— Hg38 .BAM Dante Labs 14X, 23.4…

Continue Reading Z697 – YFull YTree Info

Transcription Start Site

Transcription Start Site 2 What are the best databases to check out the transcription start sites of specific genes in human genome? TSS • 130 views wget -q -O – “http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/wgEncodeGencodeBasicV19.txt.gz” | gunzip -c | awk ‘(int($7)< int($8)) {if($4==”+”) {printf(“%s\t%d\t%d\t%s\t%s\n”,$3,$7,int($7)+1,$2,$4);}else {printf(“%s\t%d\t%d\t%s\t%s\n”,$3,int($8)-3,$8,$2,$4);}}’ chr1 69090 69091 ENST00000335137.3 + chr1 139306 139309 ENST00000423372.3…

Continue Reading Transcription Start Site

CTS1346 – YFull YTree Info

Sample ID Country / Language Info Ref File Testing company Statistics Status HGDP01351 China, People’s Republic of O-F3607* —— Hg38 .BAM Scientific 16X, 23.6 Mbp, 151 bp YF079316 —— O-Y224790 —— Hg19 .BAM 23mofang 58X, 21.3 Mbp, 150 bp HG00583 China, People’s Republic of O-Y224790 —— Hg19 .BAM Scientific ——…

Continue Reading CTS1346 – YFull YTree Info

A114 – YFull YTree Info

R-A114 – YFull YTree Info SNPs currently defining R-A114 FGC78244     A114(H)     H Sample ID Country / Language Info Ref File Testing company Statistics Status YF067576 France (Ille-et-Vilaine) R-A114* —— Hg19 .BAM Dante Labs 12X, 23.0 Mbp, 151 bp YF088360 United States (Virginia) R-CTS4466* —— Hg38 .BAM FTDNA (Y700)…

Continue Reading A114 – YFull YTree Info

Sam file is not written

Dear all, It writes the following in the log file: [08-02 01:26:25] Running Step 2: BWA … bwa_wrap /work/pathology/s206442/dbet_project/hg19/hg19.fa Output3/out_1.valid.fastq 6 Output3/out_1.valid.sam 0 Running BWA on trimmed reads … bwa mem -t 6 /work/pathology/s206442/dbet_project/hg19/hg19.fa Output3/out_1.valid.fastq | samtools view -h -F 2048 – > Output3/out_1.valid.sam However, the sam file size is…

Continue Reading Sam file is not written

F13864 – YFull YTree Info

Sample ID Country / Language Info Ref File Testing company Statistics Status ERS5240131 Singapore C-F13864* —— Hg19 .BAM Scientific 7X, 22.9 Mbp, 150 bp YF076683 China, People’s Republic of (Shandong) C-F13864* —— Hg19 .BAM 23mofang 57X, 21.2 Mbp, 150 bp YF071813 —— C-F13864* —— Hg19 .BAM 23mofang 21X, 21.8 Mbp,…

Continue Reading F13864 – YFull YTree Info

L1193 – YFull YTree Info

I-L1193 – YFull YTree Info SNPs currently defining I-L1193 L1193     FGC87558     Y72031     Sample ID Country / Language Info Ref File Testing company Statistics Status ASH1 Ireland (Tipperary) I-L1193* —— Hg19 .BAM Ancient 1X, 10.5 Mbp, 101 bp PB581 Ireland (Clare) I-L1193* —— Hg19 .BAM Ancient 2X, 15.8…

Continue Reading L1193 – YFull YTree Info

subpopulations available in MafH5.gnomAD.v3.1.1.GRCh38

subpopulations available in MafH5.gnomAD.v3.1.1.GRCh38 1 @b14a6f0d Last seen 16 hours ago United States Are subpopulation MAFs available for gnomADv.3.1.1 with any package, like they are in MafDb.gnomAD.r2.1.hs37d5? I’m trying to use Genomic Scores to obtain all variants in a genomic range with MAF in any subpopulation >= cutoff. I tried…

Continue Reading subpopulations available in MafH5.gnomAD.v3.1.1.GRCh38

Y18411 – YFull YTree Info

J-Y18411 – YFull YTree Info Sample ID Country / Language Info Ref File Testing company Statistics Status YF072520 Albania J-BY111710 —— Hg19 .BAM Dante Labs 10X, 22.8 Mbp, 151 bp YF067307 Palestine (Nablus) J-BY111710 —— Hg38 .BAM FTDNA (Y700) 34X, 18.7 Mbp, 151 bp NA20827 Italy (Firenze) J-CTS3330 —— Hg19…

Continue Reading Y18411 – YFull YTree Info

M8498 – YFull YTree Info

B-M8498 – YFull YTree Info Sample ID Country / Language Info Ref File Testing company Statistics Status YF004283 Saudi Arabia B-M8498* —— Hg19 .BAM FTDNA (Y500) 43X, 13.7 Mbp, 165 bp HGDP00992 Namibia B-M7650* —— Hg38 .BAM Scientific 18X, 23.5 Mbp, 151 bp YF013963 —— B-Y82361 —— Hg38 .BAM FTDNA…

Continue Reading M8498 – YFull YTree Info

FGC15109 – YFull YTree Info

I-FGC15109 – YFull YTree Info SNPs currently defining I-FGC15109 FGC15109     Sample ID Country / Language Info Ref File Testing company Statistics Status SZ43 Hungary (Somogy) I-BY138* —— Hg19 .BAM Ancient 8X, 22.8 Mbp, 32 bp YF010533 —— I-BY138* —— Hg19 .BAM FTDNA (Y500) 73X, 14.9 Mbp, 165 bp YF019250…

Continue Reading FGC15109 – YFull YTree Info

bedtools -u not giving unique files

bedtools -u not giving unique files 1 The following are the steps Im following: First step to extract sample using bed file is this (here the bedfile is input bedfile converted to Hg38): tabix -h -R Hg19_to_Hg38_sorted.bed.gz gnomad.genomes.v{g_version}.hgdp_tgp.chr{chr}.vcf.bgz | perl {vcftools} -c {sample_name} > {sample_name}_out.vcf’ output({sample_name}_out.vcf’) chr2 113982416 rs56177103 TATAAAATAAAATAAA…

Continue Reading bedtools -u not giving unique files

BTG2 gene predicts poor outcome in PT-DLBCL

Introduction Primary testicular diffuse large B-cell lymphoma (PT-DLBCL) is a rare and aggressive form of mature B-cell lymphoma.1–3 PT-DLBCL was the most common type of testicular tumor in men aged over 60 and characterized by painless uni- or bilateral testicular masses with infrequent constitutional symptoms.4–6 PT-DLBCL shows significant extranodal tropism,…

Continue Reading BTG2 gene predicts poor outcome in PT-DLBCL

FGC19851 – YFull YTree Info

R-FGC19851 – YFull YTree Info SNPs currently defining R-FGC19851 FGC19851     Sample ID Country / Language Info Ref File Testing company Statistics Status YF072967 United States (Georgia) R-FGC19851* —— Hg38 .BAM FTDNA (Y700) 34X, 18.7 Mbp, 151 bp YF009427 —— R-FGC65264* —— Hg19 .BAM FTDNA (Y500) 38X, 12.8 Mbp, 165…

Continue Reading FGC19851 – YFull YTree Info

FGC35106 – YFull YTree Info

Sample ID Country / Language Info Ref File Testing company Statistics Status YF016938 Saudi Arabia (Ar Riyāḍ) J-FGC35106 YF081770 | J-FGC35106*, J-FGC58682* Hg38 .BAM FTDNA (Y500) 30X, 11.5 Mbp, 151 bp YF016937 Saudi Arabia (Ar Riyāḍ) J-FGC35106 YF081769 | J-FGC35106*, J-FGC58682* Hg38 .BAM FTDNA (Y500) 37X, 12.5 Mbp, 151 bp…

Continue Reading FGC35106 – YFull YTree Info

YP4024 – YFull YTree Info

Sample ID Country / Language Info Ref File Testing company Statistics Status ERS2478532 Turkmenistan Q-YP4024* —— Hg19 .BAM Scientific 17X, 16.7 Mbp, 151 bp YF006625 Russia (Tomskaya oblast’) / Selkup Q-YP4024* —— Hg19 .BAM FTDNA (Y500) 67X, 14.8 Mbp, 165 bp DA162 Russia (Severnaya Osetiya-Alaniya, Respublika) Q-BZ5214* —— Hg19 .BAM…

Continue Reading YP4024 – YFull YTree Info

Y570 – YFull YTree Info

Sample ID Country / Language Info Ref File Testing company Statistics Status AF2 —— Q-Y570 Q-Y570*, Q-F746* Hg19 .BAM Ancient 1X, 1.3 Mbp, 94 bp YF093124 —— Q-M120* —— Hg38 .BAM Nebula Genomics 57X, 23.6 Mbp, 150 bp Kolyma1 Russia (Sakha, Respublika [Yakutiya]) Q-Y222276* —— Hg19 .BAM Ancient 7X, 13.4…

Continue Reading Y570 – YFull YTree Info

use tcgabiolinks package to download TCGA data

TCGA Data download in terms of ease of use ,RTCGA The bag should be better , And because it’s already downloaded data , The use is relatively stable . But also because of the downloaded data , There is no guarantee that the data is new .TCGAbiolinks The package is…

Continue Reading use tcgabiolinks package to download TCGA data

Bioconductor – r3Cseq

    This package is for version 3.3 of Bioconductor; for the stable, up-to-date release version, see r3Cseq. Analysis of Chromosome Conformation Capture and Next-generation Sequencing (3C-seq) Bioconductor version: 3.3 This package is an implementation of data analysis for the long-range interactions from 3C-seq assay. Author: Supat Thongjuea, MRC Molecular…

Continue Reading Bioconductor – r3Cseq

PF6747 – YFull YTree Info

E-PF6747 – YFull YTree Info Sample ID Country / Language Info Ref File Testing company Statistics Status YF010216 Azerbaijan (Qəbələ) E-PF6747* —— Hg19 .BAM FTDNA (Y500) 50X, 13.7 Mbp, 165 bp YF064736 Egypt (Al Minūfīyah) E-FT97857* —— Hg38 .BAM FTDNA (Y700) 35X, 18.5 Mbp, 151 bp YF093064 Yemen (Tā’izz) E-Y280593…

Continue Reading PF6747 – YFull YTree Info

Z2039 – YFull YTree Info

Sample ID Country / Language Info Ref File Testing company Statistics Status YF003382 Finland (Länsi-Suomen lääni) I-Z2040* —— Hg19 .BAM FTDNA (Y500) 47X, 13.3 Mbp, 165 bp YF067917 Ireland I-FGC69701* —— Hg19 .BAM Dante Labs 9X, 22.9 Mbp, 151 bp YF078735 Belarus (Vicebskaja voblasc’) / Polish I-FGC69702 —— Hg38 .VCF…

Continue Reading Z2039 – YFull YTree Info

Extract longest transcript or longest CDS transcript from GTF annotation file or gencode transcripts fasta file.

There are four types of methods to extract longest transcript or longest CDS regeion with longest transcript from transcripts fasta file or GTF file. 1.Extract longest transcript from gencode transcripts fasta file. 2.Extract longest transcript from gtf format annotation file based on gencode/ensembl/ucsc database. 3.Extract longest CDS regeion with longest…

Continue Reading Extract longest transcript or longest CDS transcript from GTF annotation file or gencode transcripts fasta file.

BY7447 – YFull YTree Info

E-BY7447 – YFull YTree Info SNPs currently defining E-BY7447 BY7447     Sample ID Country / Language Info Ref File Testing company Statistics Status YF075635 Yemen (Al Bayḑā’) E-FT183181 —— Hg38 .BAM FTDNA (Y700) 39X, 18.2 Mbp, 151 bp YF067501 Yemen (Şan’ā’) E-FT183181 —— Hg38 .BAM FTDNA (Y700) 44X, 18.8 Mbp,…

Continue Reading BY7447 – YFull YTree Info

can not upload GTF file to UCSC genomebrowser

We are unable to reproduce the error you are seeing and we also recentlyexperienced temporary issues with our site. Please let us know if youare still having this problem. Post by Gang WeiDear manager of UCSC Genome Browser,Glad to write to you. I’m now using UCSC genome browser to check…

Continue Reading can not upload GTF file to UCSC genomebrowser

DF109 – YFull YTree Info

Sample ID Country / Language Info Ref File Testing company Statistics Status YF016926 Ireland R-DF109 R-DF109*, R-A18726* Hg38 .BAM FTDNA (Y500) 27X, 12.7 Mbp, 165 bp YF016394 United States (Ohio) R-DF109 R-DF109*, R-A18726* Hg38 .BAM FTDNA (Y500) 34X, 11.9 Mbp, 151 bp YF011566 Ireland (Mayo) R-DF109 R-DF109*, R-A18726*, R-FGC23742* Hg38…

Continue Reading DF109 – YFull YTree Info

ZP77 – YFull YTree Info

R-ZP77 – YFull YTree Info SNPs currently defining R-ZP77 ZP77 / FGC6562     Sample ID Country / Language Info Ref File Testing company Statistics Status YF008362 —— R-ZP77* —— Hg19 .BAM FTDNA (Y500) 41X, 13.8 Mbp, 165 bp YF067652 Unknown R-BY40744 —— Hg38 .BAM FTDNA (Y700) 36X, 18.7 Mbp, 151…

Continue Reading ZP77 – YFull YTree Info

Convert DNAStringSet to a list of elements in R? (Error in seq[[1]][[“seq”]] : subscript out of bounds in R)

I have a bed file which contains DNA sequences information as follow: ** track name=”194″ description=”194 methylation (sites)” color=0,60,120 useScore=1 chr1 15864 15866 FALSE 894 + chr1 534241 534243 FALSE 921 – chr1 710096 710098 FALSE 729 + chr1 714176 714178 FALSE 12 – chr1 720864 720866 FALSE 988 -…

Continue Reading Convert DNAStringSet to a list of elements in R? (Error in seq[[1]][[“seq”]] : subscript out of bounds in R)

Download full list of SNPs and their coordinates in hg38

Download full list of SNPs and their coordinates in hg38 3 What is the best / standard place to get a full list of SNPs and their coordinates in hg38? I downloaded the SNPsnap database, but just realized that those coordinates are in hg19. I’m trying to figure out how…

Continue Reading Download full list of SNPs and their coordinates in hg38

Bioconductor – RiboCrypt

DOI: 10.18129/B9.bioc.RiboCrypt     Interactive visualization in genomics Bioconductor version: Release (3.14) R Package for interactive visualization and browsing NGS data. It contains a browser for both transcript and genomic coordinate view. In addition a QC and general metaplots are included, among others differential translation plots and gene expression plots….

Continue Reading Bioconductor – RiboCrypt

Bwa on multiple processor

Hi Guys, When I am trying to run bwa mem on multiple processor, I am getting error as : > mpirun -np 16 bwa mem hg19-agilent.fasta R1.fastq R2.fastq | samtools sort -o aln.bam [M::bwa_idx_load_from_disk] read 0 ALT contigs [M::bwa_idx_load_from_disk] read 0 ALT contigs [M::bwa_idx_load_from_disk] read 0 ALT contigs [M::bwa_idx_load_from_disk] read…

Continue Reading Bwa on multiple processor

Bioconductor – derfinder (development version)

DOI: 10.18129/B9.bioc.derfinder     This is the development version of derfinder; for the stable release version, see derfinder. Annotation-agnostic differential expression analysis of RNA-seq data at base-pair resolution via the DER Finder approach Bioconductor version: Development (3.15) This package provides functions for annotation-agnostic differential expression analysis of RNA-seq data. Two…

Continue Reading Bioconductor – derfinder (development version)

Alignment report

Alignment report 0 Hi Guys, I did alignment of R1 and R2 fastq files with reference genome using bwa mem and got bam file. Now, I want to check whether the alignment is done correctly and alignment percentage,coverage etc. I run following command: bwa mem hg19.fasta R1.fastq R2.fastq | samtools…

Continue Reading Alignment report

Bioconductor – ChIPQC

    This package is for version 3.1 of Bioconductor; for the stable, up-to-date release version, see ChIPQC. Quality metrics for ChIPseq data Bioconductor version: 3.1 Quality metrics for ChIPseq data Author: Tom Carroll, Wei Liu, Ines de Santiago, Rory Stark Maintainer: Tom Carroll <tc.infomatics at gmail.com>, Rory Stark <rory.stark…

Continue Reading Bioconductor – ChIPQC

identical(current_classes, .UCSC_TXCOL2CLASS) is not TRUE

GenomicFeatures::makeTxDbFromUCSC failing with an error: identical(current_classes, .UCSC_TXCOL2CLASS) is not TRUE 1 @mikhail-dozmorov-23744 Last seen 1 day ago United States Hi,The GenomicFeatures::makeTxDbFromUCSC function fails with: library(GenomicFeatures) > hg19.refseq.db <- makeTxDbFromUCSC(genome=”hg19″, table=”refGene”) Download the refGene table … Error in .fetch_UCSC_txtable(genome(session), tablename, transcript_ids = transcript_ids) : identical(current_classes, .UCSC_TXCOL2CLASS) is not TRUE OK The…

Continue Reading identical(current_classes, .UCSC_TXCOL2CLASS) is not TRUE

QIAGEN Bioinformatics Manuals

The Reference Data Manager The QIAGEN Sets Reference Data Library tab gives access to the reference data used with the CLC Haplotype Calling plugin ready-to-use workflow. From the wizard you can download and configure the reference data. For the full documentation relating to QIAGEN Sets, please see the QIAGEN Sets…

Continue Reading QIAGEN Bioinformatics Manuals

help with CrossMap

help with CrossMap 0 Hello all, I would really appreciate your help as I am new to working with different file builds and having a setback lifting a vcf file from build hg38 to hg19. in essence, using CrossMap the chromosome value gets altered. Like for example, below is the…

Continue Reading help with CrossMap

From where to get a comprehensive list of genes with gene start, gene end and chromosome for build 37?

From where to get a comprehensive list of genes with gene start, gene end and chromosome for build 37? 0 Hi all, I am trying to annotate list of genes with gene start, gene end (build37) and chromosome. I mapped most of the genes from a list downloaded from Biomart/UCSC,…

Continue Reading From where to get a comprehensive list of genes with gene start, gene end and chromosome for build 37?

Bioconductor – ProteoDisco

DOI: 10.18129/B9.bioc.ProteoDisco     Generation of customized protein variant databases from genomic variants, splice-junctions and manual sequences Bioconductor version: Release (3.14) ProteoDisco is an R package to facilitate proteogenomics studies. It houses functions to create customized (mutant) protein databases based on user-submitted genomic variants, splice-junctions, fusion genes and manual transcript…

Continue Reading Bioconductor – ProteoDisco

How to convert bedgraph file with bins into GRanges object?

You could convert your bedGraph bins from hg18 to hg19 using liftover, so you can overlap them with your peaks. You would read them into a GRanges object, then hand this to the liftover function to translate from hg18 to hg19, then unlist the results to get back a regular…

Continue Reading How to convert bedgraph file with bins into GRanges object?

Why single cell R2 fastq have no read identified by bowtie2 ?

Why single cell R2 fastq have no read identified by bowtie2 ? 0 When we input R2 fastq.gz into bowtie2, human sequence was not removed ( ${base}_host_removed is zero). for i in $(find ./ -type f -name “.fastq.gz” | while read F; do basename $F | rev | cut -c…

Continue Reading Why single cell R2 fastq have no read identified by bowtie2 ?

Generating Multiple Species Alignment Of Novel Transcripts For Phylocsf

Short version: How would you go about generating multiple species alignments of novel transcripts from bos taurus (assembly UMD3.1) with human/mouse/dog for use with PhyloCSF? Context and what I’ve tried so far: Through a sequencing experiment, our lab has identified a large set of new transcripts in Bos taurus. We…

Continue Reading Generating Multiple Species Alignment Of Novel Transcripts For Phylocsf

How to call LOH with FreeC

How to call LOH with FreeC 0 Good morning, I am try to infer loss of heterozygosity (LOH) from WGS data using Freec. For this purpose, I am using these parameters in the “[BAF]” section of the configuration file: [BAF] makePileup = My_somaticVCF.vcf.gz fastaFile = hg19.fa SNPfile = hg19_snp142.SingleDiNucl.1based.txt.gz When…

Continue Reading How to call LOH with FreeC

What is the single nucleotide polymorphism database ( dbsnp )?

The Single Nucleotide Polymorphism Database (dbSNP) is a free public archive for genetic variation within and across different species developed and hosted by the National Center for Biotechnology Information (NCBI) in collaboration with the National Human Genome Research Institute (NHGRI). Furthermore, are there any databases for single nucleotide polymorphisms?As there…

Continue Reading What is the single nucleotide polymorphism database ( dbsnp )?

SNP2TFBS

SNP2TFBS Viewing variants that affect TF binding – Results – SNP identifier Chrom id (Feb 2009 GRCh37/hg19) SNP position NB. of TF factors rs1800629   dbSNP NC_000006.11 (chr6) 31543031 1 TF name  PWM score on Ref PWM score on Alt Score difference Low Score Thr High Score Thr MZF1_1-4  1024  ….

Continue Reading SNP2TFBS

Bioconductor – Rariant

    This package is for version 3.0 of Bioconductor; for the stable, up-to-date release version, see Rariant. Identification and Assessment of Single Nucleotide Variants through Shifts in Non-Consensus Base Call Frequencies Bioconductor version: 3.0 The ‘Rariant’ package identifies single nucleotide variants from sequencing data based on the difference of…

Continue Reading Bioconductor – Rariant

Bioconductor – BSgenome.Hsapiens.UCSC.hg19

    This package is for version 3.2 of Bioconductor; for the stable, up-to-date release version, see BSgenome.Hsapiens.UCSC.hg19. Full genome sequences for Homo sapiens (UCSC version hg19) Bioconductor version: 3.2 Full genome sequences for Homo sapiens (Human) as provided by UCSC (hg19, Feb. 2009) and stored in Biostrings objects. Author:…

Continue Reading Bioconductor – BSgenome.Hsapiens.UCSC.hg19

‘Deprecated’ Error with ngs.plot.r after sys admin update Bioconductor

Loading R libraries…..Done Configuring variables… Using database: /home/yensin/software/ngsplot/database/hg19/hg19.ensembl.genebody.protein_coding.RData Done Analyze bam files and calculate coverageWarning message: ‘isNotPrimaryRead’ is deprecated. Use ‘isSecondaryAlignment’ instead. See help(“Deprecated”) ………………………………………………………………………………………………………………………………………………………………………………….Done Plotting figures…Error in seq.default(min.e, max.e, length.out = ncolor + 1) : ‘from’ cannot be NA, NaN or infinite Calls: plotheat -> ColorBreaks -> seq ->…

Continue Reading ‘Deprecated’ Error with ngs.plot.r after sys admin update Bioconductor

Best tools for calling structural variants from 2 assemblies?

Best tools for calling structural variants from 2 assemblies? 0 Dear community, I have the fasta files of 2 assemblies of the human genome (for example hg19 and hg38). What would be the best tools to call structural variants from these 2 fasta files? Most of the tools I know…

Continue Reading Best tools for calling structural variants from 2 assemblies?

How can I find reads for specific elements in a bam file?

Hi, I have a specific set of 1,009 elements in a bed file that I am interested in. I also have bam files which I would like to process to know the number of reads for these specific elements (for comparison purposes). I understand some simple uses of samtools commands,…

Continue Reading How can I find reads for specific elements in a bam file?

difference between treat_pileup and bdgcmp fold enrichment tracks macs2

difference between treat_pileup and bdgcmp fold enrichment tracks macs2 0 Hello, I created bigwig file from a treat_pileup.bdg file generated by macs2 and also used treat_pileup.bdg and control_lambda.bdg with macs2 bdgcmp. Here is my codes; macs2 callpeak -t sample.bam -c sample_input.bam -g hs -f BAM -q 0.001 –bdg –outdir /folder…

Continue Reading difference between treat_pileup and bdgcmp fold enrichment tracks macs2

Convert UCSC isoform ID to Ensembl transcript ID

Convert UCSC isoform ID to Ensembl transcript ID 2 Hello everyone, I have a few UCSC isoform IDs and I would like to convert them to the corresponding Ensembl transcript IDs. I have tried to use some online conversion tools (such as DAVID), looked up the UCSC annotation files, but…

Continue Reading Convert UCSC isoform ID to Ensembl transcript ID

Gene coordinates for hg19

Gene coordinates for hg19 0 Hi, is there a list which gives for each gene its starting coordinate (chr:pos) and its ending one with respect to the hg19 reference genome? I have a list of positions on hg19 expressed as chr:pos and I have to assign each one to the…

Continue Reading Gene coordinates for hg19

Bioconductor – FunciSNP

DOI: 10.18129/B9.bioc.FunciSNP     This package is for version 3.11 of Bioconductor; for the stable, up-to-date release version, see FunciSNP. Integrating Functional Non-coding Datasets with Genetic Association Studies to Identify Candidate Regulatory SNPs Bioconductor version: 3.11 FunciSNP integrates information from GWAS, 1000genomes and chromatin feature to identify functional SNP in…

Continue Reading Bioconductor – FunciSNP

Why may BOLT-LMM and SAIGE (quantitative, linear-mixed model) yield different results when ran on the absolutely the same dataset?

As a validation experiment, I have run the same GWAS of a quantitative phenotype derived from the UKBiobank, alongside the genomic data from the UKBiobank, once using the program BOLT-LMM and once using SAIGE linear mixed model (with selected quantitative trait tag). I wanted to see if the results would…

Continue Reading Why may BOLT-LMM and SAIGE (quantitative, linear-mixed model) yield different results when ran on the absolutely the same dataset?

Alternate nucleotide is more frequent than reference nucleotide. OMG I’m dizzy. How do I stop the twirl?

This is due to the fact that the very reference genomes that we use for re-alignment are themselves based on individuals who carry rare risk alleles. Thus, when we call variants against these genomes, we are, at many loci, comparing against rare disease risk alleles. As the best/worst example (depending…

Continue Reading Alternate nucleotide is more frequent than reference nucleotide. OMG I’m dizzy. How do I stop the twirl?

Bioconductor – ChIPComp

    This package is for version 3.4 of Bioconductor; for the stable, up-to-date release version, see ChIPComp. Quantitative comparison of multiple ChIP-seq datasets Bioconductor version: 3.4 ChIPComp detects differentially bound sharp binding sites across multiple conditions considering matching control. Author: Hao Wu, Li Chen, Zhaohui S.Qin, Chi Wang Maintainer:…

Continue Reading Bioconductor – ChIPComp

Exon coordinates and sequence

I did it like that: 1- Download refGene.txt.gz and hg19.fasta from the UCSC goldenpath. ( note: convert hg19.2bit to hg19.fa using twoBitToFa ) 2- Create a bed file with exon coordiniate using my awk script // to_transcript.awk BEGIN { OFS =”t” } { name=$2 name2=$13 sens = $4 ==”+” ?…

Continue Reading Exon coordinates and sequence

UCSC Gene Table Exon Frames Generating Stop Codons

Hi, I’m using UCSC gene tables, and I am running into trouble with interpreting exon frames. In some cases, using the exon frame from the tables creates stop codons, which shouldn’t be happening in coding regions. As an example, from the hg19 gene NM_001369291 on chromosome 22, I have this…

Continue Reading UCSC Gene Table Exon Frames Generating Stop Codons

Answer: Highly mapped to introns

I think your problem is that your bed file doesn’t match the genome/gtf you used. I think it’s too old. My $gtf is the version 104 one like yours. zcat hg19_Ensembl_gene.bed.gz | head chr1 **66999065** 67210057 **ENST00000237247** 0 + 67000041 67208778 0 27 25,123,64,25,84,57,55,176,12,12,25,52,86,93,75,501,81,128,127,60,112,156,133,203,65,165,1302, 0,863,92464,99687,100697,106394,109427,110161,127130,134147,137612,138561,139898,143621,146295,148486,150724,155765,156807,162051,185911,195881,200365,205952,207275,207889,209690, grep ENST00000237247 $gtf 1 havana…

Continue Reading Answer: Highly mapped to introns

Converting between UCSC id and gene symbol with bioconductor annotation resources

You need to use the Homo.sapiens package to make that mapping. > library(Homo.sapiens) Loading required package: AnnotationDbi Loading required package: stats4 Loading required package: BiocGenerics Loading required package: parallel Attaching package: ‘BiocGenerics’ The following objects are masked from ‘package:parallel’: clusterApply, clusterApplyLB, clusterCall, clusterEvalQ, clusterExport, clusterMap, parApply, parCapply, parLapply, parLapplyLB, parRapply,…

Continue Reading Converting between UCSC id and gene symbol with bioconductor annotation resources

Highly mapped to introns

Highly mapped to introns 0 Hi, I am analyzing RNA-seq data from human blood samples. I checked the read distribution using RSeQC read_distribution after mapping by STAR. Usually, I get more than 80% of reads mapped to exons. However, at this time, the result showed only several % were mapped…

Continue Reading Highly mapped to introns

High tumor mutation burden and DNA repair gene mutations

Introduction Anaplastic lymphoma kinase (ALK)‑fusion genes represent a small but important part of oncogenic driver mutations in NSCLC, accounting for approximately 3%‑7% of all cases worldwide.1,2 Small molecule tyrosine kinase inhibitors (TKIs) are the standard therapy for ALK-rearranged NSCLC. Crizotinib, a first-generation TKI, is the most widely used targeted drug…

Continue Reading High tumor mutation burden and DNA repair gene mutations

Produce PCA bi-plot for 1000 Genomes Phase III

Note1 – Previous version: Produce PCA bi-plot for 1000 Genomes Phase III in VCF format (old) Note2 – this data is for hg19 / GRCh37 Note3 – GRCh38 data is available HERE The tutorial has been updated based on the 1000 Genomes Phase III imputed genotypes. The original tutorial was…

Continue Reading Produce PCA bi-plot for 1000 Genomes Phase III

Bioconductor – ramr

DOI: 10.18129/B9.bioc.ramr     Detection of Rare Aberrantly Methylated Regions in Array and NGS Data Bioconductor version: Release (3.13) ramr is an R package for detection of low-frequency aberrant methylation events in large data sets obtained by methylation profiling using array or high-throughput bisulfite sequencing. In addition, package provides functions…

Continue Reading Bioconductor – ramr

Bioconductor – methylationArrayAnalysis

DOI: 10.18129/B9.bioc.methylationArrayAnalysis     This package is for version 3.11 of Bioconductor; for the stable, up-to-date release version, see methylationArrayAnalysis. A cross-package Bioconductor workflow for analysing methylation array data. Bioconductor version: 3.11 Methylation in the human genome is known to be associated with development and disease. The Illumina Infinium methylation…

Continue Reading Bioconductor – methylationArrayAnalysis

tabix for ID column

tabix for ID column 4 Hello, I’m looking for something similar to tabix. But instead of looking for informations within a given region, I would like to use the values in the ID column for quickly lookup. So for example I would like to take the compressed dbSNP file, index…

Continue Reading tabix for ID column

MAPQ (Mapping quality) of 0 for most reads from BWA-MEM2 (with no secondary alignment or other apparent reason)

Hello, I got a very weird output from BWA-mem2 – most of the reads have mapping quality of 0, even though there is no secondary alignment or anything else suspicious. I got sequencing data that was aligned with Novoalign to hg18, the data was bam files. I needed to realign…

Continue Reading MAPQ (Mapping quality) of 0 for most reads from BWA-MEM2 (with no secondary alignment or other apparent reason)

Bioconductor – wateRmelon

DOI: 10.18129/B9.bioc.wateRmelon     This package is for version 3.11 of Bioconductor; for the stable, up-to-date release version, see wateRmelon. Illumina 450 methylation array normalization and metrics Bioconductor version: 3.11 15 flavours of betas and three performance metrics, with methods for objects produced by methylumi and minfi packages. Author: Leonard…

Continue Reading Bioconductor – wateRmelon

getting different value list from GATK gc content and CANOES

getting different value list from GATK gc content and CANOES 0 I was trying to run codes from this paper “A machine-learning approach for accurate detection of copy-number variants from exome sequencing” I need to get data from GATK GC content and CANOES and combined them, but I got a…

Continue Reading getting different value list from GATK gc content and CANOES

UCSC knownCanonical hg19 vs. hg38

Hello, We have an FAQ page that covers this topic (genome.ucsc.edu/FAQ/FAQgenes.html#singledownload). As posted by ATpoint, it boils down to different datasets and different approaches. hg19 knownCanonical was last updated in 2013 and built primarily from RefSeq and GenBank sequences and a few other sources. One isoform was identified from each…

Continue Reading UCSC knownCanonical hg19 vs. hg38

Get rsID for a list of SNPs in an entire GWAS sumstats file

Here is a fairly efficient way to do this; assuming hg38 and BEDOPS and standard Unix tools installed. $ bedmap –echo –echo-map-id –delim ‘t’ <(awk ‘{n=split($0,a,/[:_]/); print “chr”a[1]”t”a[2]”t”a[2]+1″t”a[3]”https://www.biostars.org/”a[4];}’ sumstats.txt | sort-bed -) <(wget -qO- hgdownload.cse.ucsc.edu/goldenPath/hg38/database/snp150.txt.gz | gunzip -c | cut -f2-5 | sort-bed -) > answer.bed This gets around making…

Continue Reading Get rsID for a list of SNPs in an entire GWAS sumstats file

UCSC liftover

UCSC liftover 2 Hi, I’m using UCSC liftover to convert hg19 to hg38. The result came out that I don’t understand. Feb. 2009 (GRCh37/hg19) → Dec. 2013 (GRCh38/hg38) – chr1:120904787 → chr1:143905854 Dec. 2013 (GRCh38/hg38) → Feb. 2009 (GRCh37/hg19) – chr1:143905854 → chr1:149400430 (I didn’t check “Allow multiple output regions”.)…

Continue Reading UCSC liftover

Paired-end reads reported without mates: how to play matchmaker?

Hi Everyone, I am currently looking at Acute Myeloid Leukemia (AML) paired-end WGS samples from the TARGET data ocg.cancer.gov/programs/target/target-methods#3241. A bioinformatician in our group remapped the samples from hg19 to hg38. Unfortunately, we do not have any copies of the hg19 version anymore. However, when I try to run anything…

Continue Reading Paired-end reads reported without mates: how to play matchmaker?

Separate vcf file creation for matched tumor-normal samples

Separate vcf file creation for matched tumor-normal samples 0 I have received 8 matched normal tumor vcf files from our collaborators. For some reason, they didn’t provide the sequence bam files and called the variants themselves (by aligning with the reference hg19 genome for both pairs separately). Basically, I have…

Continue Reading Separate vcf file creation for matched tumor-normal samples

Missense Variant on hg19

Missense Variant on hg19 1 Hello everybody, I am using plink for doing some statistic studies on a SNP set. I would like to use only missense variant, and I have the IDs of my SNPs of interesting. Can someone suggest me how can I download a database of homo…

Continue Reading Missense Variant on hg19

karyoploteR: uncircle your genomes

Hi all, I’d like to present karyoploteR, an R/Bioconductor package we have developed to plot any data on any genome in non-circular layouts. The goal of this project was to develop a tool as flexible as Circos, but easier to use and representing genomes as straight lines instead of circles,…

Continue Reading karyoploteR: uncircle your genomes

Aligning Multiple paired end files together

Aligning Multiple paired end files together 1 Hi All, I have 72 paired end .fastq file for which i need to do Alignment using BWA. Since its a paired end data and my files are named as sam_001_1.fastq sam_001_2.fastq sam_002_1.fastq sam_002_2.fastq & so on Since its a paired end data…

Continue Reading Aligning Multiple paired end files together

vcf file analysis

vcf file analysis 0 Hello everyone, I have 22 vcf file for each chr. They were in genome build hg19 so I did a liftover and convert them to hg38 genome build. Now I need just chrom and position values from these vcf files and merge them together into a…

Continue Reading vcf file analysis

Gene mutation analysis in papillary thyroid carcinoma

Introduction Thyroid tumors are the most common malignant tumors of the endocrine system, and their incidence has been increasing in the recent decades. Currently, there are some target drugs that can effectively treat PTC, and next-generation sequencing (NGS) can be used for targeted therapy. In order to make better informed…

Continue Reading Gene mutation analysis in papillary thyroid carcinoma

how to seperate names using awk

how to seperate names using awk 1 I have a file like this: “”” qboundary.0|hg19|chr10:1080001-1280001 boundary.2|hg19|chr10:3040001-3240001 boundary.4|hg19|chr10:4760001-4960001 “”” how to quickly use awk to make it look like this(seperated by TAB): “”” chr10 1080001 1280001 chr10 3040001 3240001 chr10 4760001 4960001 “”” linux awk shell • 39 views Read more…

Continue Reading how to seperate names using awk

Get Rs Number Based On Position (6 million SNPs)

Get Rs Number Based On Position (6 million SNPs) 5 I know this question has sort of been asked before….but I need to know which method would be the most efficient way to get the Rs numbers based on position (hg19) I’ve considered looping through two files, the .txt file…

Continue Reading Get Rs Number Based On Position (6 million SNPs)

gatk, ref and alt percentages .

gatk, ref and alt percentages . 0 Hello everyone, I need some info regarding how to get percentage of REF and ALT nucleotide sequence in my data. I am using gatk and currently not getting REF and ALT percentages . the command i am using for the gatk vcf file…

Continue Reading gatk, ref and alt percentages .