Tag: hg19
How to modify VCF file?
Hi community, I have a question: the SNP position in vcf file is from GRCh37/hg19, I need to change the position to GRCh38. So, I used UCSC liftover to replace the hg19 pos by GRCh38 pos and deleted some SNPs, then sorted the pos and saved to a new vcf…
extendedSequences length is not the required for DeepCpf1 (34bp)
Hi, I’m using CRISPRseek dev v. 1.35.2, installed from github (hukai916/CRISPRseek). I wanted to calculate the CFD, and the grna efficacy of a Cas12 sgRNA (my_sgrna.fa file) using Deep Cpf1. my_sgrna.fa, TTTT (PAM) + sgRNA (20bp): >sgrna1 TTTTTGTCTTTAGACTATAAGTGC Command: offTargetAnalysis(inputFilePath = “my_sgrna.fa”, format = “fasta”, header = FALSE, exportAllgRNAs =…
YP5260 – YFull YTree Info
Sample ID Country / Language Info Ref File Testing company Statistics Status I7021 Mongolia (Bulgan) C-F15910 C-F15910*, C-Y507 Hg19 .BAM Ancient 3X, 20.2 Mbp, 40 bp NEO249 Russia (Chukotskiy avtonomnyy okrug) C-F15910* —— Hg19 .BAM Ancient 1X, 7.2 Mbp, 81 bp I11696 Mongolia (Bulgan) C-Y507 —— Hg19 .BAM Ancient 2X,…
BY3 – YFull YTree Info
J-BY3 – YFull YTree Info SNPs currently defining J-BY3 BY3 / FGC15184 Sample ID Country / Language Info Ref File Testing company Statistics Status YF016315 —— J-FGC15174 J-FGC15174*, J-FGC15168*, J-FT258574 Hg38 .BAM FTDNA (Y500) 23X, 12.0 Mbp, 151 bp YF068400 Sudan (Janūb Kurdufān) J-FGC38453* —— Hg38 .BAM FTDNA (Y700)…
Allelic expression imbalance of PIK3CA mutations is frequent in breast cancer and prognostically significant
Subjects Normal breast and tumor samples were obtained with the written informed consent from donors and appropriate approval from local ethical committees, with the detailed information described in the respective original publications: normal tissue9, METABRIC14, TCGA35. Differential allelic expression analysis DNA and total RNA from 64 samples of normal breast…
YP3952 – YFull YTree Info
Q-YP3952 – YFull YTree Info Sample ID Country / Language Info Ref File Testing company Statistics Status YF073154 Russia (Chechenskaya Respublika) / Chechen Q-YP3952* —— Hg38 .BAM FTDNA (Y700) 33X, 18.2 Mbp, 151 bp YF092378 Russia (Chechenskaya Respublika) / Chechen Q-BZ87 —— Hg38 .BAM FTDNA (Y700) 55X, 18.5 Mbp, 151…
Genome hg19**not found in homer config
I want to use the Homer to do annotation. After I input “annotatePeaks.pl 31512_TH0_D0.bed hg19 > 31512_TH0_D0.ann.txt” , it shows “!!!!Genome hg19 not found in /home/jenny/NGStools/homer/.//config.txt” I used the command “perl /home/jenny/NGStools/homer/configureHomer.pl -install hg19” to install it, and I am sure I installed the hg19 for the homer Then input…
Detailed differences between sambamba and samtools
3 month , My first post in the new student group , The false-positive mutation appears because duplicates mark Not enough ?, Tells the story of supplementary read It won’t be GATK MarkDuplicates Marked as duplicates The problem of . after , In response to this question , I began…
Z697 – YFull YTree Info
R-Z697 – YFull YTree Info SNPs currently defining R-Z697 Z697 Sample ID Country / Language Info Ref File Testing company Statistics Status YF009397 Sweden (Västra Götalands län) R-Z697* —— Hg19 .BAM FTDNA (Y500) 81X, 14.4 Mbp, 165 bp YF084333 Italy (Chieti) R-FT285492 —— Hg38 .BAM Dante Labs 14X, 23.4…
Transcription Start Site
Transcription Start Site 2 What are the best databases to check out the transcription start sites of specific genes in human genome? TSS • 130 views wget -q -O – “http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/wgEncodeGencodeBasicV19.txt.gz” | gunzip -c | awk ‘(int($7)< int($8)) {if($4==”+”) {printf(“%s\t%d\t%d\t%s\t%s\n”,$3,$7,int($7)+1,$2,$4);}else {printf(“%s\t%d\t%d\t%s\t%s\n”,$3,int($8)-3,$8,$2,$4);}}’ chr1 69090 69091 ENST00000335137.3 + chr1 139306 139309 ENST00000423372.3…
CTS1346 – YFull YTree Info
Sample ID Country / Language Info Ref File Testing company Statistics Status HGDP01351 China, People’s Republic of O-F3607* —— Hg38 .BAM Scientific 16X, 23.6 Mbp, 151 bp YF079316 —— O-Y224790 —— Hg19 .BAM 23mofang 58X, 21.3 Mbp, 150 bp HG00583 China, People’s Republic of O-Y224790 —— Hg19 .BAM Scientific ——…
A114 – YFull YTree Info
R-A114 – YFull YTree Info SNPs currently defining R-A114 FGC78244 A114(H) H Sample ID Country / Language Info Ref File Testing company Statistics Status YF067576 France (Ille-et-Vilaine) R-A114* —— Hg19 .BAM Dante Labs 12X, 23.0 Mbp, 151 bp YF088360 United States (Virginia) R-CTS4466* —— Hg38 .BAM FTDNA (Y700)…
Sam file is not written
Dear all, It writes the following in the log file: [08-02 01:26:25] Running Step 2: BWA … bwa_wrap /work/pathology/s206442/dbet_project/hg19/hg19.fa Output3/out_1.valid.fastq 6 Output3/out_1.valid.sam 0 Running BWA on trimmed reads … bwa mem -t 6 /work/pathology/s206442/dbet_project/hg19/hg19.fa Output3/out_1.valid.fastq | samtools view -h -F 2048 – > Output3/out_1.valid.sam However, the sam file size is…
F13864 – YFull YTree Info
Sample ID Country / Language Info Ref File Testing company Statistics Status ERS5240131 Singapore C-F13864* —— Hg19 .BAM Scientific 7X, 22.9 Mbp, 150 bp YF076683 China, People’s Republic of (Shandong) C-F13864* —— Hg19 .BAM 23mofang 57X, 21.2 Mbp, 150 bp YF071813 —— C-F13864* —— Hg19 .BAM 23mofang 21X, 21.8 Mbp,…
L1193 – YFull YTree Info
I-L1193 – YFull YTree Info SNPs currently defining I-L1193 L1193 FGC87558 Y72031 Sample ID Country / Language Info Ref File Testing company Statistics Status ASH1 Ireland (Tipperary) I-L1193* —— Hg19 .BAM Ancient 1X, 10.5 Mbp, 101 bp PB581 Ireland (Clare) I-L1193* —— Hg19 .BAM Ancient 2X, 15.8…
subpopulations available in MafH5.gnomAD.v3.1.1.GRCh38
subpopulations available in MafH5.gnomAD.v3.1.1.GRCh38 1 @b14a6f0d Last seen 16 hours ago United States Are subpopulation MAFs available for gnomADv.3.1.1 with any package, like they are in MafDb.gnomAD.r2.1.hs37d5? I’m trying to use Genomic Scores to obtain all variants in a genomic range with MAF in any subpopulation >= cutoff. I tried…
Y18411 – YFull YTree Info
J-Y18411 – YFull YTree Info Sample ID Country / Language Info Ref File Testing company Statistics Status YF072520 Albania J-BY111710 —— Hg19 .BAM Dante Labs 10X, 22.8 Mbp, 151 bp YF067307 Palestine (Nablus) J-BY111710 —— Hg38 .BAM FTDNA (Y700) 34X, 18.7 Mbp, 151 bp NA20827 Italy (Firenze) J-CTS3330 —— Hg19…
M8498 – YFull YTree Info
B-M8498 – YFull YTree Info Sample ID Country / Language Info Ref File Testing company Statistics Status YF004283 Saudi Arabia B-M8498* —— Hg19 .BAM FTDNA (Y500) 43X, 13.7 Mbp, 165 bp HGDP00992 Namibia B-M7650* —— Hg38 .BAM Scientific 18X, 23.5 Mbp, 151 bp YF013963 —— B-Y82361 —— Hg38 .BAM FTDNA…
FGC15109 – YFull YTree Info
I-FGC15109 – YFull YTree Info SNPs currently defining I-FGC15109 FGC15109 Sample ID Country / Language Info Ref File Testing company Statistics Status SZ43 Hungary (Somogy) I-BY138* —— Hg19 .BAM Ancient 8X, 22.8 Mbp, 32 bp YF010533 —— I-BY138* —— Hg19 .BAM FTDNA (Y500) 73X, 14.9 Mbp, 165 bp YF019250…
bedtools -u not giving unique files
bedtools -u not giving unique files 1 The following are the steps Im following: First step to extract sample using bed file is this (here the bedfile is input bedfile converted to Hg38): tabix -h -R Hg19_to_Hg38_sorted.bed.gz gnomad.genomes.v{g_version}.hgdp_tgp.chr{chr}.vcf.bgz | perl {vcftools} -c {sample_name} > {sample_name}_out.vcf’ output({sample_name}_out.vcf’) chr2 113982416 rs56177103 TATAAAATAAAATAAA…
BTG2 gene predicts poor outcome in PT-DLBCL
Introduction Primary testicular diffuse large B-cell lymphoma (PT-DLBCL) is a rare and aggressive form of mature B-cell lymphoma.1–3 PT-DLBCL was the most common type of testicular tumor in men aged over 60 and characterized by painless uni- or bilateral testicular masses with infrequent constitutional symptoms.4–6 PT-DLBCL shows significant extranodal tropism,…
FGC19851 – YFull YTree Info
R-FGC19851 – YFull YTree Info SNPs currently defining R-FGC19851 FGC19851 Sample ID Country / Language Info Ref File Testing company Statistics Status YF072967 United States (Georgia) R-FGC19851* —— Hg38 .BAM FTDNA (Y700) 34X, 18.7 Mbp, 151 bp YF009427 —— R-FGC65264* —— Hg19 .BAM FTDNA (Y500) 38X, 12.8 Mbp, 165…
FGC35106 – YFull YTree Info
Sample ID Country / Language Info Ref File Testing company Statistics Status YF016938 Saudi Arabia (Ar Riyāḍ) J-FGC35106 YF081770 | J-FGC35106*, J-FGC58682* Hg38 .BAM FTDNA (Y500) 30X, 11.5 Mbp, 151 bp YF016937 Saudi Arabia (Ar Riyāḍ) J-FGC35106 YF081769 | J-FGC35106*, J-FGC58682* Hg38 .BAM FTDNA (Y500) 37X, 12.5 Mbp, 151 bp…
YP4024 – YFull YTree Info
Sample ID Country / Language Info Ref File Testing company Statistics Status ERS2478532 Turkmenistan Q-YP4024* —— Hg19 .BAM Scientific 17X, 16.7 Mbp, 151 bp YF006625 Russia (Tomskaya oblast’) / Selkup Q-YP4024* —— Hg19 .BAM FTDNA (Y500) 67X, 14.8 Mbp, 165 bp DA162 Russia (Severnaya Osetiya-Alaniya, Respublika) Q-BZ5214* —— Hg19 .BAM…
Y570 – YFull YTree Info
Sample ID Country / Language Info Ref File Testing company Statistics Status AF2 —— Q-Y570 Q-Y570*, Q-F746* Hg19 .BAM Ancient 1X, 1.3 Mbp, 94 bp YF093124 —— Q-M120* —— Hg38 .BAM Nebula Genomics 57X, 23.6 Mbp, 150 bp Kolyma1 Russia (Sakha, Respublika [Yakutiya]) Q-Y222276* —— Hg19 .BAM Ancient 7X, 13.4…
use tcgabiolinks package to download TCGA data
TCGA Data download in terms of ease of use ,RTCGA The bag should be better , And because it’s already downloaded data , The use is relatively stable . But also because of the downloaded data , There is no guarantee that the data is new .TCGAbiolinks The package is…
Bioconductor – r3Cseq
This package is for version 3.3 of Bioconductor; for the stable, up-to-date release version, see r3Cseq. Analysis of Chromosome Conformation Capture and Next-generation Sequencing (3C-seq) Bioconductor version: 3.3 This package is an implementation of data analysis for the long-range interactions from 3C-seq assay. Author: Supat Thongjuea, MRC Molecular…
PF6747 – YFull YTree Info
E-PF6747 – YFull YTree Info Sample ID Country / Language Info Ref File Testing company Statistics Status YF010216 Azerbaijan (Qəbələ) E-PF6747* —— Hg19 .BAM FTDNA (Y500) 50X, 13.7 Mbp, 165 bp YF064736 Egypt (Al Minūfīyah) E-FT97857* —— Hg38 .BAM FTDNA (Y700) 35X, 18.5 Mbp, 151 bp YF093064 Yemen (Tā’izz) E-Y280593…
Z2039 – YFull YTree Info
Sample ID Country / Language Info Ref File Testing company Statistics Status YF003382 Finland (Länsi-Suomen lääni) I-Z2040* —— Hg19 .BAM FTDNA (Y500) 47X, 13.3 Mbp, 165 bp YF067917 Ireland I-FGC69701* —— Hg19 .BAM Dante Labs 9X, 22.9 Mbp, 151 bp YF078735 Belarus (Vicebskaja voblasc’) / Polish I-FGC69702 —— Hg38 .VCF…
Extract longest transcript or longest CDS transcript from GTF annotation file or gencode transcripts fasta file.
There are four types of methods to extract longest transcript or longest CDS regeion with longest transcript from transcripts fasta file or GTF file. 1.Extract longest transcript from gencode transcripts fasta file. 2.Extract longest transcript from gtf format annotation file based on gencode/ensembl/ucsc database. 3.Extract longest CDS regeion with longest…
BY7447 – YFull YTree Info
E-BY7447 – YFull YTree Info SNPs currently defining E-BY7447 BY7447 Sample ID Country / Language Info Ref File Testing company Statistics Status YF075635 Yemen (Al Bayḑā’) E-FT183181 —— Hg38 .BAM FTDNA (Y700) 39X, 18.2 Mbp, 151 bp YF067501 Yemen (Şan’ā’) E-FT183181 —— Hg38 .BAM FTDNA (Y700) 44X, 18.8 Mbp,…
can not upload GTF file to UCSC genomebrowser
We are unable to reproduce the error you are seeing and we also recentlyexperienced temporary issues with our site. Please let us know if youare still having this problem. Post by Gang WeiDear manager of UCSC Genome Browser,Glad to write to you. I’m now using UCSC genome browser to check…
DF109 – YFull YTree Info
Sample ID Country / Language Info Ref File Testing company Statistics Status YF016926 Ireland R-DF109 R-DF109*, R-A18726* Hg38 .BAM FTDNA (Y500) 27X, 12.7 Mbp, 165 bp YF016394 United States (Ohio) R-DF109 R-DF109*, R-A18726* Hg38 .BAM FTDNA (Y500) 34X, 11.9 Mbp, 151 bp YF011566 Ireland (Mayo) R-DF109 R-DF109*, R-A18726*, R-FGC23742* Hg38…
ZP77 – YFull YTree Info
R-ZP77 – YFull YTree Info SNPs currently defining R-ZP77 ZP77 / FGC6562 Sample ID Country / Language Info Ref File Testing company Statistics Status YF008362 —— R-ZP77* —— Hg19 .BAM FTDNA (Y500) 41X, 13.8 Mbp, 165 bp YF067652 Unknown R-BY40744 —— Hg38 .BAM FTDNA (Y700) 36X, 18.7 Mbp, 151…
Convert DNAStringSet to a list of elements in R? (Error in seq[[1]][[“seq”]] : subscript out of bounds in R)
I have a bed file which contains DNA sequences information as follow: ** track name=”194″ description=”194 methylation (sites)” color=0,60,120 useScore=1 chr1 15864 15866 FALSE 894 + chr1 534241 534243 FALSE 921 – chr1 710096 710098 FALSE 729 + chr1 714176 714178 FALSE 12 – chr1 720864 720866 FALSE 988 -…
Download full list of SNPs and their coordinates in hg38
Download full list of SNPs and their coordinates in hg38 3 What is the best / standard place to get a full list of SNPs and their coordinates in hg38? I downloaded the SNPsnap database, but just realized that those coordinates are in hg19. I’m trying to figure out how…
Bioconductor – RiboCrypt
DOI: 10.18129/B9.bioc.RiboCrypt Interactive visualization in genomics Bioconductor version: Release (3.14) R Package for interactive visualization and browsing NGS data. It contains a browser for both transcript and genomic coordinate view. In addition a QC and general metaplots are included, among others differential translation plots and gene expression plots….
Bwa on multiple processor
Hi Guys, When I am trying to run bwa mem on multiple processor, I am getting error as : > mpirun -np 16 bwa mem hg19-agilent.fasta R1.fastq R2.fastq | samtools sort -o aln.bam [M::bwa_idx_load_from_disk] read 0 ALT contigs [M::bwa_idx_load_from_disk] read 0 ALT contigs [M::bwa_idx_load_from_disk] read 0 ALT contigs [M::bwa_idx_load_from_disk] read…
Bioconductor – derfinder (development version)
DOI: 10.18129/B9.bioc.derfinder This is the development version of derfinder; for the stable release version, see derfinder. Annotation-agnostic differential expression analysis of RNA-seq data at base-pair resolution via the DER Finder approach Bioconductor version: Development (3.15) This package provides functions for annotation-agnostic differential expression analysis of RNA-seq data. Two…
Alignment report
Alignment report 0 Hi Guys, I did alignment of R1 and R2 fastq files with reference genome using bwa mem and got bam file. Now, I want to check whether the alignment is done correctly and alignment percentage,coverage etc. I run following command: bwa mem hg19.fasta R1.fastq R2.fastq | samtools…
Bioconductor – ChIPQC
This package is for version 3.1 of Bioconductor; for the stable, up-to-date release version, see ChIPQC. Quality metrics for ChIPseq data Bioconductor version: 3.1 Quality metrics for ChIPseq data Author: Tom Carroll, Wei Liu, Ines de Santiago, Rory Stark Maintainer: Tom Carroll <tc.infomatics at gmail.com>, Rory Stark <rory.stark…
identical(current_classes, .UCSC_TXCOL2CLASS) is not TRUE
GenomicFeatures::makeTxDbFromUCSC failing with an error: identical(current_classes, .UCSC_TXCOL2CLASS) is not TRUE 1 @mikhail-dozmorov-23744 Last seen 1 day ago United States Hi,The GenomicFeatures::makeTxDbFromUCSC function fails with: library(GenomicFeatures) > hg19.refseq.db <- makeTxDbFromUCSC(genome=”hg19″, table=”refGene”) Download the refGene table … Error in .fetch_UCSC_txtable(genome(session), tablename, transcript_ids = transcript_ids) : identical(current_classes, .UCSC_TXCOL2CLASS) is not TRUE OK The…
QIAGEN Bioinformatics Manuals
The Reference Data Manager The QIAGEN Sets Reference Data Library tab gives access to the reference data used with the CLC Haplotype Calling plugin ready-to-use workflow. From the wizard you can download and configure the reference data. For the full documentation relating to QIAGEN Sets, please see the QIAGEN Sets…
help with CrossMap
help with CrossMap 0 Hello all, I would really appreciate your help as I am new to working with different file builds and having a setback lifting a vcf file from build hg38 to hg19. in essence, using CrossMap the chromosome value gets altered. Like for example, below is the…
From where to get a comprehensive list of genes with gene start, gene end and chromosome for build 37?
From where to get a comprehensive list of genes with gene start, gene end and chromosome for build 37? 0 Hi all, I am trying to annotate list of genes with gene start, gene end (build37) and chromosome. I mapped most of the genes from a list downloaded from Biomart/UCSC,…
Bioconductor – ProteoDisco
DOI: 10.18129/B9.bioc.ProteoDisco Generation of customized protein variant databases from genomic variants, splice-junctions and manual sequences Bioconductor version: Release (3.14) ProteoDisco is an R package to facilitate proteogenomics studies. It houses functions to create customized (mutant) protein databases based on user-submitted genomic variants, splice-junctions, fusion genes and manual transcript…
How to convert bedgraph file with bins into GRanges object?
You could convert your bedGraph bins from hg18 to hg19 using liftover, so you can overlap them with your peaks. You would read them into a GRanges object, then hand this to the liftover function to translate from hg18 to hg19, then unlist the results to get back a regular…
Why single cell R2 fastq have no read identified by bowtie2 ?
Why single cell R2 fastq have no read identified by bowtie2 ? 0 When we input R2 fastq.gz into bowtie2, human sequence was not removed ( ${base}_host_removed is zero). for i in $(find ./ -type f -name “.fastq.gz” | while read F; do basename $F | rev | cut -c…
Generating Multiple Species Alignment Of Novel Transcripts For Phylocsf
Short version: How would you go about generating multiple species alignments of novel transcripts from bos taurus (assembly UMD3.1) with human/mouse/dog for use with PhyloCSF? Context and what I’ve tried so far: Through a sequencing experiment, our lab has identified a large set of new transcripts in Bos taurus. We…
How to call LOH with FreeC
How to call LOH with FreeC 0 Good morning, I am try to infer loss of heterozygosity (LOH) from WGS data using Freec. For this purpose, I am using these parameters in the “[BAF]” section of the configuration file: [BAF] makePileup = My_somaticVCF.vcf.gz fastaFile = hg19.fa SNPfile = hg19_snp142.SingleDiNucl.1based.txt.gz When…
Why all those reads in bam files were unmapped?
Why all those reads in bam files were unmapped? 2 ftp.sra.ebi.ac.uk/vol1/run/ERR358/ERR3580717/scrEXT030_hg19_S11_L001.bam ftp.sra.ebi.ac.uk/vol1/run/ERR358/ERR3580718/scrEXT030_hg19_S11_L002.bam ftp.sra.ebi.ac.uk/vol1/run/ERR358/ERR3580719/scrEXT030_hg19_S11_L003.bam ftp.sra.ebi.ac.uk/vol1/run/ERR358/ERR3580720/scrEXT030_hg19_S11_L004.bam ftp.sra.ebi.ac.uk/vol1/run/ERR358/ERR3580721/scrEXT030_hg19_S11_L005.bam ftp.sra.ebi.ac.uk/vol1/run/ERR358/ERR3580722/scrEXT030_hg19_S11_L006.bam For those bam files, samtools output all reads as unmapped. When I checked the bam files, I found the flag in all reads were 4. samtools view -f 4 scrEXT030_hg19_S15_L002.bam | cut -f1 >…
What is the single nucleotide polymorphism database ( dbsnp )?
The Single Nucleotide Polymorphism Database (dbSNP) is a free public archive for genetic variation within and across different species developed and hosted by the National Center for Biotechnology Information (NCBI) in collaboration with the National Human Genome Research Institute (NHGRI). Furthermore, are there any databases for single nucleotide polymorphisms?As there…
SNP2TFBS
SNP2TFBS Viewing variants that affect TF binding – Results – SNP identifier Chrom id (Feb 2009 GRCh37/hg19) SNP position NB. of TF factors rs1800629 dbSNP NC_000006.11 (chr6) 31543031 1 TF name PWM score on Ref PWM score on Alt Score difference Low Score Thr High Score Thr MZF1_1-4 1024 ….
Bioconductor – Rariant
This package is for version 3.0 of Bioconductor; for the stable, up-to-date release version, see Rariant. Identification and Assessment of Single Nucleotide Variants through Shifts in Non-Consensus Base Call Frequencies Bioconductor version: 3.0 The ‘Rariant’ package identifies single nucleotide variants from sequencing data based on the difference of…
Bioconductor – BSgenome.Hsapiens.UCSC.hg19
This package is for version 3.2 of Bioconductor; for the stable, up-to-date release version, see BSgenome.Hsapiens.UCSC.hg19. Full genome sequences for Homo sapiens (UCSC version hg19) Bioconductor version: 3.2 Full genome sequences for Homo sapiens (Human) as provided by UCSC (hg19, Feb. 2009) and stored in Biostrings objects. Author:…
‘Deprecated’ Error with ngs.plot.r after sys admin update Bioconductor
Loading R libraries…..Done Configuring variables… Using database: /home/yensin/software/ngsplot/database/hg19/hg19.ensembl.genebody.protein_coding.RData Done Analyze bam files and calculate coverageWarning message: ‘isNotPrimaryRead’ is deprecated. Use ‘isSecondaryAlignment’ instead. See help(“Deprecated”) ………………………………………………………………………………………………………………………………………………………………………………….Done Plotting figures…Error in seq.default(min.e, max.e, length.out = ncolor + 1) : ‘from’ cannot be NA, NaN or infinite Calls: plotheat -> ColorBreaks -> seq ->…
Best tools for calling structural variants from 2 assemblies?
Best tools for calling structural variants from 2 assemblies? 0 Dear community, I have the fasta files of 2 assemblies of the human genome (for example hg19 and hg38). What would be the best tools to call structural variants from these 2 fasta files? Most of the tools I know…
How can I find reads for specific elements in a bam file?
Hi, I have a specific set of 1,009 elements in a bed file that I am interested in. I also have bam files which I would like to process to know the number of reads for these specific elements (for comparison purposes). I understand some simple uses of samtools commands,…
difference between treat_pileup and bdgcmp fold enrichment tracks macs2
difference between treat_pileup and bdgcmp fold enrichment tracks macs2 0 Hello, I created bigwig file from a treat_pileup.bdg file generated by macs2 and also used treat_pileup.bdg and control_lambda.bdg with macs2 bdgcmp. Here is my codes; macs2 callpeak -t sample.bam -c sample_input.bam -g hs -f BAM -q 0.001 –bdg –outdir /folder…
Convert UCSC isoform ID to Ensembl transcript ID
Convert UCSC isoform ID to Ensembl transcript ID 2 Hello everyone, I have a few UCSC isoform IDs and I would like to convert them to the corresponding Ensembl transcript IDs. I have tried to use some online conversion tools (such as DAVID), looked up the UCSC annotation files, but…
Gene coordinates for hg19
Gene coordinates for hg19 0 Hi, is there a list which gives for each gene its starting coordinate (chr:pos) and its ending one with respect to the hg19 reference genome? I have a list of positions on hg19 expressed as chr:pos and I have to assign each one to the…
Bioconductor – FunciSNP
DOI: 10.18129/B9.bioc.FunciSNP This package is for version 3.11 of Bioconductor; for the stable, up-to-date release version, see FunciSNP. Integrating Functional Non-coding Datasets with Genetic Association Studies to Identify Candidate Regulatory SNPs Bioconductor version: 3.11 FunciSNP integrates information from GWAS, 1000genomes and chromatin feature to identify functional SNP in…
Why may BOLT-LMM and SAIGE (quantitative, linear-mixed model) yield different results when ran on the absolutely the same dataset?
As a validation experiment, I have run the same GWAS of a quantitative phenotype derived from the UKBiobank, alongside the genomic data from the UKBiobank, once using the program BOLT-LMM and once using SAIGE linear mixed model (with selected quantitative trait tag). I wanted to see if the results would…
Alternate nucleotide is more frequent than reference nucleotide. OMG I’m dizzy. How do I stop the twirl?
This is due to the fact that the very reference genomes that we use for re-alignment are themselves based on individuals who carry rare risk alleles. Thus, when we call variants against these genomes, we are, at many loci, comparing against rare disease risk alleles. As the best/worst example (depending…
Bioconductor – ChIPComp
This package is for version 3.4 of Bioconductor; for the stable, up-to-date release version, see ChIPComp. Quantitative comparison of multiple ChIP-seq datasets Bioconductor version: 3.4 ChIPComp detects differentially bound sharp binding sites across multiple conditions considering matching control. Author: Hao Wu, Li Chen, Zhaohui S.Qin, Chi Wang Maintainer:…
Exon coordinates and sequence
I did it like that: 1- Download refGene.txt.gz and hg19.fasta from the UCSC goldenpath. ( note: convert hg19.2bit to hg19.fa using twoBitToFa ) 2- Create a bed file with exon coordiniate using my awk script // to_transcript.awk BEGIN { OFS =”t” } { name=$2 name2=$13 sens = $4 ==”+” ?…
UCSC Gene Table Exon Frames Generating Stop Codons
Hi, I’m using UCSC gene tables, and I am running into trouble with interpreting exon frames. In some cases, using the exon frame from the tables creates stop codons, which shouldn’t be happening in coding regions. As an example, from the hg19 gene NM_001369291 on chromosome 22, I have this…
Answer: Highly mapped to introns
I think your problem is that your bed file doesn’t match the genome/gtf you used. I think it’s too old. My $gtf is the version 104 one like yours. zcat hg19_Ensembl_gene.bed.gz | head chr1 **66999065** 67210057 **ENST00000237247** 0 + 67000041 67208778 0 27 25,123,64,25,84,57,55,176,12,12,25,52,86,93,75,501,81,128,127,60,112,156,133,203,65,165,1302, 0,863,92464,99687,100697,106394,109427,110161,127130,134147,137612,138561,139898,143621,146295,148486,150724,155765,156807,162051,185911,195881,200365,205952,207275,207889,209690, grep ENST00000237247 $gtf 1 havana…
Converting between UCSC id and gene symbol with bioconductor annotation resources
You need to use the Homo.sapiens package to make that mapping. > library(Homo.sapiens) Loading required package: AnnotationDbi Loading required package: stats4 Loading required package: BiocGenerics Loading required package: parallel Attaching package: ‘BiocGenerics’ The following objects are masked from ‘package:parallel’: clusterApply, clusterApplyLB, clusterCall, clusterEvalQ, clusterExport, clusterMap, parApply, parCapply, parLapply, parLapplyLB, parRapply,…
Highly mapped to introns
Highly mapped to introns 0 Hi, I am analyzing RNA-seq data from human blood samples. I checked the read distribution using RSeQC read_distribution after mapping by STAR. Usually, I get more than 80% of reads mapped to exons. However, at this time, the result showed only several % were mapped…
High tumor mutation burden and DNA repair gene mutations
Introduction Anaplastic lymphoma kinase (ALK)‑fusion genes represent a small but important part of oncogenic driver mutations in NSCLC, accounting for approximately 3%‑7% of all cases worldwide.1,2 Small molecule tyrosine kinase inhibitors (TKIs) are the standard therapy for ALK-rearranged NSCLC. Crizotinib, a first-generation TKI, is the most widely used targeted drug…
Produce PCA bi-plot for 1000 Genomes Phase III
Note1 – Previous version: Produce PCA bi-plot for 1000 Genomes Phase III in VCF format (old) Note2 – this data is for hg19 / GRCh37 Note3 – GRCh38 data is available HERE The tutorial has been updated based on the 1000 Genomes Phase III imputed genotypes. The original tutorial was…
Bioconductor – ramr
DOI: 10.18129/B9.bioc.ramr Detection of Rare Aberrantly Methylated Regions in Array and NGS Data Bioconductor version: Release (3.13) ramr is an R package for detection of low-frequency aberrant methylation events in large data sets obtained by methylation profiling using array or high-throughput bisulfite sequencing. In addition, package provides functions…
Bioconductor – methylationArrayAnalysis
DOI: 10.18129/B9.bioc.methylationArrayAnalysis This package is for version 3.11 of Bioconductor; for the stable, up-to-date release version, see methylationArrayAnalysis. A cross-package Bioconductor workflow for analysing methylation array data. Bioconductor version: 3.11 Methylation in the human genome is known to be associated with development and disease. The Illumina Infinium methylation…
tabix for ID column
tabix for ID column 4 Hello, I’m looking for something similar to tabix. But instead of looking for informations within a given region, I would like to use the values in the ID column for quickly lookup. So for example I would like to take the compressed dbSNP file, index…
MAPQ (Mapping quality) of 0 for most reads from BWA-MEM2 (with no secondary alignment or other apparent reason)
Hello, I got a very weird output from BWA-mem2 – most of the reads have mapping quality of 0, even though there is no secondary alignment or anything else suspicious. I got sequencing data that was aligned with Novoalign to hg18, the data was bam files. I needed to realign…
Bioconductor – wateRmelon
DOI: 10.18129/B9.bioc.wateRmelon This package is for version 3.11 of Bioconductor; for the stable, up-to-date release version, see wateRmelon. Illumina 450 methylation array normalization and metrics Bioconductor version: 3.11 15 flavours of betas and three performance metrics, with methods for objects produced by methylumi and minfi packages. Author: Leonard…
getting different value list from GATK gc content and CANOES
getting different value list from GATK gc content and CANOES 0 I was trying to run codes from this paper “A machine-learning approach for accurate detection of copy-number variants from exome sequencing” I need to get data from GATK GC content and CANOES and combined them, but I got a…
UCSC knownCanonical hg19 vs. hg38
Hello, We have an FAQ page that covers this topic (genome.ucsc.edu/FAQ/FAQgenes.html#singledownload). As posted by ATpoint, it boils down to different datasets and different approaches. hg19 knownCanonical was last updated in 2013 and built primarily from RefSeq and GenBank sequences and a few other sources. One isoform was identified from each…
Get rsID for a list of SNPs in an entire GWAS sumstats file
Here is a fairly efficient way to do this; assuming hg38 and BEDOPS and standard Unix tools installed. $ bedmap –echo –echo-map-id –delim ‘t’ <(awk ‘{n=split($0,a,/[:_]/); print “chr”a[1]”t”a[2]”t”a[2]+1″t”a[3]”https://www.biostars.org/”a[4];}’ sumstats.txt | sort-bed -) <(wget -qO- hgdownload.cse.ucsc.edu/goldenPath/hg38/database/snp150.txt.gz | gunzip -c | cut -f2-5 | sort-bed -) > answer.bed This gets around making…
UCSC liftover
UCSC liftover 2 Hi, I’m using UCSC liftover to convert hg19 to hg38. The result came out that I don’t understand. Feb. 2009 (GRCh37/hg19) → Dec. 2013 (GRCh38/hg38) – chr1:120904787 → chr1:143905854 Dec. 2013 (GRCh38/hg38) → Feb. 2009 (GRCh37/hg19) – chr1:143905854 → chr1:149400430 (I didn’t check “Allow multiple output regions”.)…
Paired-end reads reported without mates: how to play matchmaker?
Hi Everyone, I am currently looking at Acute Myeloid Leukemia (AML) paired-end WGS samples from the TARGET data ocg.cancer.gov/programs/target/target-methods#3241. A bioinformatician in our group remapped the samples from hg19 to hg38. Unfortunately, we do not have any copies of the hg19 version anymore. However, when I try to run anything…
Separate vcf file creation for matched tumor-normal samples
Separate vcf file creation for matched tumor-normal samples 0 I have received 8 matched normal tumor vcf files from our collaborators. For some reason, they didn’t provide the sequence bam files and called the variants themselves (by aligning with the reference hg19 genome for both pairs separately). Basically, I have…
Missense Variant on hg19
Missense Variant on hg19 1 Hello everybody, I am using plink for doing some statistic studies on a SNP set. I would like to use only missense variant, and I have the IDs of my SNPs of interesting. Can someone suggest me how can I download a database of homo…
karyoploteR: uncircle your genomes
Hi all, I’d like to present karyoploteR, an R/Bioconductor package we have developed to plot any data on any genome in non-circular layouts. The goal of this project was to develop a tool as flexible as Circos, but easier to use and representing genomes as straight lines instead of circles,…
Aligning Multiple paired end files together
Aligning Multiple paired end files together 1 Hi All, I have 72 paired end .fastq file for which i need to do Alignment using BWA. Since its a paired end data and my files are named as sam_001_1.fastq sam_001_2.fastq sam_002_1.fastq sam_002_2.fastq & so on Since its a paired end data…
vcf file analysis
vcf file analysis 0 Hello everyone, I have 22 vcf file for each chr. They were in genome build hg19 so I did a liftover and convert them to hg38 genome build. Now I need just chrom and position values from these vcf files and merge them together into a…
Gene mutation analysis in papillary thyroid carcinoma
Introduction Thyroid tumors are the most common malignant tumors of the endocrine system, and their incidence has been increasing in the recent decades. Currently, there are some target drugs that can effectively treat PTC, and next-generation sequencing (NGS) can be used for targeted therapy. In order to make better informed…
how to seperate names using awk
how to seperate names using awk 1 I have a file like this: “”” qboundary.0|hg19|chr10:1080001-1280001 boundary.2|hg19|chr10:3040001-3240001 boundary.4|hg19|chr10:4760001-4960001 “”” how to quickly use awk to make it look like this(seperated by TAB): “”” chr10 1080001 1280001 chr10 3040001 3240001 chr10 4760001 4960001 “”” linux awk shell • 39 views Read more…
Get Rs Number Based On Position (6 million SNPs)
Get Rs Number Based On Position (6 million SNPs) 5 I know this question has sort of been asked before….but I need to know which method would be the most efficient way to get the Rs numbers based on position (hg19) I’ve considered looping through two files, the .txt file…
gatk, ref and alt percentages .
gatk, ref and alt percentages . 0 Hello everyone, I need some info regarding how to get percentage of REF and ALT nucleotide sequence in my data. I am using gatk and currently not getting REF and ALT percentages . the command i am using for the gatk vcf file…
Bowtie2 hg19 reference for gatk MuTect
Bowtie2 hg19 reference for gatk MuTect 3 Hello, I understand that the suggested aligner to use with GATK is bwa. If I want to use Bowtie2 as the aligner, which reference file should I be using? The reference in GATK bundle (Homo_sapiens_assembly19.fasta) does not seem to work with Bowtie2 and…
How I do lift over a Plink bim file from Hg18 to Hg19.
How I do lift over a Plink bim file from Hg18 to Hg19. 2 I’ve got some very old SNP data from Data Dryad. The BIM files uses coordinates from Hg18, but my dataset uses coordinates from Hg19. I was wondering if anyone knows how to liftover coordinates in a…
liftover using genome browser
liftover using genome browser 0 Hello everyone, I have a file which is hg38 build. I want to do a liftover and change it to hg19. I thought of using liftover tool from UCSC genome browser. I realise that the input file should be bed format. My file has only…
Pericentromeric noncoding RNA changes DNA binding of CTCF and inflammatory gene expression in senescence and cancer
Significance During the aging process, senescent cells secrete inflammatory factors, causing various age-related pathologies. Thus, controlling the senescence-associated secretory phenotype (SASP) can tremendously benefit human health. Although SASP seems to be induced by the alteration of chromosomal organization, its underlying mechanism remains unclear. Here, it has been revealed that noncoding…
BSgenomes for HIV viruses
BSgenomes for HIV viruses 0 Dear Biostars users, I wonder if there are BSgenomes available for HIV viruses? I am trying to identify clusters from CLIP-seq data mapping to the HIV genome with wavClusteR. I stuck at one step as below: `require(BSgenome.Hsapiens.UCSC.hg19) wavclusters <- filterClusters( clusters = clusters, highConfSub =…
Calling variants on reads with MAPQ=0 on HaplotypeCaller or bcftools mpileup
Calling variants on reads with MAPQ=0 on HaplotypeCaller or bcftools mpileup 2 I am working with about 500 samples of human exome data. used hg19 to align my reads and ran a standard best-practices GATK workflow. Later only to realise that a small 1Mb loci has not mapped properly due…
What is the difference between GRCh37 and hs37? And hg19?
This is what I have found so far. Please correct me if I am wrong. GRCh37 w/o patches includes the primary assembly (22 autosomal, X. Y, and non-chromosomal supecontigs) and alternate scaffolds, but not a reference mitogenome. Non-chromosomal supercontigs are the unlocalized and unplaced scaffolds. The rCRS reference mitogenome in…
Non-repeat human genome dataset
Non-repeat human genome dataset 1 Could anyone please point me to where I could find a dataset of non-repeat sequences for the human ref genome. I’m not sure if it’s still regarded as true, but I saw that possibly 2/3 of the human genome contains repeats. Is there a place…
extract entire header from BED file to FASTA
extract entire header from BED file to FASTA 1 Hi, Is there any way one can extract the entire header from a BED file while using bedtools getfasta command and write it in the FASTA output ? Have tried using bedtools getfasta -fi hg19.fa -bed file.bed -fo test.fasta -fullHeader but…