Tag: hg38

Index of /goldenPath/macEug2/vsHg38/reciprocalBest

This directory contains reciprocal-best netted chains for macEug2-hg38. – macEug2.hg38.rbest.net.gz: macEug2-referenced recip.best net to hg38. – macEug2.hg38.rbest.chain.gz: chains extracted from the recip.best net. These can be passed to the liftOver program to translate coords from macEug2 to hg38 through the recip.best net. – hg38.macEug2.rbest.net.gz: hg38-referenced recip.best net. – hg38.macEug2.rbest.chain.gz: recip.best…

Continue Reading Index of /goldenPath/macEug2/vsHg38/reciprocalBest

YP5260 – YFull YTree Info

Sample ID Country / Language Info Ref File Testing company Statistics Status I7021 Mongolia (Bulgan) C-F15910 C-F15910*, C-Y507 Hg19 .BAM Ancient 3X, 20.2 Mbp, 40 bp NEO249 Russia (Chukotskiy avtonomnyy okrug) C-F15910* —— Hg19 .BAM Ancient 1X, 7.2 Mbp, 81 bp I11696 Mongolia (Bulgan) C-Y507 —— Hg19 .BAM Ancient 2X,…

Continue Reading YP5260 – YFull YTree Info

08 compare visualization results of different annotation software

stay In the first two sections , We compared the differences vcf Use of annotation software , And convert the demerit recorded after the annotation into maf File format , because snpeff The comment result cannot be converted to maf, So we will compare later ANNOVAR、VEP、GATK Funcatator The results of…

Continue Reading 08 compare visualization results of different annotation software

BY3 – YFull YTree Info

J-BY3 – YFull YTree Info SNPs currently defining J-BY3 BY3 / FGC15184     Sample ID Country / Language Info Ref File Testing company Statistics Status YF016315 —— J-FGC15174 J-FGC15174*, J-FGC15168*, J-FT258574 Hg38 .BAM FTDNA (Y500) 23X, 12.0 Mbp, 151 bp YF068400 Sudan (Janūb Kurdufān) J-FGC38453* —— Hg38 .BAM FTDNA (Y700)…

Continue Reading BY3 – YFull YTree Info

Allelic expression imbalance of PIK3CA mutations is frequent in breast cancer and prognostically significant

Subjects Normal breast and tumor samples were obtained with the written informed consent from donors and appropriate approval from local ethical committees, with the detailed information described in the respective original publications: normal tissue9, METABRIC14, TCGA35. Differential allelic expression analysis DNA and total RNA from 64 samples of normal breast…

Continue Reading Allelic expression imbalance of PIK3CA mutations is frequent in breast cancer and prognostically significant

YP3952 – YFull YTree Info

Q-YP3952 – YFull YTree Info Sample ID Country / Language Info Ref File Testing company Statistics Status YF073154 Russia (Chechenskaya Respublika) / Chechen Q-YP3952* —— Hg38 .BAM FTDNA (Y700) 33X, 18.2 Mbp, 151 bp YF092378 Russia (Chechenskaya Respublika) / Chechen Q-BZ87 —— Hg38 .BAM FTDNA (Y700) 55X, 18.5 Mbp, 151…

Continue Reading YP3952 – YFull YTree Info

GeneActivity without Fragments file in Seurat for Integrating scRNA-seq and scATAC-seq

Hi all, I am new to R and Seurat, and I am following Seurat tutorials to find anchors between RNA-seq and ATAC-seq data according to: Combining the two tutorials is difficult for a cell line data set I am using for SNARE-seq Human here. I managed to run the following…

Continue Reading GeneActivity without Fragments file in Seurat for Integrating scRNA-seq and scATAC-seq

Variant #0000255165 (NC_000010.10:g.123278248A>G, FGFR2(NM_000141.4):c.939+1245T>C) – Global Variome shared LOVD

Variant #0000255165 (NC_000010.10:g.123278248A>G, FGFR2(NM_000141.4):c.939+1245T>C) Chromosome 10 Allele Unknown Affects function (as reported) Probably does not affect function Affects function (by curator) Not classified Classification method – Clinical classification likely benign DNA change (genomic) (Relative to hg19 / GRCh37) g.123278248A>G DNA change (hg38) g.121518734A>G Published as FGFR2(NM_022970.3):c.1035T>C (p.Y345=) ISCN – DB-ID FGFR2_000119 Variant remarks VKGL data sharing initiative Nederland Reference – ClinVar ID – dbSNP ID – Origin CLASSIFICATION record Segregation –…

Continue Reading Variant #0000255165 (NC_000010.10:g.123278248A>G, FGFR2(NM_000141.4):c.939+1245T>C) – Global Variome shared LOVD

Parse a file of strings in python separated by newline into a json array

I don’t see where you’re actually reading from the file in the first place. You have to actually read your path_text.txt before you can format it correctly right? with open(‘path_text.txt’,’r’,encoding=’utf-8′) as myfile: content = myfiel.read().splitlines() Which will give you [‘/gp/oi/eu/gatk/inputs/NA12878_24RG_med.hg38.bam’, ‘/gp/oi/eu/gatk/inputs/NA12878_24RG_small.hg38.bam’] in content. Now if you want to write this…

Continue Reading Parse a file of strings in python separated by newline into a json array

Z697 – YFull YTree Info

R-Z697 – YFull YTree Info SNPs currently defining R-Z697 Z697     Sample ID Country / Language Info Ref File Testing company Statistics Status YF009397 Sweden (Västra Götalands län) R-Z697* —— Hg19 .BAM FTDNA (Y500) 81X, 14.4 Mbp, 165 bp YF084333 Italy (Chieti) R-FT285492 —— Hg38 .BAM Dante Labs 14X, 23.4…

Continue Reading Z697 – YFull YTree Info

Y140591 – YFull YTree Info

R-Y140591 – YFull YTree Info Sample ID Country / Language Info Ref File Testing company Statistics Status YF067865 Germany R-Y140591* —— Hg38 .BAM FTDNA (Y700) 52X, 18.7 Mbp, 151 bp YF076495 Germany R-FT167842 —— Hg38 .BAM FTDNA (Y700) 49X, 18.3 Mbp, 151 bp YF067633 Germany R-FT167842 —— Hg38 .BAM FTDNA…

Continue Reading Y140591 – YFull YTree Info

CTS1346 – YFull YTree Info

Sample ID Country / Language Info Ref File Testing company Statistics Status HGDP01351 China, People’s Republic of O-F3607* —— Hg38 .BAM Scientific 16X, 23.6 Mbp, 151 bp YF079316 —— O-Y224790 —— Hg19 .BAM 23mofang 58X, 21.3 Mbp, 150 bp HG00583 China, People’s Republic of O-Y224790 —— Hg19 .BAM Scientific ——…

Continue Reading CTS1346 – YFull YTree Info

A114 – YFull YTree Info

R-A114 – YFull YTree Info SNPs currently defining R-A114 FGC78244     A114(H)     H Sample ID Country / Language Info Ref File Testing company Statistics Status YF067576 France (Ille-et-Vilaine) R-A114* —— Hg19 .BAM Dante Labs 12X, 23.0 Mbp, 151 bp YF088360 United States (Virginia) R-CTS4466* —— Hg38 .BAM FTDNA (Y700)…

Continue Reading A114 – YFull YTree Info

F13864 – YFull YTree Info

Sample ID Country / Language Info Ref File Testing company Statistics Status ERS5240131 Singapore C-F13864* —— Hg19 .BAM Scientific 7X, 22.9 Mbp, 150 bp YF076683 China, People’s Republic of (Shandong) C-F13864* —— Hg19 .BAM 23mofang 57X, 21.2 Mbp, 150 bp YF071813 —— C-F13864* —— Hg19 .BAM 23mofang 21X, 21.8 Mbp,…

Continue Reading F13864 – YFull YTree Info

‘No genomes installed!’ error from getREF

I was trying to use the getPlotSetArray() function, but I got the error ‘No genomes installed!’ from the getREF function. I digged into the problem and it turns out that in the latest version of the BSgenome package the output of the function BSgenome::installed.genomes(splitNameParts=TRUE) changed from: pkgname organism provider provider_version…

Continue Reading ‘No genomes installed!’ error from getREF

Building custom hg38 – alt contigs

I am exploring modifications of hg38 like these: github.com/mebbert/Dark_and_Camouflaged_genes Starting from the regular bcbio hg38 data installation Masking hg38.fa using bedtools maskfasta Generating indexes using bcbio_setup_genome.py for seq and bwa as described in the manual The bwa directory then contains ├── bwa │   ├── hg38_masked.fa.amb │   ├── hg38_masked.fa.ann │   ├──…

Continue Reading Building custom hg38 – alt contigs

L1193 – YFull YTree Info

I-L1193 – YFull YTree Info SNPs currently defining I-L1193 L1193     FGC87558     Y72031     Sample ID Country / Language Info Ref File Testing company Statistics Status ASH1 Ireland (Tipperary) I-L1193* —— Hg19 .BAM Ancient 1X, 10.5 Mbp, 101 bp PB581 Ireland (Clare) I-L1193* —— Hg19 .BAM Ancient 2X, 15.8…

Continue Reading L1193 – YFull YTree Info

3 -tag XM” failed! when running rsem-calculate-expression

Dear sir, When I ran “rsem-calculate-expression –paired-end –alignments -p 8input.bam” gencodev22 ./out. I got error message rsem-parse-alignments ../bowtie2/hg38 ./rsem-out.temp/rsem-out ./rsem-out.stat/rsem-out /NGS_Storage/Debbie/RNA-seq/variant_calling_20210602/RNA-leukemia002A-906.para.bam 3 -tag XM Read A00355:209:H3KTLDSX2:2:2606:24677:17425: The adjacent two lines do not represent the two mates of a paired-end read! (RSEM assumes the two mates of a paired-end read should…

Continue Reading 3 -tag XM” failed! when running rsem-calculate-expression

At ABRF Meeting, T2T Consortium Describes Improvements of Complete Human Genome

PALM SPRINGS, Calif. — Researchers from the Telomere-to-Telomere (T2T) Consortium have generated an assembly of a complete human reference genome that could lead to better variant calling in the clinic and inform new studies of cell biology. The results of the project were presented by Karen Miga, an investigator at…

Continue Reading At ABRF Meeting, T2T Consortium Describes Improvements of Complete Human Genome

Y18411 – YFull YTree Info

J-Y18411 – YFull YTree Info Sample ID Country / Language Info Ref File Testing company Statistics Status YF072520 Albania J-BY111710 —— Hg19 .BAM Dante Labs 10X, 22.8 Mbp, 151 bp YF067307 Palestine (Nablus) J-BY111710 —— Hg38 .BAM FTDNA (Y700) 34X, 18.7 Mbp, 151 bp NA20827 Italy (Firenze) J-CTS3330 —— Hg19…

Continue Reading Y18411 – YFull YTree Info

Difference between knownGene and wgEncodeGencodeCompV39

Hi: I am a bit confuse with the the relationship/difference between knownGene and wgEncodeGencodeCompV39 on UCSC Table Browser. Anyone know the precise difference between them? They both can be downloaded from the goldenPath page. knownGene: The schema is here, which is NOT match the file (knownGene.txt.gz) I downloaded. According to…

Continue Reading Difference between knownGene and wgEncodeGencodeCompV39

Bioconductor Package Installation

When I try to install the gtf for hg38 BiocManager::install(“TxDb.Hsapiens.UCSC.hg38.knownGene”) I get the following error: ‘getOption(“repos”)’ replaces Bioconductor standard repositories, see ‘?repositories’ for details replacement repositories: CRAN: cran.rstudio.com/ Bioconductor version 3.14 (BiocManager 1.30.16), R 4.1.2 (2021-11-01) Installing package(s) ‘TxDb.Hsapiens.UCSC.hg38.knownGene’ Error in readRDS(dest) : error reading from connection Per stackoverflow.com/questions/67455984/getoptionrepos-replaces-bioconductor-standard-repositories-see-reposito I…

Continue Reading Bioconductor Package Installation

M8498 – YFull YTree Info

B-M8498 – YFull YTree Info Sample ID Country / Language Info Ref File Testing company Statistics Status YF004283 Saudi Arabia B-M8498* —— Hg19 .BAM FTDNA (Y500) 43X, 13.7 Mbp, 165 bp HGDP00992 Namibia B-M7650* —— Hg38 .BAM Scientific 18X, 23.5 Mbp, 151 bp YF013963 —— B-Y82361 —— Hg38 .BAM FTDNA…

Continue Reading M8498 – YFull YTree Info

FGC15109 – YFull YTree Info

I-FGC15109 – YFull YTree Info SNPs currently defining I-FGC15109 FGC15109     Sample ID Country / Language Info Ref File Testing company Statistics Status SZ43 Hungary (Somogy) I-BY138* —— Hg19 .BAM Ancient 8X, 22.8 Mbp, 32 bp YF010533 —— I-BY138* —— Hg19 .BAM FTDNA (Y500) 73X, 14.9 Mbp, 165 bp YF019250…

Continue Reading FGC15109 – YFull YTree Info

bedtools -u not giving unique files

bedtools -u not giving unique files 1 The following are the steps Im following: First step to extract sample using bed file is this (here the bedfile is input bedfile converted to Hg38): tabix -h -R Hg19_to_Hg38_sorted.bed.gz gnomad.genomes.v{g_version}.hgdp_tgp.chr{chr}.vcf.bgz | perl {vcftools} -c {sample_name} > {sample_name}_out.vcf’ output({sample_name}_out.vcf’) chr2 113982416 rs56177103 TATAAAATAAAATAAA…

Continue Reading bedtools -u not giving unique files

Pathway analysis of RNAseq data using goseq package

Hello, I have finished the RNA seq analysis and I am trying to perform some pathway analysis. I have used the gage package and I was looking online about another package called goseq that takes into account length bias. However, when I run the code I get an error. How…

Continue Reading Pathway analysis of RNAseq data using goseq package

FGC19851 – YFull YTree Info

R-FGC19851 – YFull YTree Info SNPs currently defining R-FGC19851 FGC19851     Sample ID Country / Language Info Ref File Testing company Statistics Status YF072967 United States (Georgia) R-FGC19851* —— Hg38 .BAM FTDNA (Y700) 34X, 18.7 Mbp, 151 bp YF009427 —— R-FGC65264* —— Hg19 .BAM FTDNA (Y500) 38X, 12.8 Mbp, 165…

Continue Reading FGC19851 – YFull YTree Info

Transcriptional kinetics and molecular functions of long noncoding RNAs

Ethical compliance The research carried out in this study has been approved by the Swedish Board of Agriculture, Jordbruksverket: N343/12. Cell culture Mouse primary fibroblasts were derived from adult (>10 weeks old) CAST/EiJ × C57BL/6J or C57BL/6J × CAST/EiJ mice by skinning, mincing and culturing tail explants (for at least 10 d) in DMEM high…

Continue Reading Transcriptional kinetics and molecular functions of long noncoding RNAs

FGC35106 – YFull YTree Info

Sample ID Country / Language Info Ref File Testing company Statistics Status YF016938 Saudi Arabia (Ar Riyāḍ) J-FGC35106 YF081770 | J-FGC35106*, J-FGC58682* Hg38 .BAM FTDNA (Y500) 30X, 11.5 Mbp, 151 bp YF016937 Saudi Arabia (Ar Riyāḍ) J-FGC35106 YF081769 | J-FGC35106*, J-FGC58682* Hg38 .BAM FTDNA (Y500) 37X, 12.5 Mbp, 151 bp…

Continue Reading FGC35106 – YFull YTree Info

Bioconductor – TAPseq

DOI: 10.18129/B9.bioc.TAPseq     This package is for version 3.12 of Bioconductor; for the stable, up-to-date release version, see TAPseq. Targeted scRNA-seq primer design for TAP-seq Bioconductor version: 3.12 Design primers for targeted single-cell RNA-seq used by TAP-seq. Create sequence templates for target gene panels and design gene-specific primers using…

Continue Reading Bioconductor – TAPseq

YP4024 – YFull YTree Info

Sample ID Country / Language Info Ref File Testing company Statistics Status ERS2478532 Turkmenistan Q-YP4024* —— Hg19 .BAM Scientific 17X, 16.7 Mbp, 151 bp YF006625 Russia (Tomskaya oblast’) / Selkup Q-YP4024* —— Hg19 .BAM FTDNA (Y500) 67X, 14.8 Mbp, 165 bp DA162 Russia (Severnaya Osetiya-Alaniya, Respublika) Q-BZ5214* —— Hg19 .BAM…

Continue Reading YP4024 – YFull YTree Info

Bioconductor – branchpointer

DOI: 10.18129/B9.bioc.branchpointer     Prediction of intronic splicing branchpoints Bioconductor version: Release (3.14) Predicts branchpoint probability for sites in intronic branchpoint windows. Queries can be supplied as intronic regions; or to evaluate the effects of mutations, SNPs. Author: Beth Signal Maintainer: Beth Signal <b.signal at garvan.org.au> Citation (from within R,…

Continue Reading Bioconductor – branchpointer

Y570 – YFull YTree Info

Sample ID Country / Language Info Ref File Testing company Statistics Status AF2 —— Q-Y570 Q-Y570*, Q-F746* Hg19 .BAM Ancient 1X, 1.3 Mbp, 94 bp YF093124 —— Q-M120* —— Hg38 .BAM Nebula Genomics 57X, 23.6 Mbp, 150 bp Kolyma1 Russia (Sakha, Respublika [Yakutiya]) Q-Y222276* —— Hg19 .BAM Ancient 7X, 13.4…

Continue Reading Y570 – YFull YTree Info

use tcgabiolinks package to download TCGA data

TCGA Data download in terms of ease of use ,RTCGA The bag should be better , And because it’s already downloaded data , The use is relatively stable . But also because of the downloaded data , There is no guarantee that the data is new .TCGAbiolinks The package is…

Continue Reading use tcgabiolinks package to download TCGA data

PF6747 – YFull YTree Info

E-PF6747 – YFull YTree Info Sample ID Country / Language Info Ref File Testing company Statistics Status YF010216 Azerbaijan (Qəbələ) E-PF6747* —— Hg19 .BAM FTDNA (Y500) 50X, 13.7 Mbp, 165 bp YF064736 Egypt (Al Minūfīyah) E-FT97857* —— Hg38 .BAM FTDNA (Y700) 35X, 18.5 Mbp, 151 bp YF093064 Yemen (Tā’izz) E-Y280593…

Continue Reading PF6747 – YFull YTree Info

Comprehensive circRNA Analyses in Human Vertebrae of GIOP and Its Molecular Mechanism

Circular RNAs (circRNAs) are a novel class of noncoding RNAs that play important roles in human diseases. However, the regulation of circRNAs in glucocorticoid-induced osteoporosis (GIOP) has not been reported. In this study, we performed high-throughput sequencing to identify altered circRNAs in the vertebrae from GIOP patients. A total of…

Continue Reading Comprehensive circRNA Analyses in Human Vertebrae of GIOP and Its Molecular Mechanism

Variant #0000726648 (NC_000017.10:g.7100169G>A, ACADVL(NM_000018.3):c.-23135G>A) – Global Variome shared LOVD

Variant #0000726648 (NC_000017.10:g.7100169G>A, ACADVL(NM_000018.3):c.-23135G>A) Chromosome 17 Allele Unknown Affects function (as reported) Effect unknown Affects function (by curator) Not classified Classification method – Clinical classification VUS DNA change (genomic) (Relative to hg19 / GRCh37) g.7100169G>A DNA change (hg38) – Published as DLG4(NM_001321075.2):c.990C>T (p.G330=) ISCN – DB-ID DLG4_000038 Variant remarks VKGL data sharing initiative Nederland Reference – ClinVar ID – dbSNP ID – Origin CLASSIFICATION record Segregation – Frequency – Re-site –…

Continue Reading Variant #0000726648 (NC_000017.10:g.7100169G>A, ACADVL(NM_000018.3):c.-23135G>A) – Global Variome shared LOVD

Variant #0000803285 (NC_000007.13:g.92730753A>G, SAMD9(NM_017654.3):c.4658T>C) – Global Variome shared LOVD

Variant #0000803285 (NC_000007.13:g.92730753A>G, SAMD9(NM_017654.3):c.4658T>C) Chromosome 7 Allele Unknown Affects function (as reported) Effect unknown Affects function (by curator) Not classified Classification method – Clinical classification VUS DNA change (genomic) (Relative to hg19 / GRCh37) g.92730753A>G DNA change (hg38) – Published as SAMD9(NM_017654.3):c.4658T>C (p.I1553T), SAMD9(NM_017654.4):c.4658T>C (p.I1553T) ISCN – DB-ID SAMD9_000024 See all 3 reported entries Variant remarks VKGL data sharing initiative Nederland Reference – ClinVar ID – dbSNP ID – Origin CLASSIFICATION…

Continue Reading Variant #0000803285 (NC_000007.13:g.92730753A>G, SAMD9(NM_017654.3):c.4658T>C) – Global Variome shared LOVD

Z2039 – YFull YTree Info

Sample ID Country / Language Info Ref File Testing company Statistics Status YF003382 Finland (Länsi-Suomen lääni) I-Z2040* —— Hg19 .BAM FTDNA (Y500) 47X, 13.3 Mbp, 165 bp YF067917 Ireland I-FGC69701* —— Hg19 .BAM Dante Labs 9X, 22.9 Mbp, 151 bp YF078735 Belarus (Vicebskaja voblasc’) / Polish I-FGC69702 —— Hg38 .VCF…

Continue Reading Z2039 – YFull YTree Info

BY7447 – YFull YTree Info

E-BY7447 – YFull YTree Info SNPs currently defining E-BY7447 BY7447     Sample ID Country / Language Info Ref File Testing company Statistics Status YF075635 Yemen (Al Bayḑā’) E-FT183181 —— Hg38 .BAM FTDNA (Y700) 39X, 18.2 Mbp, 151 bp YF067501 Yemen (Şan’ā’) E-FT183181 —— Hg38 .BAM FTDNA (Y700) 44X, 18.8 Mbp,…

Continue Reading BY7447 – YFull YTree Info

DF109 – YFull YTree Info

Sample ID Country / Language Info Ref File Testing company Statistics Status YF016926 Ireland R-DF109 R-DF109*, R-A18726* Hg38 .BAM FTDNA (Y500) 27X, 12.7 Mbp, 165 bp YF016394 United States (Ohio) R-DF109 R-DF109*, R-A18726* Hg38 .BAM FTDNA (Y500) 34X, 11.9 Mbp, 151 bp YF011566 Ireland (Mayo) R-DF109 R-DF109*, R-A18726*, R-FGC23742* Hg38…

Continue Reading DF109 – YFull YTree Info

ZP77 – YFull YTree Info

R-ZP77 – YFull YTree Info SNPs currently defining R-ZP77 ZP77 / FGC6562     Sample ID Country / Language Info Ref File Testing company Statistics Status YF008362 —— R-ZP77* —— Hg19 .BAM FTDNA (Y500) 41X, 13.8 Mbp, 165 bp YF067652 Unknown R-BY40744 —— Hg38 .BAM FTDNA (Y700) 36X, 18.7 Mbp, 151…

Continue Reading ZP77 – YFull YTree Info

Download full list of SNPs and their coordinates in hg38

Download full list of SNPs and their coordinates in hg38 3 What is the best / standard place to get a full list of SNPs and their coordinates in hg38? I downloaded the SNPsnap database, but just realized that those coordinates are in hg19. I’m trying to figure out how…

Continue Reading Download full list of SNPs and their coordinates in hg38

htseq-count -t gene not working

I found a little problem. When I set the “-t gene”, the reads is mark “__no_feature”. But when I set the “-t exon”, the reads is mark “ENSG00000276104”. The gene “ENSG00000276104” is a single exon gene. I don’t know why this happens. reads: “TGTCTGTGGCGGTGGGATCCCGCGGCCGTGTTTTCCTGGTGGCCCGGCCGTGCCTGAGGTTTCTCCCCGAGCCGCCGCCTCTGCGGGCTCCCGGGTGCCCTTGCCCTCGCGGTCCCCGGCCCTCGCCCGTCTGTGCCCTCTTCCCCGCCCGCCGATCCTCTTCTTCCCCCCGAGCGGCTCACCGGCTTCACGTCCGTTGGTGGCCCCGCCTGGGAC”. I had aligned to hg38 by…

Continue Reading htseq-count -t gene not working

Bioconductor – ChIPQC

    This package is for version 3.1 of Bioconductor; for the stable, up-to-date release version, see ChIPQC. Quality metrics for ChIPseq data Bioconductor version: 3.1 Quality metrics for ChIPseq data Author: Tom Carroll, Wei Liu, Ines de Santiago, Rory Stark Maintainer: Tom Carroll <tc.infomatics at gmail.com>, Rory Stark <rory.stark…

Continue Reading Bioconductor – ChIPQC

hg38 Import custom reference upload error

Our version of TS is 5.12.2 When trying to upload new custom reference fasta (downloaded from ncbi ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz, gunzipped and renamed to hg38.fasta) through “Import custom reference” in interface an error occures: “uploaded file size is incorrect” (to be honest the error was not shown in logs, because of TypeError…

Continue Reading hg38 Import custom reference upload error

help with CrossMap

help with CrossMap 0 Hello all, I would really appreciate your help as I am new to working with different file builds and having a setback lifting a vcf file from build hg38 to hg19. in essence, using CrossMap the chromosome value gets altered. Like for example, below is the…

Continue Reading help with CrossMap

Systems biology analysis of human genomes points to key pathways conferring spina bifida risk

Significance Genetic investigations of most structural birth defects, including spina bifida (SB), congenital heart disease, and craniofacial anomalies, have been underpowered for genome-wide association studies because of their rarity, genetic heterogeneity, incomplete penetrance, and environmental influences. Our systems biology strategy to investigate SB predisposition controls for population stratification and avoids…

Continue Reading Systems biology analysis of human genomes points to key pathways conferring spina bifida risk

Padding out a GVCF file with 1000G exomes to get gatk VariantRecalibrator working with a small sample

I’ve got sequencing data for a small 500 bp amplicon from a few samples. GATK best principles suggest running VariantRecalibrator on the GVCF files I generate. I’m trying to get this working, but I get an error about “Found annotations with zero variances”. Reading the gatk manual and other posts…

Continue Reading Padding out a GVCF file with 1000G exomes to get gatk VariantRecalibrator working with a small sample

computeMatrix in deeptool is Running with no result

computeMatrix in deeptool is Running with no result 0 Hi All, I wonder if someone can help me in explaining what to input on the -R <bed file> argument of the code below? computeMatrix scale-regions -S <bigwig file(s)> -R <bed file> -b 1000 what I did for example, I download…

Continue Reading computeMatrix in deeptool is Running with no result

NoClassDefFoundError: htsjdk/samtools/util/IntervalTree

NoClassDefFoundError: htsjdk/samtools/util/IntervalTree 0 When I run circm6A (github.com/canceromics/circm6a) example code: cd ../.. java -Xmx16g -jar circm6a.jar -ip test_data/HeLa_eluate_rep_1.chr22.bam -input test_data/HeLa_input_rep_1.chr22.bam -r test_data/gencode_chr22.gtf -g test_data/hg38_chr22.fa -o test_data/example_Hela The following error occurred: Start at 2021-12-12 16:33:26 Exception in thread “main” java.lang.NoClassDefFoundError: htsjdk/samtools/util/IntervalTree at main.Method.loadGenes(Method.java:200) at main.Method.run(Method.java:66) at main.Main.main(Main.java:9) Caused by: java.lang.ClassNotFoundException: htsjdk.samtools.util.IntervalTree…

Continue Reading NoClassDefFoundError: htsjdk/samtools/util/IntervalTree

transcripts are not true in TxDb.Hsapiens.UCSC.hg38.knownGene

transcripts are not true in TxDb.Hsapiens.UCSC.hg38.knownGene 1 @11b02720 Last seen 2 hours ago United States Hello, I used TxDb.Hsapiens.UCSC.hg38.knownGene/GenomicFeatures to retrieve gene promoters and other genomic features. here is code: library(‘TxDb.Hsapiens.UCSC.hg38.knownGene’) txdb <- TxDb.Hsapiens.UCSC.hg38.knownGene PR <- promoters(txdb, upstream=2000, downstream=0) but when I take a look at the PR results: it…

Continue Reading transcripts are not true in TxDb.Hsapiens.UCSC.hg38.knownGene

gatk VariantRecalibrator positional argument error

I’m trying to use recalibrate my vcf using gatk VariantRecalibrator, but keep getting an error “Illegal argument value: Positional arguments were provided”. But I don’t know what this means, or how to correct it! Here’s my call: gatk VariantRecalibrator -R “/Volumes/Seagate Expansion Drive/refs/hg38/gatk download/Homo_sapiens_assembly38.fasta” -V “$OUT”/results/variants/”$SN”.norm.vcf.gz -AS –resource hapmap,known=false,training=true,truth=true,prior=15.0: “/Volumes/Seagate…

Continue Reading gatk VariantRecalibrator positional argument error

What is the single nucleotide polymorphism database ( dbsnp )?

The Single Nucleotide Polymorphism Database (dbSNP) is a free public archive for genetic variation within and across different species developed and hosted by the National Center for Biotechnology Information (NCBI) in collaboration with the National Human Genome Research Institute (NHGRI). Furthermore, are there any databases for single nucleotide polymorphisms?As there…

Continue Reading What is the single nucleotide polymorphism database ( dbsnp )?

Removing uncovered transcripts from multi FASTA reference file

Removing uncovered transcripts from multi FASTA reference file 0 Hi everyone 🙂 for RNASeq analyses, I have a reference file, containing multiple transcript sequences (it´s a subset of the NCBI human hg38 transcriptome). I found, that some of the transcripts are not even covered by a single read (especially if…

Continue Reading Removing uncovered transcripts from multi FASTA reference file

The Biostar Herald for Tuesday, September 21, 2021

The Biostar Herald publishes user submitted links of bioinformatics relevance. It aims to provide a summary of interesting and relevant information you may have missed. You too can submit links here. This edition of the Herald was brought to you by contribution from Istvan Albert, and was edited by Istvan…

Continue Reading The Biostar Herald for Tuesday, September 21, 2021

I can’t get a dossage file using PLINK

Hi, I have been trying to get a dosage file from vcf, map and fam files. For that, I have written this bash script : plink –fam plink.fam –map plink.map –dosage one.vcf –write-dosage However, I got this error: –dosage: Reading from one.vcf. Error: Line 1 of one.vcf has fewer tokens…

Continue Reading I can’t get a dossage file using PLINK

What is the codification in genestrand 1 and 2?

What is the codification in genestrand 1 and 2? 0 Hi there, I’m doing some peak annotation using ChIPseeker library(ChIPseeker) library(TxDb.Hsapiens.UCSC.hg38.knownGene) library(clusterProfiler) library(annotables) library(org.Hs.eg.db) txdb <- TxDb.Hsapiens.UCSC.hg38.knownGene peaks= readPeakFile(“peaks_”, header = F) peakAnno <- annotatePeak(peaks, tssRegion=c(-3000, 3000), TxDb=txdb, annoDb=”org.Hs.eg.db”) peaks_annot <- as.data.frame(peakAnno) In my annotation file “geneStrand” is codified as…

Continue Reading What is the codification in genestrand 1 and 2?

Best tools for calling structural variants from 2 assemblies?

Best tools for calling structural variants from 2 assemblies? 0 Dear community, I have the fasta files of 2 assemblies of the human genome (for example hg19 and hg38). What would be the best tools to call structural variants from these 2 fasta files? Most of the tools I know…

Continue Reading Best tools for calling structural variants from 2 assemblies?

python – snakemake multiple parameters for multiple input and single output in snakemake. ConbineGVCFs gatk problem

I have written a rule for CombineGVCFs in gatk4. The rule is as follow all_gvcf = get_all_gvcf_list() rule cohort: input: all_gvcf_list = all_gvcf, ref=”/data/refgenome/hg38.fa”, interval_list = prefix+”/bedfiles/hg38.interval_list”, params: extra = “–variant”, output: prefix+”/vcf/cohort.g.vcf”, shell: “gatk CombineGVCFs -R {input.ref} {params.extra} {input.all_gvcf_list} -O {output} –tmp-dir=/data/tmp -L {input.interval_list}” all_gvcf is the dataset for…

Continue Reading python – snakemake multiple parameters for multiple input and single output in snakemake. ConbineGVCFs gatk problem

Alternate nucleotide is more frequent than reference nucleotide. OMG I’m dizzy. How do I stop the twirl?

This is due to the fact that the very reference genomes that we use for re-alignment are themselves based on individuals who carry rare risk alleles. Thus, when we call variants against these genomes, we are, at many loci, comparing against rare disease risk alleles. As the best/worst example (depending…

Continue Reading Alternate nucleotide is more frequent than reference nucleotide. OMG I’m dizzy. How do I stop the twirl?

mixing hg38 and GRCh38 during variant calling

mixing hg38 and GRCh38 during variant calling 0 Hello everyone! I’ve been working on a variant calling pipeline for WES data and used a mix of hg38 and GRCh38 reference files after reading that hg38 is just an abbreviation of GRCh38, and that they refer to the same thing. But…

Continue Reading mixing hg38 and GRCh38 during variant calling

SNP exon region UCSC

SNP exon region UCSC 2 how i can get SNP in only exons regions genome with UCSC? UCSC get the all SNP of gene region, and there is no filter option to get only exon region. tx ucsc SNP exon • 245 views • link updated 2 hours ago by…

Continue Reading SNP exon region UCSC

ZhaozzReal/SNV_IPA: Detect SNV-associated intronic polyadenylation events from standard RNAseq data

Description Somatic single nucleotide variants (SNVs) in cancer genome affect gene expression through various mechanisms depending on their genomic location. In this study, we found that somatic SNVs near splice site are associated with abnormal intronic polyadenylation (IPA) . Here we give examples to show how to detect SNV-associated IPA…

Continue Reading ZhaozzReal/SNV_IPA: Detect SNV-associated intronic polyadenylation events from standard RNAseq data

Where can I get ?or how can I make a mappability track for hg38 assembly

Where can I get ?or how can I make a mappability track for hg38 assembly 2 Lucky you @manojmumar_bhosale I worked on similar problem recently and therefore have the bash script you can use. Required tools: GEM libary from here UCSC’s wigToBigWig from here (I chose binary for Linux 64…

Continue Reading Where can I get ?or how can I make a mappability track for hg38 assembly

How to load user-defined genome in IGV-webapp

How to load user-defined genome in IGV-webapp 0 I would like to create a session in IGV-webapp using a HTML file. The following works with pre-defined genomes (g.e. genome: “hg38”), but I would like to load my own genome. Is there a way to achieve this? <!DOCTYPE html> <html lang=”en”>…

Continue Reading How to load user-defined genome in IGV-webapp

UCSC knownCanonical hg19 vs. hg38

Hello, We have an FAQ page that covers this topic (genome.ucsc.edu/FAQ/FAQgenes.html#singledownload). As posted by ATpoint, it boils down to different datasets and different approaches. hg19 knownCanonical was last updated in 2013 and built primarily from RefSeq and GenBank sequences and a few other sources. One isoform was identified from each…

Continue Reading UCSC knownCanonical hg19 vs. hg38

Get rsID for a list of SNPs in an entire GWAS sumstats file

Here is a fairly efficient way to do this; assuming hg38 and BEDOPS and standard Unix tools installed. $ bedmap –echo –echo-map-id –delim ‘t’ <(awk ‘{n=split($0,a,/[:_]/); print “chr”a[1]”t”a[2]”t”a[2]+1″t”a[3]”https://www.biostars.org/”a[4];}’ sumstats.txt | sort-bed -) <(wget -qO- hgdownload.cse.ucsc.edu/goldenPath/hg38/database/snp150.txt.gz | gunzip -c | cut -f2-5 | sort-bed -) > answer.bed This gets around making…

Continue Reading Get rsID for a list of SNPs in an entire GWAS sumstats file

UCSC liftover

UCSC liftover 2 Hi, I’m using UCSC liftover to convert hg19 to hg38. The result came out that I don’t understand. Feb. 2009 (GRCh37/hg19) → Dec. 2013 (GRCh38/hg38) – chr1:120904787 → chr1:143905854 Dec. 2013 (GRCh38/hg38) → Feb. 2009 (GRCh37/hg19) – chr1:143905854 → chr1:149400430 (I didn’t check “Allow multiple output regions”.)…

Continue Reading UCSC liftover

Paired-end reads reported without mates: how to play matchmaker?

Hi Everyone, I am currently looking at Acute Myeloid Leukemia (AML) paired-end WGS samples from the TARGET data ocg.cancer.gov/programs/target/target-methods#3241. A bioinformatician in our group remapped the samples from hg19 to hg38. Unfortunately, we do not have any copies of the hg19 version anymore. However, when I try to run anything…

Continue Reading Paired-end reads reported without mates: how to play matchmaker?

Coverage drops in fastq alignment against custom Immunoglobulin reference

Coverage drops in fastq alignment against custom Immunoglobulin reference 0 I am working on Hiseq2000/2500 single end reads on RNASeq leukemia samples. I am interested in aligning all the reads beloging to the Immunoglobulin genes (Ig) for further analysis. The task is difficult for two main reasons: Final Ig genes…

Continue Reading Coverage drops in fastq alignment against custom Immunoglobulin reference

vcf file analysis

vcf file analysis 0 Hello everyone, I have 22 vcf file for each chr. They were in genome build hg19 so I did a liftover and convert them to hg38 genome build. Now I need just chrom and position values from these vcf files and merge them together into a…

Continue Reading vcf file analysis

Bioconductor – BSgenome.Hsapiens.UCSC.hg38.dbSNP151.major

DOI: 10.18129/B9.bioc.BSgenome.Hsapiens.UCSC.hg38.dbSNP151.major     Full genome sequences for Homo sapiens (UCSC version hg38, based on GRCh38.p12) with injected major alleles (dbSNP151) Bioconductor version: Release (3.13) Full genome sequences for Homo sapiens (Human) as provided by UCSC (hg38, based on GRCh38.p12) with major allele injected from dbSNP151, and stored in Biostrings…

Continue Reading Bioconductor – BSgenome.Hsapiens.UCSC.hg38.dbSNP151.major

tool or database to convert Gene ID to genomic position

tool or database to convert Gene ID to genomic position 1 Hello.I have lots of Pseudogene IDs like LOC100431174 but none of the below methods worked for me to find their genomic position “offline”. I need a table or package to do it offline without querying to a webpage.methods I…

Continue Reading tool or database to convert Gene ID to genomic position

unable to find chromosome in SAM header

featureCounts: unable to find chromosome in SAM header 0 I am using featureCounts to try and create a count table for some RNA-Seq data I collected using an Oxford Nanopore platform. I have .sam files aligned with minimap2, and am running the following command to try to get a count…

Continue Reading unable to find chromosome in SAM header

miRNAseq analysis not shown adapter sequence and huge N’s content

miRNAseq analysis not shown adapter sequence and huge N’s content 0 Hi there, This is my third time doing miRNA sequencing analysis, so i do not have huge experience on this… So, i have 18 human semen samples, (also no experience in this type samples) i have been reading alot…

Continue Reading miRNAseq analysis not shown adapter sequence and huge N’s content

Predicting and characterizing a cancer dependency map of tumors with deep learning

INTRODUCTION The development of novel cancer therapies requires knowledge of specific biological pathways to target individual tumors and eradicate cancer cells. Toward this goal, the landscape of genetic vulnerabilities of cancer, or the cancer dependency map, is being systematically profiled. Using RNA interference (RNAi) loss-of-function screens, Marcotte et al. (1),…

Continue Reading Predicting and characterizing a cancer dependency map of tumors with deep learning

liftover using genome browser

liftover using genome browser 0 Hello everyone, I have a file which is hg38 build. I want to do a liftover and change it to hg19. I thought of using liftover tool from UCSC genome browser. I realise that the input file should be bed format. My file has only…

Continue Reading liftover using genome browser

VariantRecalibrator no positional argument is defined for this tool.

Hi, I am trying to run the following command: gatk VariantRecalibrator -R genome.fa -V all.Sample.SNP.vcf.gz –trust-all-polymorphic -tranche 100.0 -tranche 99.95 -tranche 99.9 -tranche 99.8 -tranche 99.6 -tranche 99.5 -tranche 99.4 -tranche 99.3 -tranche 99.0 -tranche 98.0 -tranche 97.0 -tranche 90.0 -an MQRankSum -an ReadPosRankSum -an FS -an MQ -an SOR…

Continue Reading VariantRecalibrator no positional argument is defined for this tool.

Get chromosome sizes from fasta file

Get chromosome sizes from fasta file 4 Hello, I’m wondering whether there is a program that could calculate chromosome sizes from any fasta file? The idea is to generate a tab file like the one expected in bedtools genomecov for example. I know there’s the fetchChromSize program from UCSC, but…

Continue Reading Get chromosome sizes from fasta file

Contig chr1 given as location, but this contig isn’t present in the Fasta sequence dictionary

Badly formed genome unclippedLoc: Contig chr1 given as location, but this contig isn’t present in the Fasta sequence dictionary 2 Hi everyone, I’m trying to run Mutect2 for WES cancer data. However, since their Resource bundle only supports h19 seems I cannot proceed (I want to compare it with Strelka2…

Continue Reading Contig chr1 given as location, but this contig isn’t present in the Fasta sequence dictionary

Using MACS2 parameters

Using MACS2 parameters 0 Trying to reproduce a galaxy training in Linux CLI. I’ve come up with the following commands for the peak calling with MACS2. Am I on the right track? The galaxy parameters are- macs2 command can be- macs2 callpeak -t input_file.bed -n macs_output -g 50818468 –nomodel –shift…

Continue Reading Using MACS2 parameters

Non-repeat human genome dataset

Non-repeat human genome dataset 1 Could anyone please point me to where I could find a dataset of non-repeat sequences for the human ref genome. I’m not sure if it’s still regarded as true, but I saw that possibly 2/3 of the human genome contains repeats. Is there a place…

Continue Reading Non-repeat human genome dataset

VCF file phasing by SHAPEIT

Hi everybody, I would like to phase (just phasing, not imputation) vcf file containing about 1100 individuals (a given human population) derived from whole genome sequencing, the vcf file obtained by GATK. As I searched, SHAPEIT was mostly used; based on its manual, it requires genetic map for phasing, however,…

Continue Reading VCF file phasing by SHAPEIT

Finding 16 mer not present in GRCh38

Thanks for the question – it has kept me busy this Sunday morning / afternoon. As implied by others, this poses a computational challenge but is not insurmountable. For motif searching generally, I usually use AWK. My approach here was to: generate all possible k-mers of the chosen size (run…

Continue Reading Finding 16 mer not present in GRCh38

question about running CIRI-full

question about running CIRI-full 1 I’m using ciri-full to calculate the full length sequence of circRNAs ,and I can run the test data set successfully, but I can’t run my own data running test data set: java -jar ../CIRI-full.jar Pipeline -1 test_1.fq.gz -2 test_2.fq.gz -a test_anno.gtf -r test_ref.fa -d test_output/…

Continue Reading question about running CIRI-full

VCF to 23 and Me format and changing ensamble reference help needed for underestanding VCF

Hello i am trying to change my nebula Genomics report to 23 and me Format i have to problems nebula uses 38 human ensemble and 23 and me 37, I was thinking to do a python script but i have some doubts: My plan was to change the genotype according…

Continue Reading VCF to 23 and Me format and changing ensamble reference help needed for underestanding VCF