Tag: GRCH38

variant – Where should you put you cache for ensembl-vep using conda

I’ve installed vep in conda like so: conda install ensembl-vep=105.0-0 And then I installed the human cache like so: vep_install -a cf -s homo_sapiens -y GRCh38 -c /mnt/gpfs/live/rd01__/ritd-ag-project-rd018o-mdflo13/refs/vep –CONVERT But when I try and run vep I get an error: vep –dir_cache /mnt/gpfs/live/rd01__/ritd-ag-project-rd018o-mdflo13/refs/vep -i /mnt/gpfs/live/rd01__/ritd-ag-project-rd018o-mdflo13/data/test/manual/results/variants/cohort.norm_recalibrated.vcf -o /mnt/gpfs/live/rd01__/ritd-ag-project-rd018o-mdflo13/data/test/manual/results/variants/cohort.norm_recalibrated_vep.vcf Am I doing…

Continue Reading variant – Where should you put you cache for ensembl-vep using conda

linux – How to fix Perl from anaconda not installing bioperl? Bailing out the installation for BioPerl

vep -i examples/homo_sapiens_GRCh38.vcf –database Can’t locate Bio/PrimarySeqI.pm in @INC (you may need to install the Bio::PrimarySeqI module) (@INC contains: /home/youssef/anaconda3/envs/ngs1/share/ensembl-vep-88.9-0/modules /home/youssef/anaconda3/envs/ngs1/share/ensembl-vep-88.9-0 /home/youssef/anaconda3/envs/ngs1/lib/site_perl/5.26.2/x86_64-linux-thread-multi /home/youssef/anaconda3/envs/ngs1/lib/site_perl/5.26.2 /home/youssef/anaconda3/envs/ngs1/lib/5.26.2/x86_64-linux-thread-multi /home/youssef/anaconda3/envs/ngs1/lib/5.26.2 .) at /home/youssef/anaconda3/envs/ngs1/share/ensembl-vep-88.9-0/Bio/EnsEMBL/Slice.pm line 75. BEGIN failed–compilation aborted at /home/youssef/anaconda3/envs/ngs1/share/ensembl-vep-88.9-0/Bio/EnsEMBL/Slice.pm line 75. Compilation failed in require at /home/youssef/anaconda3/envs/ngs1/share/ensembl-vep-88.9-0/Bio/EnsEMBL/Feature.pm line 84. BEGIN failed–compilation aborted at /home/youssef/anaconda3/envs/ngs1/share/ensembl-vep-88.9-0/Bio/EnsEMBL/Feature.pm…

Continue Reading linux – How to fix Perl from anaconda not installing bioperl? Bailing out the installation for BioPerl

links to Ensembl GRCh37 – gitmetadata

Open Targets Genetics reports GRCh38 coordinates but ‘External references” section points to GRCh37 (grch37.ensembl.org) rather than GRCh38 (www.ensembl.org): genetics.opentargets.org/variant/8_102432699_T_C Was this a deliberate decision (e.g. we don’t have the rsID in GRCh38 for some reason, other)? If so, we need to make this clear in the docs. If not, we…

Continue Reading links to Ensembl GRCh37 – gitmetadata

Failure to detect mutations in U2AF1 due to changes in the GRCh38 reference sequence

Materials and Methods Genomic data was collected as part of the MDS National History Study or The Cancer Genome Atlas project and consented appropriately under those protocols 8 Sekeres M.A. Gore S.D. Stablein D.M. DiFronzo N. Abel G.A. DeZern A.E. Troy J.D. Rollison D.E. Thomas J.W. Waclawiw M.A. Liu J.J….

Continue Reading Failure to detect mutations in U2AF1 due to changes in the GRCh38 reference sequence

Protocols

normalization data transformation protocol 10X Genomics Visium sequencing data were aligned and quantified using the Space Ranger Software Suite (version 1.0.0, 10x Genomics Inc) using the GRCh38 human reference genome (official Cell Ranger reference, version 3.0.0). Spots were manually aligned to the paired H&E images by 10x Genomics. nucleic acid…

Continue Reading Protocols

[lh3/minimap2] Memory leak when using Python and threads

The program align.py uses mappy to align reads in Python using multiple worker threads. After loading the index the memory usage jumps up quickly to >20Gb and then continues to climb steadily through 40Gb an beyond. This issue was first discovered in bonito and isolated to mappy. The data flow…

Continue Reading [lh3/minimap2] Memory leak when using Python and threads

VEP issue: ERROR: Cache assembly version (GRCh37) and database or selected assembly version (GRCh38) do not match

Describe the issue VEP give errors even my query and reference has same assembly version Command :$: ./vep -i examples/homo_sapiens_GRCh37.vcf –cache –refseq cache reference details while running install.pl ? 458 NB: Remember to use –refseq when running the VEP with this cache! downloading ftp.ensembl.org/pub/release-104/variation/indexed_vep_cache/homo_sapiens_refseq_vep_104_GRCh37.tar.gz unpacking homo_sapiens_refseq_vep_104_GRCh37.tar.gz converting cache, this may…

Continue Reading VEP issue: ERROR: Cache assembly version (GRCh37) and database or selected assembly version (GRCh38) do not match

hg38 Import custom reference upload error

Our version of TS is 5.12.2 When trying to upload new custom reference fasta (downloaded from ncbi ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz, gunzipped and renamed to hg38.fasta) through “Import custom reference” in interface an error occures: “uploaded file size is incorrect” (to be honest the error was not shown in logs, because of TypeError…

Continue Reading hg38 Import custom reference upload error

Bioconductor – BSgenome.Hsapiens.NCBI.GRCh38

DOI: 10.18129/B9.bioc.BSgenome.Hsapiens.NCBI.GRCh38     This package is for version 3.11 of Bioconductor; for the stable, up-to-date release version, see BSgenome.Hsapiens.NCBI.GRCh38. Full genome sequences for Homo sapiens (GRCh38) Bioconductor version: 3.11 Full genome sequences for Homo sapiens (Human) as provided by NCBI (GRCh38, 2013-12-17) and stored in Biostrings objects. Author: The…

Continue Reading Bioconductor – BSgenome.Hsapiens.NCBI.GRCh38

Attempting to generate a bam.bai file but the output is not readable

Attempting to generate a bam.bai file but the output is not readable 1 Hi, I am new a exome sequencing, and have tried to follow tutorials on the subject. I am stuck at the samtools index stage because the output files are in a non-human readable format and I believe…

Continue Reading Attempting to generate a bam.bai file but the output is not readable

Why single cell R2 fastq have no read identified by bowtie2 ?

Why single cell R2 fastq have no read identified by bowtie2 ? 0 When we input R2 fastq.gz into bowtie2, human sequence was not removed ( ${base}_host_removed is zero). for i in $(find ./ -type f -name “.fastq.gz” | while read F; do basename $F | rev | cut -c…

Continue Reading Why single cell R2 fastq have no read identified by bowtie2 ?

Trouble running vcf2bam jvarkit tool

Trouble running vcf2bam jvarkit tool 2 I am trying to use the tool called vcf2bam from jvarkit on a server and I have the following 2 files: GRCh38_latest_genomic.fna – the file is of format FASTQ , and 00-common_all.vcf. I used samtools faidx and also picard CreateSequenceDictionary, but when I try…

Continue Reading Trouble running vcf2bam jvarkit tool

cellranger count DETECT_COUNT_CHEMISTRY (failed)

cellranger count DETECT_COUNT_CHEMISTRY (failed) 0 I am learning scRNA-seq and the tutorial I follow uses dataset (1k pbmcs from healthy donor) from 10X genomics website. I downloaded fastq and reference transcriptome files and ran following command. cellranger-6.1.1/cellranger count –id pbmc_1k_v2_example –transcriptome /home/murat/Share/single_cell/refdata-gex-GRCh38-2020-A –fastqs /home/murat/Share/single_cell/pbmc_1k_v2_fastqs I get following message. Martian Runtime…

Continue Reading cellranger count DETECT_COUNT_CHEMISTRY (failed)

Is the Ensembl GRCh38 genome assembly more up to date than the UniProtKB online database?

Dear all, I am working with a list of Ensembl accession codes for a desired group of proteins. I have downloaded the protein annotations related to the genome assembly GRCH38. I fetched the genomic coordinates from UniProtKB API service using the Ensembl accession codes. The service provide a protein annotation…

Continue Reading Is the Ensembl GRCh38 genome assembly more up to date than the UniProtKB online database?

Alternate nucleotide is more frequent than reference nucleotide. OMG I’m dizzy. How do I stop the twirl?

This is due to the fact that the very reference genomes that we use for re-alignment are themselves based on individuals who carry rare risk alleles. Thus, when we call variants against these genomes, we are, at many loci, comparing against rare disease risk alleles. As the best/worst example (depending…

Continue Reading Alternate nucleotide is more frequent than reference nucleotide. OMG I’m dizzy. How do I stop the twirl?

mixing hg38 and GRCh38 during variant calling

mixing hg38 and GRCh38 during variant calling 0 Hello everyone! I’ve been working on a variant calling pipeline for WES data and used a mix of hg38 and GRCh38 reference files after reading that hg38 is just an abbreviation of GRCh38, and that they refer to the same thing. But…

Continue Reading mixing hg38 and GRCh38 during variant calling

snpEFF not able to download GRCH38 ?

snpEFF not able to download GRCH38 ? 2 HI Why snpEff not able to download GRCH38 ? Always its showing error, But its work well with GRCH37 reference. Thanks for your comments. likithreddy@Curium:~/Downloads/snpEff_latest_core/snpEff$ java -jar snpEff.jar download GRCh38.76 java.lang.RuntimeException: Property: ‘GRCh38.76.genome’ not found at org.snpeff.interval.Genome.<init>(Genome.java:106) at org.snpeff.snpEffect.Config.readGenomeConfig(Config.java:681) at org.snpeff.snpEffect.Config.readConfig(Config.java:649) at…

Continue Reading snpEFF not able to download GRCH38 ?

Highly mapped to introns

Highly mapped to introns 0 Hi, I am analyzing RNA-seq data from human blood samples. I checked the read distribution using RSeQC read_distribution after mapping by STAR. Usually, I get more than 80% of reads mapped to exons. However, at this time, the result showed only several % were mapped…

Continue Reading Highly mapped to introns

Editing header of a fasta file

Editing header of a fasta file 1 Hello everybody, I’ve been using sed but for simple steps and now I can’t do this: I have this header: >ENSP00000451042.1 pep chromosome:GRCh38:14:22438547:22438554:1 gene:ENSG00000223997.1 transcript:ENST00000415118.1 gene_biotype:TR_D_gene transcript_biotype:TR_D_gene gene_symbol:TRDD1 description:T cell receptor delta diversity 1 [Source:HGNC Symbol;Acc:HGNC:12254] and I would like to obtein this:…

Continue Reading Editing header of a fasta file

Phasing with SHAPEIT

Edit June 7, 2020: The code below is for pre-phasing with SHAPEIT2. For phased imputation using the output of SHAPEIT2 and ultimate production of phased VCFs, see my answer here: A: ERROR: You must specify a valid interval for imputation using the -int argument, So, the steps are usually: pre-phasing…

Continue Reading Phasing with SHAPEIT

Produce PCA bi-plot for 1000 Genomes Phase III

Note1 – Previous version: Produce PCA bi-plot for 1000 Genomes Phase III in VCF format (old) Note2 – this data is for hg19 / GRCh37 Note3 – GRCh38 data is available HERE The tutorial has been updated based on the 1000 Genomes Phase III imputed genotypes. The original tutorial was…

Continue Reading Produce PCA bi-plot for 1000 Genomes Phase III

Annotate Structural variants with population specific allele frequency values

Annotate Structural variants with population specific allele frequency values 0 Hi, Has anyone tried filtering structural variants based on pupulation specific allele frequency (AF) values (for example gnomAD-SV or phase 3 1000 genome SV)? I have a set of SVs that I detected using a multipronged approach. For prioritising variants,…

Continue Reading Annotate Structural variants with population specific allele frequency values

Where can I get ?or how can I make a mappability track for hg38 assembly

Where can I get ?or how can I make a mappability track for hg38 assembly 2 Lucky you @manojmumar_bhosale I worked on similar problem recently and therefore have the bash script you can use. Required tools: GEM libary from here UCSC’s wigToBigWig from here (I chose binary for Linux 64…

Continue Reading Where can I get ?or how can I make a mappability track for hg38 assembly

Running htseq-count to “grab” long non coding gene_id names

Running htseq-count to “grab” long non coding gene_id names 0 hi all, new to bioinformatics. so bare with me.. I am trying find long non coding RNA from RNA-seq data. As i checked the human gtf file there are 2 different types of long non coding RNA, “lnc_RNA” and “lncRNA”,…

Continue Reading Running htseq-count to “grab” long non coding gene_id names

UCSC liftover

UCSC liftover 2 Hi, I’m using UCSC liftover to convert hg19 to hg38. The result came out that I don’t understand. Feb. 2009 (GRCh37/hg19) → Dec. 2013 (GRCh38/hg38) – chr1:120904787 → chr1:143905854 Dec. 2013 (GRCh38/hg38) → Feb. 2009 (GRCh37/hg19) – chr1:143905854 → chr1:149400430 (I didn’t check “Allow multiple output regions”.)…

Continue Reading UCSC liftover

Fasta.fai file error

Fasta.fai file error 0 Hi, I have been struggling with an error in bedtools intersect. The command I am trying to run is as follows bedtools intersect -a sorted.vcf -b nstd166.GRCh38.variant_call_chr.vcf.gz -wo -sorted -f 0.8 -r -g Homo_sapiens_assembly38.fasta.fai For some of the files that I am assessing, I don’t get…

Continue Reading Fasta.fai file error

How to pass custom software specific variables to nf-core/sarek nextflow pipeline?

How to pass custom software specific variables to nf-core/sarek nextflow pipeline? 0 I’m attempting to call whole genome variants using nf-core/sarek nextflow pipeline. In QC step there is an option that invokes trim_galore quality trimming, but i don’t know how to pass my custom adapters to be cut as well….

Continue Reading How to pass custom software specific variables to nf-core/sarek nextflow pipeline?

List of human protein coding genes with given name (known function?)

List of human protein coding genes with given name (known function?) 2 Hello, To put it simply, I am doing differential expression analysis on human RNA-seq data and I want to focus my analysis of genes that are: 1) Protein coding, so no SNOR or MIR 2) Genes with a…

Continue Reading List of human protein coding genes with given name (known function?)

Bioconductor – BSgenome.Hsapiens.UCSC.hg38.dbSNP151.major

DOI: 10.18129/B9.bioc.BSgenome.Hsapiens.UCSC.hg38.dbSNP151.major     Full genome sequences for Homo sapiens (UCSC version hg38, based on GRCh38.p12) with injected major alleles (dbSNP151) Bioconductor version: Release (3.13) Full genome sequences for Homo sapiens (Human) as provided by UCSC (hg38, based on GRCh38.p12) with major allele injected from dbSNP151, and stored in Biostrings…

Continue Reading Bioconductor – BSgenome.Hsapiens.UCSC.hg38.dbSNP151.major

cellranger count help

cellranger count help 0 these were original data from sequencing company and then i compressed these files into two files R1 and R2 /data01/chenyu/sc/cellranger-6.1.1/cellranger count –id=cellranger_szdxb015 –fastqs=/data01/chenyu/sc/sortData/191527A_SZdxb01_5 –sample=SZdxb015 –transcriptome=/data01/chenyu/sc/refdata-gex-GRCh38-2020-A –localcores=20 –nosecondary error occured: FASTQ header mismatch detected at line 4 of input files “/data01/chenyu/sc/sortData/191527A_SZdxb01_5/SZdxb015_S1_L001_R1_001.fastq.gz” and “/data01/chenyu/sc/sortData/191527A_SZdxb01_5/SZdxb015_S1_L001_R2_001.fastq.gz”: file: “/data01/chenyu/sc/sortData/191527A_SZdxb01_5/SZdxb015_S1_L001_R1_001.fastq.gz”, line: 4…

Continue Reading cellranger count help

How to download the Homo_sapiens.GRCh38.100.gtf and Homo_sapiens.GRCh38.dna.primary_assembly.fa files for my analysis?

How to download the Homo_sapiens.GRCh38.100.gtf and Homo_sapiens.GRCh38.dna.primary_assembly.fa files for my analysis? 0 I am trying to perform STAR alignment and I need the reference files for indexing. I would like to know how to download the Homo_sapiens.GRCh38.100.gtf and Homo_sapiens.GRCh38.dna.primary_assembly.fa files so that I can use my following code for indexing…

Continue Reading How to download the Homo_sapiens.GRCh38.100.gtf and Homo_sapiens.GRCh38.dna.primary_assembly.fa files for my analysis?

Linearize fasta files

Program versions used: BBMap – v. 38.32Seqtk – v. 1.3-r106Seqkit – v. 0.8.1Perl – v. 5.16.3Python – v. 3.6.6sed – v. 2.2.2 $ time (cat Homo_sapiens.GRCh38.dna.primary_assembly.fa > /dev/null) real 0m1.050s user 0m0.002s sys 0m1.045s With BBMap – reformat.sh $ time reformat.sh -Xmx40g in=Homo_sapiens.GRCh38.dna.primary_assembly.fa fastawrap=0) java -ea -Xmx40g -cp bbmap/current/ jgi.ReformatReads…

Continue Reading Linearize fasta files

Bioconductor Forum

James W. MacDonald 57k 1 week, 5 days ago United States Answer: Biomart’s getBM returns no genes for an existing GO-term in grch38, and less the Michael Love 33k 1 week, 6 days ago United States Answer: Normalizing 5′ Nascent RNA-seq data to identify differentially expressed transcr Kevin Blighe 3.3k 2 weeks, 2 days ago Republic…

Continue Reading Bioconductor Forum

Need suggestions about pathogenicity prediction of gdc level 3 SNV file

Hi, I am trying to figure out which tool is most accurate in terms of pathogenicity prediction of TCGA SNVs level 3 data. TCGA offers SIFT, PolyPhen, and IMPACT scores for different kinds of mutations. SIFT, and PolyPhen cover mainly “Missense Mutation”, while IMPACT categorizes every kind of mutation into…

Continue Reading Need suggestions about pathogenicity prediction of gdc level 3 SNV file

What is the difference between GRCh37 and hs37? And hg19?

This is what I have found so far. Please correct me if I am wrong. GRCh37 w/o patches includes the primary assembly (22 autosomal, X. Y, and non-chromosomal supecontigs) and alternate scaffolds, but not a reference mitogenome. Non-chromosomal supercontigs are the unlocalized and unplaced scaffolds. The rCRS reference mitogenome in…

Continue Reading What is the difference between GRCh37 and hs37? And hg19?

Disappearing CB, the bam tag after samtools sort -t CB

  I’ve been trying to setup an analysis pipline for RNAvelocity in AWS EC2. I used one of the 10x dataset, 10k Peripheral blood mononuclear cells (PBMCs) from a healthy donor, Single Indexed, as a test model to setup the pipeline. For speed and cost saving, I first used samtools…

Continue Reading Disappearing CB, the bam tag after samtools sort -t CB

Finding differentially expressed lncRNA

Finding differentially expressed lncRNA 0 Hi all, I’m trying to diff. expressed lncRNAs between two groups (of humans). I wanted to use the following pipeline: trimmomatic –> stringtie/cufflinks –> Cuffmerge/stringtie merge –> FEELnc to find lncRNAs. To find diff. expressed transcripts I want to use the following pipeline: trimmomatic –>…

Continue Reading Finding differentially expressed lncRNA

Finding 16 mer not present in GRCh38

Thanks for the question – it has kept me busy this Sunday morning / afternoon. As implied by others, this poses a computational challenge but is not insurmountable. For motif searching generally, I usually use AWK. My approach here was to: generate all possible k-mers of the chosen size (run…

Continue Reading Finding 16 mer not present in GRCh38