Tag: GRCH38
Unexprected Ensembl-vep results
Unexprected Ensembl-vep results 0 Hi.I got a VCF from an individual that shows symptoms of a known disease with known mutations. I run it with Ensembl-vep, expecting to find some of those mutations in the results, yet, all the consequences in the results are “intergenic-variant”.The command I used was: –cache…
How to modify VCF file?
Hi community, I have a question: the SNP position in vcf file is from GRCh37/hg19, I need to change the position to GRCh38. So, I used UCSC liftover to replace the hg19 pos by GRCh38 pos and deleted some SNPs, then sorted the pos and saved to a new vcf…
python – Matching two files(vcf to maf) using a dictionaries, and appending the contents
annotation_file ##INFO=<ID=ClinVar_CLNSIG,Number=.,xxx ##INFO=<ID=ClinVar_CLNREVSTAT,Number=.,yyy ##INFO=<ID=ClinVar_CLNDN,Number=.zzz #CHROM POS ID REF ALT QUAL FILTER INFO chr1 10145 . AAC A 101.83 . AC=2;AF=0.067;AN=30;aaa chr1 10146 . AC A 98.25 . AC=2;AF=0.083;AN=24;bbb chr1 10146 . AC * 79.25 . AC=2;AF=0.083;AN=24;ccc chr1 10439 . AC A 81.33 . AC=1;AF=0.008333;AN=120;ddd chr1 10450 . T G 53.09…
Seven Bridges, Brazilian Researchers Applying Graph Analysis to Build Diverse Reference Genome
CHICAGO – The latest of Seven Bridges Genomics’ efforts to diversify reference genomes is its largest and perhaps most complex to date, an attempt to address the Brazilian population. The Charlestown, Massachusetts-based bioinformatics company recently joined with the University of São Paulo (USP), the Associação Genomas Brasil (Brazil Genome Association),…
Standard for aligning smallRNA to a reference human rRNA?
Standard for aligning smallRNA to a reference human rRNA? 0 Hi, I need to label some smallRNA sequences that I know are rRNA fragments. I know that for mRNA these are discarded by aligning to the human genome and filtering out multimapped reads, but I need to try to pin…
Obtain equivalent variant ids (chr-pos-ref-alt) for GRCh37 and GRCh38
Obtain equivalent variant ids (chr-pos-ref-alt) for GRCh37 and GRCh38 0 Hi all, I want to obtain the equivalent variant id (chr-pos-ref-alt) from GRCh38 in GRCh37. This is to deal with some variants poorly lifted over. To exemplify, see the variant gnomad.broadinstitute.org/variant/10-17838942-A-G?dataset=gnomad_r3 It has two equivalents in GRCh37. I want to…
Genetic and chemotherapeutic influences on germline hypermutation
DNM filtering in 100,000 Genomes Project We analysed DNMs called in 13,949 parent–offspring trios from 12,609 families from the rare disease programme of the 100,000 Genomes Project. The rare disease cohort includes individuals with a wide array of diseases, including neurodevelopmental disorders, cardiovascular disorders, renal and urinary tract disorders, ophthalmological…
RefSeq Release 212 is available!
RefSeq Release 212 is now available online, from the FTP site and through NCBI’s Entrezprogramming utilities, E-utilities. This full release incorporates genomic, transcript, and protein data available as of May 2, 2022, and contains 314,915,153 records, including 229,417,182 proteins, 44,805,833 RNAs, and sequences from 119,373 organisms. The release is provided in several directories…
Latest dbSNP VCF
This is the directory you’re looking for: ftp.ncbi.nih.gov/snp/redesign/latest_release/VCF/ curl -s ftp.ncbi.nih.gov/snp/redesign/latest_release/VCF/GCF_000001405.39.gz | zcat | head ##fileformat=VCFv4.2 ##fileDate=20210513 ##source=dbSNP ##dbSNP_BUILD_ID=155 ##reference=GRCh38.p13 ##phasing=partial ##INFO=<ID=RS,Number=1,Type=Integer,Description=”dbSNP ID (i.e. rs number)”> ##INFO=<ID=GENEINFO,Number=1,Type=String,Description=”Pairs each of gene symbol:gene id. The gene symbol and id are delimited by a colon (:) and each pair is delimited by a…
On a reference pan-genome model (Part II)
12 July 2019 I wrote a blog post on a potential reference pan-genome model. I had more thoughts in my mind. I didn’t write about them because they are immature. Nonetheless, a few readers raised questions related to my immature thoughts, so I decide to add this “Part II” as…
HTseq-Count: Long processing time
HTseq-Count: Long processing time 1 Hi everyone, I’m processing BAM files using htseq-count and it takes very long time to produce the counts for each file. It is about pair-end reads (around 50 million sequence each). It takes 75 minutes to count this pair; is that normal? Thanks. htseq-count –max-reads-in-buffer=24000000000…
Pangenome-based genome inference allows efficient and accurate genotyping across a wide spectrum of variant classes
Sequencing data We used publicly available sequencing data from the GIAB consortium45, 1000 Genomes Project high-coverage data46 and Human Genome Structural Variation Consortium (HGSVC)4. All datasets include only samples consented for public dissemination of the full genomes. Statistics and reproducibility For generating the assemblies, we used all 14 samples for…
Use RSEM and Bowtie2 to align paired-end sequences
Use RSEM and Bowtie2 to align paired-end sequences 0 I want to use rsem-calculate-expression and bowtie2 aligner to align paired-end sequence based on the following conditions: 2 processors generate BAM file very fast bowtie2 sensitivity append gene/transcript name My code: rsem-refseq-extract-primary-assembly GCF_000001405.31_GRCh38.p5_genomic.fna GCF_000001405.31_GRCh38.p5_genomic.primary_assembly.fna rsem-prepare-reference –gff3 GCF_000001405.31_GRCh38.p5_genomic.gff –bowtie2 –bowtie2-path /bowtie2-2.4.5-py39hd2f7db1_2 –trusted-sources…
Complete sequence of human genome published in landmark achievement
Share on PinterestResearchers have published a gapless sequence of the human genome. John Niklasson/Getty Images Researchers belonging to the Telomere-to-Telomere (T2T) consortium have published the complete sequence of the human genome, filling in gaps present in previous versions. Previously published sequences accounted for 92% of the human genome and were…
Mapped reference id is not an id of the genome file genome_nowhitespace.fa
miRDeep2: Mapped reference id is not an id of the genome file genome_nowhitespace.fa 1 Hi everyone, I’m trying to run nf-co.re/smrnaseq pipeline and I’m having a problem with mirdeep2. Command: nextflow run nf-core/smrnaseq -profile ijcluster –input /home/794_both.fastq.gz –outdir /home/results –genome GRCh38 –protocol qiaseq –mature mirbase.org/ftp/CURRENT/mature.fa.gz –hairpin mirbase.org/ftp/CURRENT/hairpin.fa.gz Error message: Command…
Human genome: they manage to decipher its “grey zone”
An international research consortium published the first complete sequence of the human genome that reveals new genes and will shed light on hereditary diseases and human evolution, scientific sources reported yesterday. Until now, all genomic studies were based on and used as a reference the human genome sequence produced more…
subpopulations available in MafH5.gnomAD.v3.1.1.GRCh38
subpopulations available in MafH5.gnomAD.v3.1.1.GRCh38 1 @b14a6f0d Last seen 16 hours ago United States Are subpopulation MAFs available for gnomADv.3.1.1 with any package, like they are in MafDb.gnomAD.r2.1.hs37d5? I’m trying to use Genomic Scores to obtain all variants in a genomic range with MAF in any subpopulation >= cutoff. I tried…
human genome files
human genome files 0 Hi all, Just wonder to know about these two questions? what is the main difference between the two genome files (Homo_sapiens.GRCh38.dna.primary_assembly.fa and Homo_sapiens.GRCh38.dna.fa) located in the ensemble database? which one should I use for whole-exome sequence alignment? I used Homo_sapiens.GRCh38.dna.fa for the alignment, and later on,…
A genome-scale screen for synthetic drivers of T cell proliferation
Abramson, J. S. et al. Transcend NHL 001: immunotherapy with the CD19-directed CAR T-cell product JCAR017 results in high complete response rates in relapsed or refractory B-cell non-Hodgkin lymphoma. Blood 128, 4192–4192 (2016). Google Scholar Shifrut, E. et al. Genome-wide CRISPR screens in primary human T cells reveal key regulators…
Transcriptional kinetics and molecular functions of long noncoding RNAs
Ethical compliance The research carried out in this study has been approved by the Swedish Board of Agriculture, Jordbruksverket: N343/12. Cell culture Mouse primary fibroblasts were derived from adult (>10 weeks old) CAST/EiJ × C57BL/6J or C57BL/6J × CAST/EiJ mice by skinning, mincing and culturing tail explants (for at least 10 d) in DMEM high…
Accelerating minimap2 for long-read sequencing applications on modern CPUs
Chaisson, M. J. et al. Multi-platform discovery of haplotype-resolved structural variation in human genomes. Nat. Commun. 10, 1–16 (2019). Article Google Scholar Conesa, A. et al. A survey of best practices for RNA-seq data analysis. Genome Biol. 17, 1–19 (2016). Article Google Scholar Beyter, D. et al. Long-read sequencing of…
Extracellular circulating miRNAs as stress-related signature to search and rescue dogs
Study approval was provided by the Research Ethics Committee of the University of Perugia (report n.2018-21 of 11/12/2018) according to Italian Ministry of Health legislation18. All methods were carried out following relevant guidelines and regulations and the study was carried out in compliance with the ARRIVE guidelines. Informed consent is…
rs532111960 RefSNP Report – dbSNP
Help Variant Details tab shows known variant placements on genomic sequences: chromosomes (NC_), RefSeqGene, pseudogenes or genomic regions (NG_), and in a separate table: on transcripts (NM_) and protein sequences (NP_). The corresponding transcript and protein locations are listed in adjacent lines, along with molecular consequences from Sequence Ontology. When…
use tcgabiolinks package to download TCGA data
TCGA Data download in terms of ease of use ,RTCGA The bag should be better , And because it’s already downloaded data , The use is relatively stable . But also because of the downloaded data , There is no guarantee that the data is new .TCGAbiolinks The package is…
Vertical stratification of the air microbiome in the lower troposphere
Significance Large-scale meteorological and biological data demonstrate the vertical stratification of airborne biomass. The previously described diel cycle of airborne microorganisms is shown to disappear at height. Atmospheric turbulence and stratification are shown to be defining factors for the scale and boundaries, dynamics, and natural variability of airborne biomass, resulting…
Extract longest transcript or longest CDS transcript from GTF annotation file or gencode transcripts fasta file.
There are four types of methods to extract longest transcript or longest CDS regeion with longest transcript from transcripts fasta file or GTF file. 1.Extract longest transcript from gencode transcripts fasta file. 2.Extract longest transcript from gtf format annotation file based on gencode/ensembl/ucsc database. 3.Extract longest CDS regeion with longest…
rs9789283 RefSNP Report – dbSNP
Help Variant Details tab shows known variant placements on genomic sequences: chromosomes (NC_), RefSeqGene, pseudogenes or genomic regions (NG_), and in a separate table: on transcripts (NM_) and protein sequences (NP_). The corresponding transcript and protein locations are listed in adjacent lines, along with molecular consequences from Sequence Ontology. When…
Ensembl VEP gnomAD annotated allele frequencies different from gnomAD browser
I’ve annotated some variants using VEP, and was looking at the minor allele frequencies. Some of the variants had very different MAFs in the annotation than I expected (I expected MAF < 1%, whereas some annotated MAFs were >50%). I looked up the same variants on the gnomAD v3 browser,…
bwa , 2 files fastq to 1 sam
bwa , 2 files fastq to 1 sam 1 i have this problem, please, help me, I’m trying it too from Mac OS Catalina I am creating a sam file, with 2 fastq files, using bwa I apply the following command bwa mem -t 2 GRCh38.primary_assembly.genome.fa.gz V350019555_L03_B5GHUMqcnrRAABA-556_1.fq.gz V350019555_L03_B5GHUMqcnrRAABA-556_2.fq.gz > V350019555_L03_B5GHUMqcnrRAABA-556.sam…
Seven technologies to watch in 2022
The Telomere-to-Telomere Consortium is sequencing whole chromosomes.Credit: Adrian T. Sumner/SPL From gene editing to protein-structure determination to quantum computing, here are seven technologies that are likely to have an impact on science in the year ahead. Fully finished genomes Roughly one-tenth of the human genome remained uncharted when genomics researchers…
variant – Where should you put you cache for ensembl-vep using conda
I’ve installed vep in conda like so: conda install ensembl-vep=105.0-0 And then I installed the human cache like so: vep_install -a cf -s homo_sapiens -y GRCh38 -c /mnt/gpfs/live/rd01__/ritd-ag-project-rd018o-mdflo13/refs/vep –CONVERT But when I try and run vep I get an error: vep –dir_cache /mnt/gpfs/live/rd01__/ritd-ag-project-rd018o-mdflo13/refs/vep -i /mnt/gpfs/live/rd01__/ritd-ag-project-rd018o-mdflo13/data/test/manual/results/variants/cohort.norm_recalibrated.vcf -o /mnt/gpfs/live/rd01__/ritd-ag-project-rd018o-mdflo13/data/test/manual/results/variants/cohort.norm_recalibrated_vep.vcf Am I doing…
linux – How to fix Perl from anaconda not installing bioperl? Bailing out the installation for BioPerl
vep -i examples/homo_sapiens_GRCh38.vcf –database Can’t locate Bio/PrimarySeqI.pm in @INC (you may need to install the Bio::PrimarySeqI module) (@INC contains: /home/youssef/anaconda3/envs/ngs1/share/ensembl-vep-88.9-0/modules /home/youssef/anaconda3/envs/ngs1/share/ensembl-vep-88.9-0 /home/youssef/anaconda3/envs/ngs1/lib/site_perl/5.26.2/x86_64-linux-thread-multi /home/youssef/anaconda3/envs/ngs1/lib/site_perl/5.26.2 /home/youssef/anaconda3/envs/ngs1/lib/5.26.2/x86_64-linux-thread-multi /home/youssef/anaconda3/envs/ngs1/lib/5.26.2 .) at /home/youssef/anaconda3/envs/ngs1/share/ensembl-vep-88.9-0/Bio/EnsEMBL/Slice.pm line 75. BEGIN failed–compilation aborted at /home/youssef/anaconda3/envs/ngs1/share/ensembl-vep-88.9-0/Bio/EnsEMBL/Slice.pm line 75. Compilation failed in require at /home/youssef/anaconda3/envs/ngs1/share/ensembl-vep-88.9-0/Bio/EnsEMBL/Feature.pm line 84. BEGIN failed–compilation aborted at /home/youssef/anaconda3/envs/ngs1/share/ensembl-vep-88.9-0/Bio/EnsEMBL/Feature.pm…
links to Ensembl GRCh37 – gitmetadata
Open Targets Genetics reports GRCh38 coordinates but ‘External references” section points to GRCh37 (grch37.ensembl.org) rather than GRCh38 (www.ensembl.org): genetics.opentargets.org/variant/8_102432699_T_C Was this a deliberate decision (e.g. we don’t have the rsID in GRCh38 for some reason, other)? If so, we need to make this clear in the docs. If not, we…
Failure to detect mutations in U2AF1 due to changes in the GRCh38 reference sequence
Materials and Methods Genomic data was collected as part of the MDS National History Study or The Cancer Genome Atlas project and consented appropriately under those protocols 8 Sekeres M.A. Gore S.D. Stablein D.M. DiFronzo N. Abel G.A. DeZern A.E. Troy J.D. Rollison D.E. Thomas J.W. Waclawiw M.A. Liu J.J….
Protocols
normalization data transformation protocol 10X Genomics Visium sequencing data were aligned and quantified using the Space Ranger Software Suite (version 1.0.0, 10x Genomics Inc) using the GRCh38 human reference genome (official Cell Ranger reference, version 3.0.0). Spots were manually aligned to the paired H&E images by 10x Genomics. nucleic acid…
[lh3/minimap2] Memory leak when using Python and threads
The program align.py uses mappy to align reads in Python using multiple worker threads. After loading the index the memory usage jumps up quickly to >20Gb and then continues to climb steadily through 40Gb an beyond. This issue was first discovered in bonito and isolated to mappy. The data flow…
VEP issue: ERROR: Cache assembly version (GRCh37) and database or selected assembly version (GRCh38) do not match
Describe the issue VEP give errors even my query and reference has same assembly version Command :$: ./vep -i examples/homo_sapiens_GRCh37.vcf –cache –refseq cache reference details while running install.pl ? 458 NB: Remember to use –refseq when running the VEP with this cache! downloading ftp.ensembl.org/pub/release-104/variation/indexed_vep_cache/homo_sapiens_refseq_vep_104_GRCh37.tar.gz unpacking homo_sapiens_refseq_vep_104_GRCh37.tar.gz converting cache, this may…
hg38 Import custom reference upload error
Our version of TS is 5.12.2 When trying to upload new custom reference fasta (downloaded from ncbi ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz, gunzipped and renamed to hg38.fasta) through “Import custom reference” in interface an error occures: “uploaded file size is incorrect” (to be honest the error was not shown in logs, because of TypeError…
Bioconductor – BSgenome.Hsapiens.NCBI.GRCh38
DOI: 10.18129/B9.bioc.BSgenome.Hsapiens.NCBI.GRCh38 This package is for version 3.11 of Bioconductor; for the stable, up-to-date release version, see BSgenome.Hsapiens.NCBI.GRCh38. Full genome sequences for Homo sapiens (GRCh38) Bioconductor version: 3.11 Full genome sequences for Homo sapiens (Human) as provided by NCBI (GRCh38, 2013-12-17) and stored in Biostrings objects. Author: The…
Attempting to generate a bam.bai file but the output is not readable
Attempting to generate a bam.bai file but the output is not readable 1 Hi, I am new a exome sequencing, and have tried to follow tutorials on the subject. I am stuck at the samtools index stage because the output files are in a non-human readable format and I believe…
Why single cell R2 fastq have no read identified by bowtie2 ?
Why single cell R2 fastq have no read identified by bowtie2 ? 0 When we input R2 fastq.gz into bowtie2, human sequence was not removed ( ${base}_host_removed is zero). for i in $(find ./ -type f -name “.fastq.gz” | while read F; do basename $F | rev | cut -c…
Trouble running vcf2bam jvarkit tool
Trouble running vcf2bam jvarkit tool 2 I am trying to use the tool called vcf2bam from jvarkit on a server and I have the following 2 files: GRCh38_latest_genomic.fna – the file is of format FASTQ , and 00-common_all.vcf. I used samtools faidx and also picard CreateSequenceDictionary, but when I try…
cellranger count DETECT_COUNT_CHEMISTRY (failed)
cellranger count DETECT_COUNT_CHEMISTRY (failed) 0 I am learning scRNA-seq and the tutorial I follow uses dataset (1k pbmcs from healthy donor) from 10X genomics website. I downloaded fastq and reference transcriptome files and ran following command. cellranger-6.1.1/cellranger count –id pbmc_1k_v2_example –transcriptome /home/murat/Share/single_cell/refdata-gex-GRCh38-2020-A –fastqs /home/murat/Share/single_cell/pbmc_1k_v2_fastqs I get following message. Martian Runtime…
Is the Ensembl GRCh38 genome assembly more up to date than the UniProtKB online database?
Dear all, I am working with a list of Ensembl accession codes for a desired group of proteins. I have downloaded the protein annotations related to the genome assembly GRCH38. I fetched the genomic coordinates from UniProtKB API service using the Ensembl accession codes. The service provide a protein annotation…
Alternate nucleotide is more frequent than reference nucleotide. OMG I’m dizzy. How do I stop the twirl?
This is due to the fact that the very reference genomes that we use for re-alignment are themselves based on individuals who carry rare risk alleles. Thus, when we call variants against these genomes, we are, at many loci, comparing against rare disease risk alleles. As the best/worst example (depending…
mixing hg38 and GRCh38 during variant calling
mixing hg38 and GRCh38 during variant calling 0 Hello everyone! I’ve been working on a variant calling pipeline for WES data and used a mix of hg38 and GRCh38 reference files after reading that hg38 is just an abbreviation of GRCh38, and that they refer to the same thing. But…
snpEFF not able to download GRCH38 ?
snpEFF not able to download GRCH38 ? 2 HI Why snpEff not able to download GRCH38 ? Always its showing error, But its work well with GRCH37 reference. Thanks for your comments. likithreddy@Curium:~/Downloads/snpEff_latest_core/snpEff$ java -jar snpEff.jar download GRCh38.76 java.lang.RuntimeException: Property: ‘GRCh38.76.genome’ not found at org.snpeff.interval.Genome.<init>(Genome.java:106) at org.snpeff.snpEffect.Config.readGenomeConfig(Config.java:681) at org.snpeff.snpEffect.Config.readConfig(Config.java:649) at…
Highly mapped to introns
Highly mapped to introns 0 Hi, I am analyzing RNA-seq data from human blood samples. I checked the read distribution using RSeQC read_distribution after mapping by STAR. Usually, I get more than 80% of reads mapped to exons. However, at this time, the result showed only several % were mapped…
Editing header of a fasta file
Editing header of a fasta file 1 Hello everybody, I’ve been using sed but for simple steps and now I can’t do this: I have this header: >ENSP00000451042.1 pep chromosome:GRCh38:14:22438547:22438554:1 gene:ENSG00000223997.1 transcript:ENST00000415118.1 gene_biotype:TR_D_gene transcript_biotype:TR_D_gene gene_symbol:TRDD1 description:T cell receptor delta diversity 1 [Source:HGNC Symbol;Acc:HGNC:12254] and I would like to obtein this:…
Phasing with SHAPEIT
Edit June 7, 2020: The code below is for pre-phasing with SHAPEIT2. For phased imputation using the output of SHAPEIT2 and ultimate production of phased VCFs, see my answer here: A: ERROR: You must specify a valid interval for imputation using the -int argument, So, the steps are usually: pre-phasing…
Produce PCA bi-plot for 1000 Genomes Phase III
Note1 – Previous version: Produce PCA bi-plot for 1000 Genomes Phase III in VCF format (old) Note2 – this data is for hg19 / GRCh37 Note3 – GRCh38 data is available HERE The tutorial has been updated based on the 1000 Genomes Phase III imputed genotypes. The original tutorial was…
Annotate Structural variants with population specific allele frequency values
Annotate Structural variants with population specific allele frequency values 0 Hi, Has anyone tried filtering structural variants based on pupulation specific allele frequency (AF) values (for example gnomAD-SV or phase 3 1000 genome SV)? I have a set of SVs that I detected using a multipronged approach. For prioritising variants,…
Where can I get ?or how can I make a mappability track for hg38 assembly
Where can I get ?or how can I make a mappability track for hg38 assembly 2 Lucky you @manojmumar_bhosale I worked on similar problem recently and therefore have the bash script you can use. Required tools: GEM libary from here UCSC’s wigToBigWig from here (I chose binary for Linux 64…
Running htseq-count to “grab” long non coding gene_id names
Running htseq-count to “grab” long non coding gene_id names 0 hi all, new to bioinformatics. so bare with me.. I am trying find long non coding RNA from RNA-seq data. As i checked the human gtf file there are 2 different types of long non coding RNA, “lnc_RNA” and “lncRNA”,…
UCSC liftover
UCSC liftover 2 Hi, I’m using UCSC liftover to convert hg19 to hg38. The result came out that I don’t understand. Feb. 2009 (GRCh37/hg19) → Dec. 2013 (GRCh38/hg38) – chr1:120904787 → chr1:143905854 Dec. 2013 (GRCh38/hg38) → Feb. 2009 (GRCh37/hg19) – chr1:143905854 → chr1:149400430 (I didn’t check “Allow multiple output regions”.)…
Fasta.fai file error
Fasta.fai file error 0 Hi, I have been struggling with an error in bedtools intersect. The command I am trying to run is as follows bedtools intersect -a sorted.vcf -b nstd166.GRCh38.variant_call_chr.vcf.gz -wo -sorted -f 0.8 -r -g Homo_sapiens_assembly38.fasta.fai For some of the files that I am assessing, I don’t get…
How to pass custom software specific variables to nf-core/sarek nextflow pipeline?
How to pass custom software specific variables to nf-core/sarek nextflow pipeline? 0 I’m attempting to call whole genome variants using nf-core/sarek nextflow pipeline. In QC step there is an option that invokes trim_galore quality trimming, but i don’t know how to pass my custom adapters to be cut as well….
List of human protein coding genes with given name (known function?)
List of human protein coding genes with given name (known function?) 2 Hello, To put it simply, I am doing differential expression analysis on human RNA-seq data and I want to focus my analysis of genes that are: 1) Protein coding, so no SNOR or MIR 2) Genes with a…
Bioconductor – BSgenome.Hsapiens.UCSC.hg38.dbSNP151.major
DOI: 10.18129/B9.bioc.BSgenome.Hsapiens.UCSC.hg38.dbSNP151.major Full genome sequences for Homo sapiens (UCSC version hg38, based on GRCh38.p12) with injected major alleles (dbSNP151) Bioconductor version: Release (3.13) Full genome sequences for Homo sapiens (Human) as provided by UCSC (hg38, based on GRCh38.p12) with major allele injected from dbSNP151, and stored in Biostrings…
cellranger count help
cellranger count help 0 these were original data from sequencing company and then i compressed these files into two files R1 and R2 /data01/chenyu/sc/cellranger-6.1.1/cellranger count –id=cellranger_szdxb015 –fastqs=/data01/chenyu/sc/sortData/191527A_SZdxb01_5 –sample=SZdxb015 –transcriptome=/data01/chenyu/sc/refdata-gex-GRCh38-2020-A –localcores=20 –nosecondary error occured: FASTQ header mismatch detected at line 4 of input files “/data01/chenyu/sc/sortData/191527A_SZdxb01_5/SZdxb015_S1_L001_R1_001.fastq.gz” and “/data01/chenyu/sc/sortData/191527A_SZdxb01_5/SZdxb015_S1_L001_R2_001.fastq.gz”: file: “/data01/chenyu/sc/sortData/191527A_SZdxb01_5/SZdxb015_S1_L001_R1_001.fastq.gz”, line: 4…
How to download the Homo_sapiens.GRCh38.100.gtf and Homo_sapiens.GRCh38.dna.primary_assembly.fa files for my analysis?
How to download the Homo_sapiens.GRCh38.100.gtf and Homo_sapiens.GRCh38.dna.primary_assembly.fa files for my analysis? 0 I am trying to perform STAR alignment and I need the reference files for indexing. I would like to know how to download the Homo_sapiens.GRCh38.100.gtf and Homo_sapiens.GRCh38.dna.primary_assembly.fa files so that I can use my following code for indexing…
Linearize fasta files
Program versions used: BBMap – v. 38.32Seqtk – v. 1.3-r106Seqkit – v. 0.8.1Perl – v. 5.16.3Python – v. 3.6.6sed – v. 2.2.2 $ time (cat Homo_sapiens.GRCh38.dna.primary_assembly.fa > /dev/null) real 0m1.050s user 0m0.002s sys 0m1.045s With BBMap – reformat.sh $ time reformat.sh -Xmx40g in=Homo_sapiens.GRCh38.dna.primary_assembly.fa fastawrap=0) java -ea -Xmx40g -cp bbmap/current/ jgi.ReformatReads…
Bioconductor Forum
James W. MacDonald 57k 1 week, 5 days ago United States Answer: Biomart’s getBM returns no genes for an existing GO-term in grch38, and less the Michael Love 33k 1 week, 6 days ago United States Answer: Normalizing 5′ Nascent RNA-seq data to identify differentially expressed transcr Kevin Blighe 3.3k 2 weeks, 2 days ago Republic…
Need suggestions about pathogenicity prediction of gdc level 3 SNV file
Hi, I am trying to figure out which tool is most accurate in terms of pathogenicity prediction of TCGA SNVs level 3 data. TCGA offers SIFT, PolyPhen, and IMPACT scores for different kinds of mutations. SIFT, and PolyPhen cover mainly “Missense Mutation”, while IMPACT categorizes every kind of mutation into…
What is the difference between GRCh37 and hs37? And hg19?
This is what I have found so far. Please correct me if I am wrong. GRCh37 w/o patches includes the primary assembly (22 autosomal, X. Y, and non-chromosomal supecontigs) and alternate scaffolds, but not a reference mitogenome. Non-chromosomal supercontigs are the unlocalized and unplaced scaffolds. The rCRS reference mitogenome in…
Disappearing CB, the bam tag after samtools sort -t CB
I’ve been trying to setup an analysis pipline for RNAvelocity in AWS EC2. I used one of the 10x dataset, 10k Peripheral blood mononuclear cells (PBMCs) from a healthy donor, Single Indexed, as a test model to setup the pipeline. For speed and cost saving, I first used samtools…
Finding differentially expressed lncRNA
Finding differentially expressed lncRNA 0 Hi all, I’m trying to diff. expressed lncRNAs between two groups (of humans). I wanted to use the following pipeline: trimmomatic –> stringtie/cufflinks –> Cuffmerge/stringtie merge –> FEELnc to find lncRNAs. To find diff. expressed transcripts I want to use the following pipeline: trimmomatic –>…
Finding 16 mer not present in GRCh38
Thanks for the question – it has kept me busy this Sunday morning / afternoon. As implied by others, this poses a computational challenge but is not insurmountable. For motif searching generally, I usually use AWK. My approach here was to: generate all possible k-mers of the chosen size (run…