Tag: GCA

Convert list of Accession Numbers to Full Taxonomy

Using NCBI Entrez direct. $ esearch -db assembly -query “GCA_000005845” | elink -target taxonomy | efetch -format native -mode xml | grep ScientificName | awk -F “>|<” ‘BEGIN{ORS=”, “;}{print $3;}’ Escherichia coli str. K-12 substr. MG1655, cellular organisms, Bacteria, Proteobacteria, Gammaproteobacteria, Enterobacterales, Enterobacteriaceae, Escherichia, Escherichia coli, Escherichia coli K-12, If…

Continue Reading Convert list of Accession Numbers to Full Taxonomy

[lh3/minimap2] Memory leak when using Python and threads

The program align.py uses mappy to align reads in Python using multiple worker threads. After loading the index the memory usage jumps up quickly to >20Gb and then continues to climb steadily through 40Gb an beyond. This issue was first discovered in bonito and isolated to mappy. The data flow…

Continue Reading [lh3/minimap2] Memory leak when using Python and threads

MARS seq alingment

MARS seq alingment 0 Hello everyone, new here and also new to the field. was asked to create a pipeline for RNA seq and after two months of self learning of how to interact with each code im stuck with the program STAR. what im trying to do for now…

Continue Reading MARS seq alingment

hg38 Import custom reference upload error

Our version of TS is 5.12.2 When trying to upload new custom reference fasta (downloaded from ncbi ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz, gunzipped and renamed to hg38.fasta) through “Import custom reference” in interface an error occures: “uploaded file size is incorrect” (to be honest the error was not shown in logs, because of TypeError…

Continue Reading hg38 Import custom reference upload error

some signatures that have been suppressed by RefSeq/GenBank are not in wort

For example, the following accessions have been “suppressed” in RefSeq and GenBank, and do no appear in wort (/group/ctbrowngrp/irber/data/wort-data/wort-genomes/sigs/<accession>.sig). I checked newer version numbers and still did not find these accessions (e.g. GCA_900474135*sig). Looking on GenBank, I see the record has been suppressed/removed: www.ncbi.nlm.nih.gov/assembly/GCF_900474135.2/ GCA_900474135.2 GCA_002798115.1 GCA_001308105.1 I couldn’t find…

Continue Reading some signatures that have been suppressed by RefSeq/GenBank are not in wort

Attempting to generate a bam.bai file but the output is not readable

Attempting to generate a bam.bai file but the output is not readable 1 Hi, I am new a exome sequencing, and have tried to follow tutorials on the subject. I am stuck at the samtools index stage because the output files are in a non-human readable format and I believe…

Continue Reading Attempting to generate a bam.bai file but the output is not readable

Gene ID conversion for pathway enrichment analysis of differentially expressed genes

Gene ID conversion for pathway enrichment analysis of differentially expressed genes 1 Hello Community Members I am facing problems in setting up pathway enrichment analysis for the differentially expressed genes because of problems with Gene Ids. I tried using DAVID but the species that I am using is not listed…

Continue Reading Gene ID conversion for pathway enrichment analysis of differentially expressed genes

18S gene not present in genome assembly?

18S gene not present in genome assembly? 0 I am designing PCR primers to amplify a region of the 18S rRNA gene of Penicillium expansum. As the template for primer design, I use the consensus sequence of a multiple sequence alignment of 18S sequences obtained from the SILVA database. When…

Continue Reading 18S gene not present in genome assembly?

High tumor mutation burden and DNA repair gene mutations

Introduction Anaplastic lymphoma kinase (ALK)‑fusion genes represent a small but important part of oncogenic driver mutations in NSCLC, accounting for approximately 3%‑7% of all cases worldwide.1,2 Small molecule tyrosine kinase inhibitors (TKIs) are the standard therapy for ALK-rearranged NSCLC. Crizotinib, a first-generation TKI, is the most widely used targeted drug…

Continue Reading High tumor mutation burden and DNA repair gene mutations

read count to gene

read count to gene 0 I am using this command to get read counts to gene by using the bedtools intersect. samtools view -Shu -q10 -@ 20 UE-2955-CMLib12_sorted.bam | bedtools intersect -c -a GCA_900659725.1_ASM90065972v1_genomic.gff -b stdin > UE-2955-CMLib{i}_intersect_counts2.bed The command work for other files but not for one file. Which…

Continue Reading read count to gene

Is there way to get genbank assembly id from nuccore id using ncbi eutils ?

Is there way to get genbank assembly id from nuccore id using ncbi eutils ? 2 Hello, I am trying to get the assembly id (for eg. GCA_000312685.1 ) for the nuccore id (CP003157.1) using eutils? For the WGS nuccore id such as CAJY00000000 you can get the the assembly…

Continue Reading Is there way to get genbank assembly id from nuccore id using ncbi eutils ?

How to get the sequence differences between multiple bacterial genomes

How to get the sequence differences between multiple bacterial genomes 1 I am working on some closely related bacterial species (complete genomes from NCBI). I would like to extract the sequence differences between them. To be more specific, I want to find unique sequences (50 -100 nts) in each of…

Continue Reading How to get the sequence differences between multiple bacterial genomes

Plasmid-Encoded VIM-2-pProducing Pseudomonas stutzeri | IDR

Introduction Pseudomonas stutzeri is an aerobic, nonfermenting, active, Gram-negative oxidase-positive bacterium with unique colony morphology.1,2 Burri and Stutzer first described it in 1985,3 and the specific metabolic properties, such as denitrification, degradation of aromatic compounds, and nitrogen fixation, distinguish it from other pseudomonads species.2,4 Historically, P. stutzeri was not commonly…

Continue Reading Plasmid-Encoded VIM-2-pProducing Pseudomonas stutzeri | IDR

From amino acid sequence to DNA sequence using Reverse Translate software

From amino acid sequence to DNA sequence using Reverse Translate software 3 Hi. I have Identified amino acid sequence motif using MEME. Now I want to know what is the DNA sequence corresponding to this amino acid sequence: KDEKIKEIFEDLAKEERNHY. It seems that the only software that does this conversion is…

Continue Reading From amino acid sequence to DNA sequence using Reverse Translate software

Download Assembled Genomes NOT in RefSeq

Download Assembled Genomes NOT in RefSeq 0 I have a list of NCBI accession numbers and I need to download the genome sequence (or assemblies) in FASTA format. I can accomplish this for genomes that are present in RefSeq using the following command: esearch -db nuccore -query GCF_900343155.1 | efetch…

Continue Reading Download Assembled Genomes NOT in RefSeq

How to trim a GFF3 file based on specific coordinates?

How to trim a GFF3 file based on specific coordinates? 0 Hi, I would like to create a GFF3 file containing information only for specific coordinates from the chromosome level GFF3 file. I know how to extract gene and CDS info separately but don’t know how to do trimming based…

Continue Reading How to trim a GFF3 file based on specific coordinates?