Tag: GCA

Mitogenome-wise codon usage pattern from comparative analysis of the first mitogenome of Blepharipa sp. (Muga uzifly) with other Oestroid flies

Outcome of DNA sequencing, assembly, and validation In this study, initially total DNA was isolated from the finely chopped, full-grown pupa of Blepharipa sp. The NanoDrop spectrophotometer (1294 ng/μl) and the Qubit fluorometer (732.8 ng/μl) both found that the concentration of total DNA in the sample at an optimum level for mitochondrial DNA enrichment. The Tape Station profile showed…

Continue Reading Mitogenome-wise codon usage pattern from comparative analysis of the first mitogenome of Blepharipa sp. (Muga uzifly) with other Oestroid flies

How can I add a python’s ggplot object to a matplot grid?

I think the solution would be to first draw the ggplot part. Then obtain the matplotlib figure object via plt.gcf() and the axes via plt.gca(). Resize the ggplot axes to fit into a grid and finally draw the rest of the matplotlib plots to that figure. import ggplot as gp…

Continue Reading How can I add a python’s ggplot object to a matplot grid?

Using AnnoTree to Get More Assignments, Faster, in DIAMOND+MEGAN Microbiome Analysis

INTRODUCTION Next-generation sequencing (NGS) has revolutionized many areas of biological research (1, 2), providing ever-more data at an ever-decreasing cost. One such area is microbiome research, the study of microbes in their theater of activity using metagenomic sequencing (3). Here, deep short-read sequencing, and improving performance of long-read sequencing, are…

Continue Reading Using AnnoTree to Get More Assignments, Faster, in DIAMOND+MEGAN Microbiome Analysis

API for NCBI Accession ID (GenBank or RefSeq) generation from a list of species names? : bioinformatics

TL;DR: How can I convert a list of species names (common or scientific) into a corresponding list of NCBI database accession IDs to download the respective species’ reference genome from NCBI. The NCBI urls’ are not common or scientific name compatible so a regular python script for web scrapping that…

Continue Reading API for NCBI Accession ID (GenBank or RefSeq) generation from a list of species names? : bioinformatics

Amur ide genome – assembly database

 Home Assembly GCA_900092035.1 Assembly ID: – Source: GenBank Assembly ( ID GCA_900092035.1 ) Description: – Molecule type: – Submitter: – Organism: – Synonyms: – Assembly type: – Assembly level: – Assembly method: – Genome representation: – Excluded from RefSeq: – RefSeq category: – GenBank assembly accession: – RefSeq assembly accession: – RefSeq…

Continue Reading Amur ide genome – assembly database

GATK HaplotypeCaller with interval list

I am trying to use the -L option of GATK HaplotypeCaller to call SNPs and short InDels with in an interval list. My interval list file (top8snp.interval_list) content is as follows: 12 33029845 33030845 + rs24767598 13 40586682 40587682 + rs24748362 18 24373857 24374857 + rs8856159 21 50381146 50382146 +…

Continue Reading GATK HaplotypeCaller with interval list

Convert list of Accession Numbers to Full Taxonomy

Using NCBI Entrez direct. $ esearch -db assembly -query “GCA_000005845” | elink -target taxonomy | efetch -format native -mode xml | grep ScientificName | awk -F “>|<” ‘BEGIN{ORS=”, “;}{print $3;}’ Escherichia coli str. K-12 substr. MG1655, cellular organisms, Bacteria, Proteobacteria, Gammaproteobacteria, Enterobacterales, Enterobacteriaceae, Escherichia, Escherichia coli, Escherichia coli K-12, If…

Continue Reading Convert list of Accession Numbers to Full Taxonomy

[lh3/minimap2] Memory leak when using Python and threads

The program align.py uses mappy to align reads in Python using multiple worker threads. After loading the index the memory usage jumps up quickly to >20Gb and then continues to climb steadily through 40Gb an beyond. This issue was first discovered in bonito and isolated to mappy. The data flow…

Continue Reading [lh3/minimap2] Memory leak when using Python and threads

MARS seq alingment

MARS seq alingment 0 Hello everyone, new here and also new to the field. was asked to create a pipeline for RNA seq and after two months of self learning of how to interact with each code im stuck with the program STAR. what im trying to do for now…

Continue Reading MARS seq alingment

hg38 Import custom reference upload error

Our version of TS is 5.12.2 When trying to upload new custom reference fasta (downloaded from ncbi ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz, gunzipped and renamed to hg38.fasta) through “Import custom reference” in interface an error occures: “uploaded file size is incorrect” (to be honest the error was not shown in logs, because of TypeError…

Continue Reading hg38 Import custom reference upload error

some signatures that have been suppressed by RefSeq/GenBank are not in wort

For example, the following accessions have been “suppressed” in RefSeq and GenBank, and do no appear in wort (/group/ctbrowngrp/irber/data/wort-data/wort-genomes/sigs/<accession>.sig). I checked newer version numbers and still did not find these accessions (e.g. GCA_900474135*sig). Looking on GenBank, I see the record has been suppressed/removed: www.ncbi.nlm.nih.gov/assembly/GCF_900474135.2/ GCA_900474135.2 GCA_002798115.1 GCA_001308105.1 I couldn’t find…

Continue Reading some signatures that have been suppressed by RefSeq/GenBank are not in wort

Attempting to generate a bam.bai file but the output is not readable

Attempting to generate a bam.bai file but the output is not readable 1 Hi, I am new a exome sequencing, and have tried to follow tutorials on the subject. I am stuck at the samtools index stage because the output files are in a non-human readable format and I believe…

Continue Reading Attempting to generate a bam.bai file but the output is not readable

Gene ID conversion for pathway enrichment analysis of differentially expressed genes

Gene ID conversion for pathway enrichment analysis of differentially expressed genes 1 Hello Community Members I am facing problems in setting up pathway enrichment analysis for the differentially expressed genes because of problems with Gene Ids. I tried using DAVID but the species that I am using is not listed…

Continue Reading Gene ID conversion for pathway enrichment analysis of differentially expressed genes

18S gene not present in genome assembly?

18S gene not present in genome assembly? 0 I am designing PCR primers to amplify a region of the 18S rRNA gene of Penicillium expansum. As the template for primer design, I use the consensus sequence of a multiple sequence alignment of 18S sequences obtained from the SILVA database. When…

Continue Reading 18S gene not present in genome assembly?

High tumor mutation burden and DNA repair gene mutations

Introduction Anaplastic lymphoma kinase (ALK)‑fusion genes represent a small but important part of oncogenic driver mutations in NSCLC, accounting for approximately 3%‑7% of all cases worldwide.1,2 Small molecule tyrosine kinase inhibitors (TKIs) are the standard therapy for ALK-rearranged NSCLC. Crizotinib, a first-generation TKI, is the most widely used targeted drug…

Continue Reading High tumor mutation burden and DNA repair gene mutations

read count to gene

read count to gene 0 I am using this command to get read counts to gene by using the bedtools intersect. samtools view -Shu -q10 -@ 20 UE-2955-CMLib12_sorted.bam | bedtools intersect -c -a GCA_900659725.1_ASM90065972v1_genomic.gff -b stdin > UE-2955-CMLib{i}_intersect_counts2.bed The command work for other files but not for one file. Which…

Continue Reading read count to gene

Is there way to get genbank assembly id from nuccore id using ncbi eutils ?

Is there way to get genbank assembly id from nuccore id using ncbi eutils ? 2 Hello, I am trying to get the assembly id (for eg. GCA_000312685.1 ) for the nuccore id (CP003157.1) using eutils? For the WGS nuccore id such as CAJY00000000 you can get the the assembly…

Continue Reading Is there way to get genbank assembly id from nuccore id using ncbi eutils ?

How to get the sequence differences between multiple bacterial genomes

How to get the sequence differences between multiple bacterial genomes 1 I am working on some closely related bacterial species (complete genomes from NCBI). I would like to extract the sequence differences between them. To be more specific, I want to find unique sequences (50 -100 nts) in each of…

Continue Reading How to get the sequence differences between multiple bacterial genomes

Plasmid-Encoded VIM-2-pProducing Pseudomonas stutzeri | IDR

Introduction Pseudomonas stutzeri is an aerobic, nonfermenting, active, Gram-negative oxidase-positive bacterium with unique colony morphology.1,2 Burri and Stutzer first described it in 1985,3 and the specific metabolic properties, such as denitrification, degradation of aromatic compounds, and nitrogen fixation, distinguish it from other pseudomonads species.2,4 Historically, P. stutzeri was not commonly…

Continue Reading Plasmid-Encoded VIM-2-pProducing Pseudomonas stutzeri | IDR

From amino acid sequence to DNA sequence using Reverse Translate software

From amino acid sequence to DNA sequence using Reverse Translate software 3 Hi. I have Identified amino acid sequence motif using MEME. Now I want to know what is the DNA sequence corresponding to this amino acid sequence: KDEKIKEIFEDLAKEERNHY. It seems that the only software that does this conversion is…

Continue Reading From amino acid sequence to DNA sequence using Reverse Translate software

Download Assembled Genomes NOT in RefSeq

Download Assembled Genomes NOT in RefSeq 0 I have a list of NCBI accession numbers and I need to download the genome sequence (or assemblies) in FASTA format. I can accomplish this for genomes that are present in RefSeq using the following command: esearch -db nuccore -query GCF_900343155.1 | efetch…

Continue Reading Download Assembled Genomes NOT in RefSeq

How to trim a GFF3 file based on specific coordinates?

How to trim a GFF3 file based on specific coordinates? 0 Hi, I would like to create a GFF3 file containing information only for specific coordinates from the chromosome level GFF3 file. I know how to extract gene and CDS info separately but don’t know how to do trimming based…

Continue Reading How to trim a GFF3 file based on specific coordinates?