Tag: FASTA

Downloading genes from NCBI in fasta format

I’m pretty new in bioinformatics. I need to download FASTA sequences of several genes. The list of genes I have assembled consists of 140 genes, so I’d rather do this through via code than download each gene manually from the NCBI database. The genes belong to 2 different organisms (bacteria)…

Continue Reading Downloading genes from NCBI in fasta format

Compressing BAM, SAM, CRAM | Genozip

How good is Genozip at compressing BAM files? ​ See Benchmarks. ​ Compressing a BAM, SAM or CRAM file  ​ In the rest of this page we will give examples of BAM files. Genozip is also capable of compressing SAM files, and with some limitations, CRAM files as well. ​…

Continue Reading Compressing BAM, SAM, CRAM | Genozip

genbank sequence format

HHS Vulnerability Disclosure, Help This document is an overview of the Entrez databases, with general information on If you are not sure that the “Save” option in your program will do this for you, use “Save As”, In Excel, select “Save As” from the File menu. optimizations to reduce memory…

Continue Reading genbank sequence format

DashBio Alignment Chart/Sequence Viewer – Dash Python

Hello,I am using dash bio to view multiple sequences that are read from a Fasta file. Everything seems to work fine but I am wondering if there is a way to change the numbering that appears at the bottom of the sequence chart. The sequences I am reading follow a…

Continue Reading DashBio Alignment Chart/Sequence Viewer – Dash Python

Unexpected absence of ribosomal protein genes from metagenome-assembled genomes

Hug LA, Baker BJ, Anantharaman K, Brown CT, Probst AJ, Castelle CJ, et al. A new view of the tree of life. Nat Microbiol. 2016;1:16048. Article  PubMed  Google Scholar  Castelle CJ, Wrighton KC, Thomas BC, Hug LA, Brown CT, Wilkins MJ, et al. Genomic expansion of domain archaea highlights roles…

Continue Reading Unexpected absence of ribosomal protein genes from metagenome-assembled genomes

Collapse multifasta file by specific chromosome names

Collapse multifasta file by specific chromosome names 1 I have a multicast file with unique identifiers (‘SUBJECT.1’, ‘SUBJECT.2’ etc) like this: >SUBJECT.1.1:1203-2742(+) AAATTT >SUBJECT.1:354-700(+) CCCGGG >SUBJECT.2:789-2000(+) GGGCCC >SUBJECT.2:2012-2742(+) TTTAAA how would I extract every line that’s associated to each unique identifier and concatenate them together to form an output file…

Continue Reading Collapse multifasta file by specific chromosome names

Expression level of mutant genes in RNAseq data

Expression level of mutant genes in RNAseq data 1 Hello, I have WES data from matched tumor and normal samples and mutants called from these data (in MAF files). From my understanding, if I sequence the tumor sample RNA, and run a routine RNAseq data analysis pipeline, the counts I…

Continue Reading Expression level of mutant genes in RNAseq data

Screen.seqs removed all the sequences – Commands in mothur

Hira October 30, 2022, 8:20am #1 After making contigs, here is the summary of my data. And the summary of contigs report: mothur > screen.seqs(fasta=stability.trim.contigs.fasta, count=stability.contigs.count_table, maxambig=0, maxlength=275, maxhomop=8) It took 28 secs to screen 1896836 sequences, removed 1896836. /******************************************/Running command: remove.seqs(accnos=/users/hiraabid/desktop/mothur/Paddy_Fish_NGS_RawData/stability.trim.contigs.bad.accnos.temp, count=/users/hiraabid/desktop/mothur/Paddy_Fish_NGS_RawData/stability.contigs.count_table)Removed 1896836 sequences from /users/hiraabid/desktop/mothur/Paddy_Fish_NGS_RawData/stability.contigs.count_table.[WARNING]: /users/hiraabid/desktop/mothur/Paddy_Fish_NGS_RawData/stability.contigs.count_table contains only…

Continue Reading Screen.seqs removed all the sequences – Commands in mothur

Converting Bam file to Fasta (Zipped)

Converting Bam file to Fasta (Zipped) 0 I would like to convert .bam files to fq.gz (zipped fasta files) for paired reads. bedtools bamtofastq seems to be a commonly recommended method, I have also seen samtools fastq as a possible alternative. bedtools bamtofastq -i inputfile.bam -fq outputR1.fq -fq2 outputR2.fq samtools…

Continue Reading Converting Bam file to Fasta (Zipped)

‘SeqRecord’ object has no attribute ‘transcribe’

‘SeqRecord’ object has no attribute ‘transcribe’ 1 I am learning how to use python and I need to get the RNA sequence from the DNA sequences of a Multi-Fasta file, but when I try to do it I get the same error. Here is my code: from Bio import SeqIO…

Continue Reading ‘SeqRecord’ object has no attribute ‘transcribe’

Loading multiple sequence alignment fasta file for ggmsa

Loading multiple sequence alignment fasta file for ggmsa 0 Hello, I am trying to use ggmsa to visualize my amino acid multiple sequence alignment (MSA) but the ggmsa() function doesn’t recognize my MSA. Any thoughts how I can correct this? I have been loading my MSA with the following… align…

Continue Reading Loading multiple sequence alignment fasta file for ggmsa

Question 1.docx – Question 1 a) What is the first listed RefSeq mRNA sequence for human CRX? Provide your answer in “FASTA” format. Answer: Homo sapiens

Question 1a) What is the first listed RefSeq mRNA sequence for human CRX?Provide your answer in “FASTA”format.Answer:Homo sapiens cone-rod homeobox (CRX), mRNANCBI Reference Sequence: NM_000554.6GenBankGraphics>NM_000554.6 Homo sapiens cone-rod homeobox (CRX), mRNACCTTCAGCCTCTGCTGTCTGGCCGCTCTGTCTAGGTCCTGGGCCACGGGAGAGCCCCGTCCCTCCTTTCTGAAGGCCCCCTGACTTGGGCCTCAGTGTCCCCGAAGATCATGATGGCGTATATGAACCCGGGGCCCCACTATTCTGTCAACGCCTTGGCCCTAAGTGGCCCCAGTGTGGATCTGATGCACCAGGCTGTGCCCTACCCAAGCGCCCCCAGGAAGCAGCGGCGGGAGCGCACCACCTTCACCCGGAGCCAACTGGAGGAGCTGGAGGCACTGTTTGCCAAGACCCAGTACCCAGACGTCTATGCCCGTGAGGAGGTGGCTCTGAAGATCAATCTGCCTGAGTCCAGGGTTCAGGTTTGGTTCAAGAACCGGAGGGCTAAATGCAGGCAGCAGCGACAGCAGCAGAAACAGCAGCAGCAGCCCCCAGGGGGCCAGGCCAAGGCCCGGCCTGCCAAGAGGAAGGCGGGCACGTCCCCAAGACCCTCCACAGATGTGTGTCCAGACCCTCTGGGCATCTCAGATTCCTACAGTCCCCCTCTGCCCGGCCCCTCAGGCTCCCCAACCACGGCAGTGGCCACTGTGTCCATCTGGAGCCCAGCCTCAGAGTCCCCTTTGCCTGAGGCGCAGCGGGCTGGGCTGGTGGCCTCAGGGCCGTCTCTGACCTCCGCCCCCTATGCCATGACCTACGCCCCGGCCTCCGCTTTCTGCTCTTCCCCCTCCGCCTATGGGTCTCCGAGCTCCTATTTCAGCGGCCTAGACCCCTACCTTTCTCCCATGGTGCCCCAGCTAGGGGGCCCGGCTCTTAGCCCCCTCTCTGGCCCCTCCGTGGGACCTTCCCTGGCCCAGTCCCCCACCTCCCTATCAGGCCAGAGCTATGGCGCCTACAGCCCCGTGGATAGCTTGGAATTCAAGGACCCCACGGGCACCTGGAAATTCACCTACAATCCCATGGACCCTCTGGACTACAAGGATCAGAGTGCCTGGAAGTTTCAGATCTTGTAGAGGACGCAGTCTCCATCTCTCTCCATCGGGCCTCGGGACCC Read more here: Source link

Continue Reading Question 1.docx – Question 1 a) What is the first listed RefSeq mRNA sequence for human CRX? Provide your answer in “FASTA” format. Answer: Homo sapiens

Number of sequences in RefSeq.

Number of sequences in RefSeq. 2 Dear colleagues I can not understand. When I download all the genomic sequences from the refseq database, after counting, I see that there are much fewer records than presented in the release (123394 organisms ftp.ncbi.nlm.nih.gov/refseq/release/release-notes/RefSeq-release214.txt). What am I doing wrong? 1. wget ftp.ncbi.nlm.nih.gov/genomes/refseq/assembly_summary_refseq.txt 2….

Continue Reading Number of sequences in RefSeq.

How to read the AUGUSTUS results

Hi, everyone. I used AUGUSTUS to make a genetic prediction for a certain fish, but I don’t know how to read the results. I mainly do not understand the following three points. 1.What does “sequence number” mean? How is it different from scaffold number? 2.What does “none” mean? Does it…

Continue Reading How to read the AUGUSTUS results

align kmers to reference genome

align kmers to reference genome 0 dear all, I have a fasta file with a list of 25-mers and I am trying to align it to the reference genome ref.fa using bowtie2 I did bowtie2 -x ref -f ref_25mers.fa -S ref_25mers.sam but it gives the result 1 reads; of these:…

Continue Reading align kmers to reference genome

File homo_ref.faa does not exist

I got fasta output by using the following codes in R. And I need to read my fasta file (homo_ref.faa) that I obtained using these codes as “ makeblastdb -in homo_ref.faa -dbtype prot ” via terminal. But I get “BLAST options error: File homo_ref.faa does not exist“. How would you…

Continue Reading File homo_ref.faa does not exist

Building a Simulated Metagenomic Dataset

Building a Simulated Metagenomic Dataset – HackMD       Published Linked with GitHub — tags: ‘JPL: Genetic Inventory Project’ — # Building a Simulated Metagenomic Dataset Here we’ll create a simulated metagenomic datasets for controlled testing. This dataset was used to determine the Kraken 2 confidence score that best…

Continue Reading Building a Simulated Metagenomic Dataset

As of July 2015, the VCFtools project has been moved to github! Please visit the new website here: vcftools.github.io/man_0112a.html

NAME SYNOPSIS DESCRIPTION EXAMPLES BASIC OPTIONS SITE FILTERING OPTIONS INDIVIDUAL FILTERING OPTIONS GENOTYPE FILTERING OPTIONS OUTPUT OPTIONS COMPARISON OPTIONS AUTHOR NAME VCFtools v0.1.12a − Utilities for the variant call format (VCF) and binary variant call format (BCF) SYNOPSIS vcftools [ –vcf FILE | –gzvcf FILE | –bcf FILE]…

Continue Reading As of July 2015, the VCFtools project has been moved to github! Please visit the new website here: vcftools.github.io/man_0112a.html

Manually adding genes of interest to transcriptome for RNASeq DE?

Manually adding genes of interest to transcriptome for RNASeq DE? 1 I have been given some RNASeq reads by a collaborator and have been asked to assess whether there is differential expression between treatments in 6 genes of interest to my collaborator. Unfortunately, this is a non-model organism and so…

Continue Reading Manually adding genes of interest to transcriptome for RNASeq DE?

Job – Principal Biostistician/Bioinformatics job at Kenya Medical Research

Vacancy title: Principal Biostistician/Bioinformatics [ Type: FULL TIME , Industry: Research , Category: Research ] Jobs at: Kenya Medical Research – KEMRI Deadline of this Job: 06 October 2022   Duty Station: Within Kenya , Kisumu , East Africa SummaryDate Posted: Tuesday, September 20, 2022 , Base Salary: Not Disclosed…

Continue Reading Job – Principal Biostistician/Bioinformatics job at Kenya Medical Research

The genus Serratia revisited by genomics

Merlino, C. P. Bartolomeo Bizio’s letter to the most eminent priest, Angelo Bellani, concerning the phenomenon of the red-colored polenta [translated from the Italian]. J. Bacteriol. 9, 527–543 (1924). Grimont, P. A. D. & Dulong de Rosnay, H. L. C. Numerical study of 60 strains of Serratia. J. Gen. Microbiol….

Continue Reading The genus Serratia revisited by genomics

What is the Difference Between FASTA and FASTQ

The key difference between FASTA and FASTQ is that FASTA is a text-based format that only stores nucleotide or protein sequences, while FASTQ is a text-based format that stores both sequence and associated sequence quality values. Bioinformatics is a field that uses different software to analyse and understand biological data,…

Continue Reading What is the Difference Between FASTA and FASTQ

Loading reference genome from BSgenome

Loading reference genome from BSgenome 1 I am trying to run an analysis via the MutationalPatterns package. The first step is to install the BSgenome package, and then load a reference genome from BSgenome: library(“BSgenome”) ref_genome <- “BSgenome.Mmusculus.UCSC.mm39” library(ref_genome, character.only = TRUE) When I run my script, it gets hung…

Continue Reading Loading reference genome from BSgenome

Live-seq enables temporal transcriptomic recording of single cells

Biological materials RAW264.7, 293T and HeLa cells were obtained from ATCC. RAW264.7 cells with Tnf-mCherry reporter and relA-GFP fusion protein (RAW-G9 clone) were kindly provided by I.D.C. Fraser (National Institutes of Health). The IBA cell line derived from the stromal vascular fraction of interscapular brown adipose tissue of young male…

Continue Reading Live-seq enables temporal transcriptomic recording of single cells

mapping – STAR error in snakemake pipeline: “EXITING because of FATAL ERROR: could not open genome file”

I’m trying to use a 2 pass STAR mapping strategy (also explained here informatics.fas.harvard.edu/rsem-example-on-odyssey.html), but I’m getting an error. I’ve read through this page [https://github.com/alexdobin/STAR/issues/181] and I have a similar issue, but the discussed solutions don’t seem to help. Perhaps this is more a snakemake issue rather than a STAR…

Continue Reading mapping – STAR error in snakemake pipeline: “EXITING because of FATAL ERROR: could not open genome file”

Patrick Murphy Bulk RNA-Seq – HackMD

Patrick Murphy Bulk RNA-Seq – HackMD        owned this note   Published Linked with GitHub — title: ‘Patrick Murphy Bulk RNA-Seq’ disqus: hackmd — Patrick Murphy bulk RNA-Seq Analysis === ## Table of Contents [TOC] ## 1. Introduction This is a bulk RNA-Seq project, which includes human data….

Continue Reading Patrick Murphy Bulk RNA-Seq – HackMD

Freebayes-parallel with large bam file – individual threads running for >6 days

Context: I’m trying to call variants on a sequencing project using pooled genotyping-by-sequencing. Pools consist of 94 samples each, alongside a number of individuals. Sequence data was demultiplexed and then aligned to a reference genome using hisat2, and the resultant bams were merged with samtools merge. The problem bam is…

Continue Reading Freebayes-parallel with large bam file – individual threads running for >6 days

How to identify DNA sequences with ambiguous nucleotides such as N, Y, R, W.. in a multifasta file and then remove these sequences with Biopython

How to identify DNA sequences with ambiguous nucleotides such as N, Y, R, W.. in a multifasta file and then remove these sequences with Biopython 4 Dear Biostars, My request is based on filtering and curing several multifastas. For instance, I have downloaded about 150 complete genomes from NCBI belonging…

Continue Reading How to identify DNA sequences with ambiguous nucleotides such as N, Y, R, W.. in a multifasta file and then remove these sequences with Biopython

bioinformatics – How to specify amount of characters per line on a fasta file

I have a fasta file that looks like this: >abc AGAATTCGTCTTGCTCTATTCACCCTTACTTTTCTTCTTGCCCGTTCTCTTTCTTAGTATGAATCCAGTA TGCCTGCCTGTAATTGTTGCGCCCTACCTCTTTTGGCTGGCGGCTATTGCCGCCTCGTGTTTCACGGCCT CAGTTAGTACCGTTGTGACCGCCACCGGCTTGGCCCTCTCACTTCTACTCTTGGCAGCAGTGGCCAGCTC ATATGCCGCTGCACAAAGGAAACTGCTGACACCGGTGACAGTGCTTACTGCGGTTGTCACTTGTGAGTAC However, I need the file to have 60 characters per line. It should look like this: >abc AGAATTCGTCTTGCTCTATTCACCCTTACTTTTCTTCTTGCCCGTTCTCTTTCTTAGTAT GAATCCAGTATGCCTGCCTGTAATTGTTGCGCCCTACCTCTTTTGGCTGGCGGCTATTGC CGCCTCGTGTTTCACGGCCTCAGTTAGTACCGTTGTGACCGCCACCGGCTTGGCCCTCTC ACTTCTACTCTTGGCAGCAGTGGCCAGCTCATATGCCGCTGCACAAAGGAAACTGCTGAC I tried to use fold -w 60 myfile.fasta > out.fa to change my file but…

Continue Reading bioinformatics – How to specify amount of characters per line on a fasta file

Mapping reads using kallisto – rna seq analysis

Mapping reads using kallisto – rna seq analysis 0 Hi, I’m trying to map reads to a reference genome using kallisto for rna seq analysis with terminal on mac and the following command keeps loading for hours and won’t run. I’m not exactly sure where I’ve gone wrong. kallisto index…

Continue Reading Mapping reads using kallisto – rna seq analysis

Lh3 Minimap2 Issues

Issue Title State Comments Created Date Updated Date Mapping reads against multi references. Any proposition? open 0 2022-06-28 2022-06-30 Inversion between tandem repeats yields misalignment closed 1 2022-06-21 2022-06-30 use minimap2 to extract mitochondrial reads from genome assembly open 0 2022-06-20 2022-06-30 Asking for #301 to be reopened closed 0…

Continue Reading Lh3 Minimap2 Issues

Ubuntu Manpage: bamfillquery – fill query sequences into BAM files

Provided by: biobambam2_2.0.179+ds-1_amd64 NAME bamfillquery – fill query sequences into BAM files SYNOPSIS bamfillquery [options] <in.bam queries.fasta >out.bam DESCRIPTION bamfillquery reads a SAM/BAM/CRAM file and a FastA file, copies the sequences found in the FastA file into the query sequence field of the SAM/BAM/CRAM file and writes the resulting data…

Continue Reading Ubuntu Manpage: bamfillquery – fill query sequences into BAM files

Ubuntu Manpage: samtools targetcut – cut fosmid regions (for fosmid pool only)

Provided by: samtools_1.13-2_amd64 NAME samtools targetcut – cut fosmid regions (for fosmid pool only) SYNOPSIS samtools targetcut [-Q minBaseQ] [-i inPenalty] [-0 em0] [-1 em1] [-2 em2] [-f ref] in.bam DESCRIPTION This command identifies target regions by examining the continuity of read depth, computes haploid consensus sequences of targets and…

Continue Reading Ubuntu Manpage: samtools targetcut – cut fosmid regions (for fosmid pool only)

Pooled shRNA Library Screening to Identify Factors that Modulate a Drug Resistance Phenotype

High-throughput RNA interference (RNAi) screening using a pool of lentiviral shRNAs can be a tool to detect therapeutically relevant synthetic lethal targets in malignancies. We provide a pooled shRNA screening approach to investigate the epigenetic effectors in acute myeloid leukemia (AML). The overall goal of the following video is to…

Continue Reading Pooled shRNA Library Screening to Identify Factors that Modulate a Drug Resistance Phenotype

extendedSequences length is not the required for DeepCpf1 (34bp)

Hi, I’m using CRISPRseek dev v. 1.35.2, installed from github (hukai916/CRISPRseek). I wanted to calculate the CFD, and the grna efficacy of a Cas12 sgRNA (my_sgrna.fa file) using Deep Cpf1. my_sgrna.fa, TTTT (PAM) + sgRNA (20bp): >sgrna1 TTTTTGTCTTTAGACTATAAGTGC Command: offTargetAnalysis(inputFilePath = “my_sgrna.fa”, format = “fasta”, header = FALSE, exportAllgRNAs =…

Continue Reading extendedSequences length is not the required for DeepCpf1 (34bp)

Screen.seqs result varying – Commands in mothur

I have a data set of 2×150 reads of 54 pairs of 16S v4 metagenomic sequences from NCBI sra of gastritis patients. When I previously ran the sequences through mothur, the screen.seqs after silva alignment removed sufficient number of sequences. mothur > screen.seqs(fasta=current, count=current, start=2, end=13426)Using Ulcer_Donors\stability.trim.contigs.count_table as input file…

Continue Reading Screen.seqs result varying – Commands in mothur

man Bio::SeqIO::fasta (3): fasta sequence input/output stream

Bio::SeqIO::fasta(3) fasta sequence input/output stream SYNOPSIS Do not use this module directly. Use it via the Bio::SeqIO class. DESCRIPTION This object can transform Bio::Seq objects to and from fasta flat file databases. FEEDBACK Mailing Lists User feedback is an integral part of the evolution of this and other Bioperl modules….

Continue Reading man Bio::SeqIO::fasta (3): fasta sequence input/output stream

clustalw and muscle in Biopython

First, try installing Biopython 1.63 from here, it may solve some of your problems. Second, make sure you’re using the latest Python from python.org – you might want to run the installer again just to ensure that none of your files are corrupted, if you’re still getting the same error…

Continue Reading clustalw and muscle in Biopython

How do I find all Sequence Lengths in a FASTA Dataset without using the Biopython

You really don’t need regular expressions for this. header = None length = 0 with open(‘file.fasta’) as fasta: for line in fasta: # Trim newline line = line.rstrip() if line.startswith(‘>’): # If we captured one before, print it now if header is not None: print(header, length) length = 0 header…

Continue Reading How do I find all Sequence Lengths in a FASTA Dataset without using the Biopython

Standard for aligning smallRNA to a reference human rRNA?

Standard for aligning smallRNA to a reference human rRNA? 0 Hi, I need to label some smallRNA sequences that I know are rRNA fragments. I know that for mRNA these are discarded by aligning to the human genome and filtering out multimapped reads, but I need to try to pin…

Continue Reading Standard for aligning smallRNA to a reference human rRNA?

Create a streamlit download_button to download a fasta file from a local Genbank file – Using Streamlit

Hi streamlit communityI’m building a streamlit app that allows the users to upload a full record genbank file and to explore its content (genes sequences, proteins sequences etc.) using biopython. Everything works perfectly except when I try to create a st.download_button() to download the hole genome sequence or a sequence…

Continue Reading Create a streamlit download_button to download a fasta file from a local Genbank file – Using Streamlit

Detailed differences between sambamba and samtools

3 month , My first post in the new student group , The false-positive mutation appears because duplicates mark Not enough ?, Tells the story of supplementary read It won’t be GATK MarkDuplicates Marked as duplicates The problem of . after , In response to this question , I began…

Continue Reading Detailed differences between sambamba and samtools

Reverse complement of fasta file

Reading records separated by > is a nice idea as it gives you the whole chunk at a time. However, here you want to process and merge lines but not the header, thus distinguishing between lines. It is clearer to read line by line. The sequence-line is specific: all caps…

Continue Reading Reverse complement of fasta file

downloading human rRNA.fasta

downloading human rRNA.fasta 1 I am trying to download human rRNA.fasta file. do you know where I can find this file? in one of the older post in this forum, someone said this file can be found on the UCSC but I did not manage. rRNA • 154 views •…

Continue Reading downloading human rRNA.fasta

BlastX through Biopython

BlastX through Biopython 0 I have an unknown gene segment in the Human_gene.txt file and I want to run blastx (translated nucleotide) using the blast module of Biopython by making the E-value threshold 0.0001 and displaying the match result of 50 residues of query and subject. I am trying this…

Continue Reading BlastX through Biopython

java – Calculating physico-chemical properties of amino acids in Biojava

I need to calculate the number and percentages of polar/non-polar, aliphatic/aromatic/heterocyclic amino acids in this protein sequence that I got from UNIPROT, using BioJava. I have found in the BioJava tutorial how to read the Fasta files and implemented this code. But I have no ideas how to solve this…

Continue Reading java – Calculating physico-chemical properties of amino acids in Biojava

Bioinformatics with basic local alignment search tool (BLAST) and fast alignment (FASTA)

Article, 2014 In: Journal of Bioinformatics and Sequence Analysis, ISSN 2141-2464, Volume 6, 1, Pages 1-6, 2014 DOI:10.5897/ijbc2013.0086 Organisations Abstract Following advances in DNA and protein sequencing, the application of computational approaches in analysing biological data has become a very important aspect of biology. Evaluating similarities between biological sequences…

Continue Reading Bioinformatics with basic local alignment search tool (BLAST) and fast alignment (FASTA)

Recent questions tagged fasta – Q&A

Most popular tags python javascript html java css reactjs c# php r sql arrays pandas c++ android jquery DataFrame python-3.x node.js c mysql list flutter JSON ios typescript sql-server swift string angular regex laravel excel django dictionary dart bash numpy postgresql loops oracle vba linux angularjs function for-loop spring spring-boot…

Continue Reading Recent questions tagged fasta – Q&A

FastQ_7 April 2022(1) – Copy.pptx – What is the FASTA format? The FASTA format is the “workhorse” of bioinformatics. It is used to represent sequence

the FASTA format is not “officially” defined – even though it carries the majority of data information onliving systems. Its origins go back to asoftware tool calledFastawritten byDavidLipman(ascientist that later became, and still is, the director of NCBI) andWilliam R. Pearsonof the University ofVirginia. The tool itself has (to some…

Continue Reading FastQ_7 April 2022(1) – Copy.pptx – What is the FASTA format? The FASTA format is the “workhorse” of bioinformatics. It is used to represent sequence

On a reference pan-genome model (Part II)

12 July 2019 I wrote a blog post on a potential reference pan-genome model. I had more thoughts in my mind. I didn’t write about them because they are immature. Nonetheless, a few readers raised questions related to my immature thoughts, so I decide to add this “Part II” as…

Continue Reading On a reference pan-genome model (Part II)

fasta MSA Sequence input/output stream

Bio::AlignIO::fasta(3) fasta MSA Sequence input/output stream SYNOPSIS Do not use this module directly. Use it via the Bio::AlignIO class. DESCRIPTION This object can transform Bio::SimpleAlign objects to and from fasta flat files. This is for the fasta alignment format, not for the FastA sequence analysis program. To process the alignments…

Continue Reading fasta MSA Sequence input/output stream

NcbiblastpCommandline alignment results are different from blast webpage

What you are trying to do is fairly simple, and you are complicating it by: 1) not providing your sequences so that someone can reproduce your attempt; 2) giving a result in a form that is impossible to read. Be honest, can you make any sense of the result you…

Continue Reading NcbiblastpCommandline alignment results are different from blast webpage

All vs All blast not self hit? Orthogroup clustering and single copy genome?

Hey guys Self hit I have this actually a bit weird question about blast. I’ve been doing some work around single copy genome construction using Reciprocal best blast hit (RBBH) method. As I have something like 100+ annotated genome, I concatenated all annotated CDS into one fasta and makeblastdb with…

Continue Reading All vs All blast not self hit? Orthogroup clustering and single copy genome?

Merge.file do not like CAP letters – mothur bugs

Hello, I ran into this problem while running mothur on a server. mothur > merge.files(input=saraCPERF.trim.contigs.unique.good.good.filter.unique .precluster.denovo.vsearch.pick.fasta-combinedphyto.good.filter.unique.precluste r.denovo.vsearch.fasta, output=combined_saraCPERF.fasta) Unable to open combinedphyto.good.filter.unique.precluster.denovo.vsearch.fasta. Trying mothur’s executable directory combinedphyto.good.filter.unique.precluste r.denovo.vsearch.fasta. Unable to open combinedphyto.good.filter.unique.precluster.denovo.vsearch.fasta. Unable to open ▒!q▒cod.filter.unique.precluster.denovo.vsearch.fasta. Trying mot hur’s executable directory ‘qod.filter.unique.precluster.denovo.vsearch.fasta. Unable to open ‘qod.filter.unique.precluster.denovo.vsearch.fasta. free(): double free detected…

Continue Reading Merge.file do not like CAP letters – mothur bugs

Ubuntu Manpage: Bio::Tools::Seg – parse “seg” output

Provided by: libbio-perl-perl_1.7.2-2_all NAME Bio::Tools::Seg – parse “seg” output SYNOPSIS use Bio::Tools::Seg; my $parser = Bio::Tools::Seg->(-file => ‘seg.fasta’); while ( my $f = $parser->next_result ) { if ($f->score < 1.5) { print $f->location->to_FTstring, ” is low complexity\n”; } } DESCRIPTION “seg” identifies low-complexity regions on a protein sequence. It is…

Continue Reading Ubuntu Manpage: Bio::Tools::Seg – parse “seg” output

LOC125105370 sterile alpha motif domain-containing protein 1-like [Lutra lutra (Eurasian river otter)] – Gene

The following sections contain reference sequences that belong to a specific genome build. Explain This section includes genomic Reference Sequences (RefSeqs) from all assemblies on which this gene is annotated, such as RefSeqs for chromosomes and scaffolds (contigs) from both reference and alternate assemblies. Model RNAs and proteins are also…

Continue Reading LOC125105370 sterile alpha motif domain-containing protein 1-like [Lutra lutra (Eurasian river otter)] – Gene

Qiime2 Exclude Seqs with FASTQ as query data.

Qiime2 Exclude Seqs with FASTQ as query data. 0 Hello, I am working with FASTQ files and I want to filter them based on the alignment with references sequences in FASTA format. I decided to use QIIME2 for this. So I imported both FASTA and FASTQ files to the required…

Continue Reading Qiime2 Exclude Seqs with FASTQ as query data.

python – How are paths meant to be denoted on for Biopython on mac?

I am trying to run a basic biopython script to rename sequences within a fasta file. I have only ever ran this on a server; i am trying to do it on my macbook but I can’t work out what the correct path to the file should be. on the…

Continue Reading python – How are paths meant to be denoted on for Biopython on mac?

How to create a subset FASTA file of proteins of interest based on UniprotKB AC / Accession Numbers –

Hello, I am looking to create a subset FASTA file from an existing FASTA file. The subset file should only include entries with certain accession numbers. I have created a BioIndexed File with the correct number of entries, but I am unsure how to use the getsubset function in this…

Continue Reading How to create a subset FASTA file of proteins of interest based on UniprotKB AC / Accession Numbers –

Issues with searching Swissprot #25

Eddykay310 Hi @cruizperez Please help me understand the problem here and how I can fix it. I have successfully generated my DBs but I get this error during analysis. The .dmnd files do not exist in the folders as the error says but I don’t know how I can generate…

Continue Reading Issues with searching Swissprot #25

segregating sites calculation fails on gapped sequences #132

Cjfields Author Name: Jason Stajich (@hyphaltip) Original Redmine Issue: 3328, redmine.open-bio.org/issues/3328 Original Date: 2012-02-17 Original Assignee: Bioperl Guts I am Cheng-Ruei Lee, a graduate student in Duke Biology. I’m analyzing many DNA alignments of a plant species. I first used (Bio::PopGen::Utilities -> aln_to_population()) to read in the fasta format alignment,…

Continue Reading segregating sites calculation fails on gapped sequences #132

Questions tagged fasta – DevDreamz

Python Javascript Linux FAQ LoginSignup PUBLIC All Questions Tags Snippets Jobs splitlinuxpythonfasta dictionaryfastqpythonfasta pythonbioinformaticsbiopythonfasta pythonblastbiopythonfasta dictionarypythonbiopythonfasta pythonbioinformaticspairwisebiopythonfasta fastapython fastajavascriptphp bioinformaticspythonbiopythonfasta bioinformaticspythonbiopythonfasta PreviousNext Recent Posts show same id one time but in column count how many times php Assign bundle or argument to ImageView in Android ValueRequiredException during RSS feed parsing…

Continue Reading Questions tagged fasta – DevDreamz

Using Rsubread buildindex with GRCh37.p13.genome.fa.gz gives me an error

Using Rsubread buildindex with GRCh37.p13.genome.fa.gz gives me an error 0 @efernandez-22025 Last seen 1 day ago Argentina Hi I am triying to build the human index using ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_19/GRCh37.p13.genome.fa.gz I am using Rsubread 2.4.3 an it gives me the following error //================================= Running ==================================\ || || || Check the integrity of…

Continue Reading Using Rsubread buildindex with GRCh37.p13.genome.fa.gz gives me an error

ClustalW on Ubuntu – DevDreamz

The section is copied from the BioPython documentation. >>> from Bio.Align.Applications import ClustalwCommandline>>> cline = ClustalwCommandline(“clustalw2″, infile=”opuntia.fasta”)>>> print(cline) clustalw2 -infile=opuntia.fasta If you run from Bio.Align.Applications import ClustalwCommandline cline = ClustalwCommandline(“clustalw2″, infile=”opuntia.fasta”) print(cline) it will do 3 things Import ClustalwCommandline module from BioPython Create a ClustalwCommandline object Print the object’s string…

Continue Reading ClustalW on Ubuntu – DevDreamz

Append assembly accession to nucleotide accession number in RefSeq Genbank file

Append assembly accession to nucleotide accession number in RefSeq Genbank file 0 Hi everyone, When I want to append the filename to the contig header in a multi-fasta file, I usually use for F in *.fasta; do N=$(basename $F .fasta) ; bbrename.sh in=$F out=${N}_mod.fasta prefix=$F addprefix=t ; done However, this…

Continue Reading Append assembly accession to nucleotide accession number in RefSeq Genbank file

biopython – How can i write only a specific elements of the sequences, that i downloaded using Entrez.efetch, to the file( id and sequence itself)

I’m still a begginer at this. I downloaded 20 sequences from NCBI and my task is to allign them with themselves, but I need to separate the data, that I got using Entrez.efetch, so I could use it for allignment and I couldnt write the only specific elements (id and…

Continue Reading biopython – How can i write only a specific elements of the sequences, that i downloaded using Entrez.efetch, to the file( id and sequence itself)

MitotoolPy , shown no error but no results

MitotoolPy , shown no error but no results 0 Has anyone used MitoToolpy (www.mitotool.org/mp.html), a python script related to mitochondrial haplogroup classification? The official documentation claim that it only takes 50 seconds for one fasta file to get the result, but I haven’t gotten the result after running for one…

Continue Reading MitotoolPy , shown no error but no results

Creating local nt blast database : bioinformatics

Hi all, I’m trying to create a local nt blast database, my eventual goal is to create a subset based on a taxanomic group to be used on a cluster with limited storage space, its seems the only way to do this though is to start with the whole database…

Continue Reading Creating local nt blast database : bioinformatics

how to build index for cdna?

Hello, I can build index for Mus_musculus.GRCm38.dna_sm.toplevel.fa, but when build for Mus_musculus.GRCm38.cdna.all.fa, there is a bug: “rsem-extract-reference-transcripts Mus_musculus.GRCm38.cdna.all.fa 0 Mus_musculus.GRCm38.cdna.all.fa.gtf None 0 Mus_musculus.GRCm38.cdna.all” failed! Plase check if you provide correct parameters/options for the pipeline! Traceback (most recent call last): File “../indrops.py”, line 1770, in project.build_transcriptome(args.genome_fasta_gz, args.ensembl_gtf_gz, mode=args.mode) File “../indrops.py”, line…

Continue Reading how to build index for cdna?

moshi4/ANIclustermap: A tool for drawing ANI clustermap between all-vs-all microbial genomes using fastANI & seaborn

GitHub – moshi4/ANIclustermap: A tool for drawing ANI clustermap between all-vs-all microbial genomes using fastANI & seaborn This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository. You can’t perform that action at this time. You signed in with…

Continue Reading moshi4/ANIclustermap: A tool for drawing ANI clustermap between all-vs-all microbial genomes using fastANI & seaborn

FastANI – BioGrids Consortium – Supported Software

AllHigh-Throughput SequencingGenomicsProteomicsVisualizationOther FastANI Description developed for fast alignment-free computation of whole-genome Average Nucleotide Identity (ANI). Installation Use the following command to install this title with the CLI client: $ biogrids-cli install fastani Copy to clipboard Primary Citation* C. Jain, L. M. Rodriguez-R, A. M. Phillippy, K. T. Konstantinidis, and S….

Continue Reading FastANI – BioGrids Consortium – Supported Software

What is ClustalW? Tutorial of How to Use ClustalW

Share Tweet Share Share Email ClustalW is a computer tool of significant importance in bioinformatics. Primarily, biologists and statisticians used it for multiple sequence alignment. Many versions of ClustalW over the development of the algorithm are available now. How to perform a search on ClustalW? ClustalW homepage 1. Go to…

Continue Reading What is ClustalW? Tutorial of How to Use ClustalW

Convert bedGraph to Homer tag directory?

Convert bedGraph to Homer tag directory? 0 Hi, I am new to ChIP-seq analysis. When taking published data in .bedGraph format (generated by Homer), is there any way to convert back to Homer tag directory? (other than aligning from the raw .fasta). I suppose extracting columns into .bed format and…

Continue Reading Convert bedGraph to Homer tag directory?

Using salmon in Galaxy

Hi everyone. I am executing Salmon in Galaxy in order to carry out gene quantification from mouse RNA-Seq data (6 samples). To do so, I am providing a reference genome (cDNA, in fasta format), the processed reads (in fastqsanger.gz format) of one of these samples (after executing Trim-Galore) and a…

Continue Reading Using salmon in Galaxy

Trimmomatic/ linux system

Trimmomatic/ linux system 1 Hi all, I am trying to remove adapters and clean my RNA-seq.gz files using Trimmomatic, loaded on a Linux system (supercomputer server) Following the steps for Pair ends reads, explained in the manual (www.usadellab.org/cms/?page=trimmomatic) java -jar trimmomatic-0.39.jar PE input_forward.fq.gz input_reverse.fq.gz output_forward_paired.fq.gz output_forward_unpaired.fq.gz output_reverse_paired.fq.gz output_reverse_unpaired.fq.gz ILLUMINACLIP:TruSeq3-PE.fa:2:30:10:2:True LEADING:3…

Continue Reading Trimmomatic/ linux system

bioinformatics – how to replace seqIDs in a fasta file with new seqIDs using biopython

I have a fasta file that reads like so: >00009c1cc42953fb4702f6331325c7cc TACGGAGGATGCGAGCGTTATCCGGATTTATTGGGTTTAAAGGGTGCGTAGGCGGGTTGTTAAGTCAGTGGTGAAATCGTGTGGCTCAACCATACGGAGCCATTGAAACTGGCGACCTTGAGTGTAAACGAGGTAGGCGGAATGTGACGTGTAGCGGTGAAATGCTTAGATATGTCACAGAACCCCGATTGCGAAGGCAGCTTACCAGCATACAACTGAC >000118a5e731455e942c61a82a40367a623088d0 AGAGTTTTATCCTGGCTCAGGATGAACGCTAGCGGCAGGCCTAATACATGCAAGTCGGACGGGATCTAAATTTAAGCTTGCTTAAGTTTAGTGAGAGTGGCGCACGGGTGCGTAACGCGTGAGCAACCTACCCATATCAGGGGGATAGCCCGAAGAAATTCGGATTAACACCGCATAACACAGCAATCTCGCATGAGATCACTGTTAAATATTTATAGGATATGGATGGGCTCGCGTGACATTAGCTAGTTGGTAAGGTAACGGCTTACCAAGGCAACGATGTCTAGGGGCTCTGAGAGGAGAATCCCCCACACTGGTACTGAGACACGGACCAGACTCCTACGGGAGGCAGCAGTAAGGATTATTGGTCAATGGAGGGAACTCTGAACCAGCCATGCCGCGTGCAGGATGACTGCCCTATGGGTTGTAAACTGCTTTTGTCTGGGAATAAACCTTGATTCGTGAATCAAGCTGAATGTACCAGAAGAATAAGGATCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGGATCCGAGCGTTATCCGGATTTATTGGGTTTAAAGGGTGCGTAGGCGGCTTTATAAGTCAGAGGTGAAAGACGGCAGCTTAACTGTCGCAGTGCCTTTGATACTGTATAGCTTGAATATCGTTGAAGATGGCGGAATGAGACAAGTAGCGGTGAAATGCATAGATATGTCTCAGAACTCCGATTGCGAAGGCAGCTGTCTAAGCGGCAATTGACGCTGATGCACGAAAGCGTGGGGATCAAACAGGATTAGATACCCTGGTAGTCCACGCCCTAAACGATGATAACTGGATGTTGGCGATACACAGTCAGCGTCTTAGCGAAAGCGTTAAGTTATCCACCTGGGGAGTACGCCCGCAAGGGTGAAACTCAAAGGAATTGACGGGGGCCCGCACAAGCGGAGGAGCATGTGGTTTAATTCGATGATACGCGAGGAACCTTACCCGGGCTTGAAAGTTAGTGAATGCGACAGAGACGTCTCAGTCCTTCGGGACACGAAACTAGGTGCTGCATGGCTGTCGTCAGCTCGTGCCGTGAGGTGTTGGGTTAAGTCCCGCAACGAGCGCAACCCCTATGTTTAGTTGCCAGCATGTAATGATGGGGACTCTAAACAGACTGCCTGCGTAAGCAGCGAGGAAGGTGGGGACGACGTCAAGTCATCATGGCCCTTACGTCCGGGGCTACACACGTGCTACAATGGATGGTACAGCGGGCAGCTACACAGCAATGTGATGCTAATCTCTAAAAGCCATTCACAGTTCGGATAGGGGTCTGCAACTCGACCCCATGAAGTTGGATTCGCTAGTAATCGCGTATCAGCAATGACGCGGT And I want to basically add microbial taxonomy to the seq IDs like so: d__Bacteria; p__Bacteroidota; c__Bacteroidia; o__Bacteroidales; f__Bacteroidales_RF16_group; g__Bacteroidales_RF16_group; s__uncultured_bacterium|00009c1cc42953fb4702f6331325c7cc d__Bacteria; p__Bacteroidota; c__Bacteroidia; o__Sphingobacteriales; f__Sphingobacteriaceae; g__Sphingobacterium; s__uncultured_bacterium|000118a5e731455e942c61a82a40367a623088d0 Where the original seqID is appended to the taxonomy…

Continue Reading bioinformatics – how to replace seqIDs in a fasta file with new seqIDs using biopython

Optimize a script that extract features from Fasta file using biopython

Hey, I have a script that extract features from a large fasta file (1767 MB) using biopython. I am sending it as a bash job via ssh remote server. The job is running for two days now.. Is there a way to optimize my script? I think maybe the problem…

Continue Reading Optimize a script that extract features from Fasta file using biopython

subsample fasta to certain size

subsample fasta to certain size 1 Hi there, Can anyone suggest a tool or method to extract random 10GB reads with minimum read length of (1000bp) from a huge 100 Gb file. I have 50 different fa.gz files with varying size (20 -100GB) and I like to subsample fasta with…

Continue Reading subsample fasta to certain size

“No such file or directory: ‘test.xml”

Biopython NcbiblastpCommandline not working: “No such file or directory: ‘test.xml” 0 from Bio.Blast.Applications import NcbiblastpCommandline blastp=r”C:\NCBI\blast-BLAST_VERSION+\bin\blastp.exe” blastp_cline = NcbiblastpCommandline(blastp, query=r”C:/NCBI/blast-BLAST_VERSION+/bin/test.fasta”, db=r’C:/NCBI/blast-BLAST_VERSION+/bin/bos_protein.fasta’, outfmt=5, evalue=0.00001, out=r”C:/NCBI/blast-BLAST_VERSION+/bin/test.XML”) blastp_cline from Bio.Blast import NCBIXML with open(“test.XML”) as result_handle: E_VALUE_THRESH=0.01 blast_records = NCBIXML.parse(result_handle) blast_record = NCBIXML.read(result_handle) for alignment in blast_record.alignments: for hsp in alignment.hsps: if hsp.expect…

Continue Reading “No such file or directory: ‘test.xml”

How to check Fasta file ASCII characters and fix encoding errors?

How to check Fasta file ASCII characters and fix encoding errors? 0 I tried building a diamond database but got this error. Error: Error reading input stream at line 180825: Invalid character (ASCII 0) in sequence How can I fix it? Is there a tool that checks for this and…

Continue Reading How to check Fasta file ASCII characters and fix encoding errors?

Low transcript quantification with Salmon using GRCm39 annotations

Hi everyone, first time working with mouse samples and unfortunately, there are fewer resources available for the latest mouse Ensembl genome than I was expecting. What I’ve done: I performed rRNA depletion on total RNA extracted from mouse tissue and created Illumina libraries using a cDNA synthesis kit with random…

Continue Reading Low transcript quantification with Salmon using GRCm39 annotations

Feature count is very low using htseq-count

Feature count is very low using htseq-count 0 Hello all, I performed bbmap on my RNA-seq paired sequence data using following cmd bbmap.sh in1=J2_R1.fastq in2=J2_R2.fastq out=output_J2.sam ref=im4.fasta nodisk The header of generated sam file is @HD VN:1.4 SO:unsorted @SQ SN:k141_1006 LN:2503 @SQ SN:k141_5512 LN:5393 @SQ SN:k141_4772 LN:4387 @SQ SN:k141_3267 LN:4531…

Continue Reading Feature count is very low using htseq-count

Minimap2 options for Nanopore cDNA direct seq

Minimap2 options for Nanopore cDNA direct seq 0 Hello, I’m working with ONT RNA seq data and I used the cDNA direct seq to do the seq. I want to look for long deletions in mRNAs that are not spliced, for this, I want to use the splice option of…

Continue Reading Minimap2 options for Nanopore cDNA direct seq

Search for specific motif in MEME analysis

Search for specific motif in MEME analysis 1 Hello! I am looking into using the MEME suite to answer some questions about VDR motifs in L1 genes. I am able to use MEME to search for motifs in my fasta data with the web-based tool, where the command would look…

Continue Reading Search for specific motif in MEME analysis

BBTools – BioGrids Consortium – Supported Software

AllHigh-Throughput SequencingGenomicsProteomicsVisualizationOther BBTools Description a suite of fast, multithreaded bioinformatics tools designed for analysis of DNA and RNA sequence data. BBTools can handle common sequencing file formats such as fastq, fasta, sam, scarf, fasta+qual, compressed or raw, with autodetection of quality encoding and interleaving. Installation Use the following command to…

Continue Reading BBTools – BioGrids Consortium – Supported Software

need to add unique ids with accession number in multiple fasta refseq files

need to add unique ids with accession number in multiple fasta refseq files 0 i need to add my unique ids (that i have created) to accession numbers in fasta files. the unique set of ids are given in a csv file with column1 having unique ids, column2 having fasta…

Continue Reading need to add unique ids with accession number in multiple fasta refseq files

transcriptome – How to combine multiple .fasta files of primary assembly from Ensembl into one for sequence alignment?

I have some marmoset snRNA reads that I want to align with the reference transcriptome using cellranger. The primary assembly for marmoset is available here, which is broken down into 22 parts. However, cellranger mkref only accepts one .fa file to generate the transcriptome. I tried concatentaing all the extracted…

Continue Reading transcriptome – How to combine multiple .fasta files of primary assembly from Ensembl into one for sequence alignment?

How to extract fasta sequences from assembled transcripts generated by Stringtie

How to extract fasta sequences from assembled transcripts generated by Stringtie 4 Hi all, I used STAR and stringtie for mapping reads to reference genome and assembly. As you know, the generated assembled transcripts by stringtie are in gtf format. Now, I want to have fasta sequence of assembled transcript….

Continue Reading How to extract fasta sequences from assembled transcripts generated by Stringtie

biopython – How to blastp with fasta file that contains ~50 sequences

I’m trying to blastp multiple aminoacids sequences using biopython. I just can’t seem to get it right and i cant figure out the handbook for how to do this. I have come up with the following: open(“proteins_PROT.fasta”,”r”) from Bio.Blast.Applications import NcbiblastpCommandline cline = NcbiblastpCommandline(query=”proteins_PROT.fasta”, db=”nr”, evalue=0.001, remote=True, ungapped=True) NcbiblastpCommandline(cmd=’blastp’, query=”proteins_PROT.fasta”,…

Continue Reading biopython – How to blastp with fasta file that contains ~50 sequences

bedtools sample with fastq input and fewer input records than requested

I’m using bedtools sample to sample reads from fastq files. I’d like to submit two feature requests: If the number of requested records is larger than the input I get ERROR: Input file has fewer records than the requested number of output records. I guess this is intentional and not…

Continue Reading bedtools sample with fastq input and fewer input records than requested

UMD Genome group

An email was successfully sent. MaSuRCA assembler MaSuRCA is whole genome assembly software. It combines the efficiency of the de Bruijn graph and Overlap-Layout-Consensus (OLC) approaches. MaSuRCA can assemble data sets containing only short reads from Illumina sequencing or a mixture of short reads and long reads (Sanger, 454, Pacbio and…

Continue Reading UMD Genome group

nf-core/circrna

circRNA quantification, differential expression analysis and miRNA target prediction of RNA-Seq data Introduction nf-core/circrna is a best-practice analysis pipeline for the quantification, miRNA target prediction and differential expression analysis of circular RNAs in paired-end RNA sequencing data. The pipeline is built using Nextflow, a workflow tool to run tasks across…

Continue Reading nf-core/circrna

Using AnnoTree to Get More Assignments, Faster, in DIAMOND+MEGAN Microbiome Analysis

INTRODUCTION Next-generation sequencing (NGS) has revolutionized many areas of biological research (1, 2), providing ever-more data at an ever-decreasing cost. One such area is microbiome research, the study of microbes in their theater of activity using metagenomic sequencing (3). Here, deep short-read sequencing, and improving performance of long-read sequencing, are…

Continue Reading Using AnnoTree to Get More Assignments, Faster, in DIAMOND+MEGAN Microbiome Analysis

Clustal Processing Massive Dataset

Hello wonderful beings of bioinformatics! I’m new to this world and could use some help. My job is to run multiple sequence alignment on a large dataset. I am looking into the L1 family of genes and wanting to compare 7,525 elements of full length sequences. Each sequence is ~6,000…

Continue Reading Clustal Processing Massive Dataset

Butterfly eyespots evolved via cooption of an ancestral gene-regulatory network that also patterns antennae, legs, and wings

Although the hypothesis of gene-regulatory network (GRN) cooption is a plausible model to explain the origin of morphological novelties (1), there has been limited empirical evidence to show that this mechanism led to the origin of any novel trait. Several hypotheses have been proposed for the origin of butterfly eyespots,…

Continue Reading Butterfly eyespots evolved via cooption of an ancestral gene-regulatory network that also patterns antennae, legs, and wings

Fasta File Python

Fasta File Python 2 How do I go about extracting elements from a fasta file. For example, if I want a list of all the IDS and then length of a sequence in another list how do I do that in base python without using any libraries? for line in…

Continue Reading Fasta File Python

Processing two lists of files with snakemake

I want to use snakemake to do bowtie2 mapping of split read files to a reference genome, and I’d like that rule to be integrated in the general workflow. For that purpose, I first defined a rule to create a bowtie index rule build_bowtie_index: input: referenceGenomeFasta output: expand(“{name}.{index}.bt2”, index=range(1,5), name…

Continue Reading Processing two lists of files with snakemake

Correction to: FASTA/Q data compressors for MapReduce-Hadoop genomics: space and time savings made easy | BMC Bioinformatics

Following publication of the original article [1], the authors identified that the affiliations of Giuseppe Cattaneo and Raffaele Giancarlo were interchanged. The correct affiliations are given below. The correct affiliation of Giuseppe Cattaneo is: 2Dipartimento di Informatica, Università di Salerno, Fisciano, Italy. The correct affiliation of Raffaele Giancarlo is: 3Dipartimento…

Continue Reading Correction to: FASTA/Q data compressors for MapReduce-Hadoop genomics: space and time savings made easy | BMC Bioinformatics

r – Displaying BIC and AICs from MSA nucleotide multisequence fasta file

I am trying to replicate MEGA (models/find best DNA protein models) using R. After reading R AICs and BIC documentation I can’t understand how I can implement it. How can I implement AICs and BICs without having to especify the number of sequences in the fasta file (in case that…

Continue Reading r – Displaying BIC and AICs from MSA nucleotide multisequence fasta file

java – GATK: HaplotypceCaller IntelPairHmm only detecting 1 thread

I can’t seem to get GATK to recognise the number of available threads. I am running GATK (4.2.4.1) in a conda environment which is part of a nextflow (v20.10.0) pipeline I’m writing. For whatever reason, I cannot get GATK to see there is more than one thread. I’ve tried different…

Continue Reading java – GATK: HaplotypceCaller IntelPairHmm only detecting 1 thread