Category: FASTA

Aligning One Protein Sequence With A Multiple Sequence Alignment

Aligning One Protein Sequence With A Multiple Sequence Alignment 5 Given one protein sequence and a multiple sequence alignment(MSA) of a set of proteins, I want to align the protein sequence with that MSA with out changing the MSA. Do you know any tool that is cable of doing this?…

Continue Reading Aligning One Protein Sequence With A Multiple Sequence Alignment

KOBAS on Galaxy local

KOBAS on Galaxy local 0 Hello, I’m trying to setting up Galaxy on my PC. I need to use the tool “KOBAS annotate” & “KOBAS identify”. I would need help in setting up the field “BLAST protein database” since I don’t understand how to configure it. I tried to download…

Continue Reading KOBAS on Galaxy local

how to map Pacbio CCS fastq

how to map Pacbio CCS fastq 1 I have a Pacbio CCS fastq like this I want to map to genome, and this is my command and out. I want to know how to solve it. Is this fastq correct? Thanks minimap2 Pacbio • 25 views It might pay to…

Continue Reading how to map Pacbio CCS fastq

Extracting named fasta sequences according to list with Biopython

Extracting named fasta sequences according to list with Biopython 0 Hi all, I’m trying to work out a quick script to extract a set of sequence fasta files from a multifasta and write them all to a new, single fasta file. To elaborate, I’ve got a proteome, and I want…

Continue Reading Extracting named fasta sequences according to list with Biopython

Code for Rstudio with TFBSTools package searching for trascription factors in fasta format DNA sequence

Code for Rstudio with TFBSTools package searching for trascription factors in fasta format DNA sequence 0 Hello, i want to make a code for RStudio in order to find what transcription factors found in my sequences. I have DNA sequences at FASTA format with 40 noucleotides. First i want to…

Continue Reading Code for Rstudio with TFBSTools package searching for trascription factors in fasta format DNA sequence

FASTQ to VCF pipeline question

FASTQ to VCF pipeline question 0 Hello all, I am new with programming within bioinformatics and long story short, I’m practicing writing pipeline scripts starting with the fastq to VCF pipeline. I am basically at the point where I went from fastq to sorted-bam files, and as I went to…

Continue Reading FASTQ to VCF pipeline question

Change names of all sequences in a MSA

Change names of all sequences in a MSA 1 I have a multiple sequence alignment where the names for each sequences are as follows >gi|AF266048.1|taxonid|126164|organism|Macromia splendens|seqid|AF266048.1|description|Macromia splendens small subunit ribosomal RNA gene partial sequence; mitochondrial gene for mitochondrial product What I’d like is to have >Macromia_splendens. I’m not sure how…

Continue Reading Change names of all sequences in a MSA

How do I get description in BLAST?

How do I get description in BLAST? 0 Hi, all. I’m trying to get a description to the gene list using BLAST. I created the original database with the following command and did BLAST, but my output shows a lot of N/A. $ makeblastdb -in caenorhabditis_elegans.PRJNA13758.WBPS16.protein.fa -out elegansdb -dbtype prot…

Continue Reading How do I get description in BLAST?

How to call variant minimum 3 read coverage to make consensus?

How to call variant minimum 3 read coverage to make consensus? 0 I have a query regarding consensus sequence assembly where reference bases are replaced with variants with a minimum of 3 read depths, using bcftools using the below command. bcftools mpileup -f ref.fasta mapped.bam | bcftools call -c |…

Continue Reading How to call variant minimum 3 read coverage to make consensus?

EDGE-pro paired end read input

Hi, I am running EDGE-pro for prokaryotic RNA seq analysis for differential gene expression. ccb.jhu.edu/software/EDGE-pro/ I have paired end read data. The manual states ( ccb.jhu.edu/software/EDGE-pro/MANUAL ) // *MANDATORY FILES: -g genome: fasta file containing bacterial genome. If multiple chromosomes/plasmids exist, they must be combined into one file before running…

Continue Reading EDGE-pro paired end read input

How to cluster existing multiple sequence alignments to identify homologous clusters

How to cluster existing multiple sequence alignments to identify homologous clusters 0 I have a number of existing multiple sequence nucleotide alignments from closely related taxa (two clades which are sisters), and need to align these alignments for analysis. Some are homologous and some not. I think the best way…

Continue Reading How to cluster existing multiple sequence alignments to identify homologous clusters

What is bigwig file?

Asked by: Vada Ratke Score: 4.7/5 (25 votes) BigWig is a file format for display of dense, continuous data in a genome browser track, created by conversion from Wiggle (WIG) format. BigWig format is described at the UCSC Genome Bioinformatics web site, and the Broad Institute file format guide provides…

Continue Reading What is bigwig file?

What files (fasta, GTF) do I need for RNA seq analysis

What files (fasta, GTF) do I need for RNA seq analysis 1 I am very new to programming in general, and I’m trying my best to teach myself R for analyzing RNA-seq data we have. I am using this guide and have gotten to the step where I need to…

Continue Reading What files (fasta, GTF) do I need for RNA seq analysis

Download FASTA sequences for known viral reference genomes

Take a look at this report file for viral genomes. I would only need DNA based viruses, and ones that infect humans You can filter/parse out entries you need from it. Then download the genome sequence using EntrezDirect: $ efetch -db nuccore -id NC_030449.1 -format fasta >NC_030449.1 Unidentified circular ssDNA…

Continue Reading Download FASTA sequences for known viral reference genomes

Use of aligners (preferably STAR) for read-barcode matching

Use of aligners (preferably STAR) for read-barcode matching 0 My goal is to match a custom single cell type library with unknown barcode locations across reads but known sequences: I have made a ‘genome’ of ~70nt of each ‘read’ (17mX70nt contigs) and have aligned putative barcodes (~70kX32nt) against this. I…

Continue Reading Use of aligners (preferably STAR) for read-barcode matching

Rsubread FeatureCounts return 0.0% assigned

Using featureCounts in the Rsubread package I am getting 0 annotations. I started from raw sequencing data and the Refseq genome and Refseq Genomic GTF files downloaded from here: www.ncbi.nlm.nih.gov/assembly/GCF_000001635.27/ through the download assembly button on the side. I had the top option to RefSeq for both downloads and chose…

Continue Reading Rsubread FeatureCounts return 0.0% assigned

Error in fetching the Refseq using Taxonomic ID

Error in fetching the Refseq using Taxonomic ID 1 I have been trying to extract the reference sequences for the list of taxonomic IDs I have like: Taxon ID 1438843 1421962 1324283 1422107 So, for 1438843, the reference sequence is NC_000962.3 and I need to download this particular reference sequence…

Continue Reading Error in fetching the Refseq using Taxonomic ID

mapping long-reads to a reference library

mapping long-reads to a reference library 1 Hi, I have long, pacbio, reads and I have a reference library of only repeats, I want to map the long reads on the repeats library using bwa mem, is this command correct? bwa index mmm.pacbio.fastq.gz bwa mem mmm.pacbio.fastq.gz repeat-library.fasta | samtools sort…

Continue Reading mapping long-reads to a reference library

Extract sequences from a fasta file with specific nucleotide repetition

Extract sequences from a fasta file with specific nucleotide repetition 2 I have a fasta file name seqs.fa with multiple sequences i.e., >Seq1 GATAGAT**ATC**GAATG**ATC** >Seq2 GATGATAG**ATC**GATGC I want grep/extract only those sequences having ATC repeated exactly 2 times like in Seq1. How we can use grep/sed or {} method for…

Continue Reading Extract sequences from a fasta file with specific nucleotide repetition

GATK multiple files run error /usr/bin/bash: gatk: command not found

GATK multiple files run error /usr/bin/bash: gatk: command not found 0 Hello I’m using ls *.sorted_markduplicates.bam | parallel –progress –eta -j 3 ‘gatk BaseRecalibrator -I {} -R ../0.Reference/CH-PICR.fasta -O {.}.recal.bam’ to run multiple Bam files. But an error has occurred like this: Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete…

Continue Reading GATK multiple files run error /usr/bin/bash: gatk: command not found

Aspera: Failed to authenticate

Aspera: Failed to authenticate 0 I tried to download some fasta files from ENA today with following code: cat fq.txt |while read id; do ascp -QT -l 300m -P33001 -k 1 -v -i /home/tomas/.aspera/connect/etc/asperaweb_id_dsa.openssh era-fasp@${id} . ; done One example line in my fq.txt: fasp.sra.ebi.ac.uk:/vol1/fastq/ERR249/001/ERR2497991/ERR2497991_1.fastq.gz But it comes with ascp:…

Continue Reading Aspera: Failed to authenticate

adapter trimming using trimmomatic

adapter trimming using trimmomatic 1 Hi All, I ran fastqc on my chipseq dataset and a redflag is raised for overrepresented sequences. 29% of this sequence- GATCGGAAGAGCACACGTCTGAACTCCAGTCACACA (possible source- Trueseq adapter) I ran trimmomatic for both Trueseq2 and Trueseq3 but both don’t seem to trim anything. Any suggestions? Thanks, Ritu…

Continue Reading adapter trimming using trimmomatic

Please advise a tutorial/course on genetic data analysis

Please advise a tutorial/course on genetic data analysis 1 Hello everyone! I’m by no means a bioinformaticist, but would like to learn some art (my background is chemistry/computer science/machine learning, I do ML-supported drug design). I would like to analyse human genetic data. Specifically, the task is as follows: given…

Continue Reading Please advise a tutorial/course on genetic data analysis

Bioinformatics Analyst – New York

POSITION RESPONSIBILITIES: The person will:Utilize existing pipelines to process and analyze high-throughput sequencing data, including bisulfite sequencing data.Manage, organize all bioinformatics sequencing data in the lab. Including papillomavirus sequence, microbiome data, both 16S and other, and human genomic dataConstruct phylogenic treesThe individual will be responsible for downloading large Fastq, BAM…

Continue Reading Bioinformatics Analyst – New York

How to merge exons based on gene ids?

Since this is still unanswered, I’ll post an R solution until someone is kind enough to answer with the requested biopython solution. Read the fasta file in. library(“Biostrings”) library(“tidyverse”) fasta <- readDNAStringSet(“test.fasta”) You’ll end up with a Biostrings object of the fasta file. > fasta DNAStringSet object of length 7:…

Continue Reading How to merge exons based on gene ids?

randomreads.sh adding abundances for metagenomic like distribution

randomreads.sh adding abundances for metagenomic like distribution 0 Hi, I have 9 genomes, I would like to produce a metagenome like distribution using randomreads.sh. I concatenated genome fasta files in one reference file. Then, ran as below. ../bbmap/randomreads.sh ref=simplified_catgenome.fasta out1=20M.read1.fastq out2=20M.read2.fastq length=125 paired=t metagenome=t genome=9 reads=20000000 However, I would like…

Continue Reading randomreads.sh adding abundances for metagenomic like distribution

VCF to fasta incorporating heterozygous sites

VCF to fasta incorporating heterozygous sites 0 Hello, I am trying to generate a consensus fasta file for one sample from an unphased VCF. I have been using bcftools consensus, which works well, but I am running into problems with treating the heterozygous sites. I am not able to adequately…

Continue Reading VCF to fasta incorporating heterozygous sites

sam2tsv listing incorrect reference sequence & positions

Duplicate of: github.com/lindenb/jvarkit/issues/190 Hi can anyone help me resolve the issue I’m having with sam2tsv. It is a nifty piece of software but I have been encountering issues with it regarding the numbering of nucleotides it shows for the reference sequence. Here’s what sam2tsv tells me: The nucleotide string marked…

Continue Reading sam2tsv listing incorrect reference sequence & positions

Mean Length Of Fasta Sequences

Erlang special golfing 213 chars version: -module(g). -export([s/0]). s()->open_port({fd,0,1},[in,binary,{line,256}]),r(0,0),halt(). r(C,L)->receive{_,{_,{_,<<$>:8,_/binary>>}}}->r(C+1,L);{_,{_,{_,Line}}}->r(C,L+size(Line));_->io:format(“~p~n”,[L/C])end. Readable but reliable version: -module(g). -export([s/0]). s()-> P = open_port({fd, 0, 1}, [in, binary, {line, 256}]), r(P, 0, 0), halt(). r(P, C, L) -> receive {P, {data, {eol, <<$>:8, _/binary>>}}} -> r(P, C+1, L); {P, {data, {eol, Line}}} -> r(P,…

Continue Reading Mean Length Of Fasta Sequences

BLAST Results, location on genome

BLAST Results, location on genome 0 Hi, I’m visualizing a genome, and i have to be able to get from a blast search result to the location of the hit. So im running a BLAST instance locally, and I already generated a BLAST db. Searching against this db works, but…

Continue Reading BLAST Results, location on genome

NGSeq/DHPGIndex: This tool is for compressing and indexing pan-genomes and genome sequence collections for scalable sequence and read alignment purposes.

General This tool is for compressing and indexing pan-genomes and genome sequence collections for scalable sequence and read alignment purposes. The pipeline can be deployed in cloud computing environment or in dedicated computing cluster. The tool extends the CHIC aligner gitlab.com/dvalenzu/CHIC with distributed and scalable features. DHPGIndex have been tested…

Continue Reading NGSeq/DHPGIndex: This tool is for compressing and indexing pan-genomes and genome sequence collections for scalable sequence and read alignment purposes.

Error with indexing! STAR unable to access genome file SOS

Error with indexing! STAR unable to access genome file SOS 0 I keep receiving the error: EXITING because of INPUT ERROR: could not open genomeFastaFile: – Does anyone know why this could be? The code I run: STAR –runMode genomeGenerate –genomeDir /scratch/e51/trial2/align/sequence/STARindex –genomeFastaFiles /scratch/e51/trial2/align/sequence/genome/GRCm39.ens.fa –runThreadN 12 I provide the full…

Continue Reading Error with indexing! STAR unable to access genome file SOS

error in metagenemark

error in metagenemark 0 I have install the metagenemark on linux64, and I have done cp gm_key ~/.gm_key. It still have a error when run the shell. And I search online didn’t find the similar situation. The shell codes is: gmhmmp -m MetaGeneMark_v1.mod -a -k -f G -o gene.gff ./04_metaSPAdes/contigs.fasta…

Continue Reading error in metagenemark

How to assemble read with a minimum 2 coverage per site

How to assemble read with a minimum 2 coverage per site 0 Hi, I have a query regarding read assembly I have a bam file I made a consensus sequence but I want to make a consensus sequence with a minimum of 2 coverage per site instead of full coverage…

Continue Reading How to assemble read with a minimum 2 coverage per site

I’m running hhblits and ran into segmentation fault error. I’m not quite sure what is causing the issue. Can someone help me?

I’m running hhblits and ran into segmentation fault error. I’m not quite sure what is causing the issue. Can someone help me? 0 I’m running hhblits and ran into segmentation fault error. I’m not quite sure what is causing the issue. Can someone help me? sh hhblits -i input.fasta -o…

Continue Reading I’m running hhblits and ran into segmentation fault error. I’m not quite sure what is causing the issue. Can someone help me?

A Tool for Rapid Sequence Comparison

MinHash Sketch is a method of rapidly comparing large strings or sets. In genomics, you can use it like this: 1) Gather all the kmers in a genome. 2) Apply a hash function to them. 3) Keep the 10000 smallest hashcodes and call this set a “sketch”. If you do…

Continue Reading A Tool for Rapid Sequence Comparison

Twist Bioscience Staff Bioinformatics Engineer, Biopharma

Twist Biopharma is seeking a Bioinformatics Engineer to develop and integrate workflows, analyses, and computational tools involved in the production and research of antibodies and proteins. While you have a broad interest in biotech and related scientific technologies, you also understand that computer science resources must be utilized to reach…

Continue Reading Twist Bioscience Staff Bioinformatics Engineer, Biopharma

Help with TMHMM

Help with TMHMM 0 Hi people! I’m trying install TMHMM in my computer (ubuntu 18.04), but I have some error when run tmhmm this error: (base) iim-unah@iimunah-Precision-7820-Tower:~/tools/tmhmm/bin$ ./tmhmm seq.fasta syntax error at ./tmhmm line 13, near “$opt_basedir:” Unknown regexp modifier “/t” at ./tmhmm line 13, at end of line Unknown…

Continue Reading Help with TMHMM

How to quantify piRNAs ?

How to quantify piRNAs ? 2 Hi I’m trying to create a piRNA count table from samples enriched with small RNAs (~31 bases) using the piRBase reference. Despite all my readings I haven’t find a simple way to do that. In the first place I tried to quantifiy piRNAs with…

Continue Reading How to quantify piRNAs ?

kallisto genomebam not showing reads on igv

Hello! I am trying to produce bam files to load to igv after kallisto quant with –genobam option. After producing and loading the pseudoalignment bam to the igv, it is empty. This is my initial command: kallisto quant -i Homo_sapiens.GRCh38.cdna.all.release-100.idx -o pseudo -t 10 –genomebam -g Homo_sapiens.GRCh38.100.gtf -c hg38.chrom.sizes R1.fastq.gz.trim_1.fq.gz…

Continue Reading kallisto genomebam not showing reads on igv

command not found, in IMPUTE2

Edit June 7, 2020: The code below is for phased imputation using the output of SHAPEIT2 and ultimate production of phased VCFs. For the initial pre-phasing process with SHAPEIT2, see my answer here: Phasing with SHAPEIT So, the steps are usually: pre-phasing into pre-existing haplotypes available from HERE ( Phasing…

Continue Reading command not found, in IMPUTE2

Is it possible to use a loop to get the EMBOSS Merger function to work on multiple FASTA files?

Hello all, Previously, I posted about a question in a similar vein (see here) BUT now, 2 weeks later, I think I am nearly there! I plan to update that previous post and explain what I’ve done once I’ve tackled this final bit. (TL;DR my other question: I used the…

Continue Reading Is it possible to use a loop to get the EMBOSS Merger function to work on multiple FASTA files?

Getting the same alignment from needle (emboss) and ggsearch36

Getting the same alignment from needle (emboss) and ggsearch36 0 I used to run ggsearch36 from the fasta package for NW alignment of proteins and now need to switch to needle from emboss, as ggsearch36 skips low-scoring alignments, whereas needle can return all. I’m looking for the correct configuration to…

Continue Reading Getting the same alignment from needle (emboss) and ggsearch36

Cannot find reasonable band width. Continue anyway

Cannot find reasonable band width. Continue anyway 0 i’m running a bioinformatic tool called ltr finder, when i runned the command bellow ltr_finder $DATA/$organis_name.fna -s $tRNA/${organis_name}_trna.fasta -a $LTR_Finder/ps_scan -w2 -E -C > $ltr_table/$organis_name.table with the $organis_name changes from one genome to another. in some genomes i don’t get this message…

Continue Reading Cannot find reasonable band width. Continue anyway

Index of /examples/archive/bioinfo/samtools

Index of /examples/archive/bioinfo/samtools Samtools/BCFtools/HTSlib Introduction and Notes Samtools is a suite of programs for interacting with high-throughput sequencing data. It consists of three separate repositories: Samtools Reading/writing/editing/indexing/viewing SAM/BAM/CRAM format BCFtools Reading/writing BCF2/VCF/gVCF files and calling/filtering/summarising SNP and short indel sequence variants HTSlib A C library for reading/writing high-throughput sequencing data…

Continue Reading Index of /examples/archive/bioinfo/samtools

Pfam Superfamilies/Clans Alignments

Pfam Superfamilies/Clans Alignments 1 It is not clear what you want, because Pfam has families and clans but not superfamilies. Don’t know how to download clan alignment or even if it is possible, because individual Pfam families that make up a clan are often very different in terms of length…

Continue Reading Pfam Superfamilies/Clans Alignments

Ensembl vep. How to filter population frequency less than 1%?

Ensembl vep. How to filter population frequency less than 1%? 0 Hi everyone, I have gotten many responses from the site even though I never asked a question, this is my first query. I am working with ensemble vep to annotate and filter a vcf file. With this script I…

Continue Reading Ensembl vep. How to filter population frequency less than 1%?

Using PoolSNP to return non-SNP genotypes

Using PoolSNP to return non-SNP genotypes 0 I know that PoolSNP is optimized for variant calling, but surely there’s some way to get to to return a vcf with allele frequency counts at all sites rather than just the SNPs, as can be done with GATK. Is there something I…

Continue Reading Using PoolSNP to return non-SNP genotypes

finding specific protein for bacteria.

finding specific protein for bacteria. 0 Hi Please, anyone can suggest a tool to find the specific protein or unique protein for particular bacteria. I downloaded the identical protein from NCBI I got (1200 protein sequence in Fasta format file). Still, I want to shortlist or reduce the number of…

Continue Reading finding specific protein for bacteria.

Split multi-fasta file and keep structure

Split multi-fasta file and keep structure 0 Hey everyone, I have a multi-fasta file, and when I want to split into individual fasta files, I use a command like this cat myfile | awk ‘{ if (substr($0, 1, 1)==”>”) {filename=(substr($0,2) “.fna”)} print $0 > filename }’ However, each individual fasta…

Continue Reading Split multi-fasta file and keep structure

tmhmm installation and running

tmhmm installation and running 0 Hi, I am trying to predict transmembrane protein using tmhmm-2.0c. I have changed the path of perl like as following. which perl /usr/bin/perl perl -v This is perl 5, version 16, subversion 3 (v5.16.3) built for x86_64-linux-thread-multi (with 44 registered patches, see perl -V for…

Continue Reading tmhmm installation and running

Index of /~psgendb/birchhomedir/GenBank/workspace/reads.spadescorr.k99/K99

Name Last modified Size Description Parent Directory   –   assembly_graph.fastg 2021-06-17 16:25 156M   assembly_graph_with_..> 2021-06-17 16:25 78M   before_rr.fasta 2021-06-17 16:24 78M   configs/ 2021-06-17 13:22 –   final.lib_data 2021-06-17 16:25 12K   final_contigs.fasta 2021-06-17 16:25 77M   final_contigs.paths 2021-06-17 16:25 3.3M   intermediate_contigs..> 2021-06-17 16:15 78M  …

Continue Reading Index of /~psgendb/birchhomedir/GenBank/workspace/reads.spadescorr.k99/K99

Protein filtering for annotation

Protein filtering for annotation 1 Hi, I downloaded 1009 proteins from Genbank. After the below filtering I end up with 663 ammino acids sequences: $ grep “>” NbenthamianaGenbankAA.fasta | grep -v partial | grep -v like | grep -v unnamed | wc -l 663 However, I noticed many identical descriptions…

Continue Reading Protein filtering for annotation

Rewriting perl code using bioperl

Code: #!/usr/bin/perl use strict; use warnings; die “Usage: $0 <fasta_file style=”box-sizing: border-box; color: rgb(102, 102, 102); font-family: ‘Open Sans’, sans-serif; font-size: 13px; font-style: italic; line-height: 20px; background-color: rgb(251, 251, 251);”><coord_file style=”box-sizing: border-box;”>n” unless @ARGV > 0; my ($fasta_file, $coord_file) = @ARGV; open FASTA, “<” . $fasta_file; my $seq_id; my $dna_string =…

Continue Reading Rewriting perl code using bioperl

RNAmmer running

RNAmmer running 1 Dear, I am Kishor from Shanghai. Recently I have been trying to use RNAmmer. But yet to successfully run it. I have made two changes in the rnammer according to instructions, like as follows: my $INSTALL_PATH = “/mnt/genome3/Lab_Users/Kishor/DISK_2/softwares/RNAmmer” **for linux HMMSEARCHBINARY=”/mnt/genome3/LabUsers/Kishor/DISK2/softwares/hmmer2/hmmer−2.3.2/src/” $PERL = “/usr/bin/perl” I also changed…

Continue Reading RNAmmer running

Why Does The Chr1.Fa Fasta File Have A Bunch Of Ns And Why Is Some Of The Dna In Lower Case Vs. The Rest In Upper Case?

Why Does The Chr1.Fa Fasta File Have A Bunch Of Ns And Why Is Some Of The Dna In Lower Case Vs. The Rest In Upper Case? 1 Hi, I have a couple of questions about the chr1.fa FASTA file at the link below: Q1) Why does the beginning of…

Continue Reading Why Does The Chr1.Fa Fasta File Have A Bunch Of Ns And Why Is Some Of The Dna In Lower Case Vs. The Rest In Upper Case?

How To Extract A Sequence From A Big (6Gb) Multifasta File ?

How To Extract A Sequence From A Big (6Gb) Multifasta File ? 11 I want to extract some sequences using ID from a multifasta file. Using perl is not possible because it gave an error when indexing the database. Maybe because of it’s size? Is there any way to this…

Continue Reading How To Extract A Sequence From A Big (6Gb) Multifasta File ?

Segmentation fault (core dumped) during bwa mem mapping

Hi, I ran bwa mem with trimmed fastq files (ERR2593198) but I saw following error: bwa mem CHO-PICR.fasta ../2.ngsShort/trimmed_ERR2593198_1.fastq ../2.ngsShort/trimmed_ERR2593198_2.fastq [M::bwa_idx_load_from_disk] read 0 ALT contigs @PG ID:bwa PN:bwa VN:0.7.17-r1188 CL:../downloads/bwa-0.7.17/bwa mem CHO-PICR.fasta ../2.ngsShort/trimmed_ERR2593198_1.fastq ../2.ngsShort/trimmed_ERR2593198_2.fastq [M::process] read 92156 sequences (10000179 bp)… Segmentation fault (core dumped) To figure out what’s happening, I…

Continue Reading Segmentation fault (core dumped) during bwa mem mapping

Can exons from different reading frames coexist in a peptide sequence?

Hello everyone I am building a program in order to predict peptide sequences from DNA data. I am still in an early phase and making a test i ran into the next problem: 1.- I Downloaded from the NCBI a gene sequence along with its ‘computationaly predicted’ correspondent protein sequence….

Continue Reading Can exons from different reading frames coexist in a peptide sequence?

Exclude specified range of bases from multiple sequences in a FASTA file

Exclude specified range of bases from multiple sequences in a FASTA file 0 Hi, I am trying to eliminate a range of bases from sequences within a FASTA file in multiple places based on the header ID and positions that I mention. For example; I have file; A.fa >ID1 TTGTTCAACGGATCCACCTGTTGCCAAGAGTGCTTCAGTACATTGCTCACGGCTGAATCCCATATCCATCAAAGCACAAGATTTGAATTCACTCGAGGATCTGCTTCGTCGACCATTGGAAATGAAAAAATTACAATTACACATTGAATTTGTAAAGCTTGAAATTAATGAACTTACCAAAATAGATTTGCACACAGAAGCAACAGCTTGGCCGTGTTACAACTTGTAACGGGTAAAGACAAAATCGCTAACAACGGTTGTAGGCCACCATGTTCCACAAATTCACGACA…

Continue Reading Exclude specified range of bases from multiple sequences in a FASTA file

How To Add Specific Word To Fasta Header

How To Add Specific Word To Fasta Header 4 I have more than 5000 fasta sequence in a file and want to add a word , for instance phosphate, to header of all sequence. please tell me a PERL solution for that. fasta • 12k views • link updated 2…

Continue Reading How To Add Specific Word To Fasta Header

blast protein alignment

28 set blast protein alignment Posted at 20:44h in Sem categoria by BLAST applied the standard genetic code for Query, translating GTG into valine (V). The BLAST is a set of algorithms that attempt to find a short fragment of a query sequence that aligns perfectly with a fragment of…

Continue Reading blast protein alignment

Regulation of prefrontal patterning and connectivity by retinoic acid

Data reporting No statistical methods were used to predetermine sample size.  Data collection was performed by independent investigators. Prior to data analysis, all experiments were randomized and analysed by independent blinded observers. Analysis of human and macaque transcriptomic data Developing human and macaque brain RNA-seq data (counts file) with the…

Continue Reading Regulation of prefrontal patterning and connectivity by retinoic acid

Correct Way To Parse A Fasta File In Python

Correct Way To Parse A Fasta File In Python 8 Hi, I have been wondering at the correct approach in Python, maybe using Biopython, of parsing a fasta file without having to place it in memory (eg: NOT having to read it to a list, dictionary or fasta class) before…

Continue Reading Correct Way To Parse A Fasta File In Python

How to make BLASTN be aware of short read?

I’m using blastn (anaconda.org/bioconda/blast) to find similar sequences of a target sequence against a FASTA file. But my read is quite short (68 bases). I realised that blastn won’t report any hit. But there is actually a very good one in the FASTA file after checking manually. Here is the…

Continue Reading How to make BLASTN be aware of short read?

Protein sequence to Nucleotide sequence

Protein sequence to Nucleotide sequence 2 Hello All, I have file1 with protein sequence and another file with its respective decoded nucl codon sequence, is there any one liner which looks for aa single letter in file2 – change the protein sequence to the nucleotide sequence and save it as…

Continue Reading Protein sequence to Nucleotide sequence

A pairwise genome alignment pipeline using LAST and Nextflow

Tool:A pairwise genome alignment pipeline using LAST and Nextflow 0 I have prepared a pairwise genome alignment pipeline for Nextflow and plan to submit it to nf-core as a standalone pipeline or a subworkflow. Using tools from the LAST suite, it takes genomes in FASTA format, trains itself to find…

Continue Reading A pairwise genome alignment pipeline using LAST and Nextflow

How are two alleles typically represented in a whole genome sequence?

How are two alleles typically represented in a whole genome sequence? 0 I apologize in advance if this is a silly question – but I am trying to understand how two inherited variants of a gene are represented in typical whole genome sequencing formats (VCF, FASTA/Q). Here is one example…

Continue Reading How are two alleles typically represented in a whole genome sequence?

Produce PCA bi-plot for 1000 Genomes Phase III in VCF format (old)

NB – Update July 29, 2020 – this thread will no longer be watched and, for all intents and purposes, will now be archived NB – Version 2 of tutorial can be found here and should be used going forward –> Produce PCA bi-plot for 1000 Genomes Phase III –…

Continue Reading Produce PCA bi-plot for 1000 Genomes Phase III in VCF format (old)

What tool to take a FASTA sequence and output the amino acid chain?

What tool to take a FASTA sequence and output the amino acid chain? 0 If I have a sequence contained in a FASTA file, is there a way to display the corresponding chain of amino acids? If the FASTA file contains a genome I may need to first convert to…

Continue Reading What tool to take a FASTA sequence and output the amino acid chain?

install biopython jupyter

like BLAST, ClustalW, FASTA, GenBank, PubMed ExPASy, SwissProt, and many more. conda install-c conda-forge ipyleaflet With pip: pip install ipyleaflet If you are using the classic Jupyter Notebook >> import Bio >>> Bio.__version__. For Windows we provide click-and-run installers. The easiest way to install statsmodels is to install it as…

Continue Reading install biopython jupyter

biopython write fasta

Step 1 − Create a file named blast_example.fasta in the Biopython directory and give the below sequence information as input. 3. “””Bio.SeqIO support for the “fasta” (aka FastA or Pearson) file format. Then we save this line of text to the output file: Now we have finished all the genes,…

Continue Reading biopython write fasta

genbank submission tutorial

These are just a few of the questions answered in this comprehensive overview of Bayesian approaches to phylogenetics. Finally, make sure the Include Primers box is unchecked, as we are not submitting primers with this sequence. Influenza virus sequences. September, 2008. Please download the current version. Some mitochondrial genomes contain CDS’s that…

Continue Reading genbank submission tutorial

bed2vcf (bedr) package error

bed2vcf (bedr) package error 0 Greetings, I am running bedr package in R to generate vcf file from bed file using reference as one of the arguments. I followed steps below: Sort bed file using bedtools intersect Convert sorted bed file to dataframe using read.table Change datatype for chr positions…

Continue Reading bed2vcf (bedr) package error

Recommended approach for building phylogenetic tree from de novo metagenomes

Recommended approach for building phylogenetic tree from de novo metagenomes 0 I have about 650 metagenome assembled genomes (MAG) that cluster into about 150 unique species level designations from FastANI. For each of the clusters, I have a nearest reference. I want to create a phylogenetic tree of these MAGs….

Continue Reading Recommended approach for building phylogenetic tree from de novo metagenomes

How to use a protein alignment fasta file when using the function cluster of Kmer package

How to use a protein alignment fasta file when using the function cluster of Kmer package 0 Hello everyone, I’m trying to use the function cluster of the kmer package in order to obtein a dendogram of a large set of protein sequence already aligned (fasta). The cluster function requires…

Continue Reading How to use a protein alignment fasta file when using the function cluster of Kmer package

Parsing transcript version in Ensembl mouse annotation

Parsing transcript version in Ensembl mouse annotation 1 Hi all, I aligned some data to a Ensembl transcriptome with novel transcripts. I am trying to lift over the sites from transcriptome to genome, which I have previously done using the R package genomicRanges. The Ensembl FASTA headers look like this…

Continue Reading Parsing transcript version in Ensembl mouse annotation

How do you get the result of BLAST like this paper?

Since you’re blasting with ncbi-NR you’re almost there. By changing the blastx output options to outfmt 6 you’re receiving tabular output in avenae.out, you can customise the output columns to also receive the species names. For example, you could run: blastx -query /home/nkarim/avenae/trinity_even_out_dir/Trinity.300.longest.fasta -db /home/nkarim/blast/db/nr -outfmt “6 qseqid sseqid sscinames…

Continue Reading How do you get the result of BLAST like this paper?

Convert coordinates from one cotton genome to another

Convert coordinates from one cotton genome to another 0 Greetings, I have a small multifasta file (target) belonging to a cotton genome. I need to lift over the positions from one cotton genome (target) to another (Reference). The headers of the target fasta file are different than the reference file….

Continue Reading Convert coordinates from one cotton genome to another

Phylogenetic Tree from Massive Multifasta Alignment?

Phylogenetic Tree from Massive Multifasta Alignment? 0 Hi all, I have a very large (~30,000 sequence, each ~17000 bases) multifasta alignment and I am wondering if this is too large to construct a phylogenetic tree? If not, which program would be most appropriate for this use case? Thank you! tree…

Continue Reading Phylogenetic Tree from Massive Multifasta Alignment?

grep command for fasta header

grep command for fasta header 0 I used this command:– grep -Fw -A 1 -f header.txt test.fa >test_result.fa But it extracts only 1 header, not the whole which are present in my header.txt file. my header.txt file looks like:— hsa_circ_0000006 hsa_circ_0000014 hsa_circ_0000015 hsa_circ_0000042 hsa_circ_0000070 hsa_circ_0000072 hsa_circ_0000131 hsa_circ_0000133 hsa_circ_0000160 hsa_circ_0000175 hsa_circ_0000211…

Continue Reading grep command for fasta header

genbank database slideshare

Found inside – Page iiThis book describes the historical importance of potato (Solanum tuberosum L.),potato genetic resources and stocks (including S. tuberosum group Phureja DM1-3 516 R44, a unique doubled monoploid homozygous line) used for potato genome … If you continue browsing the site, you agree to the use of…

Continue Reading genbank database slideshare

genbank submission bankit

Submission of sequence data to NCBI archives . Learn more. This post will show you how to… Careers, General: your contact details, authors, publication, data release date, Original or third-party assembly/annotation, Set designation (if applicable) for multiple sequences of the same locus, Nucleotide sequences in FASTA or alignment format, Source…

Continue Reading genbank submission bankit

biopython extract sequence from fasta

My two questions are: What is the simplest way to do this? This unique book shows you how to program with Python, using code examples taken directly from bioinformatics. using python-bloom-filter, just replace the set with seen = BloomFilter(max_elements=10000, error_rate=0.001). This book is suitable for use as a classroom textbook,…

Continue Reading biopython extract sequence from fasta

ncbi genbank submission

This document describes how to use the Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) virus command line interface of the GenBank Submission Portal. Found inside – Page 4NCBI builds GenBank primarily from the submission of sequence data from authors and from the bulk submission of expressed sequence tag (EST), genome…

Continue Reading ncbi genbank submission

GREP from multiFasta file and keep headers

GREP from multiFasta file and keep headers 0 Hi there, I am new to coding, consider yourself warned 😀 I have a multifasta file with 3′ UTR sequences of variable length. I would like to extract a 6-mer sequence; AGTCTC with 20 nts upstream and 20 nts downstream (but not…

Continue Reading GREP from multiFasta file and keep headers

Error message running GeMoMa

Error message running GeMoMa 0 I am trying to run GeMoMa for gene annotation using RNA-seq evidence. java -jar -Xms25G -Xmx50G GeMoMa-1.6.1.jar CLI GeMoMaPipeline t=target.fasta s=own a=ref-annotation.gff g=ref.fa outdir=test r=MAPPED ERE.s=FR_UNSTRANDED ERE.m=target-accepted_hits.bam ERE.v=SILENT ERE.c=true tblastn=true Extractor.p=true Extractor.r=true Extractor.s=true Extractor.f=true AnnotationFinalizer.u=YES AnnotationFinalizer.r=NO p=true pc=true pgr=true However, I am getting this error…

Continue Reading Error message running GeMoMa

Cutadapt de-multiplexing does not recognize some barcodes

Hey all, So I currently dealing with deep-sequencing data of 16S amplicons from multiple variable regions on the 16S gene. I used different sets of primers to amplify different regions and then sequenced them all together, I am trying to de-multiplex based on those primers using cutadapt and the following…

Continue Reading Cutadapt de-multiplexing does not recognize some barcodes

Problem loading my multifasta

Problem loading my multifasta 0 Hello All, I have problem loading my multifasta and running rhierbaps. I I can see that the fasta.file.name is empty. ibrary(rhierbaps) fasta.file.name <- “clean_hlr.fasta” fasta.file.name <- system.file(“extdata”, “clean_hlr.fasta”, package = “rhierbaps”) fasta.file.name [1] “” snp.matrix <- load_fasta(fasta.file.name) Error in load_fasta(fasta.file.name) : Invalid msa or the…

Continue Reading Problem loading my multifasta

Biopython Biopython Statistics & Issues

Issue Title State Comments Created Date Updated Date Closed Date Add .islower() and .isupper() methods to Seq? open 0 2021-09-25 2021-09-18 – Apparently random fluctuations in coverage via CodeCov? open 1 2021-09-23 2021-09-18 – BgzfWriter accepts read-only fileobj open 1 2021-09-23 2021-09-18 – BgzfReader argument `mode` not tested if a…

Continue Reading Biopython Biopython Statistics & Issues

Haplogrep classify error

Haplogrep classify error 0 I am having a strange issue with haplogrep v2.4.0 When I ask it to classify the haplogroup of NCBI Nucleotide record EU558518.1 (www.ncbi.nlm.nih.gov/nuccore/EU558518.1?report=fasta) $haplogrep_path classify –in EU558518.fa –out EU558518 –format fasta It throws the following error: Start Classification… [M::bwa_idx_load_from_disk] read 0 ALT contigs java.lang.NullPointerException at core.Polymorphism.compareTo(Polymorphism.java:526)…

Continue Reading Haplogrep classify error

The BLAST command stops with an error.

The BLAST command stops with an error. 0 Hi all. I would like to evaluate the assembly quality by running BLAST to the fasta file output by Trinity. I ran the command as follows, but this job was stopped with an error message. $ ~/miniconda3/envs/py27/bin/blastx -query /home/nkarim/avenae/trinity_even_out_dir/Trinity.300.longest.fasta -db /home/nkarim/blast/db/nr -outfmt…

Continue Reading The BLAST command stops with an error.

Is there a way to make the BLAST results easier to read?

Is there a way to make the BLAST results easier to read? 1 Hi all. I’m trying to get assembly stat. As part of that, I would like to evaluate the assembly quality by running BLAST to the fasta file output by Trinity. I got the result by blastx, but…

Continue Reading Is there a way to make the BLAST results easier to read?

Python program to find the indexes of Cys in the given mutlifasta sequences

Python program to find the indexes of Cys in the given mutlifasta sequences 1 fasta = open(‘out.fa’, ‘r+’) for line in fasta.read().split(‘n’): if line.startswith(“>”): header = line print(header) else: indexes = [] for i in range(0, len(line)-1): if line[i] == ‘C’: indexes.append(i+1) print(“Cys : “, indexes) a indexes file given…

Continue Reading Python program to find the indexes of Cys in the given mutlifasta sequences

Biostrings readDNAStringSet handling of N bases

Hi, I’m using Biostrings readDNAStringSet to read the human genome fasta file : GRCh38.p13.genome.fa (search.genome.fn in the code below). search.genome.set <- Biostrings::readDNAStringSet(search.genome.fn) Then looking at the starts, ends, and widths of the canonical chromosomes (1-22, X, Y, and M): > as.data.frame(search.genome.set@ranges)[1:25,] start end width names 1 1 248956422 248956422 chr1…

Continue Reading Biostrings readDNAStringSet handling of N bases

Tools To Calculate Average Coverage For A Bam File?

Tools To Calculate Average Coverage For A Bam File? 12 I would like to get the average coverage of all the captured bases in a bam file. What would be the best way to do this? What I am looking is a simple one number like 40X. Given that there…

Continue Reading Tools To Calculate Average Coverage For A Bam File?

weighted mean pairwise distance among samples containing multiple sequences?

Weighted pairwise genetic distance between samples from fasta file and count table of sequences? I have a series of samples, all containing multiple DNA sequences. I’m looking to calculate the mean pairwise genetic distance between samples. I’ve figured out a way of how to do this in R using dist.dna…

Continue Reading weighted mean pairwise distance among samples containing multiple sequences?

How do I get FASTA if i have a protein ID (in 10000’s) ?

How do I get FASTA if i have a protein ID (in 10000’s) ? 3 HI I have more than 10,000 protein IDS, I’m interested in extracting all the fasta sequences of these proteins ids from uniprot. What I did so, far– Already I downloaded all the fasta sequences of…

Continue Reading How do I get FASTA if i have a protein ID (in 10000’s) ?

align_and_estimate_abundance error Trinty

align_and_estimate_abundance error Trinty 0 Hello, I am trying to prepare a reference for alignment and abundance estimation. I have taken the transcriptome fasta file, do I need to use genomic fasta file or gtf file? I don’t understand this point. pl guide me I am using this code perl /cabinfs/opt/applications/trinity/trinityrnaseq-Trinity-v2.5.1/util/align_and_estimate_abundance.pl…

Continue Reading align_and_estimate_abundance error Trinty

Makeblastdb error: file does not exist.

Makeblastdb error: file does not exist. 0 I just downloaded the blast command line applications and I would like to make a database of a MGEs nucleutides that i downloaded from ACLAME database site in fasta file. It is labeled as mydb.fasta and I have put the file into a…

Continue Reading Makeblastdb error: file does not exist.