Tag: SED

Exploring structure, microbiota, and metagenome functions of epigean and hypogean black deposits by microscopic, molecular and bioinformatic approaches

Spiro, T. G., Bargar, J. R., Sposito, G. & Tebo, B. M. Bacteriogenic manganese oxides. Acc. Chem. Res. 43, 2–9. doi.org/10.1021/ar800232a (2010). Article  CAS  PubMed  Google Scholar  Roitz, J. S., Flegal, A. R. & Bruland, K. W. The biogeochemical cycling of manganese in San Francisco Bay: Temporal and spatial variations…

Continue Reading Exploring structure, microbiota, and metagenome functions of epigean and hypogean black deposits by microscopic, molecular and bioinformatic approaches

use pre-configure, not post-patch, hooks to configure

[Buildroot] [PATCH] package/mbedtls: use pre-configure, not post-patch, hooks to configure * [Buildroot] [PATCH] package/mbedtls: use pre-configure, not post-patch, hooks to configure @ 2022-08-28 19:54 Yann E. MORIN 0 siblings, 0 replies; only message in thread From: Yann E. MORIN @ 2022-08-28 19:54 UTC (permalink / raw) To: buildroot; +Cc: Yann…

Continue Reading use pre-configure, not post-patch, hooks to configure

Bash script to automate htseq-count

Hi everyone- I am trying to write a script to automate htseq-count on a large number of samples. The script runs but then throws the following error: “Please provide 2 arguments”. Does anyone see something obvious I am missing: #!/bin/bash for samples in *.sam do gtf = “Galaxy135-\[Escherichia_coli_str_k_12_substr_mg1655.GCA_000005845.2.29.gtf\].gtf” echo $sample …

Continue Reading Bash script to automate htseq-count

Open Rank Bioinformatics Software Engineer I and Bioinformatics Engineer II in Baltimore, MD for University of Maryland, Baltimore

Open Rank Bioinformatics Software Engineer I and Bioinformatics Software Engineer II The Institute for Genome Sciences (IGS), Informatics Resource Center, is recruiting for a Open Rank Bioinformatics Software Engineer I and Bioinformatics Software Engineer II position. This position will be filled as either the Bioinformatics Software Engineer I or II.IGS…

Continue Reading Open Rank Bioinformatics Software Engineer I and Bioinformatics Engineer II in Baltimore, MD for University of Maryland, Baltimore

Recent questions tagged fasta – Q&A

Most popular tags python javascript html java css reactjs c# php r sql arrays pandas c++ android jquery DataFrame python-3.x node.js c mysql list flutter JSON ios typescript sql-server swift string angular regex laravel excel django dictionary dart bash numpy postgresql loops oracle vba linux angularjs function for-loop spring spring-boot…

Continue Reading Recent questions tagged fasta – Q&A

How to remove everything before a specific character in a column

How to remove everything before a specific character in a column 2 I have a big tsv file with 4 columns in the following format: ERR435678 contig_1 /home/results/file1.txt /home/results/file1.txt ERR435678 contig_2 /home/results/file2.txt /home/results/file2.txt ERR435678 contig_3 /home/results/file3.txt /home/results/file3.txt How can I manipulate only the elements of the third column in a…

Continue Reading How to remove everything before a specific character in a column

Reproduce incorrect goa generation of MapOf in OpenAPI body

goa version 3.7.2 Design The field name anyobject with Attribute MapOf(String, Any) should describe a map of string to any unknown value. For HTTP requests, the field anyobject can be sent like the following (i.e. any json object): { “anyobject”: { “testkey”: 10, “test1”: “testval”, “myarr”: [ { “nestedk”: {…

Continue Reading Reproduce incorrect goa generation of MapOf in OpenAPI body

Option To Exclude Unmapped Reads From SAM

Currently, unmapped reads are included in the SAM file. I have a scenario where 99% of the reads won’t map to the reference sequences used (i.e. mapping only to a gene family). This creates unnecessarily large files which need to be filtered to reduce their size (e.g. sed). It’d be…

Continue Reading Option To Exclude Unmapped Reads From SAM

Running rstudio (but not R alone) inside of a conda environment complains about finding Rccp.so

I find that when I try to load some packages in a conda environement, when using rstudio (but not when I’m using R directly), I get an error message about a missing Rcpp.so file. I activate my conda environment (which is running R version 4.1), open RStudio (which I installed…

Continue Reading Running rstudio (but not R alone) inside of a conda environment complains about finding Rccp.so

How can I separate 3 different pieces of information in a column?

How can I separate 3 different pieces of information in a column? 3 For example, in the column I have, there is a line written Ser25Phe. And I want to split the column written HGVS.Consequence as Ser 25 Phe. Programming regex split R gsub • 205 views • link updated…

Continue Reading How can I separate 3 different pieces of information in a column?

NCS v1.8.0: OpenThread: nrf-config.h: No such file or directory – Nordic Q&A – Nordic DevZone

I have a project using NCS v1.8.0 where I enabled openthread. For some reason, when trying to build `mbedtls/library/ecjpake.c`, the make command uses `-DMBEDTLS_CONFIG_FILE=”nrf-config.h”` and fails on: > nrf-config.h: No such file or directory I’m not sure why it tries to use this file because `MBEDTLS_CFG_FILE` is set to `config-tls-generic.h`….

Continue Reading NCS v1.8.0: OpenThread: nrf-config.h: No such file or directory – Nordic Q&A – Nordic DevZone

Human Caspase 14 (CASP14) CLIA Kit, Cat#EKU09426

Caspase 14 Apoptosis-Related Cysteine Peptidase; Cysteinyl Aspartate Specific Proteinases 14 Intra-Assay: CV Detection Method Double-antibody Sandwich Assay Time 2.6 hours Assay Type Double-antibody Sandwich Shipping Condition Ice packs Storage Short term: 4°C; Long term: see manual. Precaution of Use The Stop Solution is acidic. Do not allow to contact skin…

Continue Reading Human Caspase 14 (CASP14) CLIA Kit, Cat#EKU09426

Exercise Reduces H3K9me3 and Regulates Brain Derived Neurotrophic Factor and GABRA2 in an Age Dependent Manner

FIGURE 6 H3K9me3 repression of BDNF and GABA receptors decreases with age. Comparison of ChIP%… FIGURE 6 H3K9me3 repression of BDNF and GABA receptors decreases with age. Comparison of ChIP% input data from young and aged mice. (A) H3K9me3 levels at the BDNF 1 promoter were significantly greater in young…

Continue Reading Exercise Reduces H3K9me3 and Regulates Brain Derived Neurotrophic Factor and GABRA2 in an Age Dependent Manner

Regions File Format – ANGSD-wrapper/angsd-wrapper Wiki

ANGSD-wrapper prefers the regions file to be formatted as chr_name:start_position-end_position. Below, we will create a toy BED file as an example and show how we can go from BED file format to ANGSD-wrapper’s regions file format. Create toy BED file Let’s create an example BED file. You can run the…

Continue Reading Regions File Format – ANGSD-wrapper/angsd-wrapper Wiki

[Buildroot] [PATCH 1/2] package/mbedtls3: new package

[Buildroot] [PATCH 1/2] package/mbedtls3: new package * [Buildroot] [PATCH 1/2] package/mbedtls3: new package @ 2021-12-28 15:33 Fabrice Fontaine 2021-12-28 15:33 ` [Buildroot] [PATCH 2/2] package/hiawatha: needs mbedtls3 Fabrice Fontaine 2021-12-28 15:49 ` [Buildroot] [PATCH 1/2] package/mbedtls3: new package Thomas Petazzoni 0 siblings, 2 replies; 5+ messages in thread From: Fabrice…

Continue Reading [Buildroot] [PATCH 1/2] package/mbedtls3: new package

LAMMPS failed to install with Intel compiler due to cpio and diffutils

$ spack install lammps%intel +asphere +class2 +kspace +manybody +misc +molecule +mpiio +opt +replica +rigid +user-omp +user-intel ^intel-mkl ^intel-mpi (base) bash-4.2$ spack install lammps%intel +asphere +class2 +kspace +manybody +misc +molecule +mpiio +opt +replica +rigid +user-omp +user-intel ^intel-mkl ^intel-mpi [+] /opt/intel/cmake (external cmake-3.20.0-3teotjsa6webcsazofydfkk2v3pn2hb6) [+] /nfs/pdx/home/sdouyeb/spack/opt/spack/linux-centos7-skylake_avx512/intel-19.1.3.304/alsa-lib-1.2.3.2-yctbppr2tcdayclbt32xjgagmmwbct2v [+] /nfs/pdx/home/sdouyeb/spack/opt/spack/linux-centos7-skylake_avx512/intel-19.1.3.304/libiconv-1.16-ilxedtsoqggtmtbrjxehh6ojznmtdni3 [+] /nfs/pdx/home/sdouyeb/spack/opt/spack/linux-centos7-skylake_avx512/intel-19.1.3.304/yasm-1.3.0-7y4fvxshqkmikii3cvelccaiajj4dboc [+] /nfs/pdx/home/sdouyeb/spack/opt/spack/linux-centos7-skylake_avx512/intel-19.1.3.304/zlib-1.2.11-k24vg36ubsrvwfgrdipndmpqn4eo5jq7 ==>…

Continue Reading LAMMPS failed to install with Intel compiler due to cpio and diffutils

phylogenetics – Remove variable sequence component within a tree text file

I have a gene tree file of 436 orthologue genes from 6 species. I want to remove unwanted extensions as it looks massy after visualization. My file looks like: (TRINITY_Clupea_DN5452_c0_g1_i1.p1:0.0824467436,TRINITY_Engraulis_DN43599_c0_g1_i1.p1:0.1634781085)100:0.0876433106,TRINITY_Sardina_DN15766_c0_g1_i2.p1:0.0164132018)……………… What i need: (Clupea_DN5452:0.0824467436,Engraulis_DN43599:0.1634781085)100:0.0876433106,Sardina_DN15766:0.0164132018)……………… As “TRINITY” is identicial, i can remove it using sed. But after the species name ids…

Continue Reading phylogenetics – Remove variable sequence component within a tree text file

main-armv7-default][science/cp2k-data] Failed for cp2k-data-7.1.0 in stage

You are receiving this mail as a port that you maintain is failing to build on the FreeBSD package build server. Please investigate the failure and submit a PR to fix build. Maintainer: y…@freebsd.org Log URL: beefy12.nyi.freebsd.org/data/main-armv7-default/p772274a15b8b_s0630a06b2a/logs/cp2k-data-7.1.0.log Build URL: beefy12.nyi.freebsd.org/build.html?mastername=main-armv7-default&build=p772274a15b8b_s0630a06b2a Log: =>> Building science/cp2k-data build started at Sun Dec 19…

Continue Reading main-armv7-default][science/cp2k-data] Failed for cp2k-data-7.1.0 in stage

snakemake truncating shell codes

snakemake truncating shell codes 0 I’m trying to change the chromosome number notation from [0-9XY] to Chr[0-9XY] using the samtools reheader in the shell command of the snakemake. rule rename: input: os.path.join(config[“input”], “{sample}.bam”), output: os.path.join(config[“output”], “new_sample/{sample}_chr.bam”) log: os.path.join(config[“log”], “samtools/{sample}”) shell: “samtools view -H {input} | sed -e ‘s/SN:([0-9XY]*)/SN:chr1/’ -e ‘s/SN:MT/SN:chrM/’…

Continue Reading snakemake truncating shell codes

Issue with installing QIIME2 2021.11 on Windows 10 – Technical Support

Hi QIIME support team, I’m attempting to install QIIME2 on my Windows 10 machine. I installed Anaconda3, then set up conda to run in Git Bash: echo “. ${PWD}/conda.sh” >> ~/.bashrc Once I restarted Git Bash and activated Conda, I installed python-wget because installation of wget kept getting the following…

Continue Reading Issue with installing QIIME2 2021.11 on Windows 10 – Technical Support

bash script not a valid identifier

bash script not a valid identifier 2 I am trying to run bash script, but it gives this error ( `$fastq’: not a valid identifier). #!/bin/bash database=”kraken2_database” fastq=”fastq_dir” for $fastq in $(ls *_R1.fastq.gz | sed ‘s/_R1.fastq.gz//’) do kraken2 –db $database –threads 8 –memory-mapping –use-names –confidence 0.1 –report taxonomy_reads/${fastq}_kraken2.tax –paired ${fastq}_R1.fastq.gz…

Continue Reading bash script not a valid identifier

NCBI’s Efetch not working

Any help would be much appreciated. My goal is to run the following for loop to generate a list of sample_id (which is actually isolation site) for a list of SRAs. However I get an error (see below) for each and every SRA. for sra in `awk ‘NR>1{print $1}’ metadata.txt`…

Continue Reading NCBI’s Efetch not working

Using comm to make a list of files that haven’t yet been processed

Using comm to make a list of files that haven’t yet been processed 0 I’m using comm to work out which files have already been processed and which are still to do. The input and output filenames are a little different, so I’ve used basename and sed to strip away…

Continue Reading Using comm to make a list of files that haven’t yet been processed

Removing diff syntax from its output

Removing diff syntax from its output 1 I’m using diff to work out which files have already been processed and which are still to do. The input and output filenames are a little different, so I’ve used basename and sed to strip away the filepath and suffix information, so they…

Continue Reading Removing diff syntax from its output

How to count fastq reads

How to count fastq reads 9 ‘wc’ is faster than awk #yourfile.fastq echo $(cat yourfile.fastq|wc -l)/4|bc #yourfile.fastq.gz echo $(zcat yourfile.fastq.gz|wc -l)/4|bc for fasta files: grep -c “^>” file.fasta for fastq files: grep -c “^@” file.fastq for fastq files: awk ‘{s++}END{print s/4}’ file.fastq Here is the fancy script in bash: #!/bin/bash…

Continue Reading How to count fastq reads

BEDOPS bedmap confusing file for option

BEDOPS bedmap confusing file for option 1 Hi, I’ve been trying to construct a for loop for getting mean methylation values across certain genomic features but for some reason bedmap is assuming my reference file is an option which makes no sense. My for loop is seen below and I…

Continue Reading BEDOPS bedmap confusing file for option

TCGA transcriptome data to R (DESeq2)

This seems to be frequently asked question, so here is a robust method to fully recapitulate the counts given by TCGA and port it to DESeq2. Why the long way? Tanya and I noticed via TCGA-Biolinks and Firehose did not generate the full count matrix. ~5-10% of genes were missing…

Continue Reading TCGA transcriptome data to R (DESeq2)

Run multiple times samtools and sed for a big number of bam files in folder

Run multiple times samtools and sed for a big number of bam files in folder 1 How can we execute the following commands with bash for a big amount of bam files in a folder samtools view -H in.bam > header.sam sed -i s/SN:/SN:chr/ header.sam sed -i s/SN:chrMT/SN:chrM/ header.sam samtools…

Continue Reading Run multiple times samtools and sed for a big number of bam files in folder

Bioinformatics Postdoctoral Research Associate, Long Lab job with Benaroya Research Institute at Virginia Mason

Benaroya Research Institute at Virginia Mason (BRI) has a bold mission: Predict, prevent, reverse and cure immune system diseases, from autoimmune disease to cancer to COVID-19. We examine the immune system in both health and disease to understand how disorders start and how to rebalance the immune system back to…

Continue Reading Bioinformatics Postdoctoral Research Associate, Long Lab job with Benaroya Research Institute at Virginia Mason

Conversion of fna files to faa

Conversion of fna files to faa 1 Hi, Can someone tell me how I can convert multiple fna files of bacterial genomes to faa files using the command line? I have downloaded these files from NCBI. I have looked up the sed code but was unable to use it properly….

Continue Reading Conversion of fna files to faa

130releng-armv7-quarterly][math/py-pymc3] Failed for py38-pymc3-3.11.4 in fetch

You are receiving this mail as a port that you maintain is failing to build on the FreeBSD package build server. Please investigate the failure and submit a PR to fix build. Maintainer: y…@freebsd.org Log URL: beefy12.nyi.freebsd.org/data/130releng-armv7-quarterly/46aef99be2ae/logs/py38-pymc3-3.11.4.log Build URL: beefy12.nyi.freebsd.org/build.html?mastername=130releng-armv7-quarterly&build=46aef99be2ae Log: =>> Building math/py-pymc3 build started at Sat Nov 27…

Continue Reading 130releng-armv7-quarterly][math/py-pymc3] Failed for py38-pymc3-3.11.4 in fetch

How to replace/fill “Ns” in fasta with reference file having same coordinates

How to replace/fill “Ns” in fasta with reference file having same coordinates 0 Dear community, Hope you are doing great. As asked in title, please guide if there is any way to fill or replace N or N’s in fasta file with the help of reference file. For example Fasta…

Continue Reading How to replace/fill “Ns” in fasta with reference file having same coordinates

How to assess structural variation in your genome, and identify jumping transposons

Prerequisites Data An annotated genome Long reads Repeat annotation Software minimap2 samtools bedtools – for comparisons only tabix – for visualization only Installation 1 2 3 /work/gif/remkv6/USDA/04_TEJumper conda create -n svim_env –channel bioconda svim source activate svim_env Map your long reads to your genome with minimap My directory locale 1…

Continue Reading How to assess structural variation in your genome, and identify jumping transposons

Replace fasta ID with value from TSV file (sed with special characters)

Replace fasta ID with value from TSV file (sed with special characters) 0 Command: sed -i “s/^>.*$/>$fastaid/g” output.fasta The desired output is to replace the entire fasta ID with everything stored in the variable $fastaid. Problem is that $fastaid looks like 12456789.AB.25/12/21 and it throws errors due to the special…

Continue Reading Replace fasta ID with value from TSV file (sed with special characters)

Intersect multiple bed and keep all fields

Assuming the inputs are disjoint, here is a modification of a general BEDOPS-based solution from Shane Neph for N input files (for you, three inputs, but this works for the general case of N inputs): $ bedops –everything file1.bed file2.bed … fileN.bed | bedmap –echo-map – | awk ‘(split($0, a,…

Continue Reading Intersect multiple bed and keep all fields

Add Cigar string and Template Length to Read Name

Add Cigar string and Template Length to Read Name 1 Hi all, I need to convert a BAM file to Fastq format, but I don’t want to loose the Cigar and TLen information. My idea is to edit each read name in the BAM file, by appending both Cigar and…

Continue Reading Add Cigar string and Template Length to Read Name

sed – Delete records for which multiple pattern conditional s across columns awk

I have a file that looks like NC_042565.1 RefSeq region 1 114882317 . + . ID=NC_042565.1:1..114882317;Dbxref=taxon:299123;Name=1;chromosome=1;dev-stage=adult;gbkey=Src;genome=chromosome;isolate=Mets1;mol_type=genomic DNA;sex=male;sub-species=domestica;tissue-type=blood NC_042565.1 Gnomon gene 21625 41521 . – . ID=gene-LCMT2;Dbxref=GeneID:110474964;Name=LCMT2;gbkey=Gene;gene=LCMT2;gene_biotype=protein_coding NC_042565.1 Gnomon mRNA 21625 41521 . – . ID=rna-XM_021538777.2;Parent=gene-LCMT2;Dbxref=GeneID:110474964,Genbank:XM_021538777.2;Name=XM_021538777.2;gbkey=mRNA;gene=LCMT2;model_evidence=Supporting evidence includes similarity to: 2 ESTs%2C 9 Proteins%2C and 100%25 coverage of the annotated genomic…

Continue Reading sed – Delete records for which multiple pattern conditional s across columns awk

Help understanding “sed” command in a loop

Help understanding “sed” command in a loop 0 Hi everyone, I have 2 questions: 1) I have found this script online to run Kraken2 in a loop on paired ends. Although I know it works well, because I have compared the results with another loop I have, I am not…

Continue Reading Help understanding “sed” command in a loop

How to parse a data file

How to parse a data file 2 Hello all, I have a file “file1.txt ” which initially looks like this, Orthogroup F105 F109 F23 F79 HDV247 T415 OG0006155 F105|108872 OG0006156 F105|114651 OG0006157 F105|115307 OG0006158 F105|121488 OG0006551 F109|843828 OG0006552 F109|844465 OG0006553 F109|845048 OG0006557 F23|102768 OG0006558 F23|106636 OG0006559 F23|108691 OG0006560 F23|108697 OG0006841…

Continue Reading How to parse a data file

Fast way to extract specific sequences from large fasta

Fast way to extract specific sequences from large fasta 2 Hi all! I have ~2k text files, each with ~1k protein names (one protein name per line) and I need to extract the sequences of these proteins from a large master fasta file which contains ~5.5 million sequences. I wrote…

Continue Reading Fast way to extract specific sequences from large fasta

DESeq2 interaction terms in 2x2x2 factorial design

Hello, I’m trying to analyze RNASeq data from an multifactorial experimental setup with DESeq2. In brief, the experiment is about multiple stressor effects in stream organisms, whereby different stressors (here: added sediment, increased salinity or reduced flow velocity) are applied to the study organisms, either as single stressors or in…

Continue Reading DESeq2 interaction terms in 2x2x2 factorial design

main-armv6-default][science/cp2k-data] Failed for cp2k-data-7.1.0 in stage

You are receiving this mail as a port that you maintain is failing to build on the FreeBSD package build server. Please investigate the failure and submit a PR to fix build. Maintainer: y…@freebsd.org Log URL: beefy8.nyi.freebsd.org/data/main-armv6-default/pebf5105f9a4d_s23024f004a/logs/cp2k-data-7.1.0.log Build URL: beefy8.nyi.freebsd.org/build.html?mastername=main-armv6-default&build=pebf5105f9a4d_s23024f004a Log: =>> Building science/cp2k-data build started at Fri Oct 29…

Continue Reading main-armv6-default][science/cp2k-data] Failed for cp2k-data-7.1.0 in stage

main-armv6-default][biology/p5-BioPerl] Failed for p5-BioPerl-1.007007_1 in stage

You are receiving this mail as a port that you maintain is failing to build on the FreeBSD package build server. Please investigate the failure and submit a PR to fix build. Maintainer: sunp…@freebsd.org Log URL: beefy8.nyi.freebsd.org/data/main-armv6-default/pebf5105f9a4d_s23024f004a/logs/p5-BioPerl-1.007007_1.log Build URL: beefy8.nyi.freebsd.org/build.html?mastername=main-armv6-default&build=pebf5105f9a4d_s23024f004a Log: =>> Building biology/p5-BioPerl build started at Fri Oct 29…

Continue Reading main-armv6-default][biology/p5-BioPerl] Failed for p5-BioPerl-1.007007_1 in stage

“Given ref” field is empty when a ref. allele was in VCF input

VEP: “Given ref” field is empty when a ref. allele was in VCF input 0 Hi there, I’m running VEP using the following command: ref=”GRCh38.primary_assembly.genome.fa” vep=”/opt/vep_ensembl/ensembl-vep/vep” for ea in *Somatic.hc.vcf do $vep -i $ea -o vep/”$(echo $ea | sed s/.vcf//)”_VEP.txt –cache –dir_cache “/home/shared/vep_cache/” –assembly GRCh38 –merged –fasta $ref –hgvs –hgvsg…

Continue Reading “Given ref” field is empty when a ref. allele was in VCF input

Remove Gaps from Multiple sequence alignment

Remove Gaps from Multiple sequence alignment 2 I want to remove col that contains gaps in the MSA file… Any sort of python code that helps me??? col Remove in MSA • 28 views Not sure if it’s python code but I know that trimAL can be used for this….

Continue Reading Remove Gaps from Multiple sequence alignment

Trimming Illumina universal adapters using cutadapt proving insufficient

TL;DR: I have high universal Illumina adapter content in my paired-end RNA-seq reads and trimming with both the original sequence and reverse complement of the universal adapter did not completely remove the adapter content and was only effective for the R2 reads. I am trying to trim adapter sequences from…

Continue Reading Trimming Illumina universal adapters using cutadapt proving insufficient

Showing off skills for job hunting : bioinformatics

Am I screwing up by not showing off more Python skills in my code base for job applications? I have my BS in molecular biology and have worked in academic labs for about three years now, and I’m looking to move into industry jobs in bioinformatics. Over the past 3…

Continue Reading Showing off skills for job hunting : bioinformatics

bioinformatics – using sed to capture groups

Your commands would discard lines containing no | character, and lines where the mouse gene identifier has no version number. I’m not certain this is intended, but it’s a side effect of using sed -n with the p flag on the s command. I’m going to assume that this is…

Continue Reading bioinformatics – using sed to capture groups

Prevent DEXSeq breaking transcripts entry when outputting to file

I have multiple DExSeq results files I want to export to tables, but I’m finding that when I output the results directly using write.table() then with large numbers of transcripts for the exon (more than 27), the output is split over two files like: ENSG00000005302 E052 139.860053137518 0.0139936600509854 6.47551836787261 0.0109370410611376…

Continue Reading Prevent DEXSeq breaking transcripts entry when outputting to file

Extract sequences from a fasta file with specific nucleotide repetition

Extract sequences from a fasta file with specific nucleotide repetition 2 I have a fasta file name seqs.fa with multiple sequences i.e., >Seq1 GATAGAT**ATC**GAATG**ATC** >Seq2 GATGATAG**ATC**GATGC I want grep/extract only those sequences having ATC repeated exactly 2 times like in Seq1. How we can use grep/sed or {} method for…

Continue Reading Extract sequences from a fasta file with specific nucleotide repetition

How to get sample names and genotype for SNP in multi-sample VCF file

Last update: May 9, 2021 You must normalise your VCF / BCF first; otherwise, this script will not work as expected. You can do this with: bcftools norm -m-any MyVariants.vcf -Ov > MyVariants.Norm.vcf I probably should explain what’s going on here, too: It is divided into 4 parts (each part…

Continue Reading How to get sample names and genotype for SNP in multi-sample VCF file

BBDuk quality filtering not producing expected results

BBDuk quality filtering not producing expected results 1 I’m trying to trim/filter low quality reads from paired-end exome-seq data, using BBDuk. I used the command: for ea in $files; do R1=”$ea” R2=$(echo $R1 | sed “s/R1/R2/”) /home/shared/programs/bbmap/bbduk.sh -Xmx1g in1=$R1 in2=$R2 out1=”$(echo $ea | sed s/.fastq.gz/_trimmed_filtered.fastq.gz/)” out2=”$(echo $(echo $ea | sed…

Continue Reading BBDuk quality filtering not producing expected results

How to get a list of all KEGG ko terms vs names?

How to get a list of all KEGG ko terms vs names? 2 I want to map a list of KEGG terms with their corresponding names. Is it possible to get a list of all KEGG terms and their descriptions/names? kegg annotation rna-seq • 1.0k views Hi. I am not…

Continue Reading How to get a list of all KEGG ko terms vs names?

Maximizing GROMACS Throughput with Multiple Simulations per GPU Using MPS and MIG

GROMACS, a simulation package for biomolecular systems, is one of the most highly used scientific software applications worldwide, and a key tool in understanding important biological processes including those underlying the current COVID-19 pandemic. In a previous post, we showcased recent optimizations, performed in collaboration with the core development team,…

Continue Reading Maximizing GROMACS Throughput with Multiple Simulations per GPU Using MPS and MIG

How To Extract A Sequence From A Big (6Gb) Multifasta File ?

How To Extract A Sequence From A Big (6Gb) Multifasta File ? 11 I want to extract some sequences using ID from a multifasta file. Using perl is not possible because it gave an error when indexing the database. Maybe because of it’s size? Is there any way to this…

Continue Reading How To Extract A Sequence From A Big (6Gb) Multifasta File ?

snap installation error

snap installation error 1 Hello everybody I want to annotate the genomes of fungi and I wanted to install snap, I already installed on ubuntu. now i’m working on debian, the installation stops that i did “make” i get an installation error below : root@debian:software/snap# make make gcc make[1] : on…

Continue Reading snap installation error

Protein sequence to Nucleotide sequence

Protein sequence to Nucleotide sequence 2 Hello All, I have file1 with protein sequence and another file with its respective decoded nucl codon sequence, is there any one liner which looks for aa single letter in file2 – change the protein sequence to the nucleotide sequence and save it as…

Continue Reading Protein sequence to Nucleotide sequence

Produce PCA bi-plot for 1000 Genomes Phase III in VCF format (old)

NB – Update July 29, 2020 – this thread will no longer be watched and, for all intents and purposes, will now be archived NB – Version 2 of tutorial can be found here and should be used going forward –> Produce PCA bi-plot for 1000 Genomes Phase III –…

Continue Reading Produce PCA bi-plot for 1000 Genomes Phase III in VCF format (old)

Scientists Find Well-preserved Cells and Nucleus Found In 125-million-years-old Dinosaur Fossil

Researchers ‘rediscovered’ a bee that was lost for years When Richard Attenborough’s character, John Hammond, explained the extraction of prehistoric dinosaur DNA for their de-extinction in 1993’s Jurassic Park, one could not help but marvel at the idea. While mosquitoes containing dinosaur blood—full of DNA—have not been found so far,…

Continue Reading Scientists Find Well-preserved Cells and Nucleus Found In 125-million-years-old Dinosaur Fossil

GenomSys – Bioinformatics and genomics

By Luca Trotta on September 29, 2021 As described in last months’ articles, genomic methods are increasingly and effectively used to support diagnostic, preventive, and therapeutic strategies and enhance the development of personalized medicinal approaches. Please find our previous articles here: Genomic Corner  Evolution of genomics Genomics is the study…

Continue Reading GenomSys – Bioinformatics and genomics

Mapping multiples

Mapping multiples 1 Hi, I am coming to you for help. I am doing a mapping on short and long read files with BWA and MINIMAP2 My problem is that, I want to make an if loop that would allow me to choose either BWA if I work with short…

Continue Reading Mapping multiples

Mean and SD read length from a range of fastq files

Question: Mean and SD read length from a range of fastq files 2 Hi all, I’m trying to write some code to generate mean read length data from a range of fastq files. awk ‘{if(NR%4==2) print NR”t”$0″t”length($0)}’ HG1.fastq > readLength.txt i’ve got as far as here from looking through other…

Continue Reading Mean and SD read length from a range of fastq files

PhD Student Needed for Machine Learning (Deep Learning and Classical) in Molecular Biology

Job:PhD Student Needed for Machine Learning (Deep Learning and Classical) in Molecular Biology 0 Several openings are available immediately (or as late as Fall 2022) Looking for a highly motivated PhD student for Computational Biology research, with an algorithm development focus. The Ecological and Evolutionary Signal-processing (EESI) and Informatics lab…

Continue Reading PhD Student Needed for Machine Learning (Deep Learning and Classical) in Molecular Biology

Use grep to loop a command in a script

Hello, I am doing a measurement of the HWE per Population. I have done this already without trouble with 10 populations, but now I’m doing it with 89 populations so I’d like to create a script. I use this command to create a list with all the populations and their…

Continue Reading Use grep to loop a command in a script

Gromacs Contact Map | Contact Information Finder

Listing Results Gromacs Contact Map Contact maps using Gromacs ResearchGate Just Now Researchgate.net View All Contact maps using Gromacs ? I used gmx mdmat in gromacs to create contact maps, but it seems that the mdmat gives the minimum average distance rather than the average centre-of-mass distance. Estimated Reading Time:…

Continue Reading Gromacs Contact Map | Contact Information Finder

Filtration of paired reads with both length higher than 50 from bam file

Filtration of paired reads with both length higher than 50 from bam file 1 I understand that for filtering reads based on length i can simply use: samtools view -h file.bam | awk ‘length($10) > 50 || $1 ~ /^@/’ | samtools view -bS – > output.bam But how can…

Continue Reading Filtration of paired reads with both length higher than 50 from bam file

gnomADc plugin instructions not working for VEP

Hi, I have installed VEP for offline use and I am trying to download the data for the gnomADc plugin. The instructions from VEP are as follows; genomes=”storage.googleapis.com/gnomad-public/release/3.0/coverage/genomes” genome_coverage_tsv=”gnomad.genomes.r3.0.coverage.summary.tsv.bgz” wget “${genomes}/${genome_coverage_tsv}” zcat “${genome_coverage_tsv}” | sed -e ‘1s/^locus/#chromtpos/; s/:/t/’ | bgzip > gnomADc.gz tabix -s 1 -b 2 -e 2 gnomADc.gz…

Continue Reading gnomADc plugin instructions not working for VEP

Sum values in a row and compare the results to a value with awk or sed

Sum values in a row and compare the results to a value with awk or sed 0 I have a file organized as below ERR1017187.315 32630:2 0:37 32630:7 0:71 |:| 0:25 32630:10 0:82ERR1017187.333 32630:2 0:37 32630:3 0:75 |:| 0:117ERR1017187.336 32630:1 0:37 32630:6 0:73 |:| 0:117ERR1017187.358 32630:3 0:35 32630:2 0:77 |:|…

Continue Reading Sum values in a row and compare the results to a value with awk or sed

Editing header of a fasta file

Editing header of a fasta file 1 Hello everybody, I’ve been using sed but for simple steps and now I can’t do this: I have this header: >ENSP00000451042.1 pep chromosome:GRCh38:14:22438547:22438554:1 gene:ENSG00000223997.1 transcript:ENST00000415118.1 gene_biotype:TR_D_gene transcript_biotype:TR_D_gene gene_symbol:TRDD1 description:T cell receptor delta diversity 1 [Source:HGNC Symbol;Acc:HGNC:12254] and I would like to obtein this:…

Continue Reading Editing header of a fasta file

SURVIVOR merge function usage

Hello, I have issues with SURVIVOR merge function that only shows coincidental cases when I draw VenDiagram. Can anyone please help me with this issue? used survivor merge and genComp function to compare two vcfs but it only shows coincidental cases. The below is my code. In bash, SURIVOR merge…

Continue Reading SURVIVOR merge function usage

Produce PCA bi-plot for 1000 Genomes Phase III

Note1 – Previous version: Produce PCA bi-plot for 1000 Genomes Phase III in VCF format (old) Note2 – this data is for hg19 / GRCh37 Note3 – GRCh38 data is available HERE The tutorial has been updated based on the 1000 Genomes Phase III imputed genotypes. The original tutorial was…

Continue Reading Produce PCA bi-plot for 1000 Genomes Phase III

How to remove the header in fasta file and keep only the desirable part on ubuntu?

How to remove the header in fasta file and keep only the desirable part on ubuntu? 2 Hi all, I have a fasta file with this header >10005_M12.fastq Otu0001|242290|M1.fastq-M12.fastq-M5.fastq-URTM6.fastq-M7.fastq-M9.fastq I want to remove all the header parts except the OTU (with its number), I used the this command “sed ‘s/>M.Otu/>Otu/g’…

Continue Reading How to remove the header in fasta file and keep only the desirable part on ubuntu?

How to remove the header in fasta file and keep only the desirable part on ububtu?

How to remove the header in fasta file and keep only the desirable part on ububtu? 1 Hi all, I have a fasta file with this header >10005_M12.fastq Otu0001|242290|M1.fastq-M12.fastq-M5.fastq-URTM6.fastq-M7.fastq-M9.fastq I want to remove all the header parts except the OTU (with its number), I used the this command “sed ‘s/>M.Otu/>Otu/g’…

Continue Reading How to remove the header in fasta file and keep only the desirable part on ububtu?

Edit vcf file 0|0 to 0

Edit vcf file 0|0 to 0 1 I have a vcf file with GT format as 0|0 0|1 1|1 etc. I would like to convert those to a single number to create a dosage file. Ex: Editing the vcf so that 0|0 become 0, 0|1 becomes 1 1|1 becomes 2…

Continue Reading Edit vcf file 0|0 to 0

Problem with BAM file headers

Problem with BAM file headers 2 Hello Everyone, I have noticed an issue with my BAM file headers, where the @RG line is either mal-formed or is missing entirely. I think I can sed the files that are mal-formed, and add the sample names necessary to complete my further analyses…

Continue Reading Problem with BAM file headers

Remove duplicates in fasta files based on a specific value with awk

Remove duplicates in fasta files based on a specific value with awk 1 I have a FASTA file organized as such: >Prevalence_Sequence_ID:13|ARO_Name:AxyX|ARO:3004143|Detection_Model:Protein Homolog Model ATGAAGCAAAGAGTCCCTCTACGCACGTTCGTCCTATCTGCCGTATTAATTCTTATTACTGGTTGCTCGAAACCGGAAACCCAACCAGCCGCCGACGCCCCGGCGGAGAT >Prevalence_Sequence_ID:14|ARO_Name:adeF|ARO:3004143|Detection_Model:Protein Homolog Model ATGAATATCTCGAAATTCTTCATCGACCGGCCGATCTTCGCCGGCGTGCTTTCGATCCTGGTGTTGCTGGCGGGCATACTGGCCATGTTCCAGCTGCCCATTTCCGAGTACCCGGAAGTGGTGCCGCCGTCGGTGGTGGTGCGCGCGCAGTATCCGGGCGCCAACCCCAAGGTCATCGCCGAAACCGTGGCCTCGCCGCTGGAGGAG I need to remove sequences that share the same ARO code (such as those above), keeping only one. is there a…

Continue Reading Remove duplicates in fasta files based on a specific value with awk

how to replace a list of headers in fasta file that are not in order

how to replace a list of headers in fasta file that are not in order 0 That is how my fasta file looks like: >monCan3F9-B-G1795-Map9 TTTATTATACCCTGAACCCATTAAAA(multiple lines) >monJX13F48-L-B718-Map1 AAAATTAATTCAGAATTATGTTTG(multiple lines) . . . the list of new names are not in the same order as in the fasta file, so…

Continue Reading how to replace a list of headers in fasta file that are not in order

Average Amino Acid Identity (AAI) analysis manually

Average Amino Acid Identity (AAI) analysis manually 1 Hi all, I need to perform Average Amino Acid Identity (AAI) analysis for 422 genome using the SLURM system that only allows jobs to run for 3 days. Tool like compareM can’t finish the job on time. Therefore I wish to run…

Continue Reading Average Amino Acid Identity (AAI) analysis manually

List of human protein coding genes with given name (known function?)

List of human protein coding genes with given name (known function?) 2 Hello, To put it simply, I am doing differential expression analysis on human RNA-seq data and I want to focus my analysis of genes that are: 1) Protein coding, so no SNOR or MIR 2) Genes with a…

Continue Reading List of human protein coding genes with given name (known function?)

please make the build reproducible

Source: samtools Version: 1.13-1 Severity: wishlist Tags: patch User: reproducible-bui…@lists.alioth.debian.org Usertags: buildpath X-Debbugs-Cc: reproducible-b…@lists.alioth.debian.org Hi, Whilst working on the Reproducible Builds effort [0] we noticed that samtools could not be built reproducibly. This is because it includes the build flags in the binary to include in –version output, but this…

Continue Reading please make the build reproducible

samtools server vs cluster error

Using inspiration from this thread HISAT2 output direct to bam, I’m attempting to run this command. The shell variables in this case represent paths to files/locations that make sense and in fact this command runs fine on my Ubuntu 18.04 LTS server using hisat 2.1 and samtools 1.10 (this seems…

Continue Reading samtools server vs cluster error

Split Fasta file and rename output files with contig names

Split Fasta file and rename output files with contig names 2 Hello! I am trying to split a large fasta file (19,336 lines) into individual contigs. The file set up is as follows: >k141_284136 flag=1 multi=3.0000 len=1875 AGCCTACATTGGCAAGGTACTGCTTTTGTCGCCCATCGTTGGCGAATTTGCTAATGAGAACACACGGAT >k141_407195 flag=1 multi=5.0000 len=1723 GCCAGTAGTTTTCAGATTTTCAATTACTTTCTTTGCTTCTTTTAACGCAGCCGCAAAGTTGTCATCAAGTTCTCCACCCTGTGCAATATGTTTATATAGAATGCTGCTTACTTTGTCAGCAA >k141_169332 flag=1 multi=3.0000 len=20 ATTATCCATCCTATTCATCGCTTGATGAAATGTTGCAAAATTCCAAAGATTTTCAGCGTCAAATCGTTCGTATATCCTAATTAAACACCGCTAAAAGTTATGTCTAAGCAATCTTTAA I am…

Continue Reading Split Fasta file and rename output files with contig names

Add words at beginning and end of the same line for the FASTA header line with sed

Add words at beginning and end of the same line for the FASTA header line with sed 0 I have the following line: >A_1000 ACTTTCGATCTCTTGTAGATCTGTTCTC…CAC ACTTTCGATCTCTTGTAGATCTGTTCTC…CAC I would like to convert the first line as follows: >INITWORD/A_1000/FINALWORD ACTTTCGATCTCTTGTAGATCTGTTCTC…CAC ACTTTCGATCTCTTGTAGATCTGTTCTC…CAC I found a similar question that did allow me to append…

Continue Reading Add words at beginning and end of the same line for the FASTA header line with sed

bcftools multiallelic split not working

I am attempting to split multiallelic sites using bcftools norm with the following command: zcat ${inputVcf} | sed ‘s/AD,Number=./AD,Number=R/g’ | sed ‘s/ADR,Number=./ADR,Number=R/g’ | sed ‘s/ADF,Number=./ADF,Number=R/g’ | bcftools norm –fasta-ref ${genomeFa} –check-ref s –multiallelics -any –output ${outputVcf} The sed commands were based on the recommendation from here. However I’m still getting…

Continue Reading bcftools multiallelic split not working

More than one archive specified. Try –help.

Package: routine-update Version: 0.0.6 Severity: important Hi Andreas, when working on making sure the python-biopython watch file was appropriately fixed, I saw routine-update choke with the following error: $ routine-update gbp:info: Fetching from default remote for each branch gbp:info: Branch ‘master’ is already up to date. gbp:info: Branch ‘pristine-tar’ is already up to date. gbp:info: Branch…

Continue Reading More than one archive specified. Try –help.

Linearize fasta files

Program versions used: BBMap – v. 38.32Seqtk – v. 1.3-r106Seqkit – v. 0.8.1Perl – v. 5.16.3Python – v. 3.6.6sed – v. 2.2.2 $ time (cat Homo_sapiens.GRCh38.dna.primary_assembly.fa > /dev/null) real 0m1.050s user 0m0.002s sys 0m1.045s With BBMap – reformat.sh $ time reformat.sh -Xmx40g in=Homo_sapiens.GRCh38.dna.primary_assembly.fa fastawrap=0) java -ea -Xmx40g -cp bbmap/current/ jgi.ReformatReads…

Continue Reading Linearize fasta files

Remove whitespaces on fasta files, except on fasta-header

Remove whitespaces on fasta files, except on fasta-header 0 Hey everyone, I have a multi-fasta file like this: >NC_000914 464618..534825 gtgccttccattttggagcgggaccaaatcgcagcggttctggtaagtgcgagcagggac gtgccttccattttggagcgggaccaaatcgcagcggttctggtaagtgcgagcagggac aaaacgccggccggcttgcgggaccatgcgatattacaactgctcgccacctacggactg aaaacgccggccggcttgcgggaccatgcgatattacaactgctcgccacctacggactg cgatcaggagaaatccgcaacatgcggattgaggatatcgattggcggaccgaaaccatt cgatcaggagaaatccgcaacatgcggattgaggatatcgattggcggaccgaaaccatt I would like to remove whitespaces from the fasta sequences, but keep the whitespaces on the fasta-headers (>). I use this command sed -i…

Continue Reading Remove whitespaces on fasta files, except on fasta-header

bash script

bash script 3 Hello everyone, I have a file like this: RSID1 RSID2 chr1_169894240_G_T_b38 chr1_169894240_G_T_b38 chr1_169894240_G_T_b38 chr1_169891332_G_A_b38 chr1_169891332_G_A_b38 chr1_169891332_G_A_b38 chr1_169661963_G_A_b38 chr1_169661963_G_A_b38 chr1_169661963_G_A_b38 chr1_169697456_A_T_b38 chr1_169697456_A_T_b38 chr1_169697456_A_T_b38 chr1_27636786_T_C_b38 chr1_27636786_T_C_b38 chr1_196651787_C_T_b38 chr1_196651787_C_T_b38 chr6_143501715_T_C_b38 chr6_143501715_T_C_b38 I want to extract info just like: chr1_169894240 chr1_169894240. I don’t want to have other info. I just want…

Continue Reading bash script

Convert a VCF-file in a user specific Format

Convert a VCF-file in a user specific Format 0 Hello everyone, I am curious if it is possible to convert a VCF-File (with multiple samples) in a Format whith 5 columns. Column should be Sample ID Column: Position on the chromosome Genotyp Number of reads covering site QUAL phred-scaled quality…

Continue Reading Convert a VCF-file in a user specific Format

How to convert mapping bam file to fastq without loseing the mapping information

How to convert mapping bam file to fastq without loseing the mapping information 0 Hi all, I want to create my RNA mapping data into a library for further analysis. Now I have bowtie2 mapping data, which is in bam files, I now use bedtools to extract fastq mapping reads…

Continue Reading How to convert mapping bam file to fastq without loseing the mapping information

Replace multiple text with corresponding text

Replace multiple text with corresponding text 1 Hi, I run an analysis and the software replaced the bacteria name with codes, and I have txt file as below: Order Original Name Code 1 Allostreptomyces_psammosilenae_DSM_42178 S1_f1 2 Embleya_hyalina_NBRC_13850 S2_f2 3 Embleya_scabrispora_DSM_41855 S3_f3 Because the analysis involved few hundreds bacteria, it would…

Continue Reading Replace multiple text with corresponding text

The usage of sed

The usage of sed 1 sed -e ‘s/_scATAC_hg19_noDup_noMT.bam//g’ -e ‘s//directory/to/singleCell///g’ bamlist.txt | sed -e ‘s///t/g’ | awk ‘OFS=”t”{print $2}’ | tr ‘n’ ‘t’ > header.txt This replacement command is too complex. Can someone explain what this means? linux sed shell • 51 views • link updated 1 hour ago by…

Continue Reading The usage of sed