Categories
Tag: SED
NifH database for taxonomic assignment in qiime2 – General Discussion
JThurston (Josh Thurston) January 19, 2024, 8:05pm 1 Hi all! I’m currently working through a Qiime2 pipeline analysing Illumina miseq paired-end amplicon data. I’ve successfully analysed bacterial (16s) amplicons from importing, filtering through to taxonomic assignment. However, I also have amplicon data for a functional gene (nifH; nitrogenase for N2…
Extract fasta sequence from gff3 file
Extract fasta sequence from gff3 file 2 Hi everyone, I have a lot of .gff3 files with the CDS features and below with the fasta sequence. This sequence is separated from the CDS features like this: ##FASTA >NZ_NZ_LR130533.1 I would like to extract all the fasta sequence into new fasta…
Function ‘SubBackward0’ returned nan values in its 1th output – autograd
I am trying to implement a model where the forward function calls to an external function that computes the values using the model’s parametes. class R_model(torch.nn.Module): def __init__(self,) : super().__init__() self.Kx = torch.nn.Parameter(torch.randint(-200, -100, (1,)).float()*0.0001) self.Ky = torch.nn.Parameter(torch.randint(-200, -100, (1,)).float()*0.0001) self.Kz = torch.nn.Parameter(torch.randint(-200, -100, (1,)).float()*0.0001) def forward(self): return generate_r(Kx=self.Kx, Ky=self.Ky,…
Species coverage in the NCBI protein NR database ?
Hi Biostars, I am currently trying to build a Eukaryote version of the NCBI NR database and I am not really sure that I fully understand how the NR is implemented. Here is the code that I’m using to do so : #!/usr/bin/bash ############## # DOWNLOAD FULL NR ############## baseURL=”https://ftp.ncbi.nlm.nih.gov/blast/db/”…
Pruning with –indep-pairwise with plink 1.9
I’m new to PLINK and I would like to obtain a file with SNPs in approximate linkage equilibrium. Here is my script and the outputs of each step. If someone could tell me if there is an error in the script because at…
STAR output
STAR output 1 Hello, I am trying to map with STAR but it is not clear to me why I am not getting the SAM/BAM mapping file, could you help me? [epola@mazorka alignment_STAR]$ ls -lh total 13M -rw-rw-r– 1 epola epola 13M Nov 17 12:34 SRR22164928SJ.out.tab -rw——- 1 epola epola…
Pflowtts Pytorch – Open Source Agenda
P-Flow: A Fast and Data-Efficient Zero-Shot TTS through Speech Prompting Authors : Sungwon Kim, Kevin J Shih, Rohan Badlani, Joao Felipe Santos, Evelina Bhakturina,Mikyas Desta1, Rafael Valle, Sungroh Yoon, Bryan Catanzaro Affiliations: NVIDIA Status : Generated first sample (Check LJSpeech_Sample_100_epochs.wav) on 11/16/2023. Unofficial implementation of the paper P-Flow: A Fast…
[main_samview] fail to read the header from “human_g1k_v37.annotate.fasta”.
[main_samview] fail to read the header from “human_g1k_v37.annotate.fasta”. 1 Hi, I tried to annotate chromosome with prefix “chr” in a fasta file like: sed ‘s/^>/>chr/’ human_g1k_v37.fasta > human_g1k_v37.annotate.fasta However, after that, I failed to view header of the new fasta file: samtools view -H human_g1k_v37.annotate.fasta >>> [main_samview] fail to read…
How to swap UMIs?
How to swap UMIs? 0 Hi all, I have a fastq file. This is my UMI TCAAT. but unfortunately it appears at the end of line. After the alignment this information was lost. How to swap this information Expected output @NS500595:901:HCCFKAFX5:1:11101:20769:1089_TCAAT 1:N:0:GGGGGGGG+AGATCTCG_TCAAT Input i have @NS500595:901:HCCFKAFX5:1:11101:20769:1089 1:N:0:GGGGGGGG+AGATCTCG_TCAAT ATGTGGGAAACTCGACTGCATAATTTGTGGTAGTGGGGGACTGCGTTCGCGCTTTCCCCT + EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEE…
How To Get Chromosome Position Given Rs Number?
How To Get Chromosome Position Given Rs Number? 3 I have a list of a few hundred SNPs given by rs number. I want to get the chromosome and position for each SNP. For example: input: rs4477212 output: chr1:82154 snp chromosome position • 29k views you can download this information…
Comparative genomics and genome-wide SNPs of endangered Eld’s deer provide breeder selection for inbreeding avoidance
De novo genome assemblies and genome annotation We assembled a de novo genome of a seven-year-old male SED from Ubon Ratchathani Zoo using a combination of Illumina short-reads (92.94 × coverage) and PacBio long-reads (61.6 × coverage) (GenBank accession number: JACCHN000000000). Additionally, we used MGI short-reads (52.15 × coverage) to assemble a de novo genome of…
PIGx ChIP-seq pipeline error
Hi Lisa, You also need to modify the gtf annotation file using: sed ‘/^#/d’ annotation_file.gtf > annotation_file_no_header.gtf Best, Alex > On 12. Oct 2022, at 15:07, Bora Uyar <borauy…@gmail.com> wrote: > > You would need to check how your fasta headers look and how the chromosomes are represented in…
Fixed effect, random, or both in generalized additive mixed model (GAMM)? – rstudio
Hello, I need help identifying if a predictor variable needs to be a fixed effect, a random effect, or if both are necessary. I understand a fixed effect to mean “a variable of interest” and a random effect to be something that represents a structural component, like a sample design…
r – Problems with Rprofile, dont load at startup
I have a R (4.2.2) and RStudio (2023.06.2) installed on a MacOS system, before I update Rstudio I have no problem, but with those version I don’t how to load .Rprofile at startup RStudio. The defaul working directory for RStudio is in ~/R and my ~/.Rprofile is at home file.path(Sys.getenv(“HOME”),…
ACSA reflects on W Cape socio-economic development as it celebrates 30 years
Airports Company South Africa’s (ACSA) contribution to the economic growth and development of South Africa extends beyond the numbers, telling a story of a key enabler of economic growth, transformation and socio-economic development. Elelwani Tshikovhi As ACSA celebrates its 30th anniversary this year, it is fitting to note its massive…
main-armv7-default][biology/viennarna] Failed for viennarna-2.6.3 in build
You are receiving this mail as a port that you maintain is failing to build on the FreeBSD package build server. Please investigate the failure and submit a PR to fix build. Maintainer: y…@freebsd.org Log URL: pkg-status.freebsd.org/ampere2/data/main-armv7-default/p08943441f26e_s6e92fc9309/logs/viennarna-2.6.3.log Build URL: pkg-status.freebsd.org/ampere2/build.html?mastername=main-armv7-default&build=p08943441f26e_s6e92fc9309 Log: =>> Building biology/viennarna build started at Mon Oct 16…
Edit fasta header for TSA submission
Edit fasta header for TSA submission 0 Hi everyone, I’ve been trying to edit the headers of my fasta file which is intend to upload on NCBI TSA. Can’t seem to successfully upload my file on TSA and if im not mistaken it could be because of the header format….
Help with interpreting GO enrichment resutls using goseq – usegalaxy.org support
ding66 October 11, 2023, 6:35pm 1 Hi there! I’m new to RNA-Seq results analysis. By following the tutorial of ” Reference-based RNA-Seq data analysis” (Reference-based RNA-Seq data analysis), I have so far completed mapping, annotation, differential expression analysis. Now the next step that I want to do is gene enrichment…
18S taxonomy assignment SILVA database formatting
Hi Bioinformatic community, I would like to classify 18S data (V7) of Fungi with assignTaxonomy from dada2. For that I downloaded SILVA_132_SSURef_tax_silva.fasta.gz from the SILVA website and need to format it, what I do with some Linux command line oneliner. But some species in the database have a different number…
When nature turns deadly: A look at Abrin
What is abrin? In certain parts of Asia and Australia, grows a flowering plant of the bean family Fabaceae, known as Abrus precatorius, also more commonly as jequirity bean or rosary pea [1]. It’s a delicate, perennial climber but extremely invasive and classified as a weed in several countries. However, what…
linux – Output file name not being correctly named for Bash
The file name format is like this: 4digitnumber_S_R1_001.fastq.gz. To give you an example 3145_S2_R1_001.fastq.gz I’m trying to have my output file name not include _R1_001 part but it keeps including the full file name. I am not sure why it’s not giving me the correct output file name format that…
Types of blood tests a doctor may order
No single test can confirm that a person has lupus, but a doctor may use several types of blood tests to reach a lupus diagnosis. These include tests to look for antinuclear antibodies (ANA) in the blood. Systemic lupus erythematosus (SLE), or lupus, is a chronic autoimmune disease resulting from…
main-arm64-default][biology/viennarna] Failed for viennarna-2.6.3 in build
You are receiving this mail as a port that you maintain is failing to build on the FreeBSD package build server. Please investigate the failure and submit a PR to fix build. Maintainer: y…@freebsd.org Log URL: pkg-status.freebsd.org/ampere2/data/main-arm64-default/p85ccf094713a_sc584bb9cac1/logs/viennarna-2.6.3.log Build URL: pkg-status.freebsd.org/ampere2/build.html?mastername=main-arm64-default&build=p85ccf094713a_sc584bb9cac1 Log: =>> Building biology/viennarna build started at Sat Sep 23…
Salmon index not progressing
Salmon index not progressing 0 Hi! I am having issue with salmon index formation since I cannot use STAR due to limited amount of RAM (as per my recent post). I tried to follow this tutorial on how to create decoy-aware transcriptome as well as doing directly this and I…
Starting Server for Non-Default Users in JupyterHub: 500 Internal Server Error – JupyterHub
I am encountering an issue with starting a server for non-default users in JupyterHub. When attempting to start a server for a user named “mahdi” (or any other non-default user), I receive the following error message in the JupyterHub container logs: [I 2023-09-20 07:40:15.461 JupyterHub provider:659] Creating oauth client jupyterhub-user-mahdi…
failed reading from temporary file
STAR error EXITING because of FATAL ERROR: failed reading from temporary file 1 Hello all, I’m attempting to use STAR to map some RNA-seq data, but keep getting some sort of error. Here is the command I used: for i in `ls *_clean.fastq | sed ‘s/_clean.fastq//’`; do STAR –runThreadN 20…
Error executing process > consensus_classification in NanoCLUST
Error executing process > consensus_classification in NanoCLUST 1 Hello, I’m trying to run NanoCLUST with my 16S sequence data. I run it on a Linux ubuntu 20.04 machine. When I use the command: nextflow run main.nf -profile docker –reads “*.fastq” –db “db/16S_ribosomal_RNA” –tax “db/taxdb/” I get the following terminal output:…
FASTA file of fixed length
FASTA file of fixed length 7 Hi, I have a FASTA file like this: >1 TCAAGAGGGGTGAATGTGTTTCGCATGCACAAGGGACAGGAGTCT >2 ATCAGAGCTGGTGGGGTGGAGAGACAGAAACAAGTGGGAGAAGGT >3 TTATACCTACCTTATAGATAAGGAAATTGAAGCTTATAGAGTTTA >4 ATTTTTCCTTATGATACTCTATTGCCTCTCCATGGATAAAGACAG >5 AAACTCCTGACCTCAGGTGATCCACCTGCCTCGGCCTCCCAAAGT >6 TGCACACCTTCAGAACTGTGAACCAAATAAACCTCTCTTCTTTAAAATTATTCATCCTCT GGTATTCCTTTATAACAA >7 CTCTTGATGTCATTTCACTTCGGATTCTTCTTTAGAAAACTTGGA Every sequence has fixed length of 45nt but some sequences like sequence no .6 has more length. There are some more…
Corrupted FASTq files with missing “+” under some sequences.
Corrupted FASTq files with missing “+” under some sequences. 1 Hi, I have been trying to recover corrupted fastqs files. I had a decompression error; invalid compressed data–crc error. I got around the crc error by using gzrecover and then used a seqkit sana to fix sequence inconsistencies. Now, the…
How to remove fasta headers in a multifasta file and write file name as a fasta header?
How to remove fasta headers in a multifasta file and write file name as a fasta header? 3 I have fasta file namely 119XCA.fasta as shown below, >cellulase ATGCTA >gyrase TGATGCT >16s TAGTATG I need to remove all the fasta headers, keep the sequences one by one and need to…
How to limit fasta header to 40 characters?
How to limit fasta header to 40 characters? 0 I have FASTA headers with long annotation names, but the program it will be run through for proteomics has a limit of roughly 40 characters or else it will crash. The file starts off like this: >TRINITY_DN0_c0_g1_i1.p1 – RecName: Full=E3 ubiquitin-protein…
Cannabis seeds on the move: Exploring shipping policies and trends across seed banks
This article delves deep into the current shipping policies and trends among these seed banks, providing a panoramic understanding of the seed distribution landscape. In recent years, the cannabis industry has burgeoned, primarily due to changing regulations, scientific research, and shifting societal attitudes. One subset of the cannabis industry that…
efetch from NCBI E-utilities returns “curl error s 400 & 500” and takes a very long time
efetch from NCBI E-utilities returns “curl error s 400 & 500” and takes a very long time 0 I run this command to download ~4,000 gene sequences for invA gene for taxonomy# 28901. It works fine for smaller datasets, but … but takes very long time and never finishes for…
Installing R and RStudio on Linux for Data Analysis
R is a versatile programming language and environment designed specifically for data analysis and statistical computing, making it an incredible choice for data-driven work. R has gained significant popularity across the data science, data analysis, data visualization, and statistical communities due to its extensive capabilities and active user community. In…
Rstudio can’t find CMAKE even though it is in /usr/local/bin – Package Management
I have been attempting to install a package that requires cmake. However, Rstudio can’t seem to find it for some reason: R version 4.3.1 (2023-06-16) — “Beagle Scouts” Copyright (C) 2023 The R Foundation for Statistical Computing Platform: x86_64-apple-darwin22.4.0 (64-bit) R is free software and comes with ABSOLUTELY NO WARRANTY….
PacBio 16S pipeline
QIIME2 analysis pipeline 0. activate conda environment conda activate qiime2-2019.10 1. flip sequences to the same direction mkdir raw_data_rc/ mkdir raw_data_cat/ parallel -j 8 ‘seqtk seq -r {} > raw_data_rc/{/.}_rc.fastq’ ::: rawdata/*.fastq parallel -j 8 –link ‘cat {1} {2} > raw_data_cat/{1/.}_cat.fastq’ ::: rawdata/*.fastq ::: raw_data_rc/*_rc.fastq 2. trim primers mkdir trimmed_reads/…
bamCoverage fails in bam files with large number of small contigs in headers
bamCoverage fails in bam files with large number of small contigs in headers 0 Hi, I plan to use bamCoverage from Deeptools to get bw files, but it looks like the thread is dead as it never finishes (no errors). I have hundred of thousands of short unlocalized/random contigs in…
vcf file chr notation
“I have a single VCF file named ‘ALL.wgs.shapeit2_integrated_snvindels_v2a.GRCh38.27022019.sites.vcf.gz’. The issue at hand is that the file uses different chromosomal notation and lacks the ‘chr’ prefix. Like this “##fileformat=VCFv4.3 ##FILTER=<ID=PASS,Description=”All filters passed”> ##fileDate=11032019_15h52m43s ##source=IGSRpipeline ##reference=ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/GRCh38_full_analysis_set_plus_decoy_hla.fa ##contig=<ID=1> ##contig=<ID=2> ##contig=<ID=3> ##contig=<ID=4> ##contig=<ID=5> ##contig=<ID=6> ##contig=<ID=7> ##contig=<ID=8> ##contig=<ID=9> ##contig=<ID=10> ##contig=<ID=11> ##contig=<ID=12> ##contig=<ID=13> ##contig=<ID=14> ##contig=<ID=15> ##contig=<ID=16>…
Apply Plink2 Score – Error Invalid chromosome code
I am trying to run a calculator tool for polygenic scores called pgsc_calc (The Polygenic Score Catalog Calculator pipeline) that runs with nextflow and docker in linux, with my own VCF file. Its failing step 8: process > PGSCATALOG_PGSCALC:PGSCALC:APPLY_SCORE:PLINK2_SCORE **ERROR ~ Error executing process > ‘PGSCATALOG_PGSCALC:PGSCALC:APPLY_SCORE:PLINK2_SCORE (NG13RY1WV.vcf.gz chromosome ALL effect…
Invasion success of a Lessepsian symbiont-bearing foraminifera linked to high dispersal ability, preadaptation and suppression of sexual reproduction
Simberloff, D. et al. Impacts of biological invasions: What’s what and the way forward. Trends Ecol. Evol. 28, 58–66 (2013). Article PubMed Google Scholar Bellard, C., Cassey, P. & Blackburn, T. M. Alien species as a driver of recent extinctions. Biol. Lett. doi.org/10.1098/rsbl.2015.0623 (2016). Article PubMed PubMed Central Google Scholar …
Bash script with command line options gets stuck and doesn’t set default values for variables
I am pretty green when it comes to bash scripts and completely new to command line functionality in bash. I tried my hand at a script which is supposed to be useable both with command line arguments as well as manual setting of variable values, if the user prefers to…
transcripts missing from tx2gene
transcripts missing from tx2gene 2 How can I know the reference trascriptome used in the pre-computed index ? You can download the fasta transcriptome file archive (fasta, .fai index and chrome.sizes) used for that index here: refgenomes.databio.org/v3/assets/archive/2230c535660fb4774114bfa966a62f823fdb6d21acf138d4/fasta_txome?tag=default This should get you the table you need $ grep “^>E” 2230c535660fb4774114bfa966a62f823fdb6d21acf138d4.fa |…
Bioinformatics Analyst III, Spatial Biology, CGR job with Frederick National Laboratory for Cancer Research
Bioinformatics Analyst III, Spatial Biology, CGR Job ID: req3667Employee Type: exempt full-timeDivision: Clinical Research ProgramFacility: Rockville: 9615 MedCtrDrLocation: 9615 Medical Center Drive, Rockville, MD 20850 USA The Frederick National Laboratory is a Federally Funded Research and Development Center (FFRDC) sponsored by the National Cancer Institute (NCI) and operated by Leidos…
Useful Bash Commands To Handle Fasta Files
You will probably get a lot of different answers because there are many ways to parse fasta files with Bash and tools like grep, awk and sed. Here are some suggestions. To extract ids, just use the following: grep -o -E “^>\w+” file.fasta | tr -d “>” A useful step…
BEDOPS gtf2bed conversion error with Ensembl GTF
You can generate BED files (from e.g. GTF file of the Ensembl release) by executing the following command in Linux Shell: # For genes grep -P “\tgene\t” your_ensembl.gtf | cut -f1,4,5,7,9 | \ sed ‘s/[[:space:]]/\t/g’ | sed ‘s/[;|”]//g’ | \ awk -F $’\t’ ‘BEGIN { OFS=FS } { print $1,$2-1,$3,$6,”.”,$4,$10,$12,$14…
HMM gets zero or 1 hits when many more expected
HMM gets zero or 1 hits when many more expected 1 Hi all, My ultimate goal is to understand the phylogeny of a set of restriction-modification enzymes among certain genomes. For this, I have done the following: Downloaded all RM genes DNA sequences into psych_rm_genes.fna from REBASE Cleaned rebase file…
main-amd64-default][biology/viennarna] Failed for viennarna-2.5.1 in build
You are receiving this mail as a port that you maintain is failing to build on the FreeBSD package build server. Please investigate the failure and submit a PR to fix build. Maintainer: y…@freebsd.org Log URL: pkg-status.freebsd.org/beefy18/data/main-amd64-default/p8fb94260154e_s510fd83138/logs/viennarna-2.5.1.log Build URL: pkg-status.freebsd.org/beefy18/build.html?mastername=main-amd64-default&build=p8fb94260154e_s510fd83138 Log: =>> Building biology/viennarna build started at Sat Jul 15…
Jupyterhub: Kernels in different environments not working – Kernels
Hi,I’m not able to get xeus-cling- and R-kernel to work in jupyterlab or notebook startet from jupyterhub.My system : Ubuntu 22.04 with miniconda3 installed in /opt.I have an environment for jupyterhub (jupyterhubenv) and one for xeus-cling (xeusclingenv) and one for R (R-env).It works, when I install “nb_conda_kernels” and start the…
Mapping to mtDNA and then align the unmapped
Mapping to mtDNA and then align the unmapped 1 Hello all, I have aligned my samples against the mitochondrion genome of the species I work with. My idea was that after this I would keep the unmapped ones (which would be the nuclear reads), and then align these against the…
Bug#1040953: bookworm-pu: package sra-sdk/3.0.3+dfsg-6~deb12u1
Package: release.debian.org Severity: normal Tags: bookworm User: release.debian….@packages.debian.org Usertags: pu X-Debbugs-Cc: sra-…@packages.debian.org Control: affects -1 + src:sra-sdk [ Reason ] Per #1039621, the new libngs-jni package accidentally wound up with bad content (unexpanded variables in the key symlink’s source *and* target) that rendered it useless. [ Impact ] This package’s…
Subject:[QIIME2.2023.5] Need help with Qiime2 installation: ResolvePackageNotFound error – Technical Support
Subject: Need help with Qiime2 installation: ResolvePackageNotFound error Dear Qiime2 Community, I hope this message finds you well. I am currently facing an issue during the installation of Qiime2 and would greatly appreciate your assistance in resolving it. During the installation process, after following the Qiime2 instructions, I encountered the…
Extract information from files
Extract information from files 0 hello, how are you? I have several table files with the following information, separated by tabs: GH5_8 Bacteria Actinoalloteichus fjordicus ADI127-7 APU14662.1 GH5_8 Bacteria Actinoalloteichus hoggarensis DSM 45943 ASO20105.1 GH5_8 Bacteria Actinoalloteichus sp. AHMU CJ021 AUS77477.1 GH5_8 Bacteria Actinoalloteichus sp. GBA129-24 APU20630.1 GH5_8 Bacteria Actinobacteria…
Contig labels in BAM off by 1, how do I fix it?
Contig labels in BAM off by 1, how do I fix it? 0 After alot of hairpulling over why strelka wouldn’t run on my bam files, I found that for 18 contig labels were wrong, and they’re off by one (picture below). I’ve figured out which labels should be changed…
Running RStudio in own/local Galaxy instance – usegalaxy.org support
Hello,I successfully managed to setup my own local Galaxy instance. Now I am trying to integrate the interactive tools starting with RStudio. After adding it to the tool_conf.xml and starting RStudio in Galaxy, I get the following error: sed: can’t read /etc/services.d/nginx/run: No such file or directory chmod: changing permissions…
(error) qiime feature-classifier fit-classifier-naive-bayes – Technical Support
I wanted to see V3 and V4 regions as well as all other regions in the Classify taxonomy step, so I downloaded the latest version (138.1) of the data from the Download tab on the SILVA homepage and followed the steps below.However, I got an error message in the ‘qiime…
grep – Keeping DNA sequence after changing FASTA header on command line
I have a FASTA header that looks like this: >7c8250ef-c89f-4d42-9d48-12c8fe245fb2 runid=606f271fc97598006ba5a922136a2c304cef75a5 sampleid=Pool12-1 read=19008 ch=301 start_time=2021-07-03T08:48:18Z barcode=barcode01 And I am able to change it to the desired output here: >7c8250ef-c89f-4d42-9d48-12c8fe245fb2_001 Using this command: grep ‘^>’ 001_old.fasta | cut -d ‘ ‘ -f 1,8 | sed ‘/^>/s/$/_001/’ > 001_new.fasta However it completely…
randomreads.sh only produces reads for chr1 to chr7
randomreads.sh only produces reads for chr1 to chr7 0 I used randomreads.sh from bbmap to generate reads from a fasta file generated with FastaAlternateReferenceMaker from GATK. It seems no matter which options I choose the script stops generating reads and chromosome 7 despite the fasta file contains all contigs from…
how to add the sample name to the end read headers
how to add the sample name to the end read headers 1 I would need to add the sample name at the end of all the read headers in that fasta sample. For example I have #Sample1 #>read1 #ATGC #Sample2 #>read1 #ATGC Desire output: #Sample1 #>read1/Sample1 #ATGC #Sample2 #> read1/Sample2…
write output files with default name
write output files with default name 0 I have prepared a shell script file and for the output I want to have a default name, but something is wrong. Can anybody revise this command? calldir=/profile/variant/input/ base=$(echo $sam | sed “s/.sam.*/_sorted/g”) sam=/profile/variant/input/s5000W_b2.bam –output $calldir/$(basename $base)_series_call.vcf.gz But in the output the file…
[slurm-users] Need to free up memory for running more than one job on a node
Hello, (This is my first time submitting a question to the list) We have a test-HPC with 1 login node and 2 computer nodes. When we submit 90 jobs onto the test-HPC, we can only run one job per node. We seem to be allocating all memory to the one…
Getting sequences from fastq file using Grep command
Getting sequences from fastq file using Grep command 2 I have been trying to get a sequence (e.g. GCGAGCCCCACATCGCCCCCCCGATTGTAATAAATAA) from a fastq file (file.fastq) I have and output a fq file. I have tried the command: grep -A 2 -B 1 ‘GCGAGCCCCACATCGCCCCCCCGATTGTAATAAATAA’ file.fastq | sed ‘/–/d’ > output.fq I got…
Grepping through API payloads with Gron
Introduction If you have spent any time reading some of my older articles, you know I am a fan of jq. In my article on How to Exploit APIs with cURL I showed how to parse API responses with it. I went even further when I showed how to extract…
Metagenomic highlight contrasting elevational pattern of bacteria- and fungi-derived compound decompositions in forest soils
Bomble YJ, Lin CY, Amore A, Wei H, Holwerda EK, Ciesielski PN, Donohoe BS, Decker SR, Lynd LR, Himmel ME (2017) Lignocellulose deconstruction in the biosphere. Curr Opin Chem Biol 41:61–70. doi.org/10.1016/j.cbpa.2017.10.013 Article CAS PubMed Google Scholar Cardenas E, Kranabetter JM, Hope G, Maas KR, Hallam S, Mohn WW (2015)…
Computational Scientist/ Spatial Biology – Frederick National Lab for Cancer Research
The Frederick National Laboratory is a Federally Funded Research and Development Center (FFRDC) sponsored by the National Cancer Institute (NCI) and operated by Leidos Biomedical Research, Inc. The lab addresses some of the most urgent and intractable problems in the biomedical sciences in cancer and AIDS, drug development and first-in-human…
A Beginner’s Guide to Perform Molecular Dynamics Simulation of a Membrane Protein using GROMACS — GROMACS tutorials https://tutorials.gromacs.org documentation
Building the protein-membrane system in CHARMM-GUI We are now ready to embed the protein structure in the membrane in the proper location and orientation and construct the membrane composition we desire. To do this, we utilized the CHARMM-GUI input Generator, a handy web-based tool to generate GROMACS inputs for the…
How to extract haplotype data from phased bcf files
How to extract haplotype data from phased bcf files 1 Hello, I have filtered/processed phased bcf files from wgs. I would like to extract the haplotype data per sample, so that I have a tab delim file which looks like this: Sample Chr Pos hap1 hap2 AW23 chr1 1234 A…
Docker Error while running nf-core/rnaseq pipeline
Docker Error while running nf-core/rnaseq pipeline 1 I have run a nf-core pipeline with the following parameters: nextflow run nf-core/rnaseq -r 3.10.1 –input samplesheet.csv –outdir outputlatest –fasta chr22_with_ERCC92.fa -profile docker –gtf chr22_with_ERCC92.gtf –skip_multiqc true –skip_dupradar true –skip_stringtie true –aligner star_salmon –pseudo_aligner salmon –max_memory 3.5GB –max_cpus 4 Receiving an error related…
Deferentially expressed gene with high log2foldchange by DESeq2; but not meaningful at the individual level
Hi all, I am working with the RNA-Seq data on human (24Cases-20 controls) to find differentially expressed genes. my RNA-Seq data is unstranded. Here is the comments that I used to align the fastq files: ls *_1P.fastq.gz | parallel –bar -j8 ‘R2=$(echo {} | sed s/_1/_2/) && out=$(echo {} |…
Trying to Change the Formatting of a Graph in R using ggplot2 – RStudio IDE
I want to modify this output graph so that each Wall and Restored bar is next to each other for each site. The wall and restored are the site types:This is the graph that I would like it to resemble in terms of formatting the bar placements: Here is my…
How to Split 3000 WGS CRAM files into 1Mbp length chunks
How to Split 3000 WGS CRAM files into 1Mbp length chunks 1 Hello, I have 3000 WGS CRAM files and I want to split them into 1Mbp chunks. I want to split with exact genomic coordinate locations, e.g. starting from 1 to 1000000bp, 1000001bp to 2000000bp, 2000001bp to 3000000 etc….
Error in Adding 1000Genomes Ancestral Allele info: Using VCF tools fill-aa
Error in Adding 1000Genomes Ancestral Allele info: Using VCF tools fill-aa 1 Hi I am trying to add ancestral allele to 1000 Genomes Phase3 VCF files. I have used the “human_ancestor_GRCh37_e59.tar.bz2” files for ancestral allele input file. The steps I have used are: cat human_ancestor_3.fa | sed ‘s,^>.*,>1,’ | bgzip…
angular – Cannot get the data from openAPI service – returns undefined
I have this service that I got from processing an openAPI spec file (.yaml) and the thing is I do not get the filtered data I want. Here is the method from my service : /** * Search cases with filters * @param status status of cases to return (all…
Edit and re-head BAM file
Edit and re-head BAM file 0 Hi there I have a BAM file which needs to be edited and re-headed. Now, I’m aware of how to do so the problem is that for some reason the sed command I’m using does not catch the sequence I have to remove… Below,…
Unable to create environment – Technical Support
Tried to create an environment using Conda and was not able to do so. Have copy pasted the message below. Would be grateful to know what the issue is and how to resolve the issue. (base) C:\Users\Mathangi Janakiraman>wget data.qiime2.org/distro/core/qiime2-2023.2-py38-linux-conda.yml–2023-05-11 12:54:47– data.qiime2.org/distro/core/qiime2-2023.2-py38-linux-conda.ymlResolving data.qiime2.org (data.qiime2.org)… 54.200.1.12Connecting to data.qiime2.org (data.qiime2.org)|54.200.1.12|:443… connected.ERROR: cannot verify…
No differentially expressed genes after multiple testing correction in mice
No differentially expressed genes after multiple testing correction in mice 0 Hi all, I am working with the RNA-seq data on mice (group A N=3 vs group B N=3). Mice are littermates, of which group A overexpresses a human transgene which I verified. I have had .cram files from mouse…
Convert Accession Numbers in blast HIT output to Full Taxonomy
Convert Accession Numbers in blast HIT output to Full Taxonomy 1 I have the Hit table output from a BlastWeb search which presents itself basically like this: M_A00619 | XM_034926345.1 | 100.000 M_A00619 | OV754683.1 | 95.588 M_A00619 | OV754677.1 | 95.588 M_A00619 | OV737695.1 | 95.588 I want to…
Changing a fasta header
Changing a fasta header 2 Hi I have a fasta file anotated and I want to add to the first position after > the next word to ‘Similar to’ >_Anouracaudifer_00017283-RA transcript Name:”Similar to Chid1 Chitinase domain-containing protein 1 (Rattus norvegicus OX=10116)” offset:0 AED:0.30 eAED:0.30 QI:0|0|0|1|1|1|12|0|393 ATGAAGGCGCTCCTGCATGTGCTCTGGCTCACTCTGGCCTGCGGCTCTGCTCACACCACCCTGTCGAAGTCGGATGCCAAGAAGTCTGCCTCCAAGACACTGCAGGAGAAGACTCAGCTCTCAGAGACACCTGTGCAGGACCGGGGTCTGGTGGTAACAGACCCCCGAGCCGAGGACG I want the output…
Changig a fasta header
Changig a fasta header 1 Hi I have a fasta file anotated and I want to add to the first position after > the next word to ‘Similar to’ _Anouracaudifer_00017283-RA transcript Name:”Similar to Chid1 Chitinase domain-containing protein 1 (Rattus norvegicus OX=10116)” offset:0 AED:0.30 eAED:0.30 QI:0|0|0|1|1|1|12|0|393 ATGAAGGCGCTCCTGCATGTGCTCTGGCTCACTCTGGCCTGCGGCTCTGCTCACACCACCCTGTCGAAGTCGGATGCCAAGAAGTCTGCCTCCAAGACACTGCAGGAGAAGACTCAGCTCTCAGAGACACCTGTGCAGGACCGGGGTCTGGTGGTAACAGACCCCCGAGCCGAGGACG I want the output…
Error while running nf-core/rnaseq pipeline
Error while running nf-core/rnaseq pipeline 1 Hello guys! I am trying to run the nf-core/rnaseq pipeline with the following parameters: nextflow run nf-core/rnaseq -r 3.10.1 –input samplesheet.csv –outdir output –fasta chr22_with_ERCC92.fa -profile docker –gtf chr22_with_ERCC92.gtf –max_memory 200GB I keep getting a persistent error: WARN: Got an interrupted exception while taking…
hpc – Slurm – Execute a lot of serial jobs parallel
Batch script to run many serial jobs parallel on a HPC with slurm I want to run a large number of independent serial jobs in parallel using slurm. However, I run into the maximum number of 100 jobs that a user can submit. Therefore only 100 jobs are processed simultaneously…
Answer: R scripting
Your code works for me, as long as you (1) fix the column heading to make sure Location is capitalized, and (2) make sure your data frame is actually 3 columns. If it is only a single column, I get the error you get. Your data should be in 3…
find and replace between two files
HI all, I know there’s a way to do this within Unix, but I cannot figure out how to do it with the functions that I know (grep, sed, awk, cut, paste). I am dealing with output from blast, so I thought I would try to see if anyone in…
Help in replicating LDSC heritability estimates
Hi, I am trying to replicate the heritability estimates based on the insomnia GWAS summary statistics using LDSC. However, I have encountered a problem as my estimates seem to be only about half of the original estimates listed in Table S1. Despite my efforts to locate the error, I have…
Change sequence ID in fastq file generated by bcftools mpileup
Change sequence ID in fastq file generated by bcftools mpileup 0 Hi everobody ! I’m currently work on a HHV8 genetic study and I face to an issue with my bcftools command. Indeed, I want to generate consensus sequences thanks bcftools mpileup command and bam files. However, all ID get…
GFF/GTF file error / featureCounts
Hi all, I am trying to generate a count.matrix for sorted bam files, using featureCounts on linux. I have a non-modal organism (bacteria), so I generated the annotation.file using both PROKKA and RAST. I used all the following files in featurecounts; PROKKA.gff, RAST.gff RAST.gtf gffread converted-PROKKA.gtf file But still facing…
Editting fasta headers
Yes, I can help you with this. You can use a scripting language like Python to automate this task. Here is a Python code that you can use to rename the headers of your fasta files: In this code, you need to replace “/path/to/fasta/files/” with the path to the directory…
1000 genomes hg38 with dbSNP rsid
1000 genomes hg38 with dbSNP rsid 1 Hi, Anyone know where I can download the latest version of 1000 Genomes, on build hg38, in VCF format (or PLINK format), that ALSO contains the dbSNP RSid in the VCF ID field? I looked at the IGSR website, dbSNP, UCSC, etc. So…
Mapping paired end reads with ngm and samtools, using prefixes and suffixes for creating vcf eventually
Mapping paired end reads with ngm and samtools, using prefixes and suffixes for creating vcf eventually 1 So, I have problems with a script for mapping and with creating sam and bam files to eventually get to a vcf. My input files look like this: 262 files, paired reads, with…
Nextflow memory issues custom config -c
Nextflow memory issues custom config -c 1 Hi all, I am trying to run nextflow on my laptop nextflow run nf-core/rnaseq \ –input samplesheet.csv \ –genome mm10 \ -profile docker I am having issues with memory: Error executing process > ‘NFCORE_RNASEQ:RNASEQ:FASTQC_UMITOOLS_TRIMGALORE:FASTQC (KO_3)’ Caused by: Process requirement exceed available memory –…
ggplot2 – ggplot: “No non-missing arguments to min/max; returning Inf”
I’m attempting to recreate this plot (my version: lat/lon by year), but keep getting these warnings after running the ggplot code: sms2 |> mutate(fCYR = factor(CYR)) |> ggplot(aes(x = Longitude, y = Latitude, fill = est, group = fCYR)) + geom_raster(aes(x = Longitude, y = Latitude, fill = est, group…
prefix extraction and preparation for mapping and variant calling
prefix extraction and preparation for mapping and variant calling 1 hello humans, I am struggling with a bash script that should actually work as far as I can see. I need to extract prefixes of 262 files in a directory that contains reads. I will map them for later variance…
Xenocell – Error in classify reads
Hello everyone, I’m trying to run Xenocell on my dataset. I have some problems executing the “classify reads” step. The command terminates after starting the classification (“terminate called after throwing an instance of ‘std::ios_base::failure’”). and I don’t know how to fix the error. Any help would be appreciated. Thank you!…
docker – Permissions error running NextFlow RNAseq test pipeline
I’ve been trying to run a minimal example of the NextFlow RNAseq pipeline, like so: nextflow run nf-core/rnaseq -r 3.10.0 -profile test,docker –outdir /home/kai/RNASeq/rnaseq_test/test_output However, this appears to return the error below: Error executing process > ‘NFCORE_RNASEQ:RNASEQ:PREPARE_GENOME:GTF2BED (genome_gfp.gtf)’ Caused by: Process `NFCORE_RNASEQ:RNASEQ:PREPARE_GENOME:GTF2BED (genome_gfp.gtf)` terminated with an error exit status (1)…
Realigning BAM files to new reference
Hi, I am looking to create a panel of normals for a somatic variant caller. For normals, I have been provided with a set of WES bam files that have been preprocessed according to GATK best practices. However, they have been aligned to another reference genome than my case samples…
Failure working with the tmp directory
Nextflow Gatk DepthOfCoverage: Failure working with the tmp directory 0 I have nextflow workflow for which the process DepthOfCoverage failed to work with the defined tmp directory –tmp-dir tmp process pf_read_depth { tag “tag” scratch true publishDir … input: tuple val(pair_id), path(pf_bam) path refdir output: file(“final_${pair_id}.tsv”) script: “”” samtools index…
Introducing Twilio’s OpenAPI Specification GA
Today, we are thrilled to share the news that we have officially open-sourced the OpenAPI specification for every Twilio API. As a commitment to supporting and streamlining the development process for our users, we have long provided helper libraries and tooling in various popular programming languages and environments. With this…
Command line training – genotoul-bioinfo
The GenoToul bioinformatics platform, Sigenae and SaAB (MIAT) offers a catalog of training sessions. If you need bio-informatic training on tools which are not covered in the existing catalog please feel free to contact us (please add “Request for training” in the subject of your demand). For example we have…
Alignment File Processing | Variant Analysis
Learning objectives Differentiate between query-sorted and coordinate-sorted alignment files Describe and remove duplicate reads Process a raw SAM file for input into a BAM for GATK The processing of the alignment files (SAM/BAM files) can be done either with samtools or Picard and they are for the most part interchangable….
PhD Position to Develop Machine Learning Methods for Microbiome Analysis
Job:PhD Position to Develop Machine Learning Methods for Microbiome Analysis 0 Looking for a highly motivated PhD student for Computational Biology research, with an algorithm development focus. The Ecological and Evolutionary Signal-processing (EESI) and Informatics lab is doing a restart from the pandemic and will be composed of a dynamic,…
HISAT2 paired end multiple files loop error
HISAT2 paired end multiple files loop error 0 Hi, I got stuck with running hisat2 with a loop. my input files are here, here is my loop code, for f in `ls -1 *_1_fp.fastq.gz | sed ‘s/_1_fp.fastq.gz//’ ` do hisat2 -rna-strandness RF -x GRCm39 -1 ${f}_1_fp.fastq.gz -2 ${f}_2_rp.fastq.gz 2> ${f}.log|…