Tag: GREP
Trying to edit VCF file
Trying to edit VCF file 0 Hi, I’ve been trying to take some samples out of a file but it appears its only taken some of the information out. When I tried to run a code I had in R that works for all the samples it gave me an…
Filtering mitochondrial reads from ATAC-Seq aligned reads- what to do with reads that have MT in RNEXT field
Hi all, I am trying to filter mitochondrial reads from my ATAC-seq data after trimming with Trimmomatic and then aligning with Bowtie2. After searching through many pipelines, I have found 2 ways that people often do this (both using inputs that are sorted and indexed BAM files): 1) with samtools…
scripts/generate_errors.pl – third_party/github.com/ARMmbed/mbedtls – Git at Google
#!/usr/bin/env perl # Generate error.c # # Usage: ./generate_errors.pl or scripts/generate_errors.pl without arguments, # or generate_errors.pl include_dir data_dir error_file use strict; my ($include_dir, $data_dir, $error_file); if( @ARGV ) { die “Invalid number of arguments” if scalar @ARGV != 3; ($include_dir, $data_dir, $error_file) = @ARGV; –d $include_dir or die “No such…
scripts/ecc-heap.sh – third_party/github.com/ARMmbed/mbedtls – Git at Google
#!/bin/sh # Measure heap usage (and performance) of ECC operations with various values of # the relevant tunable compile-time parameters. # # Usage (preferably on a 32-bit platform): # cmake -D CMAKE_BUILD_TYPE=Release . # scripts/ecc-heap.sh | tee ecc-heap.log set –eu CONFIG_H=‘include/mbedtls/config.h’ if [ –r $CONFIG_H ]; then :; else echo…
What Are The Most Common Stupid Mistakes In Bioinformatics?
Forum:What Are The Most Common Stupid Mistakes In Bioinformatics? 78 While I of course never have stupid mistakes…ahem…I have many “friends” who: forget to check both strands generate random genomic sites without avoiding masked (NNN) gaps confuse genome freezes and even species but I’m sure there are some other very…
VCF header line counting
VCF header line counting 2 Hello happy bioinformaticians 🙂 It can be a very simple question but I want to ask that how can I count line (row) of header of VCF ? I can be done manually but I want to get accurate result. Thanks,BG vcf header • 6.4k…
[slurm-users] sbatch mem-per-gpu and gres interaction
Hello everybody, I am observing an interaction between the –mem-per-gpu, –cpus-per-gpu and –gres settings in sbatch which I do not understand. Basically, if the job is submitted with –gres=gpu:2 the –mem-per-gpu and –cpus-per-gpu settings appear to be observed. If the job is submitted with –gres=gpu:a100:2 the settings appear to be ignored…
Segmentation fault Biopython pairwise alignment
Segmentation fault Biopython pairwise alignment 0 Hi everybody ! I’m working in order to create my own pairwise sequence alignment program in Python. I use the pairwise2.align command from Bipython. When I use it with small sequences it works. I put the code bellow (2 for a match, -2 for…
Upgrade to PyTorch 2.0 – DEV Community
Why Upgrade? Upgrade Objectives Python ≥ 3.8, ≤ 3.11 CUDA ≥ 11.7.0 CUDNN ≥ 8.5.0.96 Pytorch ≥ 2.0.0 “We expect that with PyTorch 2, people will change the way they use PyTorch day-to-day”“Data scientists will be able to do with PyTorch 2.x the same things that they did with 1.x,…
grep value from html file
grep value from html file 1 I have 200 html files that contain information such as Filename, Filetype, total Sequences etc. Please see attached the screenshot I need to grep the Filename and Total Sequences from the Value column (in this screenshot I need IGM17-B_S162_read_1.fastq and the value 9237623) and…
Problems with CP2K+PLUMED build
After successfully building local.psmp architecture of CP2K, I tried to install CP2K+PLUMED. This failed, as I describe below. 1) I built PLUMED from source without errors, using the standard procedure: ./configure –prefix=/storage/home/stm9/group/SOFTWARE/plumed-2.8.2/exe make make install 2) I created an architecture file local_PLUMED.psmp file by modifying local.psmp,…
r – First I had an error with GLIBCXX_3.4.30 and now I can’t create any more conda environments
I was using R in RStudio under a conda environment with various bioconductor packages. But suddenly I ran into this error when I tried to load a package: ImportError: /home/user/anaconda3/envs/dmcgb/bin/../lib/libstdc++.so.6: version `GLIBCXX_3.4.30′ not found (required by /lib/x86_64-linux-gnu/libLLVM-13.so.1) It is not the first time that I have this error so I…
Subset FASTA file for taxonomy (phylum name) in R
Subset FASTA file for taxonomy (phylum name) in R 2 Dear all, I would like to subset a FASTA file so that I get the sequences belonging to a certain phylum (in my case: Nematoda). The headers of the FASTA file start with the phylum name, so I thought this…
[slurm-users] Question about PMIX ERROR messages being emitted by some child of srun process
HI, So I’m testing the use of Open MPI 5.0.0 pre-release with the Slurm/PMIx setup currently on NERSC Perlmutter system. First off, if I use the PRRte launch system, I don’t see the issue I’m raising here. But, many NERSC users prefer to use the srun “native” launch…
count protein-coding genes per contig
count protein-coding genes per contig 1 If you have an annotation file, as for example, the following GTF from human: 1 havana gene 11869 14409 . + . gene_id “ENSG00000223972”; gene_version “5”; gene_name “DDX11L1”; gene_source “havana”; gene_biotype “transcribed_unprocessed_pseudogene”; 1 havana transcript 11869 14409 . + . gene_id “ENSG00000223972”; gene_version “5”;…
Get protein information from ensemblbacteria using interpro
Get protein information from ensemblbacteria using interpro 1 So I am trying to access protein information using interpro ids on ensemblbacteria. I have written a MySQL code in R however, I can’t quite figure out how to get protein information using the ids using programming language. I have put in…
filter reads in BAM having a tag
filter reads in BAM having a tag 3 Anyone has a simple solution for filtering reads in a BAM/SAM file having a certain TAG? This came up trying to filter out reads from 10x without a proper CB tag defined (which is causing troubles in downstream analysis tools). I’m surprised…
Filtering rows based on a list if all the entries of the list correspond to a specific row value
Filtering rows based on a list if all the entries of the list correspond to a specific row value 2 Hi, I have a table with millions of columns and hundreds of rows. There are two columns namely Genome and Gene. I also have a list of genes of interest…
Introducing Slurm | Princeton Research Computing – SLURM Examples –
OUTLINE On total of the cluster systems (except Nobel and Tigressdata), addicts run programs of submitting scripts into the Slurm mission scheduler. A Slurm script must execute three-way things: prescribe the resource requirements for the workplace set the environment specify the work to be carrying going in the form of cup commands Below…
name ‘torch’ is not defined
Are you encountering this Python nameerror name torch is not defined error message right now? If you don’t have any idea how to troubleshoot this error, then continue reading. In this article, we’ll show you how you can fix the nameerror: name ‘torch’ is not defined in a simple way. Before we start, this…
Solved GETTING THE SAME ERROR IN while running the line of
GETTING THE SAME ERROR IN while running the line of rule provide the activity names on the dots displayed on the scatter plot rules, library(arules) library(arulesViz) library(data.table) library(ggplot2) data <- fread(“severeinjury.csv”) data <- data[grep(“^23”, data$Primary.NAICS.Code),] length(data$Nature) length(data$Part.of.Body) length(data$Event) sum(is.na(data$Nature)) sum(is.na(data$Part.of.Body)) sum(is.na(data$Event)) data <- data[complete.cases(data[, c(“Nature”, “Part.of.Body”, “Event”)]),] data[, Nature :=…
Problem extracting the species marker genes from metaphlan4 database – StrainPhlAn
Hi, i met some problem extracting species marker genes from metaphlan4 database in step 2 of tutorial identifying strain transmission eventThe version information was as follows:$ strainphlan -vWed May 3 20:42:37 2023: StrainPhlAn version 4.0.3 (24 Oct 2022) when i run module load Anaconda3source activate metaphlan4module load Bowtie2 filename=$(cat ~/WORKSPACE/result/wzy/mother_infant/metaphlan4/SGB_input_transmission.txt…
Modular Docs – Inference Engine Python demo
This is a preview of the Modular Inference Engine. It is not publicly available yet and APIs are subject to change. If you’re interested, please sign up for early access. The Modular Inference Engine is the world’s fastest unified inference engine, designed to run any TensorFlow or PyTorch model on…
Convert Accession Numbers in blast HIT output to Full Taxonomy
Convert Accession Numbers in blast HIT output to Full Taxonomy 1 I have the Hit table output from a BlastWeb search which presents itself basically like this: M_A00619 | XM_034926345.1 | 100.000 M_A00619 | OV754683.1 | 95.588 M_A00619 | OV754677.1 | 95.588 M_A00619 | OV737695.1 | 95.588 I want to…
How to demultiplex a pooled fastq sequence file and extract each sample sequences
How to demultiplex a pooled fastq sequence file and extract each sample sequences 0 Hello all, I have a pooled sequence file named “ERR1806550_1.fastq.gz” containing single-end sequences. Now, I want to demultiplex this sequence file and extract 37 sample sequences of my interest from it. These are the barcode sequences…
deep learning – PyTorch not detecting AMD GPU although ROCM installed on Ubuntu 20.04 LST
OS Version: Ubuntu 20.04 LTS PyTorch Version: 2.0 ROCM version: 5.0.2 I installed a fresh copy of Ubuntu 20.04 LTS on my desktop with AMD Radeon RX 5700 XT GPU. Both ROCM and PyTorch installed fine. However, PyTorch is not able to detect GPU. Any pointers here? $ python -c…
[SOLVED] Runtimeerror: couldnt install torch
Usually, we often run into errors like “runtimeerror: couldn’t install torch.”. It is one of the most common errors that developers may encounter during running their code. The “Runtimeerror: couldnt install torch” error typically occurs because there is a problem with installing the PyTorch library on your system. We will…
tx2gene.txt : transcript-to-gene mapping file
tx2gene.txt : transcript-to-gene mapping file 0 Hi, I am trying to quantify gene count from transcript abundance (from kallisto, salmon etc.) using Tximport. For that i have to create a transcript to gene mapping file. How can i create this? I created one with from GCF_013265735.2_USDA_OmykA_1.1_rna.fasta (Rainbow trout) fro ncbi…
rna seq – Why is there antisense sequence in RNAseq data
I’m looking at RNAseq data from CCLE. The data is paired-end. Take the cell line Hs578T and the gene HRAS as an example. The cell line carries a G12D mutation (c.35G>A), so the change in cds is: ggc ggtgtgggca agagtgcgct g – Wildtype CDS gAc ggtgtgggca agagtgcgct g – Mutant…
pangenome – Create a diagram venn
pangenome – Create a diagram venn 1 Hello, I would like to know if you can help me. I want to make a venn diagram with the presence and absence data (.Rtab) of roary (example fragment, the real list is about 8000 genes): Gene StrainA StrainB StrainC group_633 1 0…
How do i search a FASTA database by sequence in seqkit?
How do i search a FASTA database by sequence in seqkit? 1 You could do it using seqkit grep or locate but in this case you should use a proper search program like blat instead. Login before adding your answer. Traffic: 1853 users visited in the last hour Read more…
Question regarding the output of BCFtools merge tool for VCF files
Sorry for repeating the question again here as I did not get enough answer last time: I have 6 VCF files that contains SNPs only, were produced by GATK. Each VCF represent one individual animal from breed X, so they are biological replicates. I have also another 6 files from…
Illumina HumanHT-12 V3.0 expression beadchip reading data
Edit November 28, 2020: Further reproducible code: A: GPL6883_HumanRef-8_V3_0_R0_11282963_A (illumina expression beadchip) — Most Illumina ‘chip’ studies that I have seen on GEO do not contain the raw data IDAT files. You can start with the tab-delimited file, but will also require the annotation file (contained in the *_RAW.tar file),…
How to find newly submitted accessions in NCBI
How to find newly submitted accessions in NCBI 2 Dear all, I want to automate a process to identify newly submitted plant accessions in NCBI. I am scanning the NCBI FTP server, but I have not yet found any address to locate all SRA accessions. ftp.ncbi.nlm.nih.gov/ Does anybody have an…
find and replace between two files
HI all, I know there’s a way to do this within Unix, but I cannot figure out how to do it with the functions that I know (grep, sed, awk, cut, paste). I am dealing with output from blast, so I thought I would try to see if anyone in…
zap.sh api scan config
I would like to use zap.sh or zap.jar for to scan openapi api, but I do not have to much luck yet. (Docker is not an option) So I have a problem with api scan with jar (but it also a problem with zap.sh) so I have already installed required…
Either at.20377 doesn’t exist or the content differs.
Source: at Version: 3.2.5-1 Severity: serious Control: tags -1 bookworm-ignore User: debian…@lists.debian.org Usertags: regression Dear maintainer(s), Your package has an autopkgtest, great. However, it fails on arm(64|el|hf) since September 2022 (and slightly longer on s390x). Can you please investigate the situation and fix it? I copied some of the output…
Using chroot and PAM to hide directories from users on an HPC cluster
I recently needed to make the group’s cluster computing environment available to a third party that was not fully trusted, and needed some isolation (most notably user data under /home), but also needed to provide a normal operating environment (including GPU, Infiniband, SLURM job submission, toolchain management, etc.). After thinking…
PyTorch 2.0 distribution that uses cuda only if available?
Hey folks, after upgrading to torch==2.0 yesterday, I have found that I am no longer able to run torch programs if the system doesn’t have CUDA. Here’s my observation on the various distributions: # PyTorch 2.0 for use with CUDA, CUDA libs get installed as pip deps if unavailable on…
error when attempting to install qiime2 2023.2 – Technical Support
dnfarsi (Dominic Farsi) April 10, 2023, 11:14am 1 Hello I appear to be having a similar issue. However I don’t have the (core dumped). I am using Ubuntu 22.04 LTS and installed miniconda and then natively installed qiime2. When using any qiime command I get this illegal instruction. Interestingly it…
r – Problems installing Biostrings. Failing to install GenomeInfoDb
I have seen this issue being recurrent and tried many options for the last two days but non yielded to correct installation of any of these packages. I used BiocManager as suggested in other issues, also tried to install from local source, nothing seems to be working. This issue started…
Users of spack-based GROMACS installations beware of possible performance loss! – User discussions
Hi, It has recently come to our attention default Spack builds of GROMACS use RelWithDebInfo instead of Release which is the default in our build system. Due to the lower optimization levels in RelWithDebInfo such build will run up to 20% slower than release builds. Therefore, I strongly recommend to…
Bug#1033820: node-snapdragon: autopkgtest regression: Cannot find module ‘snapdragon-node’
On 4/3/23 21:55, Paul Gevers wrote: > Hi yadd, > > On 03-04-2023 05:42, Yadd wrote: >> I’m unable to reproduce this issue: there is a link that provides >> snapdragon-node inside snapdragon-capture-set: > > I could by running the following on my laptop: > paul@mulciber ~ $ autopkgtest –no-built-binaries node-snapdragon…
VEP-like tool for sequence ontology and HGVS annotation of VCF files
Mehari is a software package for annotating VCF files with variant effect/consequence. The program uses hgvs-rs for projecting genomic variants to transcripts and proteins and thus has high prediction quality. Other popular tools offering variant effect/consequence prediction include: Mehari offers predictions that aim to mirror VariantValidator, the gold standard for…
what does “exp1” mean in the gage() function?
what does “exp1” mean in the gage() function? 1 @james-w-macdonald-5106 Last seen 6 hours ago United States All the columns except for ‘stat.mean’ come from the input data. As an example, using the help page for gage: data(gse16873) cn=colnames(gse16873) hn=grep(‘HN’,cn, ignore.case =TRUE) dcis=grep(‘DCIS’,cn, ignore.case =TRUE) data(kegg.gs) data(go.gs) #go.gs with the…
Problem with fatsq-dump
Problem with fatsq-dump 0 Hi, I am absolutely new in NGS data analysis and have just started working in centos. I installed sratoolkit with the commands : conda create –n sratoolkit_env –y conda activate sratoolkit_env conda install –c bioconda sra-tools –y Then as given in the Biostar Handbook (Bioinformatics Data…
Converting an output de-novo transcriptome assembled with Trinity to a .gff3 file
Converting an output de-novo transcriptome assembled with Trinity to a .gff3 file 2 Hello! I’ve de-novo assembled a transcriptome from Trinity, resulting into Trinity.fasta, whose headers look like this: >TRINITY_DN29256_c0_g1_i1 len=323 path=[0:0-322] Followed, in the next line, by the sequence. To run an external downstream analysis with a R script,…
samtools idxstats not removing ChrM
samtools idxstats not removing ChrM 2 I am trying to remove ChrM from my ChIP-seq data. Below is my pipeline for one sample up to where I am having the issue (samtools idxstats). The output file from samtools idxstats is the same size as the input so it doesn’t look…
Functional metagenomics uncovers nitrile-hydrolysing enzymes in a coal metagenome
Introduction Cyanide-containing compounds are known as nitriles and are widely distributed in the natural environment. They are generated by different plants in various forms, such as ricinine, phenyl acetonitrile, cyanogenic glycosides, and β -cyanoalanine (Sewell et al., 2003). Anthropogenic activities have substantially influenced the production of vast quantities of nitrile…
how to make a .tbi file of .gtf.gz?
how to make a .tbi file of .gtf.gz? 2 Hello, I have a .gtf.gz file which I am going to use in a python code. for using the pysam module in python it requires an indexed file for gtf.gz? How can I index that file? Thank you in advance. tbi…
Please give me a grep command to get Gene IDS and TPM values from a stringtie output gtf file
Please give me a grep command to get Gene IDS and TPM values from a stringtie output gtf file 2 Hi, Could anyone please give me a grep command to get gene_id and respective TPM values from a string tie output file. My result output file looks like the following…
Tool To Find Out If Fastq Is In Sanger Or Phred64 Encoding?
Tool To Find Out If Fastq Is In Sanger Or Phred64 Encoding? 9 Is there a simple tool I can use to quickly find out if a FASTQ file is in Sanger or Phred64 encoding? Ideally something that tells me ‘Encoding XX’ somewhere the terminal output. fastq tools • 46k…
How to extract phased haplotypes from GATK HaplotypeCaller
I would like to extract the physically phased haplotypes from a VCF file generated by GATK’s HaplotypeCaller on Illumina data of some isolates from different yeast (S. cerevisiae) strains. According to this FAQ: In the format field of a PGT (Pre-Implantation Genetic Testing) VCF, you may find a description similar…
samtools idxstats versus samtools view command
samtools idxstats versus samtools view command 1 Hi, I have mapped RNA-seq data to the human genome concatenated with a viral genome (26 chromosomes in total) with bowtie and need to get some numbers to calculate FPKM values manually for one viral gene, to retrieve the “total number of reads”…
no module named torch.fx [SOLVED]
In this post, you will learn the solutions to resolve the modulenotfounderror: no module named ‘torch.fx’ error which is encountered of all programmers in python language. Before we proceed to solve the solutions, we will discuss first if what is the meaning and usage of ‘torch.fx’. What is torch.fx? The…
Resolving abbreviated bacterial names
Resolving abbreviated bacterial names 2 Hi community, I am working in the area of text mining and working with full-text articles. I encounter a number of bacterial names and their abbreviated forms also. But I have issues resolving the text for example “M. chelonae is a rapidly growing mycobacterium.” So…
Extract reads within given region, and their mates
Extract reads within given region, and their mates 1 Hi there, I want to extract all the reads located inside a given location. I would like to extract also the mates of those reads, because I’m working with PE data. I know that I can extract the reads using samtools…
featureCounts with NCBI T2T not capturing all genes
Hello, My team would greatly appreciate assistance with running featureCounts using the human NCBI T2T assembly (assembly (T2T-CHM13v2.0) as a reference; when we run it we end up with nearly 14,000 fewer genes than what the annotation supposedly contains.What (if any) modifications can be made to run Subread or RSubread…
How to extract FASTA headers in R
How to extract FASTA headers in R 1 I have downloaded a reference uniprotkb FASTA file. How can I only extract the FASTA headers of each gene (raw-wise) into a CSV file using R? r bioinformatics • 26 views • link updated 1 hour ago by cfos4698 ▴ 670 •…
Rstudio and conda: GLIBCXX_3.4.30 not found – RStudio IDE
I get the following error when running library(stringi) and many other packages in RStudio. I do not get this error if I run R in the terminal. >> library(stringi) Error: package or namespace load failed for ‘stringi’ in dyn.load(file, DLLpath = DLLpath, …): unable to load shared object ‘/home/user/miniconda3/envs/r-4.2/lib/R/library/stringi/libs/stringi.so’: /lib/x86_64-linux-gnu/libstdc++.so.6:…
About next_reneighbour – LAMMPS Development
Dear LAMMPS Developers and users, I have a small confusion or am struggling to understand about next_reneighbor flag in fixes. Few fixes uses this flag and set to update->ntimestep which forces reneighboring immediately. I am using couple fixes: bond/create and bond/break. My confusion is: does the reneighboring takes place when…
Bwa mem different alignment results for the same reference genome
Bwa mem different alignment results for the same reference genome 0 I used a genome A and an A+B genome to construct two A.db and AB.db with bwa respectively. The reads can be alignment with A alone, but only the B genome is alignment in the results of AB. I…
clusterProfiler for KEGG enrichment (non-model species) Over-Representation Analysis
Hi there! I would like to perform KEGG enrichment with some differentially expressed gene data from RNAseq data. I am working on a non-model organism. I have 1) KEGG to GeneName Mapping head(expr5_FS_final) KEGG unigene_FS 1 K02727 FS_gene_1 2 K17277 FS_gene_3 3 K17307 FS_gene_10 4 K14453 FS_gene_11 5 K14700 FS_gene_11…
LAMMPS hangs with OpenMPI – LAMMPS Installation
Dear all, I am compiling LAMMPS 8Feb23 on an old cluster. Here are the details: OS: Linux “Ubuntu 16.04.4 LTS” 4.13.0-39-generic Compiler: GNU C++ 5.4.0 20160609 with OpenMP not enabled C++ standard: C++11 MPI v3.1: Open MPI v4.1.5, package: Open MPI otello@vikos Distribution, ident: 4.1.5, repo rev: v4.1.5, Feb 23,…
Processing Tandem Repeats Finder (Trf) Output For Downstream Motif Analysis
Processing Tandem Repeats Finder (Trf) Output For Downstream Motif Analysis 4 I have used Tandem Repeats Finder (TRF) for tandem repeat search in my fasta files. Output looks like this: Sequence: ENSG01 Parameters: 2 5 7 80 10 50 2000 1053 1139 4 22.2 4 67 2 62 28 4…
Can someone help me with searching overlapping values between two files?
Can someone help me with searching overlapping values between two files? 0 I have two files, one file (file1.csv) contains a single column with 1200 values, 11-digits long, For instance: 00000001111 00000001152 etc. Another file (file2.csv) contains 8 columns and consists of several tabs, in the third column there are…
PhD Position to Develop Machine Learning Methods for Microbiome Analysis
Job:PhD Position to Develop Machine Learning Methods for Microbiome Analysis 0 Looking for a highly motivated PhD student for Computational Biology research, with an algorithm development focus. The Ecological and Evolutionary Signal-processing (EESI) and Informatics lab is doing a restart from the pandemic and will be composed of a dynamic,…
Using LiftOver to change genomic build
Using LiftOver to change genomic build 0 Hi, all – Two questions about using LiftOver: The .bed file changes after using LiftOver. Correct me if I’m wrong, but I can just use the .bim and .fam file from before LiftOver as those do not change? I have used LiftOver to…
Getting a curl: (22) The requested URL returned error: 500 ERROR
Getting a curl: (22) The requested URL returned error: 500 ERROR 0 Hi dear friends I am trying to use a list of assemblies (one by line) to download some data from ncbi. Example of the list: GCA_937921735.1 GCA_937897655.1 GCA_902386345.1 GCA_902386385.1 GCA_902386595.1 I am using this command (that I think…
reads aligned concordantly exactly 1 time
Good evening, I’d like to compare the alignment quality of hisat2, bowtie2 and bwa for my files. The first 2 packages output the percentage of reads aligned concordantly exactly 1 time, bwa does not, because does not output alignment summary. The samtools flagstat report is not enough, because it outputs…
Bioconductor, how to select a subset of samples in an ExpressionSet?
I’m working on an R script that downloads gene expression data from GEO, through Bioconductor and the getGEO() function. These commands download all the 436 samples of the repository, but I’m only interested in 157 of them. Precisely, I’m interested in handling only the “samples collection:ch1” column with values “”on…
Automated dbSNP lookup by rsID position, plus genome build liftover
Hola, just passing by to say ‘hi’. Please post bugs / suggestions as comments to this tutorial. rsID to position GRCh38 cat rsids.list rs1296488112 rs1226262848 rs1225501837 rs1484860612 rs1235553513 rs1424506967 cat rsids.list | while read rsid ; do pos=$(curl -sX GET “https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=snp&id=$rsid&retmode=text&rettype=text” | sed ‘s/<\//\n/g’ | grep -o -P ‘\<CHRPOS\>.{0,15}’ |…
[slurm-users] srun: Job step aborted
Hi all, I’m facing the following issue with a DGX A100 machine: I’m able to allocate resources, but the job fail when I try to execute srun, follow a detailed analysis of the incident: “` $ salloc -n1 -N1 -p DEBUG -w dgx001 –time=2:0:0 salloc: Granted job allocation 1278…
finding error to run edgeR , please check my code to be helpful for finding error and solving it. difficulty in finding the next steps of the code because of the occurring errors.
library(edgeR) counts <- read.delim(“GSE116959_series_matrix.txt”, row.names = 1) head(counts) data <- read.table(“annotation.txt”,header=TRUE , sep = “\t”) data head(data) d0<- DGEList(counts=counts , group = factor(counts)) d0 dim(d0) d0.full <- d0 #keep the old one in case we mess up countsPerMillion <- cpm(d0) summary(countsPerMillion) countCheck <- countsPerMillion > 1 head(countCheck) keep <- which(rowSums(countCheck)…
DE Analysis on cells from a patient derived mouse xenograft with high levels of mouse count “contamination”
I am performing a differential expression analysis for collaborators. The overall biological design from my collaborators is as follows: 1) Received patient sample. 2) Amplified patient sample using patient derived xenograft (PDX) in a mouse host. 3) Extracted cells from mouse and enriched for human cells by positive selection using…
sage starts from command line but not from desktop menu
Package: sagemath Version: 9.5-6 Severity: normal X-Debbugs-Cc: jorge.m…@gmail.com Dear Maintainer, Sage fails to start from the graphical menu with “Failed to execute default Terminal Emulator” “Input/output error”. The problem is confined to the menu launcher: sage starts without problem from the command line (e.g. typing “sage -n” from the terminal emulator). —————– …
Stop BLAST from phoning home
Some time back I learned from Devon Ryan on the bird app (no link because I have stopped using said app) that BLAST phones home every time you used it, by default. I was never aware of this until I saw the post and I’m not really a fan of…
extract pattern using grep/sed
extract pattern using grep/sed 1 Hi Pm4.1LM10m04850 0.24924Pm4.1LM01m05240 0.02328Pm4.1LM01m11200 -0.02328Pm4.1LM01m11050 0.02899Pm4.1LM03m10920 0.04638Pm4.1LM00m08740 -0.04638Pm4.1LM09m10890 0.18085Pm4.1LM05m02500 0.23509Pm4.1LM03m01390 This is my query data above. I want to exclude those digits that are in bold and the rest in rows like this: Pm4.1LM10m04850 Pm4.1LM01m05240 Pm4.1LM01m11200 so on.. I tried using grep command: grep “Pm4.1LM[0-9]”…
failed // cgroups v1 problem
Hi, I’m experiencing a strange issue related to a CPU swap (8352Y -> 6326) on two of our nodes. I adapted the slurm.conf to accommodate the new CPU: slurm.conf: NodeName=ice27[57-58] CPUs=64 Sockets=2 CoresPerSocket=16 ThreadsPerCore=2 Realmemory=257550 MemSpecLimit=12000 which is also what slurmd -C autodetects: NodeName=ice2758 CPUs=64 Boards=1 SocketsPerBoard=2 CoresPerSocket=16…
Hub Error about SQLite3 Version – Zero to JupyterHub on Kubernetes
sam123 February 8, 2023, 5:01pm 1 Hi, there, I rebuild Hub docker image based on amazon linux2. When I tried to run it locally, I got error:For the sqlite version error: sqlalchemy.exc.NotSupportedError: (sqlite3.NotSupportedError) deterministic=True requires SQLite 3.8.3 or higher The default SQLite coming with amazon linux2 is 3.7.17. However, I…
Is My Bam File Sorted ?
Is My Bam File Sorted ? 5 You can use the sort order (SO) flag in the header to check if the file has been sorted: % samtools view -H 5_110118_FC62VT6AAXX-hg18-unsort.bam @HD VN:1.0 SO:unsorted % samtools view -H 5_110118_FC62VT6AAXX-hg18-sort.bam @HD VN:1.0 SO:coordinate Unfortunately samtools index will work on both types…
linux – Check if folder contains files with extensions and write directories into categories
unary operator expected is because [ and * (in your *fastq.gz) work independently. [ is not shell syntax. [ is a regular command (a builtin in Bash, but still a command) and ] is its last argument, a mandatory one. Anything in between is an argument too. The shell expands…
Using Entrez Utilities to query the Nucleotide database by collection_date
Using Entrez Utilities to query the Nucleotide database by collection_date 1 Greetings, I was wondering if there was a way to query the NCBI nucleotide database using E-utilities by collection_date. In the image below I retrieved GenBank file data. Column E is the collection date. Is the only way to…
molecular modeling – Model coordination complex using GROMACS or CP2K
Here is how you can get the structure from PubChem, read the SMILES into Avogadro2, do a MMFF94 (classical) optimization, and then a single-point energy calculation using NWChem (RHF, 6-31G*). The Avogadro2 Input builder will create a CP2K file if you prefer. For GROMACS, there is extensive documentation that includes…
22.10 – Configuring MySQL for SLURM
I’m having problems getting SLURM (for job scheduling) to work with a MySQL database. I was using this as a reference, but perhaps I misunderstood something in it. If someone can let me know what I’ve missed, that would be great… This is SLURM 21.08 on Ubuntu 22.10. I’m using…
Add Information to Protein Fasta Headers
Add Information to Protein Fasta Headers 1 Hi, I have protein fasta file whose headers look like ‘>evm.model.chr.9.52’. There are almost 30k+ proteins. I have performed functional annotations and also added every information to gene structure we get from EVM. The thing is, in that files I had columns so…
email – Troubleshooting slurm e-mail settings
I am trying to setup a slurm installation and I have advanced towards the e-mail stage. So far I do not receive any mails. I have a working setup using msmtp-mta and msmtp. When I batch a script the slurmctld log shows email msg to **@**: Slurm Job_id=73 Name=example_script.sh Began,…
How to Calulate Allele Frequency from a VCF File?
I have a VCF file with 200 samples (mitochondrial genome of Plasmodium falciparum). Here is a pic to take a look at: And a few relevant lines from the actual file: ##INFO=<ID=AC,Number=A,Type=Integer,Description=”Allele count in genotypes, for each ALT allele, in the same order as listed”> ##INFO=<ID=AF,Number=A,Type=Float,Description=”Allele Frequency, for each ALT…
Tool for aligning short protein sequences
Tool for aligning short protein sequences 2 Hi, I have a file that looks like: >ref_frame=1 XFKKNLAFLQKKAKEFSSEQTRANSPTRRELQVWGRDNNSPSEA >ref_frame=2 FLKKIWPSYKKRPKNFLQSRPEPTAPPEESFRSGVETTTPPQKQ >ref_frame=3 F*KKSGLPTKKGQRIFFRADQSQQPHQKRASGLG*RQQLPLRSR >read1_frame=1 FFKKNLAFLQKKAKEFSSEQTRANSPTRRELQVWGRDNNSPSEA >read1_frame=2 FLKKIWPSYKKRPKNFLQSRPEPTAPPEESFRSGVETTTPPQKQ >read1_frame=3 F*KKSGLPTKKGQRIFFRADQSQQPHQKRASGLG*RQQLPLRSR I want to do a protein alignment where I align each read frame against each ref frame. What tool can I use to…
High ref mismatch rate after liftOver from 23andme hg19 to hg38
I lifted some 23andme files from hg19 to hg38 using the following workflow in R calling samtools,plink and liftOver: library(tidyverse) #set working directory to data directory trio_wd <- str_glue(here::here(),’/trio/K/’) #create file list for raw data file_list <- str_c(trio_wd,dir(trio_wd)) %>% str_extract(‘genome.+\\d.txt’) %>% str_extract(‘^(?:(?!admix).)+$’) %>% unique() %>% {.[!is.na(.)]} %>% str_c(trio_wd,.) #liftover loop…
r – Calibri font on Mac in ggplot
I am on Mac and need to use Calibri font for all ggplots, but i cannot make it work with either extrafont or showtext packages: font_import(prompt = FALSE, pattern = “calibri”) returns Scanning ttf files in /Library/Fonts/, /System/Library/Fonts, /System/Library/Fonts/Supplemental, ~/Library/Fonts/ … Extracting .afm files from .ttf files… Error in data.frame(fontfile…
Redirect Stdout From twoBitToFa in Bash
Hi there, I’ve been going round and round trying to figure out a way to redirect stdout to use with seqtk. I’ve read many posts that are similar, so I am confident the answer is out there, but since I’m having such a hard time I figured I might as…
Compressing BAM, SAM, CRAM | Genozip
How good is Genozip at compressing BAM files? See Benchmarks. Compressing a BAM, SAM or CRAM file In the rest of this page we will give examples of BAM files. Genozip is also capable of compressing SAM files, and with some limitations, CRAM files as well. …
lazy loading failed, unable to load shared object rtracklayer.so
Hello! I am working on analyzing a dataset I created with the 10x Chromium Single Cell Multiome kit. In order to add gene annotation to the ATAC data, I am trying to install and use the “EnsDb.Mmusculus.v79” and “BSgenome.Mmusculus.UCSC.mm10” packages with bioconductor. The same ERROR has come up repeatedly whenever…
Samtools Convert Sam To Bam With Code Examples
Samtools Convert Sam To Bam With Code Examples In this session, we’ll try our hand at solving the Samtools Convert Sam To Bam puzzle by using the computer language. The code that follows serves to illustrate this point. # Basic syntax: samtools view -S -b sam_file.sam > bam_file.bam # Where:…
Number of sequences in RefSeq.
Number of sequences in RefSeq. 2 Dear colleagues I can not understand. When I download all the genomic sequences from the refseq database, after counting, I see that there are much fewer records than presented in the release (123394 organisms ftp.ncbi.nlm.nih.gov/refseq/release/release-notes/RefSeq-release214.txt). What am I doing wrong? 1. wget ftp.ncbi.nlm.nih.gov/genomes/refseq/assembly_summary_refseq.txt 2….
Subset row-entries according to a list
Subset row-entries according to a list 1 Hello! I want to subset a selected dataset (a list of entries) from a big data file. I have a list named “contig.list” that looks like this: Contig_339241_4 Contig_1004621_3 Contig_1666_1 Contig_836268_32 Contig_1479_10 Contig_640297_1 Contig_365838_1 .. I want to subset the entries of this…
Rsubread featurecounts
Rsubread featurecounts 1 Hi there, I seem to be getting this error when reading in a BAM file which was generated by PBMM2 align on pacbio data. I have tried to google the error message but there are no results. I wonder if anyone has ideas on what the error…
Merge multiple text files to create a combined dataframe and rename columns in R – General
Hi, I have multiple .txt files (each file contains 4 columns; an identifier Gene column, a raw_counts and other columns). I would like to merge those files into a combined dataframe using the common gene column. I was able to import multiple .txt files together, merge based on identifier column,…
Unable to install bioconda packages in conda environments
From your command line it appears you are on windows. There are several veresions of pybedtools on bioconda, however, if I grep through them, they are all for the linux platform. If you’re on Windows 10, you could consider setting up the ‘windows subsystem for linux’ (and possibly Xming), installing…