Tag: regex

separate read1 and read2 from merged fastq file and align against reference genome

separate read1 and read2 from merged fastq file and align against reference genome 0 Hi, I am processing a merged fastq file. I used the following command to separate read1s and read2s in separate files for alignment using bwa mem. paste – – – – – – – – <…

Continue Reading separate read1 and read2 from merged fastq file and align against reference genome

RegEx: Listing all possibilities to build sample code in python

itertools.product() is the way to go here. If your proteins are all single characters then for each position in your “regex”, just put a string representing the valid proteins for that position into the itertools.product() arguments. For example [IG]…D.SG would become the following: p = ‘ABCDEFGHIJKLMNOPQRST’ # or whatever the…

Continue Reading RegEx: Listing all possibilities to build sample code in python

Reverse complement of fasta file

Reading records separated by > is a nice idea as it gives you the whole chunk at a time. However, here you want to process and merge lines but not the header, thus distinguishing between lines. It is clearer to read line by line. The sequence-line is specific: all caps…

Continue Reading Reverse complement of fasta file

Recent questions tagged fasta – Q&A

Most popular tags python javascript html java css reactjs c# php r sql arrays pandas c++ android jquery DataFrame python-3.x node.js c mysql list flutter JSON ios typescript sql-server swift string angular regex laravel excel django dictionary dart bash numpy postgresql loops oracle vba linux angularjs function for-loop spring spring-boot…

Continue Reading Recent questions tagged fasta – Q&A

python – Pymc3 install issues on windows 10

So I downloaded pymc3 (uninstalled and reinstalled a few times) and every time I try to import pymc3 into a jupyter notebook I get some kind of error. I am guessing that I am having an issue with how I am installing Pymc3, I followed this tutorial: github.com/pymc-devs/pymc/wiki/Installation-Guide-(Windows). After my…

Continue Reading python – Pymc3 install issues on windows 10

Parsing GenBank file: get locus tag vs product

As your sample GenBank file was incomplete, I went online to find a sample file that could be used in an example, and I found this file. Using this code and the Bio::GenBankParser module, it was parsed guessing what parts of the structure you were after. In this case, “features”…

Continue Reading Parsing GenBank file: get locus tag vs product

regex for finding gene product from the text

import re test_str = ‘ /product=”hypothetical protein”‘ match = re.search(r’product=”([^”]+)”‘, test_str) if match: print(match.group(1)) ——————————————————————————– product=” ‘product=”‘ ——————————————————————————– ( group and capture to \1: ——————————————————————————– [^”]+ any character except: ‘”‘ (1 or more times (matching the most amount possible)) ——————————————————————————– ) end of \1 ——————————————————————————– ” ‘”‘ Read more here:…

Continue Reading regex for finding gene product from the text

Unable to get regex to capture last group

The problem is probably that you’re looking for non-overlapping instances of the regex. Methods like findall won’t return B as the match for A consumes the , before B. >>> regex.findall(“((A:[c1]0.1,B:[c2]0.2),C:[c2]0.3);”) [(‘(A:[c1]0.1,’, ‘(‘, ‘A’, ‘:’, ‘[c1]’, ‘0.1’, ‘,’), (‘,C:[c2]0.3)’, ‘,’, ‘C’, ‘:’, ‘[c2]’, ‘0.3’, ‘)’)] Changing the end pattern to…

Continue Reading Unable to get regex to capture last group

python beginner – faster way to find and replace in large file?

You should split your lines into “words” and only look up these words in your dictionary: >>> re.findall(r”\w+”, “CHROMOSOME_IV ncRNA gene 5723085 5723105 . – . ID=Gene:WBGene00045518 CHROMOSOME_IV ncRNA ncRNA 5723085 5723105 . – . Parent=Gene:WBGene00045518”) [‘CHROMOSOME_IV’, ‘ncRNA’, ‘gene’, ‘5723085’, ‘5723105’, ‘ID’, ‘Gene’, ‘WBGene00045518’, ‘CHROMOSOME_IV’, ‘ncRNA’, ‘ncRNA’, ‘5723085’, ‘5723105’, ‘Parent’,…

Continue Reading python beginner – faster way to find and replace in large file?

Yandere-male-x-straight-male-reader

Kolkata FF is a game of Satta Matka in which person guess the correct number. Hence, then is rewarded with a … 1 Min Read. Regex number greater than 1000.. It was higher than in 91.8% U.S. cities. The 2019 Syracuse crime rate rose by 5% compared to 2018. The…

Continue Reading Yandere-male-x-straight-male-reader

biopython – Identify side chain atoms in BioPandas dataframe

As you suggest one way of solving your problem would be by selecting all atoms that don’t have backbone atoms names. In a pdb file I believe backbone atoms would be named ‘CA’, ‘HA’, ‘N’, ‘HN’ or ‘H’, ‘C’ and ‘O’. Beware of the N-terminal (where the hydrogens would be…

Continue Reading biopython – Identify side chain atoms in BioPandas dataframe

poem_openapi_derive – Rust

Docs.rs Releases Releases by Stars Recent Build Failures Build Failures by Stars Release Activity Rust The Book Standard Library API Reference Rust by Example Rust Cookbook Crates.io The Cargo Guide poem-openapi-derive-1.3.0 poem-openapi-derive 1.3.0 Docs.rs crate page MIT/Apache-2.0 Links Homepage Documentation Repository Crates.io Source Owners sunli829 Dependencies Inflector ^0.11.4 normal darling…

Continue Reading poem_openapi_derive – Rust

How can I separate 3 different pieces of information in a column?

How can I separate 3 different pieces of information in a column? 3 For example, in the column I have, there is a line written Ser25Phe. And I want to split the column written HGVS.Consequence as Ser 25 Phe. Programming regex split R gsub • 205 views • link updated…

Continue Reading How can I separate 3 different pieces of information in a column?

UMItools dedup deduplication taking too much time + RAM

I have some RNAseq data from miRNAs that I have processed with Bowtie2 (aligning to miRBase). Now, when doing the deduplication with umi_tools dedup I find that some of the files take a lot of time+RAM to finish (some files take around 3-4 minutes and 4-5GB of RAM and some…

Continue Reading UMItools dedup deduplication taking too much time + RAM

FilterTest (BioJava-1.4 API)

FilterTest (BioJava-1.4 API)  PREV CLASS   NEXT CLASS FRAMES    NO FRAMES     All Classes SUMMARY: NESTED | FIELD | CONSTR | METHOD DETAIL: FIELD | CONSTR | METHOD org.biojava.bio.search Interface FilterTest All Known Implementing Classes: FilterTest.Equals, FilterTest.GreaterThan, FilterTest.LessThan public interface FilterTest Class for implementing tests with BlastLikeSearchFilter objects. Several precanned tests are included. Author: David Huen Nested Class Summary static class FilterTest.Equals…

Continue Reading FilterTest (BioJava-1.4 API)

node.js – OpenAPI: “request should have required property ‘body'”

I am building out a new endpoint in my application which uses express-openapi-validator as validator middleware. /* index.ts */ import * as OpenApiValidator from ‘express-openapi-validator’; const whitelistedPaths = [/* regex tested paths */]; app.use( OpenApiValidator.middleware({ apiSpec: ‘./schema/api.json’, validateRequests: true, validateResponses: true, ignorePaths: whitelistedPaths, validateSecurity: true, }), ); /* … */…

Continue Reading node.js – OpenAPI: “request should have required property ‘body'”

Description, Programming Languages, Similar Projects of Gpt 2 Pytorch

GPT2-Pytorch with Text-Generator Better Language Models and Their Implications Our model, called GPT-2 (a successor to GPT), was trained simply to predict the next word in 40GB of Internet text. Due to our concerns about malicious applications of the technology, we are not releasing the trained model. As an experiment…

Continue Reading Description, Programming Languages, Similar Projects of Gpt 2 Pytorch

Conditionals are not supported in this regex diale…

I have this regex evolved on www.regex101.com and it seems to work properly.  When I copy this regex into the OpenApi @Pattern annotation in a Spring Boot 2.5.4 application with springdoc-openapi (tried v1.4.8 and v1.6.1, supporting OpenApi v3) I get the message  Spoiler (Highlight to read) Conditionals are not supported in this…

Continue Reading Conditionals are not supported in this regex diale…

faq – What should I do when my neural network doesn’t learn?

There’s a saying among writers that “All writing is re-writing” — that is, the greater part of writing is revising. For programmers (or at least data scientists) the expression could be re-phrased as “All coding is debugging.” Any time you’re writing code, you need to verify that it works as…

Continue Reading faq – What should I do when my neural network doesn’t learn?

main-arm64-default][devel/RStudio] Failed for RStudio-2021.09.1+372 in build

You are receiving this mail as a port that you maintain is failing to build on the FreeBSD package build server. Please investigate the failure and submit a PR to fix build. Maintainer: y…@freebsd.org Log URL: ampere2.nyi.freebsd.org/data/main-arm64-default/p7539e33f88ff_s169b368a62/logs/RStudio-2021.09.1+372.log Build URL: ampere2.nyi.freebsd.org/build.html?mastername=main-arm64-default&build=p7539e33f88ff_s169b368a62 Log: =>> Building devel/RStudio build started at Wed Dec 8…

Continue Reading main-arm64-default][devel/RStudio] Failed for RStudio-2021.09.1+372 in build

r – Is there a way to do a negative match using regex sub?

Say I have a vector of strings, g<-c(“bunchofstuff>query=true/fun/weird>bunchofstuff”, “bunchofstuff>query=animals/octopus/weird>bunchofstuff”, “bunchofstuff>query=flowers/sunshine/fun>bunchofstuff”, ” bunchofstuff>query=fun/true/sunshine>bunchofstuff” and I want to essentially use sub to erase anything after query=, until the end of the string, IF query= is not followed by true (ideally in any position). As far as I can tell, there isn’t a…

Continue Reading r – Is there a way to do a negative match using regex sub?

Challenging Regex Problem To Address Medical Results …

In this post I am going through several common issues with CSV files and fixing them using regular expressions. Often as a data scientist you work with large. 24.7 Testing and improving. Developing the right regex on the first try is often difficult. Trial and error is a common approach…

Continue Reading Challenging Regex Problem To Address Medical Results …

How to find sequence patterns in genome?

How to find sequence patterns in genome? 2 Hi, I want to find a pattern of sequence in a genome. Let’s say to find following pattern (G4N(1-10))5 that translates to 4 Guanines followed by 1 to 10 bases of either A or T or G or C and then this…

Continue Reading How to find sequence patterns in genome?

Insert size historgram from Picard for Illumina paried end 150 bp: FR, TANDEM, and both

I’m got some low coverage skim-seq bam files (1x) and was doing qc on them and got some strange results. I ran Picard CollectInsertSizeMetrics. The sequencing was done by Illumina paired end and the orientation was be F-R as usual. But I got insert size histograms showing FR, TANDEM, and…

Continue Reading Insert size historgram from Picard for Illumina paried end 150 bp: FR, TANDEM, and both

Pound: CMakeLists.txt | Fossies

Pound: CMakeLists.txt | Fossies “Fossies” – the Fresh Open Source Software Archive Member “Pound-3.0.2/CMakeLists.txt” (28 Nov 2021, 3057 Bytes) of package /linux/www/Pound-3.0.2.tgz: As a special service “Fossies” has tried to format the requested text file into HTML format (style: standard) with prefixed line numbers. Alternatively you can here view or…

Continue Reading Pound: CMakeLists.txt | Fossies

CATS is a REST APIs fuzzer and negative testing tool for OpenAPI endpoints.

REST APIs fuzzer and negative testing tool. Run thousands of self-healing API tests within minutes with no coding effort! Comprehensive: tests are generated automatically based on a large number scenarios Highly Configurable: high amount of customization to adapt to each context Self-Healing: as tests are generated, any OpenAPI spec change…

Continue Reading CATS is a REST APIs fuzzer and negative testing tool for OpenAPI endpoints.

What is OpenAPI ? – OpenAPI [1]

OpenAPI (known as Swagger before) is a standard to declare Restful API. But why should I use it? In the current context, when we are working with APIs (no matters of the language used) we want to have a clean documentation and be able to share a complete documentation of…

Continue Reading What is OpenAPI ? – OpenAPI [1]

How we got to OpenAPI

November 23, 2021 A story about how we went from spontaneous api code writing to a process with a separate repository of api schemas and code generation based on them. TL;DR We were living with an unstructured websocket API of our own design, but realized that it was impossible to…

Continue Reading How we got to OpenAPI

python – Pytorch model dies with a java interrupted exception

I have a pytorch model that dies with an exception. I am running docker on a Mac. 2021-11-22 09:58:25,083 [INFO ] W-9001-deviceidentification_ffda761820ab4a519ef598fb241e28d4-stdout MODEL_LOG – done saving hyperparameters 2021-11-22 09:58:25,162 [INFO ] W-9002-deviceidentification_ffda761820ab4a519ef598fb241e28d4-stdout MODEL_LOG – saving hyperparameters 2021-11-22 09:58:25,185 [INFO ] W-9002-deviceidentification_ffda761820ab4a519ef598fb241e28d4-stdout MODEL_LOG – done saving hyperparameters 2021-11-22 09:58:38,625 [INFO ]…

Continue Reading python – Pytorch model dies with a java interrupted exception

[BUG] Stripping ECMA Regex Leading and Trailing `/` Causes Errors

Bug Report Checklist Description Python code generation tries to strip leading and trailing / from regex patterns, but the pattern its using to do so matches when it shouldn’t. For example this pattern in my spec: [0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12} has no leading and trailing slashes, but the code linked above would still…

Continue Reading [BUG] Stripping ECMA Regex Leading and Trailing `/` Causes Errors

Using python FlashText to do pattern matching in nucleotide sequences

Using python FlashText to do pattern matching in nucleotide sequences 0 Hi all, I’m playing with the idea of using FlashText (instead of RegEx) to do some pattern finding in nucleotide sequences. My idea came from the massive speed up seen in the post below: dev.to/vi3k6i5/regex-was-taking-5-days-to-run-so-i-built-a-tool-that-did-it-in-15-minutes-c98?ref=codebldr My basic idea is…

Continue Reading Using python FlashText to do pattern matching in nucleotide sequences

java – openapi – regex for not allowing whitespace or hyphen

Am using openapi 3.0.3 to autogenerate my Spring Boot based REST API… Inside src/main/resources/openapi/schema/PurchaseOrder.yaml: openapi: ‘3.0.3’ info: title: ‘Purchase Order’ version: ‘1.0’ paths: {} components: schemas: PurchaseOrder: title: ‘Purchase Order’ type: ‘object’ properties: account: type: ‘string’ description: Identifier for account making the purchase example: 1 minLength: 1 pattern: ‘^s-$’ So,…

Continue Reading java – openapi – regex for not allowing whitespace or hyphen

orf finder

How can I find which frame is producing the final protein? Is there any way to set all the frames? import re filename = input(‘Enter name of file to parse: ‘) sequences = [] descr = None # here is the path of multifalsta file with open(filename) as file: line…

Continue Reading orf finder

Transform a GTF file into a data frame in R

Transform a GTF file into a data frame in R 4 Hi, I would like to analyse the content of a GTF file. I am quite able with R and dplyr, so I would like to transform my GTF file into a data frame to facilitate my analysis. Does anybody…

Continue Reading Transform a GTF file into a data frame in R

vcftools not ouputting log file when run from perl

I am running 325 vcftools commands to generate Fst values, which obviously needs to be automated. An example: vcftools –vcf big.vcf –weir-fst-pop pop_lists/pop1.txt –weir-fst-pop pop_lists/pop2.txt –out weir_fst_results/pop1_vs_pop2 and when I run this job, it works fine when I run it one by one by the command line, i.e. there are…

Continue Reading vcftools not ouputting log file when run from perl

Extract sequences from a fasta file with specific nucleotide repetition

Extract sequences from a fasta file with specific nucleotide repetition 2 I have a fasta file name seqs.fa with multiple sequences i.e., >Seq1 GATAGAT**ATC**GAATG**ATC** >Seq2 GATGATAG**ATC**GATGC I want grep/extract only those sequences having ATC repeated exactly 2 times like in Seq1. How we can use grep/sed or {} method for…

Continue Reading Extract sequences from a fasta file with specific nucleotide repetition

biopython extract sequence from fasta

My two questions are: What is the simplest way to do this? This unique book shows you how to program with Python, using code examples taken directly from bioinformatics. using python-bloom-filter, just replace the set with seen = BloomFilter(max_elements=10000, error_rate=0.001). This book is suitable for use as a classroom textbook,…

Continue Reading biopython extract sequence from fasta

Invert regex match

Invert regex match 1 Hello, I would like to invert my regex match Example: sssd;RS=93298723;f My current regex : RS=d* This regex would match RS=93298723, I would want to invert the match, see demo here regex101.com/r/PGkwA5/1 Thank you. regex • 92 views I found ! ([^0-9-RS=]+) Login before adding your…

Continue Reading Invert regex match

isolate adapter contamination reads from fastq file using python

isolate adapter contamination reads from fastq file using python 0 hi everyone, I want to extract adapter contaminated reads from a fastq file using python code, but I am unable to do so. adapter sequence is : “GATCGGAAGAGCTCGTATGCCGTCTTCTGCTTGAAA” file contain this data : @HWUSI-EAS570R_0003:2:50:5038:17424#0/1 CAGCTTCTGTTGATGCTGATTTAATTCCTGCAACTA +HWUSI-EAS570R_0003:2:50:5038:17424#0/1 hhhhhhhhhhhgghhhhhahhhhhhhhhhhhgfhh[ @HWUSI-EAS570R_0003:2:50:5175:17417#0/1 CACCTTGCTTTATGGGAAAGCGTAACATAACTACAG +HWUSI-EAS570R_0003:2:50:5175:17417#0/1…

Continue Reading isolate adapter contamination reads from fastq file using python

A regex to convert operon names to genes?

A regex to convert operon names to genes? 0 Hi, I would like to convert operon names to gene names (and the reverse). I think this should be possible with a regex, but I’m not fluent enough with regexes to crack it up. Conventionally, operons are named like this: genes…

Continue Reading A regex to convert operon names to genes?

bash script

bash script 3 Hello everyone, I have a file like this: RSID1 RSID2 chr1_169894240_G_T_b38 chr1_169894240_G_T_b38 chr1_169894240_G_T_b38 chr1_169891332_G_A_b38 chr1_169891332_G_A_b38 chr1_169891332_G_A_b38 chr1_169661963_G_A_b38 chr1_169661963_G_A_b38 chr1_169661963_G_A_b38 chr1_169697456_A_T_b38 chr1_169697456_A_T_b38 chr1_169697456_A_T_b38 chr1_27636786_T_C_b38 chr1_27636786_T_C_b38 chr1_196651787_C_T_b38 chr1_196651787_C_T_b38 chr6_143501715_T_C_b38 chr6_143501715_T_C_b38 I want to extract info just like: chr1_169894240 chr1_169894240. I don’t want to have other info. I just want…

Continue Reading bash script