Categories
Tag: fastq
UMI workflow resulting in bams with empty reads
Hello all, In my NGS workflow for UMI based reads, I first tried identifying and removing sequence adapters using bbmerge and cutcadapt: BBMERGE -Xmx1g -ignorejunk in1=SAMPLE_R1 in2=SAMPLE_R2 outa= adapters.fa itn CUTADAPT -a forward_adapter -A reverse_adapter -o s_2_1_sequence_trimmed_UN.fastq.gz -p s_2_2_sequence_trimmed_UN.fastq.gz SAMPLE_R1 SAMPLE_R2 Then, I converted the trimmed fastq files to an…
Ubuntu Manpage: gt-encseq-encode – Encode sequence files (FASTA/FASTQ, GenBank, EMBL) efficiently.
Provided by: genometools_1.6.5+ds-2_amd64 NAME gt-encseq-encode – Encode sequence files (FASTA/FASTQ, GenBank, EMBL) efficiently. SYNOPSIS gt encseq encode sequence_file [sequence_file [sequence_file …]] DESCRIPTION -showstats [yes|no] show compression results (default: no) -ssp [yes|no] output sequence separator positions to file (default: yes) -des [yes|no] output sequence descriptions to file (default: yes) -sds [yes|no]…
Remote Software Quality Engineer III – Bioinformatics Job at Natera
JOB TITLE: Software Quality Engineer III – Bioinformatics LOCATION: Remote, USA PRIMARY RESPONSIBILITIES: Perform software verification, define and execute test cases and scenarios required for software quality assurance and regulatory compliance. Perform system analysis, assess risk, and develop strong test strategies by analyzing product design and technical specifications, and by…
Ubuntu Manpage: FastQC – high throughput sequence QC analysis tool
Provided by: fastqc_0.11.9+dfsg-5_all NAME FastQC – high throughput sequence QC analysis tool SYNOPSIS fastqc seqfile1 seqfile2 .. seqfileN fastqc [-o output dir] [–(no)extract] [-f fastq|bam|sam] [-c contaminant file] seqfile1 .. seqfileN DESCRIPTION FastQC reads a set of sequence files and produces from each one a quality control report consisting of…
problem while importing single-end demultiplexed with quality data in qiime2 – Technical Support
MarinaZ (Marina) February 8, 2024, 11:01am 1 Hello, I’ve been experiencing the following error: (1/1?) No such option: –imput-path (Possible options: –input-path, –output-path)I have single-end demultiplexed with quality data and I’m trying to import them in qiime in order to create .qza file. The command that I used and gives…
Sperm-specific histone H1 in highly condensed sperm nucleus of Sargassum horneri
Cho, C. et al. Haploinsufficiency of protamine-1 or-2 causes infertility in mice. Nat. Genet. 28, 82–86 (2001). Article CAS PubMed Google Scholar Oliva, R. Protamines and male infertility. Hum. Reprod. Update 12, 417–435 (2006). Article CAS PubMed Google Scholar Balhorn, R. The protamine family of sperm nuclear proteins. Genome Biol….
Insight Global hiring Bioinformatics Software Engineer in Tennessee, United States
Job Description: 1) Use Nextflow to build bioinformatics pipelines that take FASTQ or BAM files as input and process them using bioinformatic tools. 2) Write Python/R scripts to process, summarize, and visualize outputs created by other tools. 3) Ensure that the pipeline is modular and flexible, with the ability to…
FASTQ to FASTA Converter
About the tool The FASTA format is a text-based format for representing nucleotide or peptide sequences. The FASTQ format additionally includes the corresponding quality scores. This tool allows you to convert FASTQ files to FASTA. The resulting FASTA file will contain only the sequence data from the input FASTQ file….
Special Episode 3: PhiX / UMIs / QC
Podcast: Explain Podcast Erschienen: 09.02.2024Dauer: 01:10:43 Getting the most out of Machines Chapters: 04:30 PhiX 14:30 low complexity 19:30 UMIs 32:10 FastQC 43:00 MultiQC 56:40 PycoQC PhiX concentrations for loading a validation run: knowledge.illumina.com/instrumentation/general/instrumentation-general-reference_material-list/000001536 Dnatech on why UMIs are used: dnatech.genomecenter.ucdavis.edu/faqs/what-are-umis-and-why-are-they-used-in-high-throughput-sequencing/ BMH learning on UMIs: www.youtube.com/watch?v=sRPMsnhIBK0 FastQC for QC of…
invalid deflate data (invalid code lengths set)
I am trying to trim paired end reads using Trim-Galore. I have made sure that the files match based on the total reads processed in the output txt file from trim-galore. One of the files trimmed correctly but when I try some of the others the total written and quality…
Metagenomic analysis of Mesolithic chewed pitch reveals poor oral health among stone age individuals
The specific environmental/history/collection context The Huseby Klev materials were unearthed and collected by archaeologists (including two of the co-authors of this article) during the excavation of this coastal hunter-fisher-gatherer site in the 90s50. The material assemblage was rich and well preserved: human bones, animal bones, plant remains and pieces of…
BWA Index Referencing Failed. Possible Reason and Solutions?
BWA Index Referencing Failed. Possible Reason and Solutions? 0 Script: echo “STEP 2: Map to reference using BWA-MEM” #BWA index reference bwa index ${ref} *#Path for ref variable has been defined* #BWA alignment bwa mem -t 4 -R “@RG\tID:SRR062634\tPL:ILLUMINA\tSM:SRR062634” ${ref} ${reads}/SRR062634_1.filt.fastq.gz ${reads}/SRR062634_2.filt.fastq.gz > ${aligned_reads}/SRR062634.paired.sam **Error:** . . . . [BWTIncConstructFromPacked]…
Comparison of capture-based mtDNA sequencing performance between MGI and illumina sequencing platforms in various sample types | BMC Genomics
Hu T, Chitnis N, Monos D, Dinh A. Next-generation sequencing technologies: an overview. Hum Immunol. 2021;82(11):801–11. Article CAS Google Scholar Jeon SA, Park JL, Park SJ, Kim JH, Goh SH, Han JY, Kim SY. Comparison between MGI and Illumina sequencing platforms for whole genome sequencing. Genes Genomics. 2021;43(7):713–24. Article CAS …
fastq.gz.fastsanger.gz to fastq.gz in Galaxy and FastQC – usegalaxy.org support
Hi,Is it possible to convert fastq.gz.fastsanger.gz to fastq.gz in Galaxy?The data is in fastq.gz.fastsanger.gz format and was generated by the NextSeq 2000. I downloaded the data from Galaxy and want to convert fastq.gz.fastsanger.gz to fastq.gz using Galaxy. I uploaded the dataset, clicked the pencil icon, and went into the Edit…
Reference genome, BWA and right algorithm
Reference genome, BWA and right algorithm 1 Hello I’m using BWA to create the index for aligning some rna-seq fastq. First thing I did was download hg38.fa.align.gz from UCSC Then I: gzip -d hg38.fa.align.gz sudo apt-get install bwa Here comes the problem. BWA instructions reccomend bwtsw algorithm, but when I…
How to trim miRNA reads?
How to trim miRNA reads? 1 Hi there, I am new to bioinformatics. I am trying to prepare fasta.gz files for uploading onto CPSS, a websever for miRNA-seq datasets. My data is from Gene Omnibus db. Basically the sample fasta file appears like this: ;>SRR1658346.1 HISEQ1:187:D0NWFACXX:3:1101:2565:2050 length=51 ATCATACAAGGACAATTTCTTTTAACGTCGTATGCCGTCTTCTGCTTGNAA >SRR1658346.2 HISEQ1:187:D0NWFACXX:3:1101:2654:2232…
Trying to understand STAR fastqLog.final.out File
Trying to understand STAR fastqLog.final.out File 0 Hello, I am analyzing ribo-seq data and am trying to understand if my interpretation of star’s log file is correct. I do not have extensive bioinformatics/computational experience, so it’s been a bit difficult trying to understand how to proceed (the guides online are…
My paired end data became single end data after mapping
My paired end data became single end data after mapping 1 Dear community, Something weird happened to me, my public dataset is obviously paired-end data (stated in ‘metadata’ part of ENA database, and there are two seperate fastq files (R1 & R2) and index file (I1) per sequencing run). After…
The Biostar Herald for Tuesday, December 19, 2023
The Biostar Herald publishes user submitted links of bioinformatics relevance. It aims to provide a summary of interesting and relevant information you may have missed. You too can submit links here. This edition of the Herald was brought to you by contribution from Mensur Dlakic, Istvan Albert, and was edited…
Blazing the trail to empower agrigenomics research and conservation
Sequencing data for a single human genome, at 30× coverage, takes up to 70 gigabytes of storage. Illumina instruments produced 280 million gigabytes of data in 2021 alone, and by 2025, we’ll need storage capacity for 40 billion gigabytes—and that’s just for human genomes. The Genomics & Bioinformatics Service of…
‘Resources’ object has no attribute ‘tmpdir’
Snakemake error AttributeError: ‘Resources’ object has no attribute ‘tmpdir’ 0 I have built a Snakemake pipeline which has been designed for paired-end reads. I have made a trial with single-end reads, and got this error. I am not sure it is related to the change of reads design, and to…
Finding the paper published from the SRA run riles
Finding the paper published from the SRA run riles 0 Hey folks, i have used this code to download a my query esearch -db sra -query ‘(“BACTERIA_NAME”[Organism] OR BACTERIA_NAME[All Fields]) AND “BACTERIA_NAME”[orgn] AND (“strategy wgs”[Properties] AND “library layout paired”[Properties] AND “filetype fastq”[Properties])’ | efetch -format runinfo -mode text > first_file.tsv…
SnapGene Version 7.1.1
SnapGene 7.1.1 was released on December 18, 2023. Fixes Fix a regression that could result in searches for queries longer than 4000 bp failing Ensure files with standard FASTA file extensions are opened as sequences regardless of whether they include a FASTA description. Fixed a crash that could occur when…
Potent latency reversal by Tat RNA-containing nanoparticle enables multi-omic analysis of the HIV-1 reservoir
Participants and blood collection A total of n = 23 HIV-1 seropositive individuals on stably suppressive ART were included in this study (Supplementary Table 1). Participants were recruited at Ghent University Hospital. 2/23 individuals are female, 21/23 are male; the limited representation of female individuals in our study is a direct reflection of…
bwa-mem reproducibility
bwa-mem reproducibility 1 I have a set of paired end fastq files, and I run bwa-mem (v0.7.17-r1188) on the files with the same exact parameters, including the same number of threads, in two different computing clusters. I compare the BAM file produced via samtools stats. and the outputs are different…
Single-cell RNA-seq workflow
In this tutorial we walk through a typical single-cell RNA-seq analysis using Bioconductor packages. We will try to cover data from different protocols, but some of the EDA/QC steps will be focused on the 10X Genomics Chromium protocol. We start from the output of the Cell Ranger preprocessing software. This…
How to setup the pipeline of the RNA-Seq FASTQ file processing (macOS version)
This is a guide for preparing for importing RNA-Seq FASTQ files to Subio Platform on a Mac computer. If you use a Windows10 machine, please go to the guide for Windows10. Subio Platform utilizes the following tools to process the RNA-Seq FASTQ files. fastp to trim adapters and filter low-quality…
Diversity and dissemination of viruses in pathogenic protozoa
Wang, A. L. & Wang, C. C. Viruses of the protozoa. Annu. Rev. Microbiol. 45, 251–263 (1991). Article CAS PubMed Google Scholar Banik, G., Stark, D., Rashid, H. & Ellis, J. Recent advances in molecular biology of parasitic viruses. Infect. Disord. – Drug Targets 14, 155–167 (2015). Article Google Scholar …
Chromosome-level genome assembly of the Stoliczka’s Asian trident bat (Aselliscus stoliczkanus)
Dobson, G. E. On a new genus and species of Rhinolophidae, with description of a new species of Vesperus, and notes on some other species of insectivorous bats from Persia. J. Asiat. Soc. Bengal. 40, 455–461 (1871). Google Scholar Bates, P., Bumrungsri, S., Francis, C., Csorba, G. & Furey, N….
bcl2fastq troubleshooting all reads dumped to “Undetermined”
Hi everyone, Another lab ran a single-end sequencing run on a NextSeq for us, but now they can’t properly demultiplex them. I’m trying to see if I can figure it out. I run bcl2fastq (newest version) on the files, but all reads are dumped to Undetermined_S0_L001_R1_001.fastq.gz I’ve got a SampleSheet.csv…
DE Jobs – UPMC Bioinformatics Scientist in Pittsburgh, Pennsylvania, United States
UPMC Presbyterian is hiring a full-time Bioinformatics Scientist to support the Molecular & Genomic Pathology Lab! This role will be scheduled for daylight shifts, Monday-Friday. The Molecular & Genomic Pathology Laboratory is a dynamic, state-of-the-art clinical laboratory that prides itself on delivering the highest quality of patient care through cutting-edge…
Conserved and divergent gene regulatory programs of the mammalian neocortex
Nucleus preparation from frozen brain tissue for Chromium single-cell multiome ATAC and gene expression analysis M1 tissue was obtained from three human donors (male, aged 42, 29 and 58 years), three macaque donors (male, aged 6 (Macaca mulatta), 6 (M. mulatta) and 14 (Macaca fascicularis) years), three marmoset (Callithrix jacchus)…
Chromosome-level genome assembly of the Asian spongy moths Lymantria dispar asiatica
Boukouvala, M. C. et al. Lymantria dispar (L.) (Lepidoptera: Erebidae): Current Status of Biology, Ecology, and Management in Europe with Notes from North America. Insects 13 (2022). Keena M. A., Richards, J. Y. Comparison of Survival and Development of Gypsy Moth Lymantria dispar L. (Lepidoptera: Erebidae) Populations from Different Geographic…
QIAseq miRNA 96 Index Kit IL UDI-B (96)
Gel-free miRNA Sample to Insight solution for differential expression analysis and novel discovery using next-generation sequencing Features Gel-free miRNA sequencing library prep from as little as 1 ng of total RNA Elimination of adapter dimers and unwanted RNA species resulting in the highest fidelity and most efficient data Integrated Unique…
Indigenous Australian genomes show deep structure and rich novel variation
Inclusion and ethics The DNA samples analysed in this project form part of a collection of biospecimens, including historically collected samples, maintained under Indigenous governance by the NCIG11 at the John Curtin School of Medical Research at the Australian National University (ANU). NCIG, a statutory body within ANU, was founded…
Single-cell DNA methylome and 3D multi-omic atlas of the adult mouse brain
Mouse brain tissues All experimental procedures using live animals were approved by the Salk Institute Animal Care and Use Committee under protocol number 18-00006. Adult (P56) C57BL/6J male mice were purchased from the Jackson Laboratory at 7 weeks of age and maintained in the Salk animal barrier facility on 12-h dark–light…
Running cellranger on multiple files when R1 and R2 fastqs are in multiple subfolders
Running cellranger on multiple files when R1 and R2 fastqs are in multiple subfolders 0 Hi, I have downloaded a public dataset and would like to run cellranger count on it. The main folder is CNP000460, containing multiple samples, such as CNS0094872, this sample folder has several associated subfolders, each…
ubuntu – Medaka: unrecognized command ‘tools’ and samtools not found
When trying to run medaka_consensus in ubuntu, I am getting the following error. I installed into a virtualenv to run on ubuntu. (medaka) ubuntu:~/medaka$ medaka_consensus -i combined.fastq -d curated.fasta -t -o ~/medaka 10 -m r941_sup_plant_g610 TF_CPP_MIN_LOG_LEVEL is set to ‘3’ [main] unrecognized command ‘tools’ Attempting to automatically select model version….
Merge overlapping paired end reads from BAM file.
Merge overlapping paired end reads from BAM file. 0 Hi everyone, Using Trimmomatic and then HISAT2, I have aligned 300 RNA fastq samples (NovaSeq6000, RNA sequencing, paired-end, 150bp sequencing). I have found a percentage of overlapping paired end reads (read through) in the 300 .bam files. I found the overlaps…
Construction of a risk stratification model integrating ctDNA to predict response and survival in neoadjuvant-treated breast cancer | BMC Medicine
Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, Bray F. Global Cancer Statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2021;71(3):209–49. Article PubMed Google Scholar Giaquinto AN, Sung H, Miller KD, Kramer JL, Newman LA,…
Long PCRSeq – Microsynth – CH
Explore expanded possibilities with Microsynth’s Long PCRSeq, leveraging the cutting-edge long-read sequencing technology from Oxford Nanopore Technologies (ONT) to sequence clonal linear DNA ranging from 600 bp to 50 kb in length. Conveniently accessible for samples in tubes and 96-well plates, this service builds upon the capabilities of…
What is the troubleshoot for this error: conversion of .SRA to FASTA file on command prompt?
I am getting this error message after using the following code: C:\sratoolkit.3.0.7-win64\sratoolkit.3.0.7-win64\bin>fastq-dump –fasta SRR1658345 Error: 2023-12-11T06:08:04 fastq-dump.3.0.7 err: timeout exhausted while waiting condition within process system module – failed SRR1658345 ============================================================= An error occurred during processing. A report was generated into the file ‘C:\Users\Hp/ncbi_error_report.txt’. If the problem persists, you may…
Running STAR aligner on paired-end reads as single-end read
Running STAR aligner on paired-end reads as single-end read 1 Hi, I am just curious. Can we use STAR aligner to align the paired-end reads as a single-end? What are the consequences of the output of such alignment? RNA-Seq paired end single end STAR align • 17k views Yes, STAR…
Search a read by its name in a big fastq.gz file
Search a read by its name in a big fastq.gz file 0 Dear All, How can I search for a read by part of its name in a big fastq.gz file (size around 13GB)? For example, I would like to search for a read name containing the “VH01677:31:AACCMFHHV:1:1101:6586:25290” string in…
Snakemake rule error
Snakemake rule error 0 I have the following rule in snakemake: rule low_coverage_contig_reads: input: bam=”data/processed/bam_files/bam/{sample}_{fraction}.bam.bai”, output: r1=”data/processed/clean_reads/low_cov/low_cov_{sample}_{fraction}_R1.fq.gz”, r2=”data/processed/clean_reads/low_cov/low_cov_{sample}_{fraction}_R2.fq.gz” threads: 8 params: bam=”data/processed/bam_files/bam/{sample}_{fraction}.bam” log: log1=”logs/{sample}_{fraction}_low_coverage_reads.log”, shell: “”” (samtools coverage {params.bam} | awk ‘NR > 1 && $7 < 10 {{print $1}}’ | tr ‘\\n’ ‘ ‘ | samtools view -u {params.bam}…
MetaSPAdes genome assembly (shotgun metagenome singleend) – usegalaxy.org support
Busrak December 11, 2023, 3:31pm 1 Hi friends, Metagenomic single end raw data with cut adapt‘Maximum error rate 0.3’Match times: 1Minimum overlap length:3minimum lenght: 15Max N: 0.3Max expected errors: 30 parameters.Then I aligned the host genome with gallus gallus with BBmap tool. I want to Assembly unmapped read. However, MetaSPAdes…
What apps can read large FASTQ files on Windows?
What apps can read large FASTQ files on Windows? 2 I have fastq files of around 12 GB size. I have tried opening them with Sublime Text and Atom but they are not able to read them. What apps can read really large FASTQ files on Windows? fastq apps •…
Is interleaved Fastq files the same as interlaced fastq
Is interleaved Fastq files the same as interlaced fastq 0 There are errors in BWA-MEM2 that there manipulation Data • 271 views • link updated 2 hours ago by Joe 21k • written 1 day ago by RCMC • 0 Login before adding your answer. Read more here: Source link
SRA toolkit (NCBI) – sra to fasta
SRA toolkit (NCBI) – sra to fasta 1 Dear all, At the moment I’m trying to download sequences from the Sequence Read Archive (SRA) from NCBI and put them into fasta format. For this I downloaded the SRA-toolkit of NCBI and used the following code: set PATH=%PATH%;C:\Users\Admin\Desktop\sratoolkit.2.9.0-win64\sratoolkit.2.9.0-win64\bin prefetch –max-size 100000000…
PacBio subreads.fastq files?
PacBio subreads.fastq files? 0 I have downloaded PacBio isoseq data as subreads.fastq format from NCBI. Most of the isoseq analysis tools require input as Pacbio .bam file, which is unavailable form NCBI. I want to perform differential gene expression analysis and alternative splicing analysis. I have confusion regarding the nature…
Insert Size For Illumina Gaiix Paired-End Library From Sam/Bam File
Insert Size For Illumina Gaiix Paired-End Library From Sam/Bam File 2 From the fastq data (read 1 and read 2) from illumina GAIIx platform ( paired-end library), I created the Sam and bam file using BWA. I got the statistics of number of uniquely-paired reads and total reads mapped to…
Resin acids play key roles in shaping microbial communities during degradation of spruce bark
Bark preparation Spruce bark was obtained from the Iggesund pulp and paper mill (Iggesund, Holmen AB, Sweden), from a bark pile resulting from stripping of spruce logs at the mill after harvest, with the average age of trees at harvest being ~70 years. The bark was left to dry at…
3 Simple Ways to Download FASTQ files | by Vijini Mallawaarachchi | The Computational Biology Magazine | Dec, 2023
A detailed overview of 3 ways to download FASTQ files of SRA runs from NCBI As bioinformaticians, the National Center for Biotechnology Information (NCBI) is one of the most important resources we use to get data. NCBI plays a crucial role in our research community due to its extensive databases…
Haplotype-resolved genome of heterozygous African cassava cultivar TMEB117 (Manihot esculenta)
Wang, P. et al. The genome evolution and domestication of tropical fruit mango. Genome Biol 21 (2020). Tang, C. et al. The rubber tree genome reveals new insights into rubber production and species adaptation. Nat Plants 2 (2016). Bredeson, J. V. et al. Sequencing wild and cultivated cassava and related…
DNA polymerases in precise and predictable CRISPR/Cas9-mediated chromosomal rearrangements | BMC Biology
Cell culture The human endometrial carcinoma HEC-1-B cells were cultured in the modified Eagle’s medium (MEM) supplemented with 10% fetal bovine serum (FBS) and 1% penicillin-streptomycin at 37°C in a 5% (v/v) CO2 incubator. The human embryonic kidney HEK293T cells were cultured in the Dulbecco’s modified Eagle’s medium (DMEM) supplemented…
Java class error message when using BBDuk
Java class error message when using BBDuk 0 I am trying to run BBDuk to quality trim and filter my illumina whole genome sequences. I have used other trimming scripts before and have not had a problem. Although this is my first time preprocessing sequencing data from Quantseq samples. I…
Ambiguous genes due to aligners and their impact on RNA-seq data analysis
Datasets To avoid the so-called ‘dataset bias’20,that some datasets are generated with specific structures and thus the results are ‘over-optimistic’ (in the case of working with our novel method), we performed the analysis in the light of several real datasets (see Table 4). We used four different datasets from the NCBI…
Metagenome Sequencing – SeqCenter, LLC
Microbiomes contain diverse microbial communities whose interactions not only impact the system residents, but also have profound effects on the ecology, chemistry, and health of their shared host or environment. Capturing the interactions among microbes is essential as we expand our understanding of host-microbe dynamics, evaluate environmental degradation or remediation,…
sam – Discrepancy in Read Counts Between FastQ and BAM Files in Adapter-Trimmed Pipeline
In a FastQ to BAM pipeline where only adapter trimming is performed, I’ve noticed a potential discrepancy in read counts between the initial FastQ files and their resulting BAM file. Specifically, I’m seeking clarification on whether the following statement holds true: “Total number of reads in R1 and R2 FastQ…
Illumina RNA Sequencing – SeqCenter, LLC
RNA, the intermediate stage in the flow of genetic information from DNA to protein, contains many systems of expression and regulation that can be studied through RNA sequencing. While whole genome sequencing reveals information at the DNA level, next generation RNA sequencing (RNA-seq) characterizes the transcriptome and provides additional context…
Very low successfully assigned alignments with feature counts
Hello everyone, I am stuck trying to analyze some single-end RNAseq data from human tissue. My issue is that the alignment with HISAT 2 went very well: 94.95% overall alignment rate. However, when I use featureCounts, I get: 5.7% when I set the strandSpecific parameter to 1. 5.3% when I…
Cell ranger multi for demultiplexing FB files and GEX files
Cell ranger multi for demultiplexing FB files and GEX files 0 I have my feature csv file which is multiplexing capture, i have feature barcode fastq files, gex fastq files and i was trying to run cell-ranger multi but this error shows up (error] Deplex Error: No valid tags were…
Sorted bam files are empty after sorting them from bam
Sorted bam files are empty after sorting them from bam 0 Hi, I have been working with all my DNA analysis files in parallels but I got to a point where I had about 15 files get stuck on one step. Specifically, I notice something is wrong because the files…
Fetching subsets with slow5curl and samtools
{“payload”:{“allShortcutsEnabled”:false,”fileTree”:{“docs”:{“items”:[{“name”:”data.md”,”path”:”docs/data.md”,”contentType”:”file”},{“name”:”mount.md”,”path”:”docs/mount.md”,”contentType”:”file”},{“name”:”slow5curl.md”,”path”:”docs/slow5curl.md”,”contentType”:”file”}],”totalCount”:3},””:{“items”:[{“name”:”docs”,”path”:”docs”,”contentType”:”directory”},{“name”:”README.md”,”path”:”README.md”,”contentType”:”file”}],”totalCount”:2}},”fileTreeProcessingTime”:21.958637,”foldersToFetch”:[],”reducedMotionEnabled”:null,”repo”:{“id”:641926755,”defaultBranch”:”main”,”name”:”gtgseq”,”ownerLogin”:”GenTechGp”,”currentUserCanPush”:false,”isFork”:false,”isEmpty”:false,”createdAt”:”2023-05-17T13:03:07.000Z”,”ownerAvatar”:”avatars.githubusercontent.com/u/133880336?v=4″,”public”:true,”private”:false,”isOrgOwned”:true},”symbolsExpanded”:false,”treeExpanded”:true,”refInfo”:{“name”:”main”,”listCacheKey”:”v0:1684328588.326433″,”canEdit”:false,”refType”:”branch”,”currentOid”:”4079e27791c34880ca1a3a9bba9e2b2fc2885bab”},”path”:”docs/slow5curl.md”,”currentUser”:null,”blob”:{“rawLines”:null,”stylingDirectives”:null,”csv”:null,”csvError”:null,”dependabotInfo”:{“showConfigurationBanner”:false,”configFilePath”:null,”networkDependabotPath”:”/GenTechGp/gtgseq/network/updates”,”dismissConfigurationNoticePath”:”/settings/dismiss-notice/dependabot_configuration_notice”,”configurationNoticeDismissed”:null,”repoAlertsPath”:”/GenTechGp/gtgseq/security/dependabot”,”repoSecurityAndAnalysisPath”:”/GenTechGp/gtgseq/settings/security_analysis”,”repoOwnerIsOrg”:true,”currentUserCanAdminRepo”:false},”displayName”:”slow5curl.md”,”displayUrl”:”github.com/GenTechGp/gtgseq/blob/main/docs/slow5curl.md?raw=true”,”headerInfo”:{“blobSize”:”3.77 KB”,”deleteInfo”:{“deleteTooltip”:”You must be signed in to make or propose changes”},”editInfo”:{“editTooltip”:”You must be signed in to make or propose changes”},”ghDesktopPath”:”desktop.github.com”,”gitLfsPath”:null,”onBranch”:true,”shortPath”:”59fb302″,”siteNavLoginPath”:”/login?return_to=https%3A%2F%2Fgithub.com%2FGenTechGp%2Fgtgseq%2Fblob%2Fmain%2Fdocs%2Fslow5curl.md”,”isCSV”:false,”isRichtext”:true,”toc”:[{“level”:1,”text”:”Fetching subsets with slow5curl and samtools”,”anchor”:”fetching-subsets-with-slow5curl-and-samtools”,”htmlText”:”Fetching subsets with slow5curl and samtools”},{“level”:2,”text”:”Installing necessary tools”,”anchor”:”installing-necessary-tools”,”htmlText”:”Installing necessary tools”},{“level”:2,”text”:”Example: Fetching a subset of reads”,”anchor”:”example-fetching-a-subset-of-reads”,”htmlText”:”Example: Fetching a subset of reads”},{“level”:2,”text”:”Example: Fetching and basecalling a subset of…
The role of APOBEC3B in lung tumor evolution and targeted cancer therapy resistance
Cell line and growth assays Cell lines were grown in Roswell Park Memorial Institute-1640 medium (RPMI-1640) with 1% penicillin–streptomycin (10,000 U ml−1) and 10% FBS or in Iscove’s modified Dulbecco’s medium (IMDM) with 1% penicillin–streptomycin (10,000 U ml−1), l-glutamine (200 mM) and 10% FBS in a humidified incubator with 5% CO2 maintained at 37 °C. Drugs…
Borrelia puertoricensis in opossums (Didelphis marsupialis) from Colombia | Parasites & Vectors
Oppler Z, Keeffe K, McCoy K, Brisson D. Evolutionary genetics of Borrelia. Curr Issues Mol Biol. 2021;42:97–112. doi.org/10.21775/cimb.042.097.2. Article PubMed Google Scholar Margos G, Fingerle V, Cutler S, Gofton A, Stevenson B, Estrada-Peña A. Controversies in bacterial taxonomy: the example of the genus Borrelia. Ticks Tick Borne Dis. 2020;11:101335. doi.org/10.1016/j.ttbdis.2019.101335….
Process Truncated fastq file
Process Truncated fastq file 1 Dear all, I have 150bp paired-end mRNA data, for one sample in the reverse reads (R2) file the QC (FastQC) run for upto 95 % and then failed with an error message: Failed to process file Sample1-mRNA_R2.fastq.gz uk.ac.babraham.FastQC.Sequence.SequenceFormatException: Ran out of data in the middle…
4 Fastq files for a single run generated by 10X
4 Fastq files for a single run generated by 10X 0 Hello, I have a question about the 10X generated Fastq files. As I know 10X platforms can generate up to 4 Fastq files as R1, R2, I1 and I2. I need to use Fastq files and align them with…
mouse+human cells in 10x scRNA-seq
mouse+human cells in 10x scRNA-seq 1 Hi everyone, I’m analyzing 10x scRNA-seq data generated from xenografts (mouse + human tissues). I have the following workflow to label cells as either mouse or human: Align 10x scRNA-seq data to mouse+human combined genome using cellranger count. Use the file generated by cellranger…
Bam files generated with STAR cause a segmentation fault core dump error when used with another tool
I am mapping RNA-Seq data using STAR, using multi-sample two-pass mapping. I first mapped all samples with one-pass then concatenated their SJOut files and filtered junctions. I launched the second mapping by using this SJOut file. I used this command to generate genome : ` /home/STAR-2.7.10b/bin/Linux_x86_64/STAR \ –runThreadN 10 \…
BIRCH Tutorial – Genome Assembly
BIRCH Tutorial – Genome Assembly References FastQC – www.bioinformatics.babraham.ac.uk/projects/fastqc/ trim_galore User’s Guidetrim_galore manual SeqKit 0. Obtain sequencing read files DATASET: Fakankun I et al. Ph.D. thesis, University of Manitoba (in progress) Rhodosporidium diobovatum.This dataset is a random sample of about 5% of the reads from fungal genomic DNA. raw read…
Yes .. BBMap can do that!
NOTE: This collection was originally posted at SeqAnswers.com. Creating a copy here to preserve the information.Part I is available here: Yes .. BBMap can do that! – Part I : bbmap (aligner), bbduk (scan/trim), repair (fix PE reads) and reformat (format conversions)Part II is available here: Yes .. BBMap can…
Salpa genome and developmental transcriptome analyses reveal molecular flexibility enabling reproductive success in a rapidly changing environment
Loeb, V. et al. Effects of sea-ice extent and krill or salp dominance on the Antarctic food web. Nature 387, 897–900 (1997). Article ADS CAS Google Scholar Atkinson, A., Siegel, V., Pakhomov, E. & Rothery, P. Long-term decline in krill stock and increase in salps within the Southern Ocean. Nature…
Common analysis of direct RNA sequencinG CUrrently leads to misidentification of m5C at GCU motifs
Introduction Oxford Nanopore Technologies (ONT) direct RNA sequencing (Fig 1A) enables detection of RNA modifications. A modified base produces an altered electrical current and/or dwell time relative to a canonical base that can be detected with algorithms (Garalde et al, 2018; Smith et al, 2019; Workman et al, 2019). Figure…
Problematic fastq files…How can we trust them?
Problematic fastq files…How can we trust them? 1 Hello fellas, A week ago I made another post regarding an error I was getting while I was trying to run BBDuk on a number of fastq files. In that case, there were lines that miss the “+” char. After looking a…
Salmon indices differences
Salmon indices differences 1 I am trying to run Salmon locally on prostate cancer samples, and I used this command: salmon quant -i data/2230c535660fb4774114bfa966a62f823fdb6d21acf138d4/salmon_sa_index/default -l A -1 SRR21898893_1.fastq.gz -2 SRR21898893_2.fastq.gz –validateMappings –gcBias -o transcripts_quant I downloaded the pre-built versions of the partial decoy (salmon_partial_sa_index) and full decoy (salmon_sa_index) indices from…
Extracting chimeric reads from mapping
Hello, I am struggling to processing and analyse bam files (from bwa alignment), to extracting the chimeric read alignment. I am aligning human cell line RNA-seq data (paired end) to virus, aimed to find the viral integration sites in the genome. For that, after reading a bit here from following…
Longitudinal detection of circulating tumor DNA
Analysis of Roche KAPA Target Enrichment kit experimental data obtained on an Illumina sequencing system is most frequently performed using a variety of publicly available, open-source analysis tools. The typical variant calling analysis workflow consists of sequencing read quality assessment, read filtering, mapping against the reference genome, duplicate removal, coverage…
ESRP1 controls biogenesis and function of a large abundant multiexon circRNA | Nucleic Acids Research
Abstract While the majority of circRNAs are formed from infrequent back-splicing of exons from protein coding genes, some can be produced at quite high level and in a regulated manner. We describe the regulation, biogenesis and function of circDOCK1(2–27), a large, abundant circular RNA that is highly regulated during epithelial-mesenchymal…
bwa mem hangs after a few thousand reads
I am trying to align a bunch of paired sample fastq files using bwa mem. My original command was: bwa mem -t 8 hg38.fa sample_read1.fq.gz sample_read2.fq.gz > sample_paired.sam I am running this on a HPC cluster. These files have approx. 25 million reads, so I initially anticipated that they might…
How to get unaligned reads and aligned reads into separate files from SAM/BAM?
How to get unaligned reads and aligned reads into separate files from SAM/BAM? 0 I have long reads aligned with MiniMap2 in the form of SAM file. I want to get my unmapped reads into a file called unmapped.fastq.gz and my aligned reads into a file called mapped.fastq.gz. How can…
Senior Bioinformatics Software Engineer – Land A Remote Job From Top Employers
The Center for Applied Bioinformatics (CAB) at the St. Jude Children’s Research Hospital (SJCRH) is seeking a creative Software Engineer with a strong background in bioinformatics to join our development team to create and maintain our vital analytical infrastructure. The new hire will work closely with a team of computer…
Generate Read counts from bam file
Generate Read counts from bam file 2 Currently i am working on a project related to LHON disease (rare mitochondrial disorder which leads to progressive visual loss). I have 9 RNA-seq fastq files out of which 3 are for carriers, 3 for affected and 3 for control. Data downloaded is…
bam or VCF files from GSE75010
bam or VCF files from GSE75010 1 Hi all I’m planning to run a variant calling analysis using Microarray data GSE75010 that contains GSE75010_RAW.tar and GSE75010_complete_dataset.csv.gz. I used to download the .fastq files using SRA Run numbers through Ubuntu/Linux to get .bam and VCF files. However, this is not the…
Discrepancy in total number of bases in trimmed read1 and read2 files after BBDuk
Discrepancy in total number of bases in trimmed read1 and read2 files after BBDuk 0 Hi all, After performing adapter trimming with bbduk.sh, I found that the total number of bases in the read1 file is different compared with the read2 file from FastQC quality check. Below was the code…
low rate of ‘Successfully assigned alignments’
Hello everybody. I’m a newbie in RNA-seq Analysis, and I have this situation that I don’t really understand. While working with featureCounts for RNA-seq read quantification, I came across an intriguing issue. The rate of successfully assigned alignments turned out to be unexpectedly low, totalling just 15463270 (7.6%). This was…
Bowtie mapping for single_end read
Bowtie mapping for single_end read 1 bowtie –threads 5 -X 1000 -m 1 -v 2 –best –strata –sam IndexedGenome ${DATA_DIR}${SRR_ID}.trim.fastq > ${SAM_DIR}${SRR_ID}.sam Hi All, I am using the above script to map single_end ChIP-Seq reads. The percentage of the aligned reads is around 20%. How can I modify the script…
I made an error when using metawrap to binning
I made an error when using metawrap to binning 1 my code metawrap binning -o bin_out -t 24 -m 200 -a all_contig/all_merge.fasta –metabat2 –maxbin2 –concoct all_fastq/*fastq Error reported as follows sorting the SRR10492802 alignment file [bam_sort_core] merging from 24 files and 24 in-memory blocks… [E::sam_hdr_sanitise] Malformed SAM header at line…
Enhanced specificity of Bacillus metataxonomics using a tuf-targeted amplicon sequencing approach
Parte AC, Sardà Carbasse J, Meier-Kolthoff JP, Reimer LC, Göker M. List of Prokaryotic names with standing in nomenclature (LPSN) moves to the DSMZ. Int J Syst Evol Microbiol. 2020;70:5607–12. Article PubMed PubMed Central Google Scholar Saxena AK, Kumar M, Chakdar H, Anuroopa N, Bagyaraj DJ. Bacillus species in soil…
Ancient diversity in host-parasite interaction genes in a model parasitic nematode
Van Valen, L. A new evolutionary law. Evol. Theory 1, 1–30 (1973). Google Scholar Woolhouse, M. E. J., Webster, J. P., Domingo, E., Charlesworth, B. & Levin, B. R. Biological and biomedical implications of the co-evolution of pathogens and their hosts. Nat. Genet. 32, 569–577 (2002). Article CAS PubMed Google…
Running STAR on fastq file generated from a RNA-seq experiment
Running STAR on fastq file generated from a RNA-seq experiment 1 Hi, I am new to bioinformatics, especially on the command line. I am trying to run STAR alignment on pairs of fastq.gz files from several samples generated as part of an RNAseq experiment. My goal is to perform splice…
H101 for cervical cancer | DDDT
Introduction Patients with persistent, recurrent, or metastatic (P/R/M) cervical carcinoma respond poorly to treatment despite the best available therapeutic regimens, with a 5-year survival of 17%.1 Most of them are heavily pretreated with chemotherapy and/or radiotherapy, and many patients experience complications related to treatment or advanced disease, which exclude them…
Best practices for unstranded sequences in featureCounts
Hi everyone, I’m using featureCounts to analyze some RNA-Seq data, but I have several doubts in the use with unstranded library. First, when I analyze some SRA sequences or when I don’t know the library type, I use Salmon to know it with the next command: salmon quant -p 32…
Masurca. Failed to pre-correct Nanopore data, please check your data!
Hi! I have problem with Masurca. I am doing a hybrid assembly nanopore with Illumina. My config file look like this: DATA PE = aa 200 20 ../data/Ldec_illumina/SRR1055545_1.fastq ../data/Ldec_illumina/SRR1055545_2.fastq PE = ab 200 20 ../data/Ldec_illumina/SRR1055547_1.fastq ../data/Ldec_illumina/SRR1055547_2.fastq PE = ac 200 20 ../data/Ldec_illumina/SRR1055548_1.fastq ../data/Ldec_illumina/SRR1055548_2.fastq PE = ad 200 20 ../data/Ldec_illumina/SRR1055549_1.fastq ../data/Ldec_illumina/SRR1055549_2.fastq…
fragments.tsv.gz file in ATAC seq
fragments.tsv.gz file in ATAC seq 0 Hi all, I looked at some tutorials ATAC seq. They use fragments.tsv.gz at the beginning of the analysis. For my ATAC seq data, I have fastq, bam and bw files but not fragment file. So the fragments files will be created from fastq files,…
SSR molecular marker developments and genetic diversity analysis of Zanthoxylum nitidum (Roxb.) DC
Commission, S. P. Pharmacopoeia of the People’s Republic of China (Pharmacopoeia of the People’s Republic of China, 2020). Google Scholar Sun, X. & Sun, F. Shennong Materia Medica Classic (People’s Health Publishing House, 1963). Google Scholar Huang, C. Flora of China Vol. 43 (Science Press, 1997). Google Scholar Hu, J….
Tools for Efficient Retrieval from GEO and SRA Databases | by Denis Odinokov, MBBS, MSc, PMP | Nov, 2023
Image by Gerd Altmann from Pixabay For downloading data and standardized metadata from GEO (Gene Expression Omnibus) and SRA (Sequence Read Archive), several bioinformatics and command-line tools and scripts are available, primarily hosted on GitHub. ARA: An automated pipeline developed for better sampling of NCBI SRA database records, allowing full…
Genome mining reveals novel biosynthetic gene clusters in entomopathogenic bacteria
Katz, L. & Baltz, R. H. Natural product discovery: Past, present, and future. J. Ind. Microbiol. Biotechnol. 43, 155–176 (2016). Article CAS PubMed Google Scholar Newman, D. J. & Cragg, G. M. Natural products as sources of new drugs from 1981 to 2014. J. Nat. Prod. 79, 629–661 (2016). Article …
No best K value found
No best K value found 0 I used command ./kmergenie <filename.fastq> running histogram estimation Setting maximum kmer length to: 251 bp computing histograms (from k=21 to k=121): ntCard wall-clock time over all k values: 0 seconds fitting model to histograms to estimate best k could not predict a best k…
Salmon index problem
Salmon index problem 0 Hello, I’m trying to use Salmon in the mapping-based mode, and I downloaded the full decoy salmon indices via refgenie list here using the refgenie command refgenie pull hg38/salmon_sa_index and it download the full folder locally. Now I have this index folder and SRR21898893_1.fastq.gz and SRR21898893_2.fastq.gz…