Tag: CHR

PCA from plink2 for SGDP using a pangenome and DeepVariant

Hi there, I’m doing my first experiments with PCA and UMAP as dimensionality reductions to visualize a dataset I’ve been working on. Basically, I used the samples from the SGDP which I then mapped on the human pangenome for, finally, calling small variants with DeepVariant. I moved on with some…

Continue Reading PCA from plink2 for SGDP using a pangenome and DeepVariant

What Is The Meaning Of The Score In Diffbind’S Occupancy/Overlap Analysis?

I have 3 chip-seq biological replicates, each with its own input control. I was interested in using diffBind to performs some IDR-style analysis – e.g. take only the peaks that come up in more than one sample. experiment <- dba(sampleSheet=”exp_samples.csv”) pdf(‘overlap_venn.pdf’) dba.plotVenn(experiment, experiment$masks$macs) dev.off() I got a nice Venn diagram:…

Continue Reading What Is The Meaning Of The Score In Diffbind’S Occupancy/Overlap Analysis?

Solved write R code using ggplot or the ggraph library using

write R code using ggplot or the ggraph library using these variables to create a heatmap Class ‘igraph’ hidden list of 10 $ : num 28 $ : logi FALSE $ : num [ 1 : 344 ]  21  21  21  21  21  21…

Continue Reading Solved write R code using ggplot or the ggraph library using

Study uncovers vast genomic diversity in Aboriginal Australian communities

In a recent study published in the journal Nature, researchers investigated the previously underrepresented genomic diversity of four Aboriginal Australian communities. They used population-scale whole-genome (WGS) long-read sequencing. Study findings revealed unique alleles comprised of insertion-deletion variants, variable copy number regions, and structural variants, 62% of which are novel to…

Continue Reading Study uncovers vast genomic diversity in Aboriginal Australian communities

Beyond the exome: utility of long-read whole genome sequencing in exome-negative autosomal recessive diseases | Genome Medicine

Our cohort comprises 34 families in which a presumably autosomal recessive disease defied molecular diagnosis by clinical exome sequencing (short-read sequencing-based) and reanalysis performed on the index individual for each family (Fig. 1). The index patient in each family was subjected to an average of 10 × depth lrWGS except for Family F8602…

Continue Reading Beyond the exome: utility of long-read whole genome sequencing in exome-negative autosomal recessive diseases | Genome Medicine

Why is the number of Minimap2 alignment observations different with CIGAR generation flag?

I am using Minimap2 in Linux to generate a sequence alignment between the Streptomyces coelicolor A3(2) chromosome (ref.fa) and the Mycobacterium tuberculosis chromosome (query.fa). My desired output is a PAF (Pairwise mApping Format) file. The general way to align reference and query sequences with Minimap2 is the following: minimap2 ref.fa…

Continue Reading Why is the number of Minimap2 alignment observations different with CIGAR generation flag?

Methylation Analysis Tutorial in R_part1

The code and approaches that I share here are those I am using to analyze TCGA methylation data. At the bottom of the page, you can find references used to make this tutorial. If you are coming from a computer background, please bear with a geneticist who tried to code…

Continue Reading Methylation Analysis Tutorial in R_part1

Converting a .frq file to a data frame?

Converting a .frq file to a data frame? 0 Hi! I’m a bit new to the whole bioinformatics community and I’m working with MAF’s stored in a .frq file that looks like this: CHROM POS N_ALLELES N_CHR {ALLELE:FREQ} AE014298.5 5694 2 0 AAAAAAAAAAAAAACCAGC:-nan AAAAAAAAAAGTTAAAAAAATAAAACCAGC:-nan AE014298.5 51946 2 0 A:-nan G:-nan…

Continue Reading Converting a .frq file to a data frame?

r – How to perform t test and plot p-values for comparison between groups on a grouped boxplot (ggplot)?

I have a data frame, as shown below: > dput(filtered_lymph) structure(list(cluster = c(“CD4+ Tcells”, “CD4+ Tcells”, “CD4+ Tcells”, “CD4+ Tcells”, “CD4+ Tcells”, “CD4+ Tcells”, “CD4+ Tcells”, “CD4+ Tcells”, “CD4+ Tcells”, “CD4+ Tcells”, “CD4+ Tcells”, “CD4+ Tcells”, “CD4+ Tcells”, “CD4+ Tcells”, “CD4+ Tcells”, “CD4+ Tcells”, “CD4+ Tcells”, “CD4+ Tcells”, “CD4+ Tcells”,…

Continue Reading r – How to perform t test and plot p-values for comparison between groups on a grouped boxplot (ggplot)?

Key gene linked to male bias in autism, Tourette’s, and ADHD uncovered

Research has documented a strong male sex bias in attention deficit hyperactivity disorder (ADHD), autism spectrum disorder (ASD), and Tourette syndrome (TS). Among males, the hemizygous nature of chromosome X (Chr X) has been a known vulnerability factor. Still, the characterization of rare genetic variation in Chr X has not…

Continue Reading Key gene linked to male bias in autism, Tourette’s, and ADHD uncovered

sparklyr – Databricks Connect v2

Last updated: Thu Dec 7 16:28:19 2023 Intro Databricks Connect enables the interaction with Spark clusters remotely. It is based on Spark Connect, which enables remote connectivity thanks to its new decoupled client-server architecture. This allows users to interact with the Spark cluster without having to run the jobs from…

Continue Reading sparklyr – Databricks Connect v2

Failed to open /ROH/.log. Try changing the –out parameter.

Error: Failed to open /ROH/.log. Try changing the –out parameter. 0 when I used this code in R system(“plink –vcf Pakistan.total.vcf –homozyg –homozyg-window-snp 50 –homozyg-snp 50 –homozyg-window-missing 3 –homozyg-kb 100 –homozyg-density 1000 –allow-extra-chr –out /ROH/plink/n”) I got this error: Error: Failed to open /ROH/plink/n.log. Try changing the –out parameter. How…

Continue Reading Failed to open /ROH/.log. Try changing the –out parameter.

How to properly mock a (Pysam) read

I am creating a custom softclipping tool due to a limitation found in Ampliconclip Issue softclipping reads when they belong and don’t belong to a common amplicon. I am progressing ok in my development and I am developing tests. I found some limitations when I try to mock a Pysam…

Continue Reading How to properly mock a (Pysam) read

how to remove multiple columns from a file in R

how to remove multiple columns from a file in R 1 Hi all, Could anyone please help how to remove col names from Meth file which are present in DF file. DF <- read.table(“DIFF.txt”, as.is=T, na.strings=”NA”, check.names=FALSE) head(DF) x 1 NSE.1.0096 2 NSE.1.0100 3 NSE.1.0121 library(readr) Meth <- read_csv(“betas_1.csv”, col_types…

Continue Reading how to remove multiple columns from a file in R

Read count vs Depth

Hi! I have been RNA seq short read sequencing data for a 112 dengue samples. I need to know by what percentage transcriptome is covered by our sequencing reads? I found Bedtools as an appropriate tool for this. however, i am unable to understand two different outputs from this tool…..

Continue Reading Read count vs Depth

Where can I get a list of SNPs mapping overlapping genes in humans?

Given files genes.bed and snps.bed, you could do something like: $ bedmap –echo –echo-map-id –delim ‘\t’ genes.bed snps.bed > answer.bed The file answer.bed will contain the gene annotation and a semi-colon delimited list of SNP identifiers that overlap each gene. In order to get genes.bed, you could use Gencode v44…

Continue Reading Where can I get a list of SNPs mapping overlapping genes in humans?

ASEReadCounter output wrong number of coverage

ASEReadCounter output wrong number of coverage 0 Hi, I am using ASEReadCounter to count the number of reads per variant in a BAM file. For some positions, it will report 1 read covered(1 refCount or 1 altCount) while there is no read covered at those positions after checking it in…

Continue Reading ASEReadCounter output wrong number of coverage

Animal Microbiome Market Latest Updated Report 2023-2031

PRESS RELEASE Published December 5, 2023 Animal Microbiome Market Assessment worth $ 12.4 Billion by 2030 – Exclusive Report by InsightAce Analytic InsightAce Analytic Pvt. Ltd. announces the release of a market assessment report on the “Global Animal Microbiome Market– by Products (Phage Therapy Product, Functional Food (Probiotics, Prebiotics and…

Continue Reading Animal Microbiome Market Latest Updated Report 2023-2031

Issues with Chromosome Encoding and VCF Annotation in dbSNP Alpha Release

Body: Hello, Biostars Community, I am working on creating a custom database of variants using the VCF from the latest dbSNP alpha release available at ftp.ncbi.nih.gov/snp/population_frequency/latest_release/. I have encountered a couple of issues that I’m hoping someone might help me resolve. Firstly, the chromosome encoding uses RefSeq IDs (e.g., NC_000007.12)…

Continue Reading Issues with Chromosome Encoding and VCF Annotation in dbSNP Alpha Release

vcftools

vcftools 1 Hi, I tried this code but I couldn’t get any output. Please guide me to resolve this issue bash for i in {1..2} do vcftools –LROH –vcf Pakistan.total.vcf –out ${i} –chr i done vcftools • 41 views • link updated 2 hours ago by Barista &utrif; 10 •…

Continue Reading vcftools

locuszoom error

locuszoom error 1 Hi there, I downloaded locuszoom in it’s entirty from their github page. I have everything in order but when I attempt to run the following I get an error: locuszoom_test/bin/locuszoom –metal chr7.snx13.marker.locuszoom.metal –ld chr7.imputed.concat.sorted.nomono.80.recode.maf01.pheno.vcf.2.ld.locuszoom.input –refsnp rs1533245 –prefix chr7.snx13.rs1533245 /share/hennlab/progs/locuszoom_test/bin/../src/m2zfast.py:82: SyntaxWarning: invalid escape sequence ‘\d’ RE_SNP_1000G = re.compile(“chr(\d+|[a-zA-z]+):(\d+)$”);…

Continue Reading locuszoom error

Longitudinal detection of circulating tumor DNA

Analysis of Roche KAPA Target Enrichment kit experimental data obtained on an Illumina sequencing system is most frequently performed using a variety of publicly available, open-source analysis tools. The typical variant calling analysis workflow consists of sequencing read quality assessment, read filtering, mapping against the reference genome, duplicate removal, coverage…

Continue Reading Longitudinal detection of circulating tumor DNA

There are 1 region(s)/phenotype(s) with p-value > 0.1 (not significant).

I’m trying to get a PRS for a very small subset of samples (~400 training, ~200 testing). Firstly I C+T the training data previous to PRSice. My .log is PRSice 2.3.3 (2020-08-05) github.com/choishingwan/PRSice(C) 2016-2020 Shing Wan (Sam) Choi and Paul F. O’ReillyGNU General Public License v3If you use PRSice in…

Continue Reading There are 1 region(s)/phenotype(s) with p-value > 0.1 (not significant).

r – HIn karyoploteR, how can I color specific cytobands based on their names?

I would like to ask for some help in using karyoploteR. I am trying to plot some custom genomes with custom cytobands. I want to color each region based on the names. Unfortunately, I was not successful after trying to parse a custom table that would serve as a dictionary….

Continue Reading r – HIn karyoploteR, how can I color specific cytobands based on their names?

missing region in the process of annotation

missing region in the process of annotation 0 Hi. I am analyzing TCGA methylation data from TCGAbiolinks and I faced one problem during annotation process with annotatr. This TCGA data has covered a gene in the chromosome 19, but annotated result did not contain one region in chromosome 19. I…

Continue Reading missing region in the process of annotation

HTseq reports missing attribute name

HTseq reports missing attribute name 1 Hello, I am running this htseq command htseq-count -r pos -t gene -i gene -s yes -f bam \ /Volumes/cachannel/ZebraFinchBrain/CB-4a_genomemapping/sorted_alignmentcb4a.bam \ /Volumes/cachannel/ZebraFinchBrain/GCF_003957565.2/Taeniopygia_guttata.bTaeGut1_v1.p.110.chr.gff3 > \ /Volumes/cachannel/ZebraFinchBrain/HTSEQ_withautomate/output_counts.txt However I get this error: Error processing GFF file (line 75 of file /Volumes/cachannel/ZebraFinchBrain/GCF_003957565.2/Taeniopygia_guttata.bTaeGut1_v1.p.110.chr.gff3): Feature gene:ENSTGUG00000013637 does not contain…

Continue Reading HTseq reports missing attribute name

rstudio – How do I classify universities per state in R?

There are two different aspects of this question, including: Where do I find information about U.S. colleges and universities that includes the state in which the colleges are located? Given a source of information, how do I use R to create a data frame that includes the college name and…

Continue Reading rstudio – How do I classify universities per state in R?

Pruning with –indep-pairwise with plink 1.9

I’m new to PLINK and I would like to obtain a file with SNPs in approximate linkage equilibrium. Here is my script and the outputs of each step. If someone could tell me if there is an error in the script because at…

Continue Reading Pruning with –indep-pairwise with plink 1.9

Help finding the correct file version for dbSNP VCF ID replacement

Tried to use dbSNP version 156 using bcftools to replace the ID field in a reference VCF which originally contains a different position ID format. It seems the bcftools command did not work because a numeric chromosome column format in the #CHROM field which might not be compatible with bcftools…

Continue Reading Help finding the correct file version for dbSNP VCF ID replacement

Intrinsic deletion at 10q23.31, including the PTEN gene locus, is aggravated upon CRISPR-Cas9-mediated genome engineering in HAP1 cells mimicking cancer profiles

Introduction The CRISPR-Cas system is a widely used genome engineering technology because of its simple programmability, versatile scalability, and targeting efficiency (Wang & Doudna, 2023). Although researchers are rapidly developing CRISPR-Cas9 tools, the biggest challenge remains to overcome undesired on- and off-targeting outcomes. Previous studies have reported unintended genomic alterations,…

Continue Reading Intrinsic deletion at 10q23.31, including the PTEN gene locus, is aggravated upon CRISPR-Cas9-mediated genome engineering in HAP1 cells mimicking cancer profiles

How to perform liftover from 38 to 37 in R?

I have some gwas summary statistics in GRCh38 that I want to lift to GRCh37. I am trying to liftover in R using this code: library(tidyverse) library(magrittr) library(data.table) library(rtracklayer) library(GenomicRanges) rm(list=ls()) gwas_data <- fread(“/gwas_sumstats_allchr.txt”) chain_file <- “/chain_files/hg38ToHg19.over.chain” chain <- import.chain(chain_file) # Convert to GRanges object (assuming GENPOS is 1-based) gwas_ranges…

Continue Reading How to perform liftover from 38 to 37 in R?

Bedtools coverage -hist “all” in chr column

Bedtools coverage -hist “all” in chr column 0 $ bedtools coverage -a A.bed -b B.bed -hist chr1 0 100 b1 1 + 0 70 100 0.7000000 chr1 0 100 b1 1 + 1 30 100 0.3000000 chr1 100 200 b2 1 – 1 100 100 1.0000000 chr2 0 100 b3…

Continue Reading Bedtools coverage -hist “all” in chr column

How to change "CompressedGRangesList" to "GRangesList"

Hi, I am trying to A/B compartment analysis with minfi, but I got following error. “`r Error in { : task 1 failed – “is(object, “SummarizedExperiment”) is not TRUE” “` Since I want to use data with hg38 annotation but `makeGenomicRatioSetFromMatrix` function has only `ilmn12.hg19`, I did `makeGenomicRatioSetFromMatrix` function with…

Continue Reading How to change "CompressedGRangesList" to "GRangesList"

Getting message while running generateMap.pl from hmmcopy_utils

Getting message while running generateMap.pl from hmmcopy_utils 0 Hi, I’m trying to get a mappability wig file by running generateMap.pl from hmmcopy_utils. However, I’m getting a message which states that, Setting the index via positional argument will be deprecated in a future release. Please use -x option instead. What I…

Continue Reading Getting message while running generateMap.pl from hmmcopy_utils

[main_samview] fail to read the header from “human_g1k_v37.annotate.fasta”.

[main_samview] fail to read the header from “human_g1k_v37.annotate.fasta”. 1 Hi, I tried to annotate chromosome with prefix “chr” in a fasta file like: sed ‘s/^>/>chr/’ human_g1k_v37.fasta > human_g1k_v37.annotate.fasta However, after that, I failed to view header of the new fasta file: samtools view -H human_g1k_v37.annotate.fasta >>> [main_samview] fail to read…

Continue Reading [main_samview] fail to read the header from “human_g1k_v37.annotate.fasta”.

Clumping with r2=0 and 250kb radius in plink

Clumping with r2=0 and 250kb radius in plink 1 Hi, I am doing clumping with the follow command: plink \ –bfile ${myfilename} \ –keep all_hg38_EUR.ids \ –clump ${trait}_tmp2.txt \ –clump-snp-field SNP \ –clump-field P \ –allow-extra-chr \ –memory 30000 \ –clump-p1 5e-8 \ –clump-r2 0 \ –clump-kb 250 \ –out…

Continue Reading Clumping with r2=0 and 250kb radius in plink

Bgen file not being opened by PRSice

Bgen file not being opened by PRSice 0 I used the following command to calculate PRS of a sequenced file coming from a collaborator. I imputed the vcf file which gave me separate vcf files for each chromosome. I then converted them to bgen and generated bgi and sample files…

Continue Reading Bgen file not being opened by PRSice

Convert Seurat object to anndata

Hi all, I am trying to save my Seurat object to h5ad to use it as Anndata, I did it before using MuDataSeurat, with Seurat_4.3.0.1. `seurat_object = CreateSeuratObject(counts = out, project=”sample_name”) str(seurat_object) 1st str: Formal class ‘Seurat’ [package “SeuratObject”] with 13 slots ..@ assays :List of 1 .. ..$ RNA:Formal…

Continue Reading Convert Seurat object to anndata

How To Get Chromosome Position Given Rs Number?

How To Get Chromosome Position Given Rs Number? 3 I have a list of a few hundred SNPs given by rs number. I want to get the chromosome and position for each SNP. For example: input: rs4477212 output: chr1:82154 snp chromosome position • 29k views you can download this information…

Continue Reading How To Get Chromosome Position Given Rs Number?

Read file with the DSSAT R package – rstudio

Good evening All,I am trying to read experimental file from my system using the DSSAT R package and I get the following error below: I don’t know if there is any expert here that can help me; Here is my reprex: # Load Required Packages library(DSSAT) #> When using the…

Continue Reading Read file with the DSSAT R package – rstudio

human genome – How many Ns and ns in GRCh37 / GRCh38 per ‘canonical’ chromosome?

This is kind of pedantic, but I’m not sure where to look… For GRCh38 (and a lot of work…) I have the following… Chr Length Ns ns chr1 248,956,422 18,475,229 181 chr2 242,193,529 1645,291 10 chr3 198,295,559 195,420 4 chr4 190,214,555 461,888 0 chr5 181,538,259 272,881 0 chr6 170,805,979 727,255…

Continue Reading human genome – How many Ns and ns in GRCh37 / GRCh38 per ‘canonical’ chromosome?

Locally annotating SNP IDs and Gene names of called variants

Locally annotating SNP IDs and Gene names of called variants 0 I have GWAS results after variant calling. The VCF file only had CHR (1:22) and POS (12345678 etc) information but the ID column has all “.”, namely no rsIDs in it. After GWAS analysis I have a list of…

Continue Reading Locally annotating SNP IDs and Gene names of called variants

How to get just protein_coding genes using biomart in R

How to get just protein_coding genes using biomart in R 2 Dear all, I would like to have help with getting just protein_coding genes from gene expression file using biomart. What I have is a file of all genes expression for mouse (mm10) with ensemble gene_names, and I need to…

Continue Reading How to get just protein_coding genes using biomart in R

Snakemake issue with wrappers

I have issues when running a wrapper of BWA mem with Snakemake. The error message “No module named ‘snakemake_wrapper_utils’” appear (see below). However, when checking if the package is installed in Python, I found the following: import snakemake_wrapper_utils print(snakemake_wrapper_utils.__version__) 0.1.0 Did anyone have this problem? Would you know why there…

Continue Reading Snakemake issue with wrappers

couldn’t find matching transcriptome, returning non-ranged SummarizedExperiment AND unable to find an inherited method for function ‘seqinfo’ for signature ‘”SummarizedExperiment”‘

Dear Michael, I have not been able to run tximeta properly. I have read #38 but could not get any clue. The quant.sf files were generated by the latest nf-core RNA-seq pipeline (3.12.0), as the pipeline did not save the Salmon index, I generated it myself. Salmon used by nf-core…

Continue Reading couldn’t find matching transcriptome, returning non-ranged SummarizedExperiment AND unable to find an inherited method for function ‘seqinfo’ for signature ‘”SummarizedExperiment”‘

Filtering for primary and secondary reads using sam flags (0 properly paired reads in alignment step)

Hey everybody, I have just performed the alignment of paired-end reads to a reference using bwa mem with the -M flag, ran samtools markdup and flagstat. Flagstat produced the following output: 4985084 + 0 in total (QC-passed reads + QC-failed reads) 1806492 + 0 primary 3178592 + 0 secondary 0…

Continue Reading Filtering for primary and secondary reads using sam flags (0 properly paired reads in alignment step)

A Bioconductor workflow for processing, evaluating,…

Introduction Proteins are responsible for carrying out a multitude of biological tasks, implementing cellular functionality and determining phenotype. Mass spectrometry (MS)-based expression proteomics allows protein abundance to be quantified and compared between samples. In turn, differential protein abundance can be used to explore how biological systems respond to a perturbation….

Continue Reading A Bioconductor workflow for processing, evaluating,…

No valid chromosomes found! on Michigan Imputation Server

I have two vcf files – one hg19 and one hg38, analysing data from the same participants on two slightly different SNP platforms. Both files have been through the pre-imputation checks. The header (and the first line) of the hg38 version looks like: ##fileformat=VCFv4.3 ##FILTER=<ID=PASS,Description=”All filters passed”> ##fileDate=20230906 ##source=PLINKv2.00 ##contig=<ID=chr1,length=248917420>…

Continue Reading No valid chromosomes found! on Michigan Imputation Server

fragments file generation via Sinto from CellRanger output

fragments file generation via Sinto from CellRanger output 0 Hi, I am following the instructions for the PASTA package (satijalab.org/seurat/articles/pasta_vignette.html). This package uses scRNA-seq data to infer alternative polyadenylation usage from scRNAseq data. It requires among many input files also a fragment file. The authors state the following must be…

Continue Reading fragments file generation via Sinto from CellRanger output

Combination of RNAseq and RADseq to Identify Physiological and Adaptive Responses to Acidification in the Eastern Oyster (Crassostrea virginica)

Aguilera F, McDougall C, Degnan BM (2017) Co-option and de novo gene evolution underlie molluscan shell diversity. Mol Biol Evol 34(4):779–792 CAS  PubMed  PubMed Central  Google Scholar  Alexa A, Rahnenfuhrer J (2020) topGO: Enrichment analysis for gene ontology. R package version 2.40.0 Google Scholar  Arivalagan J, Yarra T, Marie B,…

Continue Reading Combination of RNAseq and RADseq to Identify Physiological and Adaptive Responses to Acidification in the Eastern Oyster (Crassostrea virginica)

problem with bcftools syntax

problem with bcftools syntax 1 Hi all! I am having difficulty with creating a bcftools command. I have a .vcf.gz file downloaded from the 1000G site and a csv file with columns chrom/pos/id/ref/alt. I would like to manipulate the downloaded vcf file so that it uses only the snps I…

Continue Reading problem with bcftools syntax

How do I write a correctly formatted gff3 file in R?

Dear all, I am trying to annotate non-coding RNA in a small RNA-seq dataset. The RNACentral gff3 file that I am using has different chromosome identifiers than the genome assembly. I have loaded the gff3 file in R where I changed the chromosome identifiers using the the assembly report and…

Continue Reading How do I write a correctly formatted gff3 file in R?

Pysam pileup and Rsamtools pileup output discrepancy

I have mRNA sequencing data that I’ve aligned to a genome. Specifically, I am interested in determining the total number of reads at each base and the percentage of occurrences of A, T, G, C, deletions, and insertions at these bases. I have utilized both pysam pileup and Rsamtools pileup…

Continue Reading Pysam pileup and Rsamtools pileup output discrepancy

Lack of correspondance of GFA node IDs to giraffe/call node IDs

I have a GFA graph built with PGGB using several samples. I want to genotype some other samples with short reads using VG Giraffe. After investigating how to generate the corresponding indexes for VG Giraffe from a GFA generated with PGGB, I think I have found a way to do…

Continue Reading Lack of correspondance of GFA node IDs to giraffe/call node IDs

Allele numbers and frequencies in X and Y chr?

Allele numbers and frequencies in X and Y chr? 1 I’m new to a lot of this and I’ve been looking through the X and Y chromosome regions of a WGS vcf file. I’m confused. How are variant calls made on X and Y? I have a few examples: X…

Continue Reading Allele numbers and frequencies in X and Y chr?

rstudio – How do I perform a correlation analysis between the years and CO2 emissions per state, in my data?

I’m trying to perform a statistical correlation analysis using Rstudio between years and C02 emissions per state in my data, but I can’t seem to get the coding correct. I’ve added the graph I created using the data I’m trying to perform the statistical anaylsis on below. Graph of years…

Continue Reading rstudio – How do I perform a correlation analysis between the years and CO2 emissions per state, in my data?

4 Nor de puncte (scatterplot)

## # A tibble: 6 × 19 ## year month day dep_time sched_dep…¹ dep_d…² arr_t…³ sched…⁴ arr_d…⁵ carrier ## <int> <int> <int> <int> <int> <dbl> <int> <int> <dbl> <chr> ## 1 2013 1 1 517 515 2 830 819 11 UA ## 2 2013 1 1 533 529 4 850…

Continue Reading 4 Nor de puncte (scatterplot)

public databases – Converting VCF format to text for use with PLINK and understanding column mapping

I successfully completed Nature PRS tutorial, which is based on PLINK. Turning to my real data, I downloaded ukb-d-20544_1.vcf.gz. Now I’m facing the problem that I seem to be unable to use it in PLINK or find the correct data format to download at all, and I am a bit…

Continue Reading public databases – Converting VCF format to text for use with PLINK and understanding column mapping

Plotting gene Expression per pathway

Plotting gene Expression per pathway 0 Good morning, I am trying to recreate for my data an image I saw in a journal, as I find it very intuitive. The image is attached below. For my data I have for each GO category a list of named vectors containing the…

Continue Reading Plotting gene Expression per pathway

r – ggplot2 Graph colors do not match with what I coded

The display of the graph colors does not match with what I coded. It only displays the first color correctly. When it comes to the second color, it just displays grey, regardless of what color code I input. The data is this: A tibble: 2 × 4 work_from_home n prop…

Continue Reading r – ggplot2 Graph colors do not match with what I coded

Unable to convert a date formatted as chr to date; appears as datetime – General

I am new to R and Rstudio. I am currently trying to convert a column of dates that appear as mm/dd/yyyy hh:mm:ss but are classed as character. I want to turn them into date format, so that the hh:mm:ss is dropped off in a new column.I have tried finding a…

Continue Reading Unable to convert a date formatted as chr to date; appears as datetime – General

What Is The Most Direct Way To Extract The Splice Junctions From A Sam File?

What Is The Most Direct Way To Extract The Splice Junctions From A Sam File? 2 Hi! I need to get the introns coordinates (chr:strat-end and the strand) of the spliced reads wich are in SAM file. I have no experience with this kind of format, so I plan to…

Continue Reading What Is The Most Direct Way To Extract The Splice Junctions From A Sam File?

Mapping of paired-end ddRADseq results in 0.00% of reads pairing

Hey all, I’m trying to map my RADseq to a reference genome, and none of my paired-end reads are being paired. Forward and reverse reads are both mapping separately, but not pairing. This problem is consistent for all of my samples. Any help would be much appreciated!! I also viewed…

Continue Reading Mapping of paired-end ddRADseq results in 0.00% of reads pairing

AWS STAR Genome Index Error

AWS STAR Genome Index Error 0 Hello, I have been trying to run this line of code for the longest time: STAR –runThreadN 20 –runMode genomeGenerate –genomeDir genomeDir/ –genomeFastaFiles Homo_sapiens.GRCh38.dna.toplevel.fa –sjdbGTFfile Homo_sapiens.GRCh38.110.chr.gtf I first tried running it on my home terminal but then realized that that it would take several…

Continue Reading AWS STAR Genome Index Error

Solved RStudio File Edit Code View Plots Session Build Debug

Transcribed image text: RStudio File Edit Code View Plots Session Build Debug Profile Tools Help −□× R Q R CIS4930_Bank Loan_Demonstrationall… Source on Save \#open csv file system_admin.df <- read.csv(file.choose()) head(system_admin.df) \#packages insta11.packages (“ipred”) insta11.packages (“caret”) library (caret) \#question 1 – Create vairiable names iscompleted \#question2 System_admin.df $ Completed <-…

Continue Reading Solved RStudio File Edit Code View Plots Session Build Debug

Converting from BED to SAF/GFF

I believe that SAF format use 1-based coordinates that are closed on both ends. Here is how I got this conclusion. First make some toy data. $ cat genome.fa >chr1 AATTCCGGAAAATTTTCCCCGGGGAAAAAAAAAAAAAAAAAACCCCCCCCCCCCCCCCCCCCCCCCCCCC $ cat reads.fa >q1 AAAATTTTCCCCGGGGAAAAAAAAAAAAAAAAAACC Map reads to the genome: $ STAR –runMode genomeGenerate –genomeDir test_star –genomeFastaFiles genome.fa –genomeSAindexNbases…

Continue Reading Converting from BED to SAF/GFF

Creating typewriter-styled maps in {ggplot2}

A couple of months ago I read a blog post by RJ Andrews, in which he described the process of making a map of California using a typewriter. It’s a beautiful map – made using over 2,500 keystrokes, all done by hand. The density of ink for each letter displays…

Continue Reading Creating typewriter-styled maps in {ggplot2}

date – Cleaning up columns to look less messy in a ggplot in R-studio

I have a dataset containing information on a species of spider, where I have 3 different life stages. There are different days data has been collected throughout a total of 2 years, for two different study sites. I am working on a graph in r using ggplot2, but my columns…

Continue Reading date – Cleaning up columns to look less messy in a ggplot in R-studio

QC of genetic data

QC of genetic data 0 Hi, I have some genetic data in a bim file. The chromosomes range from 0 to 23 and 26, which I have not come across before. Would the SNPs on chromosome 0 and 26 be removed from the genetic file or left in. Then, I…

Continue Reading QC of genetic data

Calculating TPM from featureCounts output

Calculating TPM from featureCounts output 3 Hi all, Have a simple question but just want to double check I’m not doing something stupid. I have paired-end RNA-seq data for which I have used featureCounts to quantify raw counts. I now want to normalize using the TPM formula. I read this…

Continue Reading Calculating TPM from featureCounts output

Plants | Free Full-Text | In Silico RNAseq and Biochemical Analyses of Glucose-6-Phosphate Dehydrogenase (G6PDH) from Sweet Pepper Fruits: Involvement of Nitric Oxide (NO) in Ripening and Modulation

Figure 1. Genomic organization of the CaG6PDH gene family. The gene structure is displayed with exons (green boxes) and introns (black lines). Untranslated regions are shown in grey boxes. Exon–intron regions are drawn at scale. Figure 1. Genomic organization of the CaG6PDH gene family. The gene structure is displayed with…

Continue Reading Plants | Free Full-Text | In Silico RNAseq and Biochemical Analyses of Glucose-6-Phosphate Dehydrogenase (G6PDH) from Sweet Pepper Fruits: Involvement of Nitric Oxide (NO) in Ripening and Modulation

i don’t know error UCSC hg38.fa reference

i commend in  sequenza-utils bam2seqz -p –normal ${RESULTS}/5_variant_calling/${sample}_N.mpileup –tumor ${RESULTS}/5_variant_calling/${sample}_T.mpileup –fasta ${REFER}/hg38.fa -gc ${REFER}/hg38_genome_gc50.wig.gz -o ${RESULTS}/8_seq/${sample}_seqz.gz sequenza-utils seqz_binning  –seqz ${RESULTS}/8_seq/${sample}_seqz.gz -w 50 -o ${RESULTS}/8_seq/${sample}_small_seqz.gz results in dictory chromosome_depth.pdf, gd_plot.pdf, sequenza_extract.RData  chromosome_depth.pdf chromosome chr 1,10,11, chr11_ KI270721v1_random… Why is it in the result file chr 2 ~xy  ? Read more…

Continue Reading i don’t know error UCSC hg38.fa reference

chromosome location

chromosome location 1 I have some gene id, start position,end position and their chromosome number. How can I visualize this data by a colorful figure? Bioinformatics • 41 views What you can do is use SnapGene Download the genome (fasta file) Load it into Snapgene Find gene locations based on…

Continue Reading chromosome location

identify DEGs across all conditions and per specific conditions

Hi, I am analyzing a bulk-RNAseq and I want to analyse the dataset using Deseq2. I am very confused so apologies if it’s a stupid question. My dataset has 12 samples (3 per condition). the conditions are: treatment and control and 2 time points (0hr, 12hrs). So I wanted to…

Continue Reading identify DEGs across all conditions and per specific conditions

Comparing multiple columns from two files using AWK

Dear all, I need your help to solve the following problem. I have the following two files (indicated as A and B): FILE A: head 129N.final-test_taxid-120686.txt A00270:507:H3KTJDSX5:4:1105:31747:1736 1187 chr1 205559197 60 144M6S = 205559203 152 GAGCATTTAGGCAAGAGAAAGGAACAAAGGGTATCCAAATTGAAAAACAGGAGTCAAATTGTCCCTTTGCAGACAACAGGATTTTACATATAGAAAAATCTAAAAGATCACACACACACACACACACACACACACACACACACACACAAA FFFFFFFFFFFFFFF:FFFFFFFFFFF:FF:FFFF,FFFFF:FFFF:F:FFFFFF:F:FFFFFF:FFFFFF:FF,FFFFFFFFFFFFFFFFFFFFFF::,FFFF::,FFF,F:FFFFFFFFFFFFFFFF,FFFFFFFFFF:,F,FF,FF: NM:i:0 AS:i:288 nn:i:0 tp:A:P cm:i:18 s1:i:120 s2:i:0 de:f:0 rl:i:86 MQ:i:50 MC:Z:102M4I44M ms:i:4398 A00270:507:H3KTJDSX5:4:1105:31747:1736…

Continue Reading Comparing multiple columns from two files using AWK

Program for Overlapping DMRs (Differentially Methylated Regions) Between Groups

Program for Overlapping DMRs (Differentially Methylated Regions) Between Groups 0 I have the methylation results from 3 different groups (young, adult, old) that I’ve processed and have used mcomp to merge ratio files from the same group together and did group comparisons (i.e. young_Vs_adult, adult_Vs_old, young_Vs_old). I’m looking for a…

Continue Reading Program for Overlapping DMRs (Differentially Methylated Regions) Between Groups

r code cheat sheet(5) (docx)

Cheat Sheet for R Coding Statistical tests Chi-squared chisq.test( var1 , var2 ) cat(“\n”) cat(“Observed”) chisq.test( var1 , var2 )$observed cat(“\n”) cat(“Expected”) chisq.test( var1 , var2 )$expected ______________________________ t-test t.test( var1 ~ var2 , data = df, var.equal = TRUE) # dbl~chr is order of variables __________________________________ One-way ANOVA car::Anova(lm(…

Continue Reading r code cheat sheet(5) (docx)

STAR Intron Motif Script Gives Segmentation fault Error

STAR Intron Motif Script Gives Segmentation fault Error 0 I have the following inputs: # Define input directory containing FASTQ files Input_directory=”/path/to/fastq/folder” # Define output directory for STAR output files Output_directory=”/path/to/output/directory” # Define paths to reference files Annotation_GTF=”/path/to/Zebra/fish/GRCz11.110.chr.gtf” Genome_FASTA=”/path/to/soft/masked/Zebra/fish/primary_assembly.fa” Reference=”/path/to/soft/masked/STAR/created/reference/only/for/use/with/STAR” # Define the number of threads to use num_threads=4 To…

Continue Reading STAR Intron Motif Script Gives Segmentation fault Error

Highly inflated p-values in GWAS by regenie

Highly inflated p-values in GWAS by regenie 0 I was running a GWAS using REGENIE 3.2.5 on more than 250,000 samples, and the p-values returned are highly inflated with -log10P up to 5000. As a result there were over 10,000 variants called significant under the threshold of p < 5e-8,…

Continue Reading Highly inflated p-values in GWAS by regenie

KidneyGPS: a user-friendly web application to help prioritize kidney function genes and variants based on evidence from genome-wide association studies | BMC Bioinformatics

User interface The user interface of KidneyGPS is organized into five tabs: Three tabs enable the specific search for genes, variants and regions (underlying data structure shown in Additional file 1: Fig. S4): (1) “gene search” tab: search for genes using their gene names (synonyms automatically mapped to their official HGNC…

Continue Reading KidneyGPS: a user-friendly web application to help prioritize kidney function genes and variants based on evidence from genome-wide association studies | BMC Bioinformatics

Filtering VCF to divide with equal sizes

Filtering VCF to divide with equal sizes 1 Hello everyone! I have a very large VCF file (>400gb), and I want to divide it to use with VEP. VEP recommends separating the vcf, so I generated a list of contigs, based on the header, with 3^7 bases for each chromosome….

Continue Reading Filtering VCF to divide with equal sizes

Animal Microbiome Market Revenue Report with Forecast to 2031

PRESS RELEASE Published September 20, 2023 InsightAce Analytic Pvt. Ltd. announces the release of a market assessment report on the “Global Animal Microbiome Market– by Products (Phage Therapy Product, Functional Food (Probiotics, Prebiotics and Others), Sequencing Platforms, Feed Additives/Functional Ingredients, Alternative Proteins and Alternative Feed, and Other Innovative Microbiome Products…

Continue Reading Animal Microbiome Market Revenue Report with Forecast to 2031

Genetic distance in cM from VCF of non-reference species to run Beagle

I’m working with a resequenced genome of a non-reference species. The VCF contains ~7 mln of SNPs, all with their relative position on their own chromosome. I have a 10.01 % of missing data, so I need to impute these NA. I eventually settled for Beagle v5 as a tool,…

Continue Reading Genetic distance in cM from VCF of non-reference species to run Beagle

Create 10,000bp windows for a SNP file and assign each SNP to its respective window

Create 10,000bp windows for a SNP file and assign each SNP to its respective window 2 Hello all, I have a tab file with 700k SNPs (results from BayPass analysis of p-values for Genotype x environment association). I want to create 10k bp window intervals for each of the chromosome/contig…

Continue Reading Create 10,000bp windows for a SNP file and assign each SNP to its respective window

Using ExomeDepth for GRCH38 processed samples to call CNVs

The only difference would be the annotations, instead of using bedframes from data(genes.hg19) and data(exons.hg19) in ExomeDepth, I got them from the UCSC Table Browser for hg38 (genome.ucsc.edu/cgi-bin/hgTables). The only info they contain are: chromosome start end name ..and then run as before. Change bed.frame = exons.hg19 to the exon…

Continue Reading Using ExomeDepth for GRCH38 processed samples to call CNVs

Filter VCF File by VCF Format Variants

Filter VCF File by VCF Format Variants 0 I am trying to filter a VCF file to only include variants that are within another file, which is a txt file with VCF formatted columns (CHR POS REF ALT). I have been having a hard time finding a way to filter…

Continue Reading Filter VCF File by VCF Format Variants

Why and how to address this?

Dear All, I’m preparing data for Mendelian randomization (MR) analysis to assess causal effect of telomere length on kidney phenotype in UK Biobank (UKB) data. The following steps were what I have done:  1. I started to search for prior research summary data and found close to 800 SNPs for…

Continue Reading Why and how to address this?

Plink codes showed GWAS results with effect size (beta) and SE as NA: Why and How?

Plink codes showed GWAS results with effect size (beta) and SE as NA: Why and How? 0 Dear All, I’m preparing data for Mendelian randomization (MR) analysis to assess causal effect of telomere length on kidney phenotype using UK Biobank (UKB) data. The following steps were what I have done:…

Continue Reading Plink codes showed GWAS results with effect size (beta) and SE as NA: Why and How?

how simply mutation annotation in R?

how simply mutation annotation in R? 0 Hi, I am working with a pre-exists mutation list in Granges, this data have some fields chr, position,ref, alt, strand, because our positions are limited to refgene, So I only want to annotation our mutations with: 1) mutations outcome in protein coding 2)…

Continue Reading how simply mutation annotation in R?

Plant Genomics Market Research Report and Report associated with it Reveals the Latest Trends And Opportunities of this market for period from 2023 to 2030.

PRESS RELEASE Published September 13, 2023 List of reports available with us. Plant Breeding and CRISPR Plants Market Size / CAGR / Sales Revenue   (Request Free Sample Report) Plant Wall Systems Market Size / CAGR / Sales Revenue   (Request Free Sample Report) Soil Enhancers Market Size / CAGR / Sales…

Continue Reading Plant Genomics Market Research Report and Report associated with it Reveals the Latest Trends And Opportunities of this market for period from 2023 to 2030.

Ensembl Release 104 and newer GTF files no longer have genes sorted by position

Following up on my previous post, I dug deeper and want to more precisely describe my “problem”. Up until and including Ensembl Release 103, the GTF files provided had all the gene entries in strictly sorted order (with all the transcript, exon, etc. entries pertaining to a gene entry listed…

Continue Reading Ensembl Release 104 and newer GTF files no longer have genes sorted by position

convert bed12 to sorted gtf

convert bed12 to sorted gtf 1 Hello I m trying to convert bed12 to sorted gtf but output file ‘Precapture_uniq.gff’ is empty i m very new for this work if you can help me to solve this i appreciate it. awk -f bed12togff Postcapture_uniq_chr.bed12 | sort -k1,1 -k4,4n -k5,5n “$@”…

Continue Reading convert bed12 to sorted gtf

How best can I find transposons in my genome?

My objective is to identify transposons within my genome. To achieve this, I am pursuing a specific approach: aligning my genome to a reference and pinpointing regions that are distinct to my genome, subsequently annotating these regions. My initial attempt involved using the Matcher tool, yielding the following results. Regarding…

Continue Reading How best can I find transposons in my genome?

Segmentation fault error in CONTROL-FREEC

Hi, I am trying to run CONTROL-FREEC on diploid yeast samples to detect CNVs on a department cluster. The config file looks like this: [general] chrLenFile = yeast_chr.len ploidy = 2 window = 150000 #breakPointThreshold = -.002; #coefficientOfVariation = 0.062 chrFiles = /net/smith/vol1/home/student/FREEC-11.6b/data/test_sludge/yeast_files outputDir = /net/smith/vol1/home/student/FREEC-11.6b/data/test_sludge/output #degree=3 [sample] mateFile =…

Continue Reading Segmentation fault error in CONTROL-FREEC

Help with Mofa2

Hi Biostars friends, I try to run this tutorial on my multiome data. The error I got mean my for loop tried to reach an index seem doesn’t exist but don’t know how to troubleshoot. Would you please have a suggestion? Thank you so much! for (i in c(“distal”,”promoter”)) {…

Continue Reading Help with Mofa2

Invalid .bed file size (expected 9996779 bytes)

Hello Christopher, We have data in PGEN format, and as part of our workflow, we initially created temporary PSAM, PVAR, and PGEN files using the following command:./plink2 –bgen data.bgen ref-first –sample data.sample –set-missing-var-ids @:#:’\$r’:’\$a’ –new-id-max-allele-len 99 truncate –make-pgen –out data.intermediate  2. Following this step, we convert these temporary files to…

Continue Reading Invalid .bed file size (expected 9996779 bytes)

r – How do I add a legend indicating significance levels below a ggplot object?

I’m using ggforestplot() to plot the results from my several regression models where some of the data have been imputed with mice(). But for the sake of this MWE, I will use the example data of the ggforestplot() package instead: # Load packages library(ggforestplot) library(tidyverse) # Use the example data…

Continue Reading r – How do I add a legend indicating significance levels below a ggplot object?

The first high-quality chromosome-level genome of Eretmochelys imbricata using HiFi and Hi-C data

Sample collection and DNA extraction An individual E. imbricata was obtained from the sea turtle rescue base on Naozhou Island, Zhanjiang City, Guangdong Province, China. A 10 mL blood sample was drawn from its jugular sinus and rapidly frozen for further analysis. Genomic DNA was extracted from the processed blood samples…

Continue Reading The first high-quality chromosome-level genome of Eretmochelys imbricata using HiFi and Hi-C data

simuations with sim1000g

Hi, I’m trying to simulate the 22 chromosomes with sim1000G package in R using the for loop the problem is that for the first run of the loop, the code works well but after it cant produce the genetics map. This is my code for (i in c(1:2)) { print(“#####”)…

Continue Reading simuations with sim1000g