Tag: dplyr

Bioconductor – EasyCellType

DOI: 10.18129/B9.bioc.EasyCellType     Annotate cell types for scRNA-seq data Bioconductor version: Release (3.16) We developed EasyCellType which can automatically examine the input marker lists obtained from existing software such as Seurat over the cell markerdatabases. Two quantification approaches to annotate cell types are provided: Gene set enrichment analysis (GSEA)…

Continue Reading Bioconductor – EasyCellType

lazy loading failed, unable to load shared object rtracklayer.so

Hello! I am working on analyzing a dataset I created with the 10x Chromium Single Cell Multiome kit. In order to add gene annotation to the ATAC data, I am trying to install and use the “EnsDb.Mmusculus.v79” and “BSgenome.Mmusculus.UCSC.mm10” packages with bioconductor. The same ERROR has come up repeatedly whenever…

Continue Reading lazy loading failed, unable to load shared object rtracklayer.so

microarray analysis – gene upregulation/downregulation

microarray analysis – gene upregulation/downregulation 0 Hi guys, I have performed microarray differential expression analysis using the following R commands/script: library(“arrayQualityMetrics”) > library(GEOquery) > library(oligo) > library(Biobase) > library(affy) > library(“splitstackshape”) > library(“tidyr”) > library(dplyr) > celFiles <- list.celfiles() > affyRaw <- read.celfiles(celFiles) Platform design info loaded. Reading in :…

Continue Reading microarray analysis – gene upregulation/downregulation

Divide the data , count, and then also output the id – General

This is not elegant. library(tidyr) library(dplyr) c1<-c(16.6,1.0,10.1,8.6,8.0,17.0,2.4,7.6,5.7,11.6,3.6,2.8,6.3,1.5,2.7,16.7,6.7,5.3,12.5) c2<-1:19 c3<-data.frame(c2,c1) colnames(c3)<-c(“id”,”col”) c3 <- mutate(c3,Group=cut(col,breaks = seq(1,19,3),right = FALSE)) c3 #> id col Group #> 1 1 16.6 [16,19) #> 2 2 1.0 [1,4) #> 3 3 10.1 [10,13) #> 4 4 8.6 [7,10) #> 5 5 8.0 [7,10) #> 6 6…

Continue Reading Divide the data , count, and then also output the id – General

ggplot2 – How can I create multiple plots from same dataset in R?

Let me first share a dummy data, from which I want to prepare ggplot graphs. library(tidyverse) set.seed(1) sample_size <- 1200 dates <- sample(seq(1,31),sample_size,replace = TRUE) Monthss <- sample(seq(1,12),sample_size,replace = TRUE) hrs <- sample(seq(1,23),sample_size,replace = TRUE) minutes <- sample(seq(1,59),sample_size,replace = TRUE) date_time_vector <- paste0(dates,”-“,Monthss,”-“,2022,” “,hrs,”:”,minutes) |> lubridate::parse_date_time(“dmy HM”) Conversion <- sample(c(TRUE,FALSE),sample_size,…

Continue Reading ggplot2 – How can I create multiple plots from same dataset in R?

File homo_ref.faa does not exist

I got fasta output by using the following codes in R. And I need to read my fasta file (homo_ref.faa) that I obtained using these codes as “ makeblastdb -in homo_ref.faa -dbtype prot ” via terminal. But I get “BLAST options error: File homo_ref.faa does not exist“. How would you…

Continue Reading File homo_ref.faa does not exist

Microarray DEG scatterplot

Hi, I have found that my selected gene, probe I.D 201667_at is differentially expressed between WDLPS and DDLPS tumour tissue samples after performing microarray DEG analysis. Instead of just a p value in a table format: Probe I.D “201667_at” logFC 10.8205874181535 AveExpr 10.6925705768407 t 82.8808890739766 P.Value 3.10189446528995e-88 adj.P Val 3.10189446528995e-88…

Continue Reading Microarray DEG scatterplot

Log2FC values slightly higher in some genes after DESeq2 shrinkage

Hi, I have a question about DESeq2 LFCshrinkage: Is it possible that some genes have a slightly higher LFC after shrinkage? It happened during my RNAseq DE analysis, I have very deeply sequenced samples with large base means. I tried visualizing using MAplot check, and it looks fine. I’m mainly…

Continue Reading Log2FC values slightly higher in some genes after DESeq2 shrinkage

Installation | _main.knit

Before you start the installation of R, make sure your computer software is up-to-date. To get the most out of R, it is usually best to update to the maximum possible operating system. On a Mac, under the  you can find About This Mac. Then, in the Overview Tab…

Continue Reading Installation | _main.knit

Bioinformatics Jobs – The Bioinformatics CRO

2021/12/09. Who We Are Generate Biomedicines, Inc. is a Flagship backed, privately-held biotechnology company on a mission to reimagine the drug discovery process through the use of cutting-edge machine learning techniques. Core to Generate’s approach is the development and application of novel machine learning algorithms to solve foundational problems in…

Continue Reading Bioinformatics Jobs – The Bioinformatics CRO

Bioconductor – HPiP

DOI: 10.18129/B9.bioc.HPiP     Host-Pathogen Interaction Prediction Bioconductor version: Release (3.15) HPiP (Host-Pathogen Interaction Prediction) uses an ensemble learning algorithm for prediction of host-pathogen protein-protein interactions (HP-PPIs) using structural and physicochemical descriptors computed from amino acid-composition of host and pathogen proteins.The proposed package can effectively address data shortages and data…

Continue Reading Bioconductor – HPiP

rstudio – R Package Check(): “All declared Imports should be used”

I’m writing a small package of functions for myself only (not CRAN; on GitHub, but not public), and developing locally on the computer. Mostly this is me being a newbie at R and learning to write first package. I’m using devtools and after load_all() and check(), I have been getting…

Continue Reading rstudio – R Package Check(): “All declared Imports should be used”

Bioconductor – TrajectoryGeometry (development version)

DOI: 10.18129/B9.bioc.TrajectoryGeometry     This is the development version of TrajectoryGeometry; for the stable release version, see TrajectoryGeometry. This Package Discovers Directionality in Time and Pseudo-times Series of Gene Expression Patterns Bioconductor version: Development (3.16) Given a time series or pseudo-times series of gene expression data, we might wish to…

Continue Reading Bioconductor – TrajectoryGeometry (development version)

R programming language tutorials – Technical Ripon

Are you learning the R programming language? Want to learn how to do more tasks with R? Check out our Do More With R video tutorials below — most with accompanying text articles and code, almost all under 10 minutes. In the table below you can easily search all tutorials…

Continue Reading R programming language tutorials – Technical Ripon

Bioconductor – sSNAPPY (development version)

DOI: 10.18129/B9.bioc.sSNAPPY     This is the development version of sSNAPPY; for the stable release version, see sSNAPPY. Single Sample directioNAl Pathway Perturbation analYsis Bioconductor version: Development (3.16) A single sample pathway pertrubation testing methods for RNA-seq data. The method propagate changes in gene expression down gene-set topologies to compute…

Continue Reading Bioconductor – sSNAPPY (development version)

Recent questions tagged fasta – Q&A

Most popular tags python javascript html java css reactjs c# php r sql arrays pandas c++ android jquery DataFrame python-3.x node.js c mysql list flutter JSON ios typescript sql-server swift string angular regex laravel excel django dictionary dart bash numpy postgresql loops oracle vba linux angularjs function for-loop spring spring-boot…

Continue Reading Recent questions tagged fasta – Q&A

Director of Bioinformatics in Chicago, IL for University of Chicago (UC)

Details Posted: 03-May-22 Location: Chicago, Illinois Type: Full-time Salary: Open Categories: Research – Laboratory/Non-Laboratory Staff/Administrative Location: Hyde Park Campus Job Description: Provides technical expertise in the selection, validation, and implementation of the appropriate internal and external data analytic and bioinformatic solutions needed to analyze specimens, process and integrate data, and…

Continue Reading Director of Bioinformatics in Chicago, IL for University of Chicago (UC)

Error in SummarizedExperiment

I have installed DESeq2 version 1.36.0 samples <- colnames(txi$counts) group <- as.factor(c(“control”,”control”,”control”,”control”,”control”,”diet”,”diet”,”diet”,”diet”,”diet”, “control”,”control”,”control”,”control”,”control”,”diet”,”diet”,”diet”,”diet”,”diet”,”diet”)) coldata <- data.frame(samples, group, stringsAsFactors = F) coldata <- coldata[,c(“samples”,”group”)] coldata$samples <- factor(coldata$samples) coldata$group <- factor(coldata$group) rownames(coldata) <- sub(“fb”, “”, rownames(coldata)) all(rownames(coldata$samples) %in% colnames(txi)) all(rownames(coldata) == colnames(txi)) TRUE library(DESeq2) ddsTxi <- DESeqDataSetFromTximport(txi, colData = coldata, design =…

Continue Reading Error in SummarizedExperiment

deseq2 problem

deseq2 problem 0 Hi I am trying to draw a PCA plot with DESeq2 but somehow I cannot use DESeq2 functions. It is a really simple code i wil be pasting below. > transform <- DESeq2::rlog(eliminated_data, blind = TRUE) Error in (function (classes, fdef, mtable) : unable to find an…

Continue Reading deseq2 problem

GDCprepare of RNAseq counts produces error

GDCprepare of RNAseq counts produces error 1 @76ac7b25 Last seen 12 minutes ago Canada Hello everyone! I have been using the TCGAbiolinks package for the last couple years to access RNAseq data for the TCGA-LAML project. Just very recently, I had noticed that I could no longer use GDCquery to…

Continue Reading GDCprepare of RNAseq counts produces error

Separate exogenous from endogenous transcripts using Salmon RNAseq DTU

Dear friends, We are trying to use Salmon for DTU analysis. We want to separate exogenous from endogenous transcripts by following this post www.biostars.org/p/443701/ and this paper f1000research.com/articles/7-952 We are focusing on a gene called ASCL1 (endo-ASCL1). We transduced cells with lentiviral vector containing ASCL1 ORF only (Lenti-ASCL1). There should…

Continue Reading Separate exogenous from endogenous transcripts using Salmon RNAseq DTU

GDCquery_Maf error

GDCquery_Maf error 0 @76e1237b Last seen 1 day ago Singapore Hi all, I really need some help. I am trying to run GDCquery_Maf which worked fine until yesterday. Now I get the following error: Error in GDCquery(paste0(“TCGA-“, tumor), data.category = “Simple Nucleotide Variation”, : Please set a valid workflow.type argument…

Continue Reading GDCquery_Maf error

subpopulations available in MafH5.gnomAD.v3.1.1.GRCh38

subpopulations available in MafH5.gnomAD.v3.1.1.GRCh38 1 @b14a6f0d Last seen 16 hours ago United States Are subpopulation MAFs available for gnomADv.3.1.1 with any package, like they are in MafDb.gnomAD.r2.1.hs37d5? I’m trying to use Genomic Scores to obtain all variants in a genomic range with MAF in any subpopulation >= cutoff. I tried…

Continue Reading subpopulations available in MafH5.gnomAD.v3.1.1.GRCh38

“Rprofile”to use rstudio on MAC

. rprofile againI received a comment at A recent post and taught me how to send command from R console to terminal using system functions.Used.As described above, the following is described in the console of the upper left corner of rstudio. library(stats) library(tidyverse) library(ggplot2) library(GGally) library(patchwork) library(lubridate) library(dplyr) We named…

Continue Reading “Rprofile”to use rstudio on MAC

Bioconductor installation problems

Dear Friends, I have been having considerable difficulty doing various package related bioconductor actions. Here is my system specifications: ======================================================================= macOS Monterey Version: 12.3 (21E230) MacBook Pro (Retina, 13-inch, Early 2015) Memory: 16 GB 1867 MHz DDR3 RStudio 2021.09.0 Build 351 R version 4.1.2 (2021-11-01) — “Bird Hippie” Copyright (C)…

Continue Reading Bioconductor installation problems

Bioconductor – mirTarRnaSeq

DOI: 10.18129/B9.bioc.mirTarRnaSeq     mirTarRnaSeq Bioconductor version: Release (3.14) mirTarRnaSeq R package can be used for interactive mRNA miRNA sequencing statistical analysis. This package utilizes expression or differential expression mRNA and miRNA sequencing results and performs interactive correlation and various GLMs (Regular GLM, Multivariate GLM, and Interaction GLMs ) analysis…

Continue Reading Bioconductor – mirTarRnaSeq

ggplot2 – How To Update R Values

I have the below code which runs a 3 month picture of my metrics. I open the saved code, remove “Nov-21” and add “Feb-22”, then delete the first entry for each metric and add “Feb-22” entry to end of each metric (957L, 1208L, 1054L, 476L). Previously, the 3 month picture…

Continue Reading ggplot2 – How To Update R Values

Bioconductor Package Installation

When I try to install the gtf for hg38 BiocManager::install(“TxDb.Hsapiens.UCSC.hg38.knownGene”) I get the following error: ‘getOption(“repos”)’ replaces Bioconductor standard repositories, see ‘?repositories’ for details replacement repositories: CRAN: cran.rstudio.com/ Bioconductor version 3.14 (BiocManager 1.30.16), R 4.1.2 (2021-11-01) Installing package(s) ‘TxDb.Hsapiens.UCSC.hg38.knownGene’ Error in readRDS(dest) : error reading from connection Per stackoverflow.com/questions/67455984/getoptionrepos-replaces-bioconductor-standard-repositories-see-reposito I…

Continue Reading Bioconductor Package Installation

Bioconductor – cytoKernel

DOI: 10.18129/B9.bioc.cytoKernel     Differential expression using kernel-based score test Bioconductor version: Release (3.14) cytoKernel implements a kernel-based score test to identify differentially expressed features in high-dimensional biological experiments. This approach can be applied across many different high-dimensional biological data including gene expression data and dimensionally reduced cytometry-based marker expression…

Continue Reading Bioconductor – cytoKernel

Correctly place geom_text labels in forest plot with “fill” parameter in ggplot2 – General

Hi There, I want to create annotated publication-ready forest plots comparing different models. How do I ensure that the labels are placed above and tight against the corresponding error bars and do not overlap? Thank you for your help! Here is what I tried: library(tidyverse) #> Warning: package ‘tibble’ was…

Continue Reading Correctly place geom_text labels in forest plot with “fill” parameter in ggplot2 – General

ggplot2loon function – RDocumentation

Examples # NOT RUN { if(interactive()) { p <- ggplot(mtcars, aes(wt, mpg)) + geom_point() g <- ggplot2loon(p) p1 <- ggplot(mtcars) + geom_point(aes(x = wt, y = mpg, colour = factor(gear))) + facet_wrap(~am) g1 <- ggplot2loon(p1) # } # NOT RUN { df <- data.frame( x = rnorm(120, c(0, 2, 4)),…

Continue Reading ggplot2loon function – RDocumentation

Bioconductor – sccomp (development version)

DOI: 10.18129/B9.bioc.sccomp     This is the development version of sccomp; to use it, please install the devel version of Bioconductor. Robust Outlier-aware Estimation of Composition and Heterogeneity for Single-cell Data Bioconductor version: Development (3.15) A robust and outlier-aware method for testing differential tissue composition from single-cell data. This model…

Continue Reading Bioconductor – sccomp (development version)

quosure dplyr ggplot2 (1) – Code Examples

How to parametrize function calls in dplyr 0.7? The release of dplyr 0.7 includes a major overhaul of programming with dplyr. I read this document carefully, and I am trying to understand how it will impact my use of dplyr. Here is a common idiom I… …

Continue Reading quosure dplyr ggplot2 (1) – Code Examples

Pathway analysis of RNAseq data using goseq package

Hello, I have finished the RNA seq analysis and I am trying to perform some pathway analysis. I have used the gage package and I was looking online about another package called goseq that takes into account length bias. However, when I run the code I get an error. How…

Continue Reading Pathway analysis of RNAseq data using goseq package

A Comprehensive Guide on ggplot2 in R

                                                                  Image source: Author       Introduction Visualization plays an important role in the decision-making process after analyzing relevant data. Graphical representation highlighting the interdependence of key elements affecting performance is important in the above process. There are many libraries in Python and R which provide different options showing…

Continue Reading A Comprehensive Guide on ggplot2 in R

DESeq2 and high prefiltering cutoff

DESeq2 and high prefiltering cutoff 1 @255004b1 Last seen 3 hours ago United States Hi, I am curious about prefiltering with DESeq2. I understand from this site and reading the DESeq2 vignette that prefiletering is really unnecessary as DESeq2 has a stringent filtering that it does. However, I’m seeing better…

Continue Reading DESeq2 and high prefiltering cutoff

How do I resolve Rd warning “missing file link” when building packages in RStudio?

After building a simple test package to isolate this issue, I receive the following warning when I run Rcmd.exe INSTALL –nomultiarch –with-keep.source simpleTest: * installing to library ‘C:/Users/user/Documents/R-dev’ * installing *source* package ‘simpleTest’ … ** R ** preparing package for lazy loading ** help *** installing help indices converting help…

Continue Reading How do I resolve Rd warning “missing file link” when building packages in RStudio?

Bioconductor – TAPseq

DOI: 10.18129/B9.bioc.TAPseq     This package is for version 3.12 of Bioconductor; for the stable, up-to-date release version, see TAPseq. Targeted scRNA-seq primer design for TAP-seq Bioconductor version: 3.12 Design primers for targeted single-cell RNA-seq used by TAP-seq. Create sequence templates for target gene panels and design gene-specific primers using…

Continue Reading Bioconductor – TAPseq

r – ggplot2 running for minutes without plotting

I am attempting to plot the below vector, but when I run the function, it just continues to run and does not plot. I have waited 5 minutes before I feel uncomfortable and click stop in the console. Wondering what is going on. Up until this point I have had…

Continue Reading r – ggplot2 running for minutes without plotting

r – trying to make a ggplot with two lines

The trick is to gather the columns you want to map as variables. As I don’t know, how you want to plot your graph, means, about x-axis and y-axis, I made a pseudo plot. and for your continuous variable part, you can either convert your values to integer or numeric…

Continue Reading r – trying to make a ggplot with two lines

use tcgabiolinks package to download TCGA data

TCGA Data download in terms of ease of use ,RTCGA The bag should be better , And because it’s already downloaded data , The use is relatively stable . But also because of the downloaded data , There is no guarantee that the data is new .TCGAbiolinks The package is…

Continue Reading use tcgabiolinks package to download TCGA data

r – Is it possible to change a line from a ggplot/geom_line plot depending on what month the datapoint corresponds to?

Yes, it’s possible. The easiest way to do it by creating a vector of your colors, the same length as the rows in your dataframe, and passing it to the col argument in geom_line(). Here is an example: library(dplyr, warn.conflicts = FALSE) library(ggplot2) library(lubridate, warn.conflicts = FALSE) # create some…

Continue Reading r – Is it possible to change a line from a ggplot/geom_line plot depending on what month the datapoint corresponds to?

R plot color by value in y-axis ggplot

No need for ifelse statement, just write your condition and then use scale_fill_manual: ggplot(Report, aes(x=Report$Name,y=Report$average_working_hours)) + ggtitle(‘working hours in July’) + ylab(‘ working hours’) + geom_bar(stat = ‘identity’, aes(fill = Report$average_working_hours > 8)) + theme_gray() + scale_fill_manual(values=c(‘blue’, ‘red’)) This can be done quite simply using dplyr. First thing I would…

Continue Reading R plot color by value in y-axis ggplot

Bioconductor – pRolocGUI

    This package is for version 3.4 of Bioconductor; for the stable, up-to-date release version, see pRolocGUI. Interactive visualisation of spatial proteomics data Bioconductor version: 3.4 The package pRolocGUI comprises functions to interactively visualise organelle (spatial) proteomics data on the basis of pRoloc, pRolocdata and shiny. Author: Lisa M…

Continue Reading Bioconductor – pRolocGUI

gene ID RNAseq

gene ID RNAseq 0 Hi friends How can I get gene numeric ID and hugo ID by R script? what script should I use? I have this but does not give numeric ID and hugo ID. ibrary(biomaRt) library(dplyr) library(tibble) attributeNames <-c(“ensembl_gene_id”,”external_gene_name”,”HGNC_ID”, “chromosome_name”,”description”) filterValues <- rownames(res) Annotations <- getBM(attributes=attributeNames, filters =…

Continue Reading gene ID RNAseq

r – Where does ggplot set the order of the color scheme?

I think this is a reproducible example of what you’re seeing. In the diamonds dataset, the mean price of “Good” diamonds is actually higher than the mean for “Very Good” diamonds. library(dplyr) diamonds %>% group_by(cut) %>% summarize(mean_price = mean(price)) # A tibble: 5 x 2 cut mean_price <ord> <dbl> 1…

Continue Reading r – Where does ggplot set the order of the color scheme?

Tabix file download error eQTL analysis

I am trying to download a tabix file to perform an analysis on an eQTL dataset, however I have the following error each each file I try to try from the eQTL – catalogue library(ggplot2) library(readr) library(coloc) library(GenomicRanges) library(seqminer) tabix_paths = read.delim(“https://raw.githubusercontent.com/eQTL-Catalogue/eQTL-Catalogue-resources/master/tabix/tabix_ftp_paths.tsv”, sep = “t”, header = TRUE, stringsAsFactors =…

Continue Reading Tabix file download error eQTL analysis

Writing ggplot custom geometry function

stat_accum <- function(mapping = NULL, data = NULL, geom = “point”, position = “stack”, …, show.legend = NA, inherit.aes = TRUE) { layer( data = data, mapping = mapping, stat = StatAccum, geom = geom, position = position, show.legend = show.legend, inherit.aes = inherit.aes, params = list( na.rm = na.rm,…

Continue Reading Writing ggplot custom geometry function

Error on final object when generating ggplot objects in for loop with dplyr select()

I want to make many plots using multiple pairs of variables in a dataframe, all with the same x. I store the plots in a named list. For simplicity, below is an example with only 1 variable in each plot. Key to this function is a select() call that is…

Continue Reading Error on final object when generating ggplot objects in for loop with dplyr select()

Using logic in listcols (ggplot2 edition)

I am trying to produce a slightly different result from a purrr::map iteration depending on a condition. Say I have this code producing plots and storing them in a dataframe: library(tidyverse) #> — Attaching packages ————————————— tidyverse 1.3.1 — #> v ggplot2 3.3.5 v purrr 0.3.4 #> v tibble 3.1.6…

Continue Reading Using logic in listcols (ggplot2 edition)

R connection to sqlite – Stackify

SQLite is a file level database, hence to reference it requires a full directory path. No where do you specify the working directory or a full path in the file name. By default, R will use the current working directory contained in getwd(). If database is not contained in this…

Continue Reading R connection to sqlite – Stackify

Decimal digits in `Slope Graph` with `ggplot2`

Following a former question I opened few weeks ago: Slope Chart – ggplot2 I face another issue, concerning the numeric values reported in the graph. Even specifying the decimal digits I need (exactly 3) with any of the two commands: y=round(y, digit = 3) at the endof the code or…

Continue Reading Decimal digits in `Slope Graph` with `ggplot2`

Bioconductor – RiboCrypt

DOI: 10.18129/B9.bioc.RiboCrypt     Interactive visualization in genomics Bioconductor version: Release (3.14) R Package for interactive visualization and browsing NGS data. It contains a browser for both transcript and genomic coordinate view. In addition a QC and general metaplots are included, among others differential translation plots and gene expression plots….

Continue Reading Bioconductor – RiboCrypt

Ggplot: heatmap based on two vectors ( R, Ggplot2 )

Problem : ( Scroll to solution ) I am trying to plot data as a heat map in ggplot2. I understand that you’d normally have to have x, y, and z coordinated to plot x against y and then color by z. I have found plenty of heat map examples…

Continue Reading Ggplot: heatmap based on two vectors ( R, Ggplot2 )

summary-statistics – Github Help

0 1 0 summary-statistics,Fictional company AutosRUs’ newest prototype, the MechaCar, is suffering from production troubles that are blocking the manufacturing team’s progress. AutosRUs’ senior management enlisted assistance from the data analytics team to review the production data for insights that may help the manufacturing team overcome their production issues. User:…

Continue Reading summary-statistics – Github Help

Bioconductor – TBSignatureProfiler (development version)

DOI: 10.18129/B9.bioc.TBSignatureProfiler     This is the development version of TBSignatureProfiler; for the stable release version, see TBSignatureProfiler. Profile RNA-Seq Data Using TB Pathway Signatures Bioconductor version: Development (3.15) Gene signatures of TB progression, TB disease, and other TB disease states have been validated and published previously. This package aggregates…

Continue Reading Bioconductor – TBSignatureProfiler (development version)

protti source: R/fetch_alphafold_prediction.R

#’ Fetch AlphaFold prediction #’ #’ Fetches atom level data for AlphaFold predictions either for selected proteins or whole #’ organisms. #’ #’ @param uniprot_ids optional, a character vector of UniProt identifiers for which predictions #’ should be fetched. This argument is mutually exclusive to the code{organism_name} argument. #’ @param…

Continue Reading protti source: R/fetch_alphafold_prediction.R

Bottom-Up Proteomics and LiP-MS Quality Control and Data Analysis Tools

Peptides are mapped onto PDB structures or AlphaFold prediction based on their positions. This is accomplished by replacing the B-factor information in the structure file with values that allow highlighting of peptides, protein regions or amino acids when the structure is coloured by B-factor. In addition to simply highlighting peptides,…

Continue Reading Bottom-Up Proteomics and LiP-MS Quality Control and Data Analysis Tools

Finding the significance of the overlap between 2 or more gene sets using simulation in R.

TLDR: Example R function to calculate significance of overlap of 2 or more gene sets. genes_all is a vector that contains all genes, and gene_sets takes a list of vectors for each gene set. I encourage people to read the full tutorial and attempt to reproduce the code themselves (especially…

Continue Reading Finding the significance of the overlap between 2 or more gene sets using simulation in R.

Outliers on DESEq2 Results

I have an RNAseq dataset, where one of the genes I intend to analyze has hundreds of counts ranging from 10 to 12, with a few counts > 9000. I process this data in Deseq2 and get that the gene is differentially expressed across several samples of interest. What can…

Continue Reading Outliers on DESEq2 Results

Bioconductor – conclus

DOI: 10.18129/B9.bioc.conclus     ScRNA-seq Workflow CONCLUS – From CONsensus CLUSters To A Meaningful CONCLUSion Bioconductor version: Release (3.13) CONCLUS is a tool for robust clustering and positive marker features selection of single-cell RNA-seq (sc-RNA-seq) datasets. It takes advantage of a consensus clustering approach that greatly simplify sc-RNA-seq data analysis…

Continue Reading Bioconductor – conclus

Need help to remove NA values from data frame

Need help to remove NA values from data frame 2 I have this data frame : and I want to remove those rows which contain NA values from the log2fold change column How can I do this through R? DeSEQ2 R • 256 views Hi Anas, If your data frame…

Continue Reading Need help to remove NA values from data frame

Bioconductor – traviz (development version)

DOI: 10.18129/B9.bioc.traviz     This is the development version of traviz; to use it, please install the devel version of Bioconductor. Trajectory functions for visualization and interpretation. Bioconductor version: Development (3.14) traviz provides a suite of functions to plot trajectory related objects from Bioconductor packages. It allows plotting trajectories in…

Continue Reading Bioconductor – traviz (development version)

Bioconductor – marr

DOI: 10.18129/B9.bioc.marr     Maximum rank reproducibility Bioconductor version: Release (3.13) marr (Maximum Rank Reproducibility) is a nonparametric approach that detects reproducible signals using a maximal rank statistic for high-dimensional biological data. In this R package, we implement functions that measures the reproducibility of features per sample pair and sample…

Continue Reading Bioconductor – marr

Can cnetplot plot points change shape in clusterprofiler?

Can cnetplot plot points change shape in clusterprofiler? 0 I have a cnetplot and I am wondering if it is possible for me to do further categorising of the plot by point shape? I have a cnetplot of genes and their interacting pathways, but the genes also have a few…

Continue Reading Can cnetplot plot points change shape in clusterprofiler?

Easy Way To Get 3′ Utr Lengths Of A List Of Genes

Easy Way To Get 3′ Utr Lengths Of A List Of Genes 4 Hi, as the title says really, I’m wondering if there is any tool available that would allow me to drop in a list of say entrez gene ids and get their corresponding 3′ UTR lenghts? Thanks for…

Continue Reading Easy Way To Get 3′ Utr Lengths Of A List Of Genes

Bioconductor – DuoClustering2018 (development version)

DOI: 10.18129/B9.bioc.DuoClustering2018     This is the development version of DuoClustering2018; for the stable release version, see DuoClustering2018. Data, Clustering Results and Visualization Functions From Duò et al (2018) Bioconductor version: Development (3.14) Preprocessed experimental and simulated scRNA-seq data sets used for evaluation of clustering methods for scRNA-seq data in…

Continue Reading Bioconductor – DuoClustering2018 (development version)

How to combine a data frame with another data frame containing comma-separated values?

For dataframe manipulation, in general, you should look into the dplyr and tidyr packages, they offer endless possibilities if you learn to manipulate them (lots of practice will help). A good and concise cheatsheet is available here. Regarding this problem in particular, something like this should work: library(dplyr) library(tidyr) dfA…

Continue Reading How to combine a data frame with another data frame containing comma-separated values?

Start your Kaggle journey today!

Hello folks. How’re you doing? Hope everything is fine with you. And today, I want to publicly commit myself to the 2 Articles 1-week challenge. Every week I will write two articles on Data Science from my past experiences or my learnings of the week. Ok then, keep all those…

Continue Reading Start your Kaggle journey today!

How to transform the deg gene list from seurat to a gene list input to clusterProfiler compareCluster ?

Sorry for lateness, I wanted to do something similar. This is what I did for reference: Using a Seurat generated gene list for input into ClusterProfiler to see the GO or KEGG terms per cluster. I’ll keep the meat and potatoes of the Seurat vignette in this tutorial: library(dplyr) library(Seurat)…

Continue Reading How to transform the deg gene list from seurat to a gene list input to clusterProfiler compareCluster ?

hjust in geom_cladelab not working properly

hjust in geom_cladelab not working properly 0 I have an issue with hjust of clade lab when using geom_cladelab. The justification does not change when specifying the labels to align with the outer circle and having angle = “auto” . I have added an reprex of the issue. The problem…

Continue Reading hjust in geom_cladelab not working properly

r is running out of memory

r is running out of memory 1 When i am trying loading gene expression counts downloaded from tcga (I have 859 samples with raw counts of 60000 rows). I GET THIS ERROR and R STOPPED FOR NO REASON! I check the memory to find out that my memory is nearly…

Continue Reading r is running out of memory

spots not filling the whole tissue image

Issue with Seurat SpatialPlot: spots not filling the whole tissue image 0 In Seurat, SpatialPlot generates a plot with an enlarged/expanded image of tissue section as compared to the original spot image. This seems to happen on the relatively small image with a number of spots around 500. I ‘d…

Continue Reading spots not filling the whole tissue image

lfcShrink probelm in many 0 count genes RNA-seq data

Hi, Dr love. I post a question about weird MAplot or volcano plot of DESeq2 diff result and also in biostar. ATpoint give a useful answer about too many 0 count genes and prefiltering. It seems that too many 0 count genes makes lfc shrink have a probelm. And I…

Continue Reading lfcShrink probelm in many 0 count genes RNA-seq data

weird MAplot or volcano plot of DESeq2 diff result

Hi, every one. I find a werid MAplot or volcano plot of DESeq reuslt. I am wondering whether you can give me some advice. This diff result is from two cell type bulk RNA-seq. I use two specific marker to get these two cell type using Flow cytometer. I alreadly…

Continue Reading weird MAplot or volcano plot of DESeq2 diff result

Answer: PopGenome – VCF, fasta, GTF and codons still missing

Dear Maciek Hopefully you were able to solve these problems already. I cannot comment on the main set of issues you reported. However, I also encountered the error: `Error in START[!REV, 3] : incorrect number of dimensions` following certain instances of `set.synnonsyn` which I also noticed occurred for genes which…

Continue Reading Answer: PopGenome – VCF, fasta, GTF and codons still missing

How to colour points in cnetplot of clustprofiler?

I have a cnetplot from running enrichment with kegg using clusterprofiler. I have scores input as the fold change but for each gene in the plot they are not varying in colour to show their difference in the fold change score. My dataset is genes of entrez IDs and then…

Continue Reading How to colour points in cnetplot of clustprofiler?

%% error in Rstudio

%% error in Rstudio 1 dc.markers %>% group_by(cluster) %>% top_n(2, wt = avg_logFC) the above code is giving error even after using dplyr and matrix libraries in seurat analysis in rstudio error : Error: Problem with filter() input ..1. i Input ..1 is top_n_rank(2, avg_logFC). x object ‘avg_logFC’ not found…

Continue Reading %% error in Rstudio

Highly used R packages with no Python equivalent

The biggies are obviously DESeq2, limma and edgeR, but they are massive packages doing some very complex statistics, and also have dependency trees that would need to be considered. Depending on your background, you might want to look into the rtracklayer/GenomicRanges eco-system. While I personally am not a fan, I…

Continue Reading Highly used R packages with no Python equivalent