Tag: dplyr

r – Where does ggplot set the order of the color scheme?

I think this is a reproducible example of what you’re seeing. In the diamonds dataset, the mean price of “Good” diamonds is actually higher than the mean for “Very Good” diamonds. library(dplyr) diamonds %>% group_by(cut) %>% summarize(mean_price = mean(price)) # A tibble: 5 x 2 cut mean_price <ord> <dbl> 1…

Continue Reading r – Where does ggplot set the order of the color scheme?

Tabix file download error eQTL analysis

I am trying to download a tabix file to perform an analysis on an eQTL dataset, however I have the following error each each file I try to try from the eQTL – catalogue library(ggplot2) library(readr) library(coloc) library(GenomicRanges) library(seqminer) tabix_paths = read.delim(“https://raw.githubusercontent.com/eQTL-Catalogue/eQTL-Catalogue-resources/master/tabix/tabix_ftp_paths.tsv”, sep = “t”, header = TRUE, stringsAsFactors =…

Continue Reading Tabix file download error eQTL analysis

Writing ggplot custom geometry function

stat_accum <- function(mapping = NULL, data = NULL, geom = “point”, position = “stack”, …, show.legend = NA, inherit.aes = TRUE) { layer( data = data, mapping = mapping, stat = StatAccum, geom = geom, position = position, show.legend = show.legend, inherit.aes = inherit.aes, params = list( na.rm = na.rm,…

Continue Reading Writing ggplot custom geometry function

Error on final object when generating ggplot objects in for loop with dplyr select()

I want to make many plots using multiple pairs of variables in a dataframe, all with the same x. I store the plots in a named list. For simplicity, below is an example with only 1 variable in each plot. Key to this function is a select() call that is…

Continue Reading Error on final object when generating ggplot objects in for loop with dplyr select()

Using logic in listcols (ggplot2 edition)

I am trying to produce a slightly different result from a purrr::map iteration depending on a condition. Say I have this code producing plots and storing them in a dataframe: library(tidyverse) #> — Attaching packages ————————————— tidyverse 1.3.1 — #> v ggplot2 3.3.5 v purrr 0.3.4 #> v tibble 3.1.6…

Continue Reading Using logic in listcols (ggplot2 edition)

R connection to sqlite – Stackify

SQLite is a file level database, hence to reference it requires a full directory path. No where do you specify the working directory or a full path in the file name. By default, R will use the current working directory contained in getwd(). If database is not contained in this…

Continue Reading R connection to sqlite – Stackify

Decimal digits in `Slope Graph` with `ggplot2`

Following a former question I opened few weeks ago: Slope Chart – ggplot2 I face another issue, concerning the numeric values reported in the graph. Even specifying the decimal digits I need (exactly 3) with any of the two commands: y=round(y, digit = 3) at the endof the code or…

Continue Reading Decimal digits in `Slope Graph` with `ggplot2`

Bioconductor – RiboCrypt

DOI: 10.18129/B9.bioc.RiboCrypt     Interactive visualization in genomics Bioconductor version: Release (3.14) R Package for interactive visualization and browsing NGS data. It contains a browser for both transcript and genomic coordinate view. In addition a QC and general metaplots are included, among others differential translation plots and gene expression plots….

Continue Reading Bioconductor – RiboCrypt

Ggplot: heatmap based on two vectors ( R, Ggplot2 )

Problem : ( Scroll to solution ) I am trying to plot data as a heat map in ggplot2. I understand that you’d normally have to have x, y, and z coordinated to plot x against y and then color by z. I have found plenty of heat map examples…

Continue Reading Ggplot: heatmap based on two vectors ( R, Ggplot2 )

summary-statistics – Github Help

0 1 0 summary-statistics,Fictional company AutosRUs’ newest prototype, the MechaCar, is suffering from production troubles that are blocking the manufacturing team’s progress. AutosRUs’ senior management enlisted assistance from the data analytics team to review the production data for insights that may help the manufacturing team overcome their production issues. User:…

Continue Reading summary-statistics – Github Help

Bioconductor – TBSignatureProfiler (development version)

DOI: 10.18129/B9.bioc.TBSignatureProfiler     This is the development version of TBSignatureProfiler; for the stable release version, see TBSignatureProfiler. Profile RNA-Seq Data Using TB Pathway Signatures Bioconductor version: Development (3.15) Gene signatures of TB progression, TB disease, and other TB disease states have been validated and published previously. This package aggregates…

Continue Reading Bioconductor – TBSignatureProfiler (development version)

protti source: R/fetch_alphafold_prediction.R

#’ Fetch AlphaFold prediction #’ #’ Fetches atom level data for AlphaFold predictions either for selected proteins or whole #’ organisms. #’ #’ @param uniprot_ids optional, a character vector of UniProt identifiers for which predictions #’ should be fetched. This argument is mutually exclusive to the code{organism_name} argument. #’ @param…

Continue Reading protti source: R/fetch_alphafold_prediction.R

Bottom-Up Proteomics and LiP-MS Quality Control and Data Analysis Tools

Peptides are mapped onto PDB structures or AlphaFold prediction based on their positions. This is accomplished by replacing the B-factor information in the structure file with values that allow highlighting of peptides, protein regions or amino acids when the structure is coloured by B-factor. In addition to simply highlighting peptides,…

Continue Reading Bottom-Up Proteomics and LiP-MS Quality Control and Data Analysis Tools

Finding the significance of the overlap between 2 or more gene sets using simulation in R.

TLDR: Example R function to calculate significance of overlap of 2 or more gene sets. genes_all is a vector that contains all genes, and gene_sets takes a list of vectors for each gene set. I encourage people to read the full tutorial and attempt to reproduce the code themselves (especially…

Continue Reading Finding the significance of the overlap between 2 or more gene sets using simulation in R.

Outliers on DESEq2 Results

I have an RNAseq dataset, where one of the genes I intend to analyze has hundreds of counts ranging from 10 to 12, with a few counts > 9000. I process this data in Deseq2 and get that the gene is differentially expressed across several samples of interest. What can…

Continue Reading Outliers on DESEq2 Results

Bioconductor – conclus

DOI: 10.18129/B9.bioc.conclus     ScRNA-seq Workflow CONCLUS – From CONsensus CLUSters To A Meaningful CONCLUSion Bioconductor version: Release (3.13) CONCLUS is a tool for robust clustering and positive marker features selection of single-cell RNA-seq (sc-RNA-seq) datasets. It takes advantage of a consensus clustering approach that greatly simplify sc-RNA-seq data analysis…

Continue Reading Bioconductor – conclus

Need help to remove NA values from data frame

Need help to remove NA values from data frame 2 I have this data frame : and I want to remove those rows which contain NA values from the log2fold change column How can I do this through R? DeSEQ2 R • 256 views Hi Anas, If your data frame…

Continue Reading Need help to remove NA values from data frame

Bioconductor – traviz (development version)

DOI: 10.18129/B9.bioc.traviz     This is the development version of traviz; to use it, please install the devel version of Bioconductor. Trajectory functions for visualization and interpretation. Bioconductor version: Development (3.14) traviz provides a suite of functions to plot trajectory related objects from Bioconductor packages. It allows plotting trajectories in…

Continue Reading Bioconductor – traviz (development version)

Bioconductor – marr

DOI: 10.18129/B9.bioc.marr     Maximum rank reproducibility Bioconductor version: Release (3.13) marr (Maximum Rank Reproducibility) is a nonparametric approach that detects reproducible signals using a maximal rank statistic for high-dimensional biological data. In this R package, we implement functions that measures the reproducibility of features per sample pair and sample…

Continue Reading Bioconductor – marr

Can cnetplot plot points change shape in clusterprofiler?

Can cnetplot plot points change shape in clusterprofiler? 0 I have a cnetplot and I am wondering if it is possible for me to do further categorising of the plot by point shape? I have a cnetplot of genes and their interacting pathways, but the genes also have a few…

Continue Reading Can cnetplot plot points change shape in clusterprofiler?

Easy Way To Get 3′ Utr Lengths Of A List Of Genes

Easy Way To Get 3′ Utr Lengths Of A List Of Genes 4 Hi, as the title says really, I’m wondering if there is any tool available that would allow me to drop in a list of say entrez gene ids and get their corresponding 3′ UTR lenghts? Thanks for…

Continue Reading Easy Way To Get 3′ Utr Lengths Of A List Of Genes

Bioconductor – DuoClustering2018 (development version)

DOI: 10.18129/B9.bioc.DuoClustering2018     This is the development version of DuoClustering2018; for the stable release version, see DuoClustering2018. Data, Clustering Results and Visualization Functions From Duò et al (2018) Bioconductor version: Development (3.14) Preprocessed experimental and simulated scRNA-seq data sets used for evaluation of clustering methods for scRNA-seq data in…

Continue Reading Bioconductor – DuoClustering2018 (development version)

How to combine a data frame with another data frame containing comma-separated values?

For dataframe manipulation, in general, you should look into the dplyr and tidyr packages, they offer endless possibilities if you learn to manipulate them (lots of practice will help). A good and concise cheatsheet is available here. Regarding this problem in particular, something like this should work: library(dplyr) library(tidyr) dfA…

Continue Reading How to combine a data frame with another data frame containing comma-separated values?

Start your Kaggle journey today!

Hello folks. How’re you doing? Hope everything is fine with you. And today, I want to publicly commit myself to the 2 Articles 1-week challenge. Every week I will write two articles on Data Science from my past experiences or my learnings of the week. Ok then, keep all those…

Continue Reading Start your Kaggle journey today!

How to transform the deg gene list from seurat to a gene list input to clusterProfiler compareCluster ?

Sorry for lateness, I wanted to do something similar. This is what I did for reference: Using a Seurat generated gene list for input into ClusterProfiler to see the GO or KEGG terms per cluster. I’ll keep the meat and potatoes of the Seurat vignette in this tutorial: library(dplyr) library(Seurat)…

Continue Reading How to transform the deg gene list from seurat to a gene list input to clusterProfiler compareCluster ?

hjust in geom_cladelab not working properly

hjust in geom_cladelab not working properly 0 I have an issue with hjust of clade lab when using geom_cladelab. The justification does not change when specifying the labels to align with the outer circle and having angle = “auto” . I have added an reprex of the issue. The problem…

Continue Reading hjust in geom_cladelab not working properly

r is running out of memory

r is running out of memory 1 When i am trying loading gene expression counts downloaded from tcga (I have 859 samples with raw counts of 60000 rows). I GET THIS ERROR and R STOPPED FOR NO REASON! I check the memory to find out that my memory is nearly…

Continue Reading r is running out of memory

spots not filling the whole tissue image

Issue with Seurat SpatialPlot: spots not filling the whole tissue image 0 In Seurat, SpatialPlot generates a plot with an enlarged/expanded image of tissue section as compared to the original spot image. This seems to happen on the relatively small image with a number of spots around 500. I ‘d…

Continue Reading spots not filling the whole tissue image

lfcShrink probelm in many 0 count genes RNA-seq data

Hi, Dr love. I post a question about weird MAplot or volcano plot of DESeq2 diff result and also in biostar. ATpoint give a useful answer about too many 0 count genes and prefiltering. It seems that too many 0 count genes makes lfc shrink have a probelm. And I…

Continue Reading lfcShrink probelm in many 0 count genes RNA-seq data

weird MAplot or volcano plot of DESeq2 diff result

Hi, every one. I find a werid MAplot or volcano plot of DESeq reuslt. I am wondering whether you can give me some advice. This diff result is from two cell type bulk RNA-seq. I use two specific marker to get these two cell type using Flow cytometer. I alreadly…

Continue Reading weird MAplot or volcano plot of DESeq2 diff result

Answer: PopGenome – VCF, fasta, GTF and codons still missing

Dear Maciek Hopefully you were able to solve these problems already. I cannot comment on the main set of issues you reported. However, I also encountered the error: `Error in START[!REV, 3] : incorrect number of dimensions` following certain instances of `set.synnonsyn` which I also noticed occurred for genes which…

Continue Reading Answer: PopGenome – VCF, fasta, GTF and codons still missing

How to colour points in cnetplot of clustprofiler?

I have a cnetplot from running enrichment with kegg using clusterprofiler. I have scores input as the fold change but for each gene in the plot they are not varying in colour to show their difference in the fold change score. My dataset is genes of entrez IDs and then…

Continue Reading How to colour points in cnetplot of clustprofiler?

%% error in Rstudio

%% error in Rstudio 1 dc.markers %>% group_by(cluster) %>% top_n(2, wt = avg_logFC) the above code is giving error even after using dplyr and matrix libraries in seurat analysis in rstudio error : Error: Problem with filter() input ..1. i Input ..1 is top_n_rank(2, avg_logFC). x object ‘avg_logFC’ not found…

Continue Reading %% error in Rstudio