Tag: dplyr
Bioconductor – TrajectoryGeometry (development version)
DOI: 10.18129/B9.bioc.TrajectoryGeometry This is the development version of TrajectoryGeometry; for the stable release version, see TrajectoryGeometry. This Package Discovers Directionality in Time and Pseudo-times Series of Gene Expression Patterns Bioconductor version: Development (3.16) Given a time series or pseudo-times series of gene expression data, we might wish to…
R programming language tutorials – Technical Ripon
Are you learning the R programming language? Want to learn how to do more tasks with R? Check out our Do More With R video tutorials below — most with accompanying text articles and code, almost all under 10 minutes. In the table below you can easily search all tutorials…
Bioconductor – sSNAPPY (development version)
DOI: 10.18129/B9.bioc.sSNAPPY This is the development version of sSNAPPY; for the stable release version, see sSNAPPY. Single Sample directioNAl Pathway Perturbation analYsis Bioconductor version: Development (3.16) A single sample pathway pertrubation testing methods for RNA-seq data. The method propagate changes in gene expression down gene-set topologies to compute…
Recent questions tagged fasta – Q&A
Most popular tags python javascript html java css reactjs c# php r sql arrays pandas c++ android jquery DataFrame python-3.x node.js c mysql list flutter JSON ios typescript sql-server swift string angular regex laravel excel django dictionary dart bash numpy postgresql loops oracle vba linux angularjs function for-loop spring spring-boot…
Director of Bioinformatics in Chicago, IL for University of Chicago (UC)
Details Posted: 03-May-22 Location: Chicago, Illinois Type: Full-time Salary: Open Categories: Research – Laboratory/Non-Laboratory Staff/Administrative Location: Hyde Park Campus Job Description: Provides technical expertise in the selection, validation, and implementation of the appropriate internal and external data analytic and bioinformatic solutions needed to analyze specimens, process and integrate data, and…
Error in SummarizedExperiment
I have installed DESeq2 version 1.36.0 samples <- colnames(txi$counts) group <- as.factor(c(“control”,”control”,”control”,”control”,”control”,”diet”,”diet”,”diet”,”diet”,”diet”, “control”,”control”,”control”,”control”,”control”,”diet”,”diet”,”diet”,”diet”,”diet”,”diet”)) coldata <- data.frame(samples, group, stringsAsFactors = F) coldata <- coldata[,c(“samples”,”group”)] coldata$samples <- factor(coldata$samples) coldata$group <- factor(coldata$group) rownames(coldata) <- sub(“fb”, “”, rownames(coldata)) all(rownames(coldata$samples) %in% colnames(txi)) all(rownames(coldata) == colnames(txi)) TRUE library(DESeq2) ddsTxi <- DESeqDataSetFromTximport(txi, colData = coldata, design =…
deseq2 problem
deseq2 problem 0 Hi I am trying to draw a PCA plot with DESeq2 but somehow I cannot use DESeq2 functions. It is a really simple code i wil be pasting below. > transform <- DESeq2::rlog(eliminated_data, blind = TRUE) Error in (function (classes, fdef, mtable) : unable to find an…
GDCprepare of RNAseq counts produces error
GDCprepare of RNAseq counts produces error 1 @76ac7b25 Last seen 12 minutes ago Canada Hello everyone! I have been using the TCGAbiolinks package for the last couple years to access RNAseq data for the TCGA-LAML project. Just very recently, I had noticed that I could no longer use GDCquery to…
Separate exogenous from endogenous transcripts using Salmon RNAseq DTU
Dear friends, We are trying to use Salmon for DTU analysis. We want to separate exogenous from endogenous transcripts by following this post www.biostars.org/p/443701/ and this paper f1000research.com/articles/7-952 We are focusing on a gene called ASCL1 (endo-ASCL1). We transduced cells with lentiviral vector containing ASCL1 ORF only (Lenti-ASCL1). There should…
GDCquery_Maf error
GDCquery_Maf error 0 @76e1237b Last seen 1 day ago Singapore Hi all, I really need some help. I am trying to run GDCquery_Maf which worked fine until yesterday. Now I get the following error: Error in GDCquery(paste0(“TCGA-“, tumor), data.category = “Simple Nucleotide Variation”, : Please set a valid workflow.type argument…
subpopulations available in MafH5.gnomAD.v3.1.1.GRCh38
subpopulations available in MafH5.gnomAD.v3.1.1.GRCh38 1 @b14a6f0d Last seen 16 hours ago United States Are subpopulation MAFs available for gnomADv.3.1.1 with any package, like they are in MafDb.gnomAD.r2.1.hs37d5? I’m trying to use Genomic Scores to obtain all variants in a genomic range with MAF in any subpopulation >= cutoff. I tried…
“Rprofile”to use rstudio on MAC
. rprofile againI received a comment at A recent post and taught me how to send command from R console to terminal using system functions.Used.As described above, the following is described in the console of the upper left corner of rstudio. library(stats) library(tidyverse) library(ggplot2) library(GGally) library(patchwork) library(lubridate) library(dplyr) We named…
Bioconductor installation problems
Dear Friends, I have been having considerable difficulty doing various package related bioconductor actions. Here is my system specifications: ======================================================================= macOS Monterey Version: 12.3 (21E230) MacBook Pro (Retina, 13-inch, Early 2015) Memory: 16 GB 1867 MHz DDR3 RStudio 2021.09.0 Build 351 R version 4.1.2 (2021-11-01) — “Bird Hippie” Copyright (C)…
Bioconductor – mirTarRnaSeq
DOI: 10.18129/B9.bioc.mirTarRnaSeq mirTarRnaSeq Bioconductor version: Release (3.14) mirTarRnaSeq R package can be used for interactive mRNA miRNA sequencing statistical analysis. This package utilizes expression or differential expression mRNA and miRNA sequencing results and performs interactive correlation and various GLMs (Regular GLM, Multivariate GLM, and Interaction GLMs ) analysis…
ggplot2 – How To Update R Values
I have the below code which runs a 3 month picture of my metrics. I open the saved code, remove “Nov-21” and add “Feb-22”, then delete the first entry for each metric and add “Feb-22” entry to end of each metric (957L, 1208L, 1054L, 476L). Previously, the 3 month picture…
Bioconductor Package Installation
When I try to install the gtf for hg38 BiocManager::install(“TxDb.Hsapiens.UCSC.hg38.knownGene”) I get the following error: ‘getOption(“repos”)’ replaces Bioconductor standard repositories, see ‘?repositories’ for details replacement repositories: CRAN: cran.rstudio.com/ Bioconductor version 3.14 (BiocManager 1.30.16), R 4.1.2 (2021-11-01) Installing package(s) ‘TxDb.Hsapiens.UCSC.hg38.knownGene’ Error in readRDS(dest) : error reading from connection Per stackoverflow.com/questions/67455984/getoptionrepos-replaces-bioconductor-standard-repositories-see-reposito I…
Bioconductor – cytoKernel
DOI: 10.18129/B9.bioc.cytoKernel Differential expression using kernel-based score test Bioconductor version: Release (3.14) cytoKernel implements a kernel-based score test to identify differentially expressed features in high-dimensional biological experiments. This approach can be applied across many different high-dimensional biological data including gene expression data and dimensionally reduced cytometry-based marker expression…
Correctly place geom_text labels in forest plot with “fill” parameter in ggplot2 – General
Hi There, I want to create annotated publication-ready forest plots comparing different models. How do I ensure that the labels are placed above and tight against the corresponding error bars and do not overlap? Thank you for your help! Here is what I tried: library(tidyverse) #> Warning: package ‘tibble’ was…
ggplot2loon function – RDocumentation
Examples # NOT RUN { if(interactive()) { p <- ggplot(mtcars, aes(wt, mpg)) + geom_point() g <- ggplot2loon(p) p1 <- ggplot(mtcars) + geom_point(aes(x = wt, y = mpg, colour = factor(gear))) + facet_wrap(~am) g1 <- ggplot2loon(p1) # } # NOT RUN { df <- data.frame( x = rnorm(120, c(0, 2, 4)),…
Bioconductor – sccomp (development version)
DOI: 10.18129/B9.bioc.sccomp This is the development version of sccomp; to use it, please install the devel version of Bioconductor. Robust Outlier-aware Estimation of Composition and Heterogeneity for Single-cell Data Bioconductor version: Development (3.15) A robust and outlier-aware method for testing differential tissue composition from single-cell data. This model…
quosure dplyr ggplot2 (1) – Code Examples
How to parametrize function calls in dplyr 0.7? The release of dplyr 0.7 includes a major overhaul of programming with dplyr. I read this document carefully, and I am trying to understand how it will impact my use of dplyr. Here is a common idiom I… …
Pathway analysis of RNAseq data using goseq package
Hello, I have finished the RNA seq analysis and I am trying to perform some pathway analysis. I have used the gage package and I was looking online about another package called goseq that takes into account length bias. However, when I run the code I get an error. How…
A Comprehensive Guide on ggplot2 in R
Image source: Author Introduction Visualization plays an important role in the decision-making process after analyzing relevant data. Graphical representation highlighting the interdependence of key elements affecting performance is important in the above process. There are many libraries in Python and R which provide different options showing…
DESeq2 and high prefiltering cutoff
DESeq2 and high prefiltering cutoff 1 @255004b1 Last seen 3 hours ago United States Hi, I am curious about prefiltering with DESeq2. I understand from this site and reading the DESeq2 vignette that prefiletering is really unnecessary as DESeq2 has a stringent filtering that it does. However, I’m seeing better…
How do I resolve Rd warning “missing file link” when building packages in RStudio?
After building a simple test package to isolate this issue, I receive the following warning when I run Rcmd.exe INSTALL –nomultiarch –with-keep.source simpleTest: * installing to library ‘C:/Users/user/Documents/R-dev’ * installing *source* package ‘simpleTest’ … ** R ** preparing package for lazy loading ** help *** installing help indices converting help…
Bioconductor – TAPseq
DOI: 10.18129/B9.bioc.TAPseq This package is for version 3.12 of Bioconductor; for the stable, up-to-date release version, see TAPseq. Targeted scRNA-seq primer design for TAP-seq Bioconductor version: 3.12 Design primers for targeted single-cell RNA-seq used by TAP-seq. Create sequence templates for target gene panels and design gene-specific primers using…
r – ggplot2 running for minutes without plotting
I am attempting to plot the below vector, but when I run the function, it just continues to run and does not plot. I have waited 5 minutes before I feel uncomfortable and click stop in the console. Wondering what is going on. Up until this point I have had…
r – trying to make a ggplot with two lines
The trick is to gather the columns you want to map as variables. As I don’t know, how you want to plot your graph, means, about x-axis and y-axis, I made a pseudo plot. and for your continuous variable part, you can either convert your values to integer or numeric…
use tcgabiolinks package to download TCGA data
TCGA Data download in terms of ease of use ,RTCGA The bag should be better , And because it’s already downloaded data , The use is relatively stable . But also because of the downloaded data , There is no guarantee that the data is new .TCGAbiolinks The package is…
r – Is it possible to change a line from a ggplot/geom_line plot depending on what month the datapoint corresponds to?
Yes, it’s possible. The easiest way to do it by creating a vector of your colors, the same length as the rows in your dataframe, and passing it to the col argument in geom_line(). Here is an example: library(dplyr, warn.conflicts = FALSE) library(ggplot2) library(lubridate, warn.conflicts = FALSE) # create some…
R plot color by value in y-axis ggplot
No need for ifelse statement, just write your condition and then use scale_fill_manual: ggplot(Report, aes(x=Report$Name,y=Report$average_working_hours)) + ggtitle(‘working hours in July’) + ylab(‘ working hours’) + geom_bar(stat = ‘identity’, aes(fill = Report$average_working_hours > 8)) + theme_gray() + scale_fill_manual(values=c(‘blue’, ‘red’)) This can be done quite simply using dplyr. First thing I would…
Bioconductor – pRolocGUI
This package is for version 3.4 of Bioconductor; for the stable, up-to-date release version, see pRolocGUI. Interactive visualisation of spatial proteomics data Bioconductor version: 3.4 The package pRolocGUI comprises functions to interactively visualise organelle (spatial) proteomics data on the basis of pRoloc, pRolocdata and shiny. Author: Lisa M…
gene ID RNAseq
gene ID RNAseq 0 Hi friends How can I get gene numeric ID and hugo ID by R script? what script should I use? I have this but does not give numeric ID and hugo ID. ibrary(biomaRt) library(dplyr) library(tibble) attributeNames <-c(“ensembl_gene_id”,”external_gene_name”,”HGNC_ID”, “chromosome_name”,”description”) filterValues <- rownames(res) Annotations <- getBM(attributes=attributeNames, filters =…
r – Where does ggplot set the order of the color scheme?
I think this is a reproducible example of what you’re seeing. In the diamonds dataset, the mean price of “Good” diamonds is actually higher than the mean for “Very Good” diamonds. library(dplyr) diamonds %>% group_by(cut) %>% summarize(mean_price = mean(price)) # A tibble: 5 x 2 cut mean_price <ord> <dbl> 1…
Tabix file download error eQTL analysis
I am trying to download a tabix file to perform an analysis on an eQTL dataset, however I have the following error each each file I try to try from the eQTL – catalogue library(ggplot2) library(readr) library(coloc) library(GenomicRanges) library(seqminer) tabix_paths = read.delim(“https://raw.githubusercontent.com/eQTL-Catalogue/eQTL-Catalogue-resources/master/tabix/tabix_ftp_paths.tsv”, sep = “t”, header = TRUE, stringsAsFactors =…
Writing ggplot custom geometry function
stat_accum <- function(mapping = NULL, data = NULL, geom = “point”, position = “stack”, …, show.legend = NA, inherit.aes = TRUE) { layer( data = data, mapping = mapping, stat = StatAccum, geom = geom, position = position, show.legend = show.legend, inherit.aes = inherit.aes, params = list( na.rm = na.rm,…
Error on final object when generating ggplot objects in for loop with dplyr select()
I want to make many plots using multiple pairs of variables in a dataframe, all with the same x. I store the plots in a named list. For simplicity, below is an example with only 1 variable in each plot. Key to this function is a select() call that is…
Using logic in listcols (ggplot2 edition)
I am trying to produce a slightly different result from a purrr::map iteration depending on a condition. Say I have this code producing plots and storing them in a dataframe: library(tidyverse) #> — Attaching packages ————————————— tidyverse 1.3.1 — #> v ggplot2 3.3.5 v purrr 0.3.4 #> v tibble 3.1.6…
R connection to sqlite – Stackify
SQLite is a file level database, hence to reference it requires a full directory path. No where do you specify the working directory or a full path in the file name. By default, R will use the current working directory contained in getwd(). If database is not contained in this…
Decimal digits in `Slope Graph` with `ggplot2`
Following a former question I opened few weeks ago: Slope Chart – ggplot2 I face another issue, concerning the numeric values reported in the graph. Even specifying the decimal digits I need (exactly 3) with any of the two commands: y=round(y, digit = 3) at the endof the code or…
Bioconductor – RiboCrypt
DOI: 10.18129/B9.bioc.RiboCrypt Interactive visualization in genomics Bioconductor version: Release (3.14) R Package for interactive visualization and browsing NGS data. It contains a browser for both transcript and genomic coordinate view. In addition a QC and general metaplots are included, among others differential translation plots and gene expression plots….
Ggplot: heatmap based on two vectors ( R, Ggplot2 )
Problem : ( Scroll to solution ) I am trying to plot data as a heat map in ggplot2. I understand that you’d normally have to have x, y, and z coordinated to plot x against y and then color by z. I have found plenty of heat map examples…
summary-statistics – Github Help
0 1 0 summary-statistics,Fictional company AutosRUs’ newest prototype, the MechaCar, is suffering from production troubles that are blocking the manufacturing team’s progress. AutosRUs’ senior management enlisted assistance from the data analytics team to review the production data for insights that may help the manufacturing team overcome their production issues. User:…
Bioconductor – TBSignatureProfiler (development version)
DOI: 10.18129/B9.bioc.TBSignatureProfiler This is the development version of TBSignatureProfiler; for the stable release version, see TBSignatureProfiler. Profile RNA-Seq Data Using TB Pathway Signatures Bioconductor version: Development (3.15) Gene signatures of TB progression, TB disease, and other TB disease states have been validated and published previously. This package aggregates…
protti source: R/fetch_alphafold_prediction.R
#’ Fetch AlphaFold prediction #’ #’ Fetches atom level data for AlphaFold predictions either for selected proteins or whole #’ organisms. #’ #’ @param uniprot_ids optional, a character vector of UniProt identifiers for which predictions #’ should be fetched. This argument is mutually exclusive to the code{organism_name} argument. #’ @param…
Bottom-Up Proteomics and LiP-MS Quality Control and Data Analysis Tools
Peptides are mapped onto PDB structures or AlphaFold prediction based on their positions. This is accomplished by replacing the B-factor information in the structure file with values that allow highlighting of peptides, protein regions or amino acids when the structure is coloured by B-factor. In addition to simply highlighting peptides,…
Finding the significance of the overlap between 2 or more gene sets using simulation in R.
TLDR: Example R function to calculate significance of overlap of 2 or more gene sets. genes_all is a vector that contains all genes, and gene_sets takes a list of vectors for each gene set. I encourage people to read the full tutorial and attempt to reproduce the code themselves (especially…
Outliers on DESEq2 Results
I have an RNAseq dataset, where one of the genes I intend to analyze has hundreds of counts ranging from 10 to 12, with a few counts > 9000. I process this data in Deseq2 and get that the gene is differentially expressed across several samples of interest. What can…
Bioconductor – conclus
DOI: 10.18129/B9.bioc.conclus ScRNA-seq Workflow CONCLUS – From CONsensus CLUSters To A Meaningful CONCLUSion Bioconductor version: Release (3.13) CONCLUS is a tool for robust clustering and positive marker features selection of single-cell RNA-seq (sc-RNA-seq) datasets. It takes advantage of a consensus clustering approach that greatly simplify sc-RNA-seq data analysis…
Need help to remove NA values from data frame
Need help to remove NA values from data frame 2 I have this data frame : and I want to remove those rows which contain NA values from the log2fold change column How can I do this through R? DeSEQ2 R • 256 views Hi Anas, If your data frame…
Bioconductor – traviz (development version)
DOI: 10.18129/B9.bioc.traviz This is the development version of traviz; to use it, please install the devel version of Bioconductor. Trajectory functions for visualization and interpretation. Bioconductor version: Development (3.14) traviz provides a suite of functions to plot trajectory related objects from Bioconductor packages. It allows plotting trajectories in…
Bioconductor – marr
DOI: 10.18129/B9.bioc.marr Maximum rank reproducibility Bioconductor version: Release (3.13) marr (Maximum Rank Reproducibility) is a nonparametric approach that detects reproducible signals using a maximal rank statistic for high-dimensional biological data. In this R package, we implement functions that measures the reproducibility of features per sample pair and sample…
Can cnetplot plot points change shape in clusterprofiler?
Can cnetplot plot points change shape in clusterprofiler? 0 I have a cnetplot and I am wondering if it is possible for me to do further categorising of the plot by point shape? I have a cnetplot of genes and their interacting pathways, but the genes also have a few…
Easy Way To Get 3′ Utr Lengths Of A List Of Genes
Easy Way To Get 3′ Utr Lengths Of A List Of Genes 4 Hi, as the title says really, I’m wondering if there is any tool available that would allow me to drop in a list of say entrez gene ids and get their corresponding 3′ UTR lenghts? Thanks for…
Bioconductor – DuoClustering2018 (development version)
DOI: 10.18129/B9.bioc.DuoClustering2018 This is the development version of DuoClustering2018; for the stable release version, see DuoClustering2018. Data, Clustering Results and Visualization Functions From Duò et al (2018) Bioconductor version: Development (3.14) Preprocessed experimental and simulated scRNA-seq data sets used for evaluation of clustering methods for scRNA-seq data in…
How to combine a data frame with another data frame containing comma-separated values?
For dataframe manipulation, in general, you should look into the dplyr and tidyr packages, they offer endless possibilities if you learn to manipulate them (lots of practice will help). A good and concise cheatsheet is available here. Regarding this problem in particular, something like this should work: library(dplyr) library(tidyr) dfA…
Start your Kaggle journey today!
Hello folks. How’re you doing? Hope everything is fine with you. And today, I want to publicly commit myself to the 2 Articles 1-week challenge. Every week I will write two articles on Data Science from my past experiences or my learnings of the week. Ok then, keep all those…
How to transform the deg gene list from seurat to a gene list input to clusterProfiler compareCluster ?
Sorry for lateness, I wanted to do something similar. This is what I did for reference: Using a Seurat generated gene list for input into ClusterProfiler to see the GO or KEGG terms per cluster. I’ll keep the meat and potatoes of the Seurat vignette in this tutorial: library(dplyr) library(Seurat)…
hjust in geom_cladelab not working properly
hjust in geom_cladelab not working properly 0 I have an issue with hjust of clade lab when using geom_cladelab. The justification does not change when specifying the labels to align with the outer circle and having angle = “auto” . I have added an reprex of the issue. The problem…
r is running out of memory
r is running out of memory 1 When i am trying loading gene expression counts downloaded from tcga (I have 859 samples with raw counts of 60000 rows). I GET THIS ERROR and R STOPPED FOR NO REASON! I check the memory to find out that my memory is nearly…
spots not filling the whole tissue image
Issue with Seurat SpatialPlot: spots not filling the whole tissue image 0 In Seurat, SpatialPlot generates a plot with an enlarged/expanded image of tissue section as compared to the original spot image. This seems to happen on the relatively small image with a number of spots around 500. I ‘d…
lfcShrink probelm in many 0 count genes RNA-seq data
Hi, Dr love. I post a question about weird MAplot or volcano plot of DESeq2 diff result and also in biostar. ATpoint give a useful answer about too many 0 count genes and prefiltering. It seems that too many 0 count genes makes lfc shrink have a probelm. And I…
weird MAplot or volcano plot of DESeq2 diff result
Hi, every one. I find a werid MAplot or volcano plot of DESeq reuslt. I am wondering whether you can give me some advice. This diff result is from two cell type bulk RNA-seq. I use two specific marker to get these two cell type using Flow cytometer. I alreadly…
Answer: PopGenome – VCF, fasta, GTF and codons still missing
Dear Maciek Hopefully you were able to solve these problems already. I cannot comment on the main set of issues you reported. However, I also encountered the error: `Error in START[!REV, 3] : incorrect number of dimensions` following certain instances of `set.synnonsyn` which I also noticed occurred for genes which…
How to colour points in cnetplot of clustprofiler?
I have a cnetplot from running enrichment with kegg using clusterprofiler. I have scores input as the fold change but for each gene in the plot they are not varying in colour to show their difference in the fold change score. My dataset is genes of entrez IDs and then…
%% error in Rstudio
%% error in Rstudio 1 dc.markers %>% group_by(cluster) %>% top_n(2, wt = avg_logFC) the above code is giving error even after using dplyr and matrix libraries in seurat analysis in rstudio error : Error: Problem with filter() input ..1. i Input ..1 is top_n_rank(2, avg_logFC). x object ‘avg_logFC’ not found…
Highly used R packages with no Python equivalent
The biggies are obviously DESeq2, limma and edgeR, but they are massive packages doing some very complex statistics, and also have dependency trees that would need to be considered. Depending on your background, you might want to look into the rtracklayer/GenomicRanges eco-system. While I personally am not a fan, I…