I’m trying to annotate some data from a Bisulphite experiment, from which I have a GRanges object without any annotation:
GRanges object with 872900 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] chr10 48196 *
[2] chr10 48486 *
[3] chr10 49247 *
[4] chr10 49258 *
[5] chr10 49595 *
... ... ... ...
[872896] chrY 26439351 *
[872897] chrY 56866730 *
[872898] chrY 56871726 *
[872899] chrY 56879863 *
[872900] chrY 56885734 *
-------
seqinfo: 24 sequences from an unspecified genome; no seqlengths
I’m pulling annotation data from TxDb.Hsapiens.UCSC.hg19.knownGene to annotate each position with promoters and genes.
Before using nearest
, findOverlaps
, … I thought to collect the information in a single object with sensible names for the other researchers.
But when I combine the resulting object there is an error that prevents me checking the output:
suppressPackageStartupMessages(library("TxDb.Hsapiens.UCSC.hg19.knownGene"))
genes <- genes(TxDb.Hsapiens.UCSC.hg19.knownGene, column = "gene_id",
single.strand.genes.only=FALSE)
genes <- sort(genes)
promoters <- promoters(TxDb.Hsapiens.UCSC.hg19.knownGene, column = "gene_id")
promoters <- sort(promoters)
library("org.Hs.eg.db")
#>
library("GenomicRanges")
s <- mapIds(org.Hs.eg.db, keys = keys(TxDb.Hsapiens.UCSC.hg19.knownGene),
keytype = "ENTREZID", column = c("SYMBOL"))
#> 'select()' returned 1:1 mapping between keys and columns
s2 <- sapply(names(genes), function(x, maping) {
if (length(x) == 0) {
NA
} else {
unique(maping[x])
}
}, maping = s)
mcols(genes)$symbols <- s2
s2 <- sapply(promoters$gene_id, function(x, maping) {
if (length(x) == 0) {
NA
} else {
unique(maping[x])
}
}, maping = s)
promoters$symbols <- s2
gp <- c(genes, promoters)
gp
#> GRangesList object of length 106419:
#> Error in h(simpleError(msg, call)): error in evaluating the argument 'object' in selecting a method for function 'show': error in evaluating the argument 'x' in selecting a method for function 'as.list': subscript is a NSBS object that is incompatible with the current
#> subsetting operation
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.2.2 (2022-10-31 ucrt)
#> os Windows 10 x64 (build 19045)
#> system x86_64, mingw32
#> ui RTerm
#> language (EN)
#> collate Spanish_Spain.utf8
#> ctype Spanish_Spain.utf8
#> tz Europe/Paris
#> date 2023-01-03
#> pandoc 2.19.2 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date (UTC) lib source
#> AnnotationDbi * 1.60.0 2022-11-01 [1] Bioconductor
#> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.2.2)
#> Biobase * 2.58.0 2022-11-01 [1] Bioconductor
#> BiocFileCache 2.6.0 2022-11-01 [1] Bioconductor
#> BiocGenerics * 0.44.0 2022-11-01 [1] Bioconductor
#> BiocIO 1.8.0 2022-11-01 [1] Bioconductor
#> BiocParallel 1.32.4 2022-12-01 [1] Bioconductor
#> biomaRt 2.54.0 2022-11-01 [1] Bioconductor
#> Biostrings 2.66.0 2022-11-01 [1] Bioconductor
#> bit 4.0.5 2022-11-15 [1] CRAN (R 4.2.2)
#> bit64 4.0.5 2020-08-30 [1] CRAN (R 4.2.2)
#> bitops 1.0-7 2021-04-24 [1] CRAN (R 4.2.0)
#> blob 1.2.3 2022-04-10 [1] CRAN (R 4.2.2)
#> cachem 1.0.6 2021-08-19 [1] CRAN (R 4.2.2)
#> cli 3.4.1 2022-09-23 [1] CRAN (R 4.2.2)
#> codetools 0.2-18 2020-11-04 [1] CRAN (R 4.2.2)
#> crayon 1.5.2 2022-09-29 [1] CRAN (R 4.2.2)
#> curl 4.3.3 2022-10-06 [1] CRAN (R 4.2.2)
#> DBI 1.1.3 2022-06-18 [1] CRAN (R 4.2.2)
#> dbplyr 2.2.1 2022-06-27 [1] CRAN (R 4.2.2)
#> DelayedArray 0.24.0 2022-11-01 [1] Bioconductor
#> digest 0.6.31 2022-12-11 [1] CRAN (R 4.2.2)
#> dplyr 1.0.10 2022-09-01 [1] CRAN (R 4.2.2)
#> ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.2.2)
#> evaluate 0.19 2022-12-13 [1] CRAN (R 4.2.2)
#> fansi 1.0.3 2022-03-24 [1] CRAN (R 4.2.2)
#> fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.2.2)
#> filelock 1.0.2 2018-10-05 [1] CRAN (R 4.2.2)
#> fs 1.5.2 2021-12-08 [1] CRAN (R 4.2.2)
#> generics 0.1.3 2022-07-05 [1] CRAN (R 4.2.2)
#> GenomeInfoDb * 1.34.4 2022-12-01 [1] Bioconductor
#> GenomeInfoDbData 1.2.9 2022-12-20 [1] Bioconductor
#> GenomicAlignments 1.34.0 2022-11-01 [1] Bioconductor
#> GenomicFeatures * 1.50.3 2022-12-12 [1] Bioconductor
#> GenomicRanges * 1.50.2 2022-12-16 [1] Bioconductor
#> glue 1.6.2 2022-02-24 [1] CRAN (R 4.2.2)
#> highr 0.10 2022-12-22 [1] CRAN (R 4.2.2)
#> hms 1.1.2 2022-08-19 [1] CRAN (R 4.2.2)
#> htmltools 0.5.4 2022-12-07 [1] CRAN (R 4.2.2)
#> httr 1.4.4 2022-08-17 [1] CRAN (R 4.2.2)
#> IRanges * 2.32.0 2022-11-01 [1] Bioconductor
#> KEGGREST 1.38.0 2022-11-01 [1] Bioconductor
#> knitr 1.41 2022-11-18 [1] CRAN (R 4.2.2)
#> lattice 0.20-45 2021-09-22 [1] CRAN (R 4.2.2)
#> lifecycle 1.0.3 2022-10-07 [1] CRAN (R 4.2.2)
#> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.2.2)
#> Matrix 1.5-3 2022-11-11 [1] CRAN (R 4.2.2)
#> MatrixGenerics 1.10.0 2022-11-01 [1] Bioconductor
#> matrixStats 0.63.0 2022-11-18 [1] CRAN (R 4.2.2)
#> memoise 2.0.1 2021-11-26 [1] CRAN (R 4.2.2)
#> org.Hs.eg.db * 3.16.0 2022-12-20 [1] Bioconductor
#> pillar 1.8.1 2022-08-19 [1] CRAN (R 4.2.2)
#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.2.2)
#> png 0.1-8 2022-11-29 [1] CRAN (R 4.2.2)
#> prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.2.2)
#> progress 1.2.2 2019-05-16 [1] CRAN (R 4.2.2)
#> purrr 0.3.5 2022-10-06 [1] CRAN (R 4.2.2)
#> R.cache 0.16.0 2022-07-21 [1] CRAN (R 4.2.2)
#> R.methodsS3 1.8.2 2022-06-13 [1] CRAN (R 4.2.0)
#> R.oo 1.25.0 2022-06-12 [1] CRAN (R 4.2.0)
#> R.utils 2.12.2 2022-11-11 [1] CRAN (R 4.2.2)
#> R6 2.5.1 2021-08-19 [1] CRAN (R 4.2.2)
#> rappdirs 0.3.3 2021-01-31 [1] CRAN (R 4.2.2)
#> Rcpp 1.0.9 2022-07-08 [1] CRAN (R 4.2.2)
#> RCurl 1.98-1.9 2022-10-03 [1] CRAN (R 4.2.2)
#> reprex 2.0.2 2022-08-17 [1] CRAN (R 4.2.2)
#> restfulr 0.0.15 2022-06-16 [1] CRAN (R 4.2.2)
#> rjson 0.2.21 2022-01-09 [1] CRAN (R 4.2.0)
#> rlang 1.0.6 2022-09-24 [1] CRAN (R 4.2.2)
#> rmarkdown 2.19 2022-12-15 [1] CRAN (R 4.2.2)
#> Rsamtools 2.14.0 2022-11-01 [1] Bioconductor
#> RSQLite 2.2.19 2022-11-24 [1] CRAN (R 4.2.2)
#> rstudioapi 0.14 2022-08-22 [1] CRAN (R 4.2.2)
#> rtracklayer 1.58.0 2022-11-01 [1] Bioconductor
#> S4Vectors * 0.36.1 2022-12-05 [1] Bioconductor
#> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.2.2)
#> stringi 1.7.8 2022-07-11 [1] CRAN (R 4.2.1)
#> stringr 1.5.0 2022-12-02 [1] CRAN (R 4.2.2)
#> styler 1.8.1 2022-11-07 [1] CRAN (R 4.2.2)
#> SummarizedExperiment 1.28.0 2022-11-01 [1] Bioconductor
#> tibble 3.1.8 2022-07-22 [1] CRAN (R 4.2.2)
#> tidyselect 1.2.0 2022-10-10 [1] CRAN (R 4.2.2)
#> TxDb.Hsapiens.UCSC.hg19.knownGene * 3.2.2 2022-12-20 [1] Bioconductor
#> utf8 1.2.2 2021-07-24 [1] CRAN (R 4.2.2)
#> vctrs 0.5.1 2022-11-16 [1] CRAN (R 4.2.2)
#> withr 2.5.0 2022-03-03 [1] CRAN (R 4.2.2)
#> xfun 0.35 2022-11-16 [1] CRAN (R 4.2.2)
#> XML 3.99-0.13 2022-12-04 [1] CRAN (R 4.2.2)
#> xml2 1.3.3 2021-11-30 [1] CRAN (R 4.2.2)
#> XVector 0.38.0 2022-11-01 [1] Bioconductor
#> yaml 2.3.6 2022-10-18 [1] CRAN (R 4.2.2)
#> zlibbioc 1.44.0 2022-11-01 [1] Bioconductor
#>
#> [1] C:/Users/lrevilla/AppData/Local/Programs/R/R-4.2.2/library
#>
#> ──────────────────────────────────────────────────────────────────────────────
Created on 2023-01-03 with reprex v2.0.2
PS: I would appreciate any suggestion that helped me annotate and existing GRanges with annotation data.
I am probably missing something as I haven’t found HOWTOs or vignettes documenting how to do this.
Read more here: Source link