Why do Illumina 850k/EPIC arrays ignore CpGs which are “GC” in the forward strand?

Why do Illumina 850k/EPIC arrays ignore CpGs which are “GC” in the forward strand?

0

CpGs are symmetrical, in that a CG sequence on the forward strand is hybridized to a GC — and both dinucleotides on each opposing strand are CpGs dinucleotides which can be methylated. Conversely, CpGs can be GC on the forward strand but CG on the reverse strand.

FORWARD -> 5'--CG--3'  [OR]  5'--GC--3' <- FORWARD
REVERSE -> 3'--GC--5'        3'--CG--5' <- REVERSE

The assignment of “forward” and “reverse” strandedness is more or less arbitrary.

Given the above, why does it seem like the Illumina 850k (aka EPIC) array only profiles methylation from CpGs which are CG in the forward strand, while ignoring CpGs which are GC in the forward strand? I would also love to hear if my premises are wrong.

suppressPackageStartupMessages({
  library(IlluminaHumanMethylationEPICanno.ilm10b4.hg19)
  library(tidyverse)})
data(IlluminaHumanMethylationEPICanno.ilm10b4.hg19)

IlluminaHumanMethylationEPICanno.ilm10b4.hg19 %>%
  getAnnotation() %>%
  as_tibble() %>%
  count(forward_seq=str_extract(Forward_Sequence, "\\[[ATCG]{2}\\]"))

# Results:
# # A tibble: 3 × 2
#   forward_seq      n
#   <chr>        <int>
# 1 [CA]          2922
# 2 [CG]        862927
# 3 [CT]            10


bioinformatics


microarray


annotation


illumina


genome

• 127 views

Read more here: Source link