How can I extract data from the expression matrix txt extension?
I want to analyze this data (GSE60361) with the seurat package and also extract the gene expression matrix (including cells in rows and genes in columns). Is it possible? What other tools are there to obtain the expression matrix of this data model (GSE60361)(GSE75688)?
• 28 views
There are files in the supplement you can use:
library(data.table) counts <- data.table::fread("https://ftp.ncbi.nlm.nih.gov/geo/series/GSE60nnn/GSE60361/suppl/GSE60361_C1-3005-Expression.txt.gz")
The gene names are in the first column, but when you try to set them as rownames R will complain about non-unique values such as “Mar-1”, which means that the authors use Excel (/facepalm) to manipulate or create this file. So you have to first fix the corrupted gene names, e.g. These gene symbol are from where? What source? and then move the first column to rownames.
There is another file, this time in Excel itself (/facepalm2), no clue what it is, maybe you can find out, seems like some kind of per-cell annotations:
library(readxl) download.file("https://ftp.ncbi.nlm.nih.gov/geo/series/GSE60nnn/GSE60361/suppl/GSE60361_spikes_annotation_and_abundance.xlsx", "GSE60361_spikes_annotation_and_abundance.xlsx") other <- readxl::read_xlsx("GSE60361_spikes_annotation_and_abundance.xlsx")
Seurat documentation will tell how to read a matrix into a Seurat object.
Traffic: 1073 users visited in the last hour
Read more here: Source link