Download all cases from TCGAbiolinks

Hi all, I would like to download the bulk RNA-seq data for all patients in the TCGA-LUAD cohort using TCGAbiolinks. Does this exist as a single matrix?

I have read the package vignette and can download individual cases however does TCGAbiolinks facilitate downloading a single matrix of all the patients?

I ask because if you download similar data from Xena browser you can download a 585 column matrix.

I tried this with TCGAbiolinks:

test<-GDCquery(project="TCGA-LUAD", data.category = 'Gene expression', data.type="Gene expression quantification", platform = "Illumina HiSeq", file.type="results", legacy = TRUE)
dim(getResults(test))

This results in 600 files.

I tried the code below to see if one file was much bigger than the others but it appears not, hence all 600 files are separate cases:

getResults(test) %>% arrange(desc(file_size)) %>% head(10)

Finally I interrogated the duplicated cases and while some cases have a file for both cancer and normal tissue (this is OK), other patients have 2 or 3 files all for cancer tissue. Which file should I choose?!

dups_index <- which(duplicated(getResults(test)[,"cases.submitter_id"]))
dups <- getResults(test)[,"cases.submitter_id"][dups_index]

for(i in 1:length(dups)){
    print(i)
    print(getResults(test) %>% filter(cases.submitter_id == dups[i]) %>% select(sample_type))
}

Any help appreciated, thanks in advance

Read more here: Source link