Data Manipulation

Data Manipulation

2

Hello everyone,

I have an excel file which has three columns. In the first column there are drug names (with repetition); In the second column there are cell line names in which these drugs are tested on. The same drug names as well cell line names can be found in multiple rows. In the third column I have the AUC value. The thing is that each drug has been tested on different nr of cell lines. I want to format this file so that i have a final matrix in which :

Drug names should be first column (one row to each drug), cell line names should be on the first row and the cells should be filled with AUC values. In those cells where a drug hasn’t been tested on that specific cell line I want “NA” values.

Thank you in advance!


Excel


Analysis


R


Function


Data

• 79 views

df <- data.frame(
'drug'=rep(paste0('drug',seq(5)),5),
'cell'=rep(paste0('cell',seq(5)),each=5)
)
df$auc <- rbinom(25, 100, 0.5)

> head(df)
   drug  cell auc
1 drug1 cell1  50
2 drug2 cell1  43
3 drug3 cell1  53
4 drug4 cell1  50
5 drug5 cell1  45
6 drug1 cell2  48

library(tidyr)
output <- df %>% spread(cell,auc)
output

   drug cell1 cell2 cell3 cell4 cell5
1 drug1    50    48    53    51    47
2 drug2    43    42    55    53    52
3 drug3    53    42    46    46    50
4 drug4    50    54    51    54    53
5 drug5    45    41    46    47    49


Login
before adding your answer.

Traffic: 2063 users visited in the last hour

Read more here: Source link

Tagged