Data Manipulation
Hello everyone,
I have an excel file which has three columns. In the first column there are drug names (with repetition); In the second column there are cell line names in which these drugs are tested on. The same drug names as well cell line names can be found in multiple rows. In the third column I have the AUC value. The thing is that each drug has been tested on different nr of cell lines. I want to format this file so that i have a final matrix in which :
Drug names should be first column (one row to each drug), cell line names should be on the first row and the cells should be filled with AUC values. In those cells where a drug hasn’t been tested on that specific cell line I want “NA” values.
Thank you in advance!
• 79 views
df <- data.frame(
'drug'=rep(paste0('drug',seq(5)),5),
'cell'=rep(paste0('cell',seq(5)),each=5)
)
df$auc <- rbinom(25, 100, 0.5)
> head(df)
drug cell auc
1 drug1 cell1 50
2 drug2 cell1 43
3 drug3 cell1 53
4 drug4 cell1 50
5 drug5 cell1 45
6 drug1 cell2 48
library(tidyr)
output <- df %>% spread(cell,auc)
output
drug cell1 cell2 cell3 cell4 cell5
1 drug1 50 48 53 51 47
2 drug2 43 42 55 53 52
3 drug3 53 42 46 46 50
4 drug4 50 54 51 54 53
5 drug5 45 41 46 47 49
Traffic: 2063 users visited in the last hour
Read more here: Source link