So I’m trying to do some GO term enrichment analysis for some custom annotations, using the TopGO package in R.
I’m following section 4.3 of the user guide, found here.
The data needs to be in the following format (Note: file should have two, tab-delimited columns. The second of which should list the corresponding GO terms, separated by commas):
068724 GO:0005488, GO:0003774, GO:0001539, GO:0006935, GO:0009288 119608 GO:0005634, GO:0030528, GO:0006355, GO:0045449, GO:0003677, GO:0007275 049239 GO:0016787, GO:0017057, GO:0005975, GO:0005783, GO:0005792, GO:0004345, GO:0005788, GO:0047936, GO:0006098, GO:0005488, GO:0006006, GO:0055114, GO:0016491 067829 GO:0045926, GO:0016616, GO:0000287, GO:0030145, GO:0005739, GO:0000166, GO:0005575, GO:0006099, GO:0005524, GO:0008152, GO:0006102, GO:0005759, GO:0005975, GO:0004449, GO:0055114, GO:0016491
However, my data currently looks like this:
QBM89824.1 GO:0072659 QBM86167.1 GO:0070072 QBM87744.1 GO:0031307 QBM87744.1 GO:0045040 QBM87744.1 GO:0070096 QBM87389.1 GO:0000500 QBM87389.1 GO:0042790 QBM85935.1 GO:0035859 QBM85935.1 GO:0050790 QBM85935.1 GO:0005096 QBM85935.1 GO:0042819 QBM85935.1 GO:0032007
I’m having trouble transforming my data to look like the required format. There’s currently over 11k rows, so sorting it out manually isn’t an option. Does anyone know of any methods for doing this? I’m comfortable using Python but not so much with R
Thanks in advance!
Read more here: Source link