Subset row-entries according to a list

Subset row-entries according to a list

1

Hello!

I want to subset a selected dataset (a list of entries) from a big data file. I have a list named “contig.list” that looks like this:

Contig_339241_4
Contig_1004621_3
Contig_1666_1
Contig_836268_32
Contig_1479_10
Contig_640297_1
Contig_365838_1
..

I want to subset the entries of this list from a big table named “function.tax.ranks” that looks like this:

Contig_339241_4 Taxonomy
Contig_339241_41    Taxonomy
Contig_339241_47    Taxonomy
Contig_1004621_3    Taxonomy
Contig_1004621_30   Taxonomy
Contig_1004621_39   Taxonomy
Contig_1666_1   Taxonomy
Contig_836268_32    Taxonomy
Contig_1479_10  Taxonomy
Contig_1479_100 Taxonomy
Contig_1479_100 Taxonomy
Contig_1479_107 Taxonomy
Contig_640297_1 Taxonomy
Contig_365838_1 Taxonomy
Contig_365838_16    Taxonomy
Contig_365838_17    Taxonomy
..

The resulting output should be:

Contig_339241_4 Taxonomy
Contig_1004621_3    Taxonomy
Contig_1666_1   Taxonomy
Contig_836268_32    Taxonomy
Contig_1479_10  Taxonomy
Contig_640297_1 Taxonomy
Contig_365838_1 Taxonomy

I have tried

grep -f contig.list function.tax.ranks > contig_taxa.txt

But the problem is the subsetting doesn’t stop at the last digit, it extracts everything after that. For example, while my list has only “Contig_339241_4”, I am getting additional output from “Contig_339241_41” and “Contig_339241_47” (basically all entries from Contig_339241_4[0-9]). How can I fix it?

Thank you very much in advance!

Regards,
PSP


subset

• 27 views

Read more here: Source link