Your commands would discard lines containing no
| character, and lines where the mouse gene identifier has no version number. I’m not certain this is intended, but it’s a side effect of using
sed -n with the
p flag on the
s command. I’m going to assume that this is unintended.
Just use two expressions with
sed -e 's/.*|//' -e 's/..*//' file >newfile
grep command that has the non-standard
-o option, and assuming that you just want to extract all Ensembl mouse gene stable IDs from the file (and that the file only contains stable IDs that you’d like to extract),
grep -o 'ENSMUSG[[:digit:]]*' file >newfile
You may also use two chained
cut commands, each one doing similar modifications of the data as the two
sed substitutions earlier in this answer. Using static cut would probably be quicker than using a regular expression, but I doubt you’d see any major speed differences unless your input data is huge.
cut -d '|' -f 2 file | cut -d '.' -f 1 >newfile
Read more here: Source link