Why python code, specifically? Unless you want to practice your programming skills there are good tools to do that out there. Also, do you want to remove all gaps (un-align) or remove a certain portion of gaps (e.g. columns with > 50% gaps) or uninformative columns? Still it is nice to have all the options.
- Jalview (grapahical interface, Edit -> remove all gaps)
- trimAL
trimal -nogaps
ortrimal -noallgaps
should work either way (can be installed via conda), it can also clip your sequence identifiers into a shorter compatible format. Some older phylogenetic software (phylip and thereby prottest3 – max. 10 characters sequence id, mrbayes, no length restriction, but sub-string 1:15 must uniquely indentify sequence) is darn picky about these, and it looks like you might run into problems with your identifiers. I have a perl-script though, that also attempts to keep the identifiers unique and readable, let me know if you need that too. sed '/^[^>]/s/-//g' input_file
should also do as a quick command-line hack without any installation, however that will leave you with unequal length fasta lines which most tools are completely fine with, or pipe the output through EMBOSSseqret
to fix the output
Traffic: 2740 users visited in the last hour
Read more here: Source link