Hi, I just wanted to introduce you to one of my packages called `fuc` ([GitHub](https://github.com/sbslee/fuc)). It has a submodule called `pyvcf` ([API](https://sbslee-fuc.readthedocs.io/en/latest/api.html#module-fuc.api.pyvcf)) which is designed for working with VCF files. It implements `pyvcf.VcfFrame` which stores VCF data as `pandas.DataFrame` to allow fast computation and easy manipulation. One of my main goals for writing `pyvcf` was to create a tool that can make VCF more friendly to ML analyses and data mining. So please check it out if you’re interested!
Below is an example usage of `pyvcf` to detect/visualize structural variation from VCF.
![enter image description here]
from fuc import pyvcf, common
import matplotlib.pyplot as plt
import seaborn as sns
cyp2d6_starts = [42522500,42522852,42523448,42523843,42524175,42524785,42525034,42525739,42526613]
cyp2d6_ends = [42522754,42522994,42523636,42523985,42524352,42524946,42525187,42525911,42526883]
cyp2d7_starts = [42536213,42536565,42537161,42537543,42537877,42538479,42538728,42539410,42540284]
cyp2d7_ends = [42536467,42536707,42537349,42537685,42538054,42538640,42538881,42539582,42540576]
vf = pyvcf.VcfFrame.from_file(vcf_file)
fig, axes = plt.subplots(2, 3, figsize=(18, 12))
[[ax1, ax2, ax3], [ax4, ax5, ax6]] = axes
vf.plot_region(‘NA18973’, ax=ax1, color=”tab:green”)
vf.plot_region(‘HG00276’, ax=ax2, color=’t …
Read more here: Source link