Extract multiple times a fasta sequence from a list by name

Hi everybody!

I have uploaded on R a list of 9K fasta sequences, on which 40K SNPs map to – which means, some sequence host 1+ SNP.

I have a R object (and a vcf as well) with the fasta sequences names and the SNP positions and I want to exploit it to get 40K fasta sequences in a separare object – one fasta for each SNP, with redundant sequences for those hosting more SNP.

Later on, I’m sto substitute the SNP position with an aNy, but so far I’m struck with the first step: getting a fasta per SNP.

Someone has any clue on how to do it?

    MyFasta<-read.fasta("Contigs.fa")
    MyVcf<-read.vcfR("Samples.recode.vcf")

    head(MyFasta)
$NODE_1
    [1] "the whole sequence, nt by nt"
    attr(,"name")
    [1] "NODE_1"
    attr(,"Annot")
    [1] ">NODE_1"
    attr(,"class")
    [1] "SeqFastadna"

$NODE_2
    [1] "the whole sequence, nt by nt"
    attr(,"name")
    [1] "NODE_2"
    attr(,"Annot")
    [1] ">NODE_2"
    attr(,"class")
    [1] "SeqFastadna"

 $NODE_3
    [1] "the whole sequence, nt by nt"
    attr(,"name")
    [1] "NODE_3"
    attr(,"Annot")
    [1] ">NODE_3"
    attr(,"class")
    [1] "SeqFastadna"

[…]

head(MyVcf)
[...]
[1] "***** Fixed section *****"
      CHROM     POS   ID REF ALT QUAL FILTER
[1,] "NODE_1" "225" NA "C" "T" NA   "PASS"
[2,] "NODE_2" "155" NA "T" "G" NA   "PASS"
[3,] "NODE_2" "194" NA "A" "C" NA   "PASS"

[…]

[5,] "NODE_3" "285" NA "C" "G" NA   "PASS"

Read more here: Source link