How to determine the exact version of hg38 if I have only the FASTA file
I have a FASTA file which contains hg38 assembly. It contains the primary contigs, alt contigs, decoy, HLA, mito.
How do I determine the exact version of hg38 based on the FASTA?
Here some of the headers:
>chr1 AC:CM000663.2 gi:568336023 LN:248956422 rl:Chromosome M5:6aef897c3d6ff0c78aff06ac189178dd AS:GRCh38
>chr2 AC:CM000664.2 gi:568336022 LN:242193529 rl:Chromosome M5:f98db672eb0993dcfdabafe2a882905c AS:GRCh38
>chr3 AC:CM000665.2 gi:568336021 LN:198295559 rl:Chromosome M5:76635a41ea913a405ded820447d067b0 AS:GRCh38
>chr4 AC:CM000666.2 gi:568336020 LN:190214555 rl:Chromosome M5:3210fecf1eb92d5489da4346b3fddc6e AS:GRCh38
>chr5 AC:CM000667.2 gi:568336019 LN:181538259 rl:Chromosome M5:a811b3dc9fe66af729dc0dddf7fa4f13 AS:GRCh38 hm:47309185-49591369
>chr6 AC:CM000668.2 gi:568336018 LN:170805979 rl:Chromosome M5:5691468a67c7e7a7b5f2a3a683792c29 AS:GRCh38
>chr7 AC:CM000669.2 gi:568336017 LN:159345973 rl:Chromosome M5:cc044cc2256a1141212660fb07b6171e AS:GRCh38
>chr8 AC:CM000670.2 gi:568336016 LN:145138636 rl:Chromosome M5:c67955b5f7815a9a1edfaa15893d3616 AS:GRCh38
>chr9 AC:CM000671.2 gi:568336015 LN:138394717 rl:Chromosome M5:6c198acf68b5af7b9d676dfdd531b5de AS:GRCh38
>chr10 AC:CM000672.2 gi:568336014 LN:133797422 rl:Chromosome M5:c0eeee7acfdaf31b770a509bdaa6e51a AS:GRCh38
>chr11 AC:CM000673.2 gi:568336013 LN:135086622 rl:Chromosome M5:1511375dc2dd1b633af8cf439ae90cec AS:GRCh38
>chr12 AC:CM000674.2 gi:568336012 LN:133275309 rl:Chromosome M5:96e414eace405d8c27a6d35ba19df56f AS:GRCh38
>chr13 AC:CM000675.2 gi:568336011 LN:114364328 rl:Chromosome M5:a5437debe2ef9c9ef8f3ea2874ae1d82 AS:GRCh38
>chr14 AC:CM000676.2 gi:568336010 LN:107043718 rl:Chromosome M5:e0f0eecc3bcab6178c62b6211565c807 AS:GRCh38 hm:multiple
>chr15 AC:CM000677.2 gi:568336009 LN:101991189 rl:Chromosome M5:f036bd11158407596ca6bf3581454706 AS:GRCh38
>chr16 AC:CM000678.2 gi:568336008 LN:90338345 rl:Chromosome M5:db2d37c8b7d019caaf2dd64ba3a6f33a AS:GRCh38
>chr17 AC:CM000679.2 gi:568336007 LN:83257441 rl:Chromosome M5:f9a0fb01553adb183568e3eb9d8626db AS:GRCh38
>chr18 AC:CM000680.2 gi:568336006 LN:80373285 rl:Chromosome M5:11eeaa801f6b0e2e36a1138616b8ee9a AS:GRCh38
>chr19 AC:CM000681.2 gi:568336005 LN:58617616 rl:Chromosome M5:85f9f4fc152c58cb7913c06d6b98573a AS:GRCh38 hm:multiple
>chr20 AC:CM000682.2 gi:568336004 LN:64444167 rl:Chromosome M5:b18e6c531b0bd70e949a7fc20859cb01 AS:GRCh38
>chr21 AC:CM000683.2 gi:568336003 LN:46709983 rl:Chromosome M5:974dc7aec0b755b19f031418fdedf293 AS:GRCh38 hm:multiple
>chr22 AC:CM000684.2 gi:568336002 LN:50818468 rl:Chromosome M5:ac37ec46683600f808cdd41eac1d55cd AS:GRCh38 hm:multiple
>chrX AC:CM000685.2 gi:568336001 LN:156040895 rl:Chromosome M5:2b3a55ff7f58eb308420c8a9b11cac50 AS:GRCh38
>chrY AC:CM000686.2 gi:568336000 LN:57227415 rl:Chromosome M5:ce3e31103314a704255f3cd90369ecce AS:GRCh38 hm:10001-2781479,56887903-57217415
>chrM AC:J01415.2 gi:113200490 LN:16569 rl:Mitochondrion M5:c68f52674c9fb33aef52dcf399755519 AS:GRCh38 tp:circular
>chr1_KI270706v1_random AC:KI270706.1 gi:568335410 LN:175055 rg:chr1 rl:unlocalized M5:62def1a794b3e18192863d187af956e6 AS:GRCh38
>chr1_KI270707v1_random AC:KI270707.1 gi:568335409 LN:32032 rg:chr1 rl:unlocalized M5:78135804eb15220565483b7cdd02f3be AS:GRCh38
>chr1_KI270708v1_random AC:KI270708.1 gi:568335408 LN:127682 rg:chr1 rl:unlocalized M5:1e95e047b98ed92148dd84d6c037158c AS:GRCh38
>chr1_KI270709v1_random AC:KI270709.1 gi:568335407 LN:66860 rg:chr1 rl:unlocalized M5:4e2db2933ea96aee8dab54af60ecb37d AS:GRCh38
>chr1_KI270710v1_random AC:KI270710.1 gi:568335406 LN:40176 rg:chr1 rl:unlocalized M5:9949f776680c6214512ee738ac5da289 AS:GRCh38
>chr1_KI270711v1_random AC:KI270711.1 gi:568335405 LN:42210 rg:chr1 rl:unlocalized M5:af383f98cf4492c1f1c4e750c26cbb40 AS:GRCh38
>chr1_KI270712v1_random AC:KI270712.1 gi:568335404 LN:176043 rg:chr1 rl:unlocalized M5:c38a0fecae6a1838a405406f724d6838 AS:GRCh38
>chr1_KI270713v1_random AC:KI270713.1 gi:568335403 LN:40745 rg:chr1 rl:unlocalized M5:cb78d48cc0adbc58822a1c6fe89e3569 AS:GRCh38
>chr1_KI270714v1_random AC:KI270714.1 gi:568335402 LN:41717 rg:chr1 rl:unlocalized M5:42f7a452b8b769d051ad738ee9f00631 AS:GRCh38
>chr2_KI270715v1_random AC:KI270715.1 gi:568335401 LN:161471 rg:chr2 rl:unlocalized M5:b65a8af1d7bbb7f3c77eea85423452bb AS:GRCh38
>chr2_KI270716v1_random AC:KI270716.1 gi:568335400 LN:153799 rg:chr2 rl:unlocalized M5:2828e63b8edc5e845bf48e75fbad2926 AS:GRCh38
>chr3_GL000221v1_random AC:GL000221.1 gi:224183270 LN:155397 rg:chr3 rl:unlocalized M5:3238fb74ea87ae857f9c7508d315babb AS:GRCh38
>chr4_GL000008v2_random AC:GL000008.2 gi:568335399 LN:209709 rg:chr4 rl:unlocalized M5:a999388c587908f80406444cebe80ba3 AS:GRCh38
>chr5_GL000208v1_random AC:GL000208.1 gi:224183050 LN:92689 rg:chr5 rl:unlocalized M5:aa81be49bf3fe63a79bdc6a6f279abf6 AS:GRCh38
>chr9_KI270717v1_random AC:KI270717.1 gi:568335398 LN:40062 rg:chr9 rl:unlocalized M5:796773a1ee67c988b4de887addbed9e7 AS:GRCh38
>chr9_KI270718v1_random AC:KI270718.1 gi:568335397 LN:38054 rg:chr9 rl:unlocalized M5:b0c463c8efa8d64442b48e936368dad5 AS:GRCh38
>chr9_KI270719v1_random AC:KI270719.1 gi:568335396 LN:176845 rg:chr9 rl:unlocalized M5:cd5e932cfc4c74d05bb64e2126873a3a AS:GRCh38
>chr9_KI270720v1_random AC:KI270720.1 gi:568335395 LN:39050 rg:chr9 rl:unlocalized M5:8c2683400a4aeeb40abff96652b9b127 AS:GRCh38
>chr11_KI270721v1_random AC:KI270721.1 gi:568335394 LN:100316 rg:chr11 rl:unlocalized M5:9654b5d3f36845bb9d19a6dbd15d2f22 AS:GRCh38
>chr14_GL000009v2_random AC:GL000009.2 gi:568335393 LN:201709 rg:chr14 rl:unlocalized M5:862f555045546733591ff7ab15bcecbe AS:GRCh38
>chr14_GL000225v1_random AC:GL000225.1 gi:224183274 LN:211173 rg:chr14 rl:unlocalized M5:63945c3e6962f28ffd469719a747e73c AS:GRCh38
>chr14_KI270722v1_random AC:KI270722.1 gi:568335392 LN:194050 rg:chr14 rl:unlocalized M5:51f46c9093929e6edc3b4dfd50d803fc AS:GRCh38
>chr14_GL000194v1_random AC:GL000194.1 gi:224183213 LN:191469 rg:chr14 rl:unlocalized M5:6ac8f815bf8e845bb3031b73f812c012 AS:GRCh38
>chr14_KI270723v1_random AC:KI270723.1 gi:568335391 LN:38115 rg:chr14 rl:unlocalized M5:74a4b480675592095fb0c577c515b5df AS:GRCh38
>chr14_KI270724v1_random AC:KI270724.1 gi:568335390 LN:39555 rg:chr14 rl:unlocalized M5:c3fcb15dddf45f91ef7d94e2623ce13b AS:GRCh38
>chr14_KI270725v1_random AC:KI270725.1 gi:568335389 LN:172810 rg:chr14 rl:unlocalized M5:edc6402e58396b90b8738a5e37bf773d AS:GRCh38
>chr14_KI270726v1_random AC:KI270726.1 gi:568335388 LN:43739 rg:chr14 rl:unlocalized M5:fbe54a3197e2b469ccb2f4b161cfbe86 AS:GRCh38
>chr15_KI270727v1_random AC:KI270727.1 gi:568335387 LN:448248 rg:chr15 rl:unlocalized M5:84fe18a7bf03f3b7fc76cbac8eb583f1 AS:GRCh38
>chr16_KI270728v1_random AC:KI270728.1 gi:568335386 LN:1872759 rg:chr16 rl:unlocalized M5:369ff74cf36683b3066a2ca929d9c40d AS:GRCh38
>chr17_GL000205v2_random AC:GL000205.2 gi:568335385 LN:185591 rg:chr17 rl:unlocalized M5:458e71cd53dd1df4083dc7983a6c82c4 AS:GRCh38
>chr17_KI270729v1_random AC:KI270729.1 gi:568335384 LN:280839 rg:chr17 rl:unlocalized M5:2756f6ee4f5780acce31e995443508b6 AS:GRCh38
>chr17_KI270730v1_random AC:KI270730.1 gi:568335383 LN:112551 rg:chr17 rl:unlocalized M5:48f98ede8e28a06d241ab2e946c15e07 AS:GRCh38
>chr22_KI270731v1_random AC:KI270731.1 gi:568335382 LN:150754 rg:chr22 rl:unlocalized M5:8176d9a20401e8d9f01b7ca8b51d9c08 AS:GRCh38
>chr22_KI270732v1_random AC:KI270732.1 gi:568335381 LN:41543 rg:chr22 rl:unlocalized M5:d837bab5e416450df6e1038ae6cd0817 AS:GRCh38
>chr22_KI270733v1_random AC:KI270733.1 gi:568335380 LN:179772 rg:chr22 rl:unlocalized M5:f1fa05d48bb0c1f87237a28b66f0be0b AS:GRCh38
>chr22_KI270734v1_random AC:KI270734.1 gi:568335379 LN:165050 rg:chr22 rl:unlocalized M5:1d17410ae2569c758e6dd51616412d32 AS:GRCh38
>chr22_KI270735v1_random AC:KI270735.1 gi:568335378 LN:42811 rg:chr22 rl:unlocalized M5:eb6b07b73dd9a47252098ed3d9fb78b8 AS:GRCh38
>chr22_KI270736v1_random AC:KI270736.1 gi:568335377 LN:181920 rg:chr22 rl:unlocalized M5:2ff189f33cfa52f321accddf648c5616 AS:GRCh38
>chr22_KI270737v1_random AC:KI270737.1 gi:568335376 LN:103838 rg:chr22 rl:unlocalized M5:2ea8bc113a8193d1d700b584b2c5f42a AS:GRCh38
>chr22_KI270738v1_random AC:KI270738.1 gi:568335375 LN:99375 rg:chr22 rl:unlocalized M5:854ec525c7b6a79e7268f515b6a9877c AS:GRCh38
>chr22_KI270739v1_random AC:KI270739.1 gi:568335374 LN:73985 rg:chr22 rl:unlocalized M5:760fbd73515fedcc9f37737c4a722d6a AS:GRCh38
>chrY_KI270740v1_random AC:KI270740.1 gi:568335373 LN:37240 rg:chrY rl:unlocalized M5:69e42252aead509bf56f1ea6fda91405 AS:GRCh38
>chrUn_KI270302v1 AC:KI270302.1 gi:568335372 LN:2274 rl:unplaced M5:ee6dff38036f7d03478c70717643196e AS:GRCh38
>chrUn_KI270304v1 AC:KI270304.1 gi:568335371 LN:2165 rl:unplaced M5:9423c1b46a48aa6331a77ab5c702ac9d AS:GRCh38
>chrUn_KI270303v1 AC:KI270303.1 gi:568335370 LN:1942 rl:unplaced M5:2cb746c78e0faa11e628603a4bc9bd58 AS:GRCh38
• 40 views
You could use blastn to align your fasta reference against the other common fasta references and find out which one you have:
blast -query file1.fasta -subject file2.fasta -outfmt 6 -out results.txt
If there are no gaps between each two chr sequences aligned in the results file you found the correct reference. 🙂
Another quick and dirty method would be to check the number of contigs you have, the name of the contigs, and the size of the sequences as well (Ex. chrM, chrY and chrX).
Traffic: 3221 users visited in the last hour
Read more here: Source link