Help with PCA plot from SNPRelate

Hello, BioStars community! I am working on a datase of 50 samples(between cases and controls) with genotyping data for ~200 SNPs. I ran SNPRelate PCA analysis without adding population data and tried to plot its results. My PCA plot seems a bit “strange” since the observations did not come together to form “groups” PCA PLOT
Also, I plotted principal component pairs for the first four PCs, but the dots did not appear on my figure PCA PLOT 2.

The commands I used for this analysis are down here. Would anyone help me to understand my output figures? What can one conclude from such a “non-grouped” PCA plot in terms of population stratification?

SNP pruning based on LD:

  set.seed(1000)
 snpset <- snpgdsLDpruning(genofile, autosome.only=FALSE, ld.threshold=0.2)

Chromosome 1: 45.16%, 14/31

Chromosome 2: 53.57%, 15/28

Chromosome 3: 73.33%, 11/15

 Chromosome 4: 62.50%, 5/8 

Chromosome 5: 58.33%, 7/12 

Chromosome 6:31.11%, 14/45 

Chromosome 7: 63.64%, 7/11 

Chromosome 8: 62.50%, 5/8

Chromosome 9: 50.00%, 1/2 

Chromosome 10: 26.67%, 4/15 

Chromosome 11:66.67%, 6/9 

Chromosome 12: 60.00%, 6/10 

Chromosome 14: 37.50%, 3/8 

Chromosome 15: 66.67%, 2/3 

Chromosome 16: 50.00%, 9/18 

Chromosome 17:100.00%, 4/4 

Chromosome 18: 21.43%, 3/14 

Chromosome 19: 66.67%, 4/6 

Chromosome 20: 40.00%, 2/5 

Chromosome 21: 100.00%, 1/1 

Chromosome 22: 25.00%, 3/12 

Chromosome X: 28.57%, 2/7 

128 markers are selected in total.

> str(snpset)

List of 22

 $ chr1 : int [1:14] 1 5 8 10 11 12 16 18 19 20 ...

 $ chr2 : int [1:15] 32 34 35 38 40 41 43 44 48 49 ...

 $ chr3 : int [1:11] 60 64 66 67 68 69 70 71 72 73 ...

 $ chr4 : int [1:5] 76 78 79 80 81

 $ chr5 : int [1:7] 83 85 87 88 90 92 94

 $ chr6 : int [1:14] 98 99 101 105 110 118 131 132 133 134 ...

 $ chr7 : int [1:7] 141 142 143 144 146 147 150

 $ chr8 : int [1:5] 152 153 154 155 156

 $ chr9 : int 160

 $ chr10: int [1:4] 161 166 167 172

 $ chr11: int [1:6] 177 179 180 181 182 184

 $ chr12: int [1:6] 187 189 190 192 193 194

 $ chr14: int [1:3] 195 196 199

 $ chr15: int [1:2] 203 205

 $ chr16: int [1:9] 206 208 211 214 215 220 221 222 223

 $ chr17: int [1:4] 224 225 226 227

 $ chr18: int [1:3] 228 238 241

 $ chr19: int [1:4] 242 244 245 246

 $ chr20: int [1:2] 249 251

 $ chr21: int 253

 $ chr22: int [1:3] 255 263 264

 $ chrX : int [1:2] 266 271

**#PCA ANALYSIS**

> pca <- snpgdsPCA(genofile, autosome.only=FALSE, snp.id=snpset.id, num.thread=2)


Principal Component Analysis (PCA) on genotypes:

Excluding 0 SNP (monomorphic: TRUE, MAF: NaN, missing rate: NaN)

    # of samples: 50

    # of SNPs: 128

    using 2 threads

    # of principal components: 32

PCA:    the sum of all selected genotypes (0,1,2) = 2377

CPU capabilities: Double-Precision SSE2

Fri May 27 17:39:12 2022    (internal increment: 7124)

[==================================================] 100%, completed, 0s  
Fri May 27 17:39:12 2022    Begin (eigenvalues and eigenvectors)
Fri May 27 17:39:12 2022    Done.

> pc.percent <- pca$varprop*100

> head(round(pc.percent, 2))

[1] 6.96 5.65 5.56 5.19 4.94 4.63

> tab <- data.frame(sample.id = pca$sample.id,
   EV1 = pca$eigenvect[,1],    # the first eigenvector
   EV2 = pca$eigenvect[,2],    # the second eigenvector
   stringsAsFactors = FALSE)

> head(tab)

  sample.id         EV1          EV2

1      Z100 -0.15640307  0.033644598

2      Z110  0.05745959  0.104254360

3      Z112 -0.11022784  0.007529606

4      Z113  0.17145391 -0.259216687

5      Z114 -0.14967317  0.056786244

6      Z125 -0.04177276  0.197420240

#PLOTTING PCA 

> plot(tab$EV2, tab$EV1, xlab="eigenvector 2", ylab="eigenvector 1")

**#Plot the principal component pairs for the first four PCs:**

> lbls <- paste("PC", 1:4, "\n", format(pc.percent[1:4], digits=2), "%", sep="")
> pairs(pca$eigenvect[,1:4], col=tab$pop, labels=lbls)

Read more here: Source link