Effect of difference in sample sizes on p-value

Effect of difference in sample sizes on p-value



I have created the following figure. enter image description here

I have a pretty significant p-value comparing the two groups. The two samples are basically just a list of numbers that are genome sizes. I am sure the high samples sizes has a role to play in how significant the p-value is. I have the following two questions –

  1. Should I perform any kind of multiple correction here? I would think not, since I am doing just one test
  2. Is it problematic that the two sample sizes are different. I know if one was 5 and the other 5000, that would not be a very powerful test, but in this case (or a case with similar numbers) would it lead to spurious p-values? If it would, I can take a random subsample from ‘source 2’ of 341 datapoints, to make them equal.

Thank you




updated 1 hour ago by


written 2 hours ago by



Read more here: Source link