Effect of difference in sample sizes on p-value
I have created the following figure.
I have a pretty significant p-value comparing the two groups. The two samples are basically just a list of numbers that are genome sizes. I am sure the high samples sizes has a role to play in how significant the p-value is. I have the following two questions –
- Should I perform any kind of multiple correction here? I would think not, since I am doing just one test
- Is it problematic that the two sample sizes are different. I know if one was 5 and the other 5000, that would not be a very powerful test, but in this case (or a case with similar numbers) would it lead to spurious p-values? If it would, I can take a random subsample from ‘source 2’ of 341 datapoints, to make them equal.
• 27 views
Read more here: Source link