Effect of difference in sample sizes on p-value

Hello,

I have created the following figure.

I have a pretty significant p-value comparing the two groups. The two samples are basically just a list of numbers that are genome sizes. I am sure the high samples sizes has a role to play in how significant the p-value is. I have the following two questions –

- Should I perform any kind of multiple correction here? I would think not, since I am doing just one test
- Is it problematic that the two sample sizes are different. I know if one was 5 and the other 5000, that would not be a very powerful test, but in this case (or a case with similar numbers) would it lead to spurious p-values? If it would, I can take a random subsample from ‘source 2’ of 341 datapoints, to make them equal.

Thank you

• 27 views

Read more here: Source link