I am not sure why my boxplot created with ggplot geom_boxplot is showing an incorrect upper whisker and showing the data point (value = 7) as an outlier for “Male” grouping red boxplot.
I have analyzed the same data sheet in SPSS and confirmed the Q1, Q3, and other values shown on the graph match. The SPSS stem and leaf plot also shows only 1 outlier (the value = 8 shown on the graph). According to the graph and calculations from the data made in SPSS, I’ve calculated the following based on the male grouping red boxplot:
Lower hinge = bottom of box = 25th percentile
= 1.00
Upper hinge = top of box = 75th percentile
= 3.75
R documentation for geom_boxplot states that the upper whisker extends from the hinge to the highest value that is within 1.5 * IQR of the hinge. Data beyond the end of the whiskers are outliers and plotted as points (as specified by Tukey) R documentation
IQR = distance b/w 25th and 75th percentile = 3.75 - 1.00 = 2.75
IQR = 1.5 * 2.75 = 4.125
Upper hinge + IQR = 3.75 + 4.125 = 7.875
Upper whisker should extend to 7.875 and the outlier value currently shown that is equal to 7 should be within this whisker. Why is this not displayed correctly? Am I missing something here? Thank you!
Read more here: Source link