boxplot – Ggplot geom_boxplot showing incorrect upper whisker and additional outlier

Boxplot

I am not sure why my boxplot created with ggplot geom_boxplot is showing an incorrect upper whisker and showing the data point (value = 7) as an outlier for “Male” grouping red boxplot.

I have analyzed the same data sheet in SPSS and confirmed the Q1, Q3, and other values shown on the graph match. The SPSS stem and leaf plot also shows only 1 outlier (the value = 8 shown on the graph). According to the graph and calculations from the data made in SPSS, I’ve calculated the following based on the male grouping red boxplot:

Lower hinge = bottom of box = 25th percentile

= 1.00 

Upper hinge = top of box = 75th percentile

= 3.75

R documentation for geom_boxplot states that the upper whisker extends from the hinge to the highest value that is within 1.5 * IQR of the hinge. Data beyond the end of the whiskers are outliers and plotted as points (as specified by Tukey) R documentation

IQR = distance b/w 25th and 75th percentile = 3.75 - 1.00 = 2.75

IQR = 1.5 * 2.75 = 4.125

Upper hinge + IQR = 3.75 + 4.125 = 7.875

Upper whisker should extend to 7.875 and the outlier value currently shown that is equal to 7 should be within this whisker. Why is this not displayed correctly? Am I missing something here? Thank you!

Read more here: Source link