Data Mining Discussion 2 a

  • The mean is in general affected by outliers.

True. A major flaw the mean has is its sensitivity to extreme values. Large numbers throw off the true value of the mean. If an accurate measure of the mean is desired, it’s important to first remove any outliers.

  • What are the differences between the measures of central tendency and the measures of dispersion?

Measures of central tendency determine where the data values are concentrated, as opposed to measures of dispersion which determine how spread out the data values are.

  • How would you catalog a boxplot, as a measure of dispersion or as a data visualization aid? Why?

I would categorize a boxplot as a data visualization aid. A boxplot is a type of graph that visually represents raw data (i.e. a bunch of numbers) in a simple way. Granted, it’s important to know how to read a boxplot before considering it to be a simpler method of interpreting data, but nevertheless it’s a useful tool.