Comparing Iris Data to the Normal Distribution

mhonaker

The iris dataset

  • This dataset contains some measurements of flower properties of three different species of iris.
  • Often, but not always, real measurements like these conform to a classic normal distribution.
hist(iris$Sepal.Width, breaks = 20, col = "skyblue", xlab = "Sepal Width", main = "")

plot of chunk unnamed-chunk-1

The normal distribution

The normal, or Gaussian, probability density function:
\[F(x;\mu,\sigma)=\int_{-\infty}^x \frac{1}{\sigma\sqrt{2\pi}} e^{-\frac{(x-\mu)^2}{2\sigma^2}}\]
is often used to approximate real data, such as flower size.
A way to compare some real data to a normal distribution would be quite convenient.

Comparing the ideal to real data

The histogram below shows real data, and the line the normal distribution with the same standard deviation.

hist(I(iris$Sepal.Width - mean(iris$Sepal.Width)), freq = F, breaks = 20, xlab = "", 
    main = "", col = "lightblue")
x <- seq(-1.5, 2, length = 1000)
y <- dnorm(x, mean = 0, sd = sd(iris$Sepal.Width))
lines(x, y, type = "l", lwd = 3, col = "blue")

plot of chunk unnamed-chunk-2

A shiny app to do the same thing...

  • The shiny app deployed at http://mhonaker.shinyapps.io/proj2/ does just that.
  • A user can plot one of several data sets, and compare the normal distributions.
  • Additionally, a user can plot a normal distribution of their own mean and standard deviation.
  • Further projected improvements include the ability to upload a user generated dataset.