Comparing Iris Data to the Normal Distribution

mhonaker

The iris dataset

This dataset contains some measurements of flower properties of three different species of iris.
Often, but not always, real measurements like these conform to a classic normal distribution.

hist(iris$Sepal.Width, breaks = 20, col = "skyblue", xlab = "Sepal Width", main = "")

plot of chunk unnamed-chunk-1

The normal distribution

The normal, or Gaussian, probability density function:
\[F(x;\mu,\sigma)=\int_{-\infty}^x \frac{1}{\sigma\sqrt{2\pi}} e^{-\frac{(x-\mu)^2}{2\sigma^2}}\]
is often used to approximate real data, such as flower size.
A way to compare some real data to a normal distribution would be quite convenient.

Comparing the ideal to real data

The histogram below shows real data, and the line the normal distribution with the same standard deviation.

hist(I(iris$Sepal.Width - mean(iris$Sepal.Width)), freq = F, breaks = 20, xlab = "", 
    main = "", col = "lightblue")
x <- seq(-1.5, 2, length = 1000)
y <- dnorm(x, mean = 0, sd = sd(iris$Sepal.Width))
lines(x, y, type = "l", lwd = 3, col = "blue")

plot of chunk unnamed-chunk-2

A shiny app to do the same thing...

The shiny app deployed at http://mhonaker.shinyapps.io/proj2/ does just that.
A user can plot one of several data sets, and compare the normal distributions.
Additionally, a user can plot a normal distribution of their own mean and standard deviation.
Further projected improvements include the ability to upload a user generated dataset.