
MATH 4720/MSSC 5720 Introduction to Statistics
Human weight follows \(N(\mu, \sigma^2)\)

# of snowstorms in one year follows \(Poisson(\lambda)\)

\(n\) random variables \(X_1, X_2, \dots, X_n\).
\(X_1, X_2, \dots, X_n\) come from the same distribution.
View \(X_i\) as a data point to be drawn from a population with some distribution, say \(N(\mu, \sigma^2)\).
Assume that \(X_1, X_2, \dots, X_n\) are independent, i.e., the distribution/value of \(X_i\) is not affected by any other \(X_j\).
With the same distribution, \(X_1, X_2, \dots, X_n\) are independent and identically distributed (i.i.d.): \(X_1, X_2, \dots, X_n \stackrel{iid}{\sim} N(\mu, \sigma^2)\)
\((X_1, X_2, \dots, X_n)\) is a random sample of size \(n\) from the population.
Can you provide another statistic?
Sample variance \(\frac{\sum_{i=1}^n \left(X_i - \overline{X}\right)^2}{n-1}\) is a statistic.
Since \(X_1, X_2, \dots, X_n\) are random variables, any transformation or function of \((X_1, X_2, \dots, X_n)\), or statistic, is also a random variable.
The probability distribution of a statistic is called the sampling distribution of that statistic.
Does the sample mean \(\overline{X} = \frac{1}{n}\sum_{i=1}^n X_i\) have a sampling distribution?

What are the differences between the sampling distribution of \(\overline{X}\) and the population distribution each individual r.v. \(X_i\) is drawn from?
Sample means \((\overline{X})\) are less variable than individual observations \(X_i\).
Sample means \((\overline{X})\) are more normal than individual observations \(X_i\).
Roll a fair die 3 times 🎲🎲 🎲 independently to obtain 3 values from the population \(\{1, 2, 3, 4, 5, 6\}\).
Repeat the process 10,000 times and plot the histogram of the sampling mean.


Suppose \((X_1, \dots, X_n)\) is the random sample from a population distribution with mean \(\mu\) and standard deviation \(\sigma\).
The mean of the sampling distribution of the sample mean, \(\overline{X} = \frac{\sum_{i=1}^nX_i}{n}\), is \(\mu_{\overline{X}} = \mu\) .
The standard deviation of the sampling distribution of the sample mean \(\overline{X}\) is \(\sigma_{\overline{X}} = \frac{\sigma}{\sqrt{n}}\) .
If the population distribution is \(N(\mu, \sigma^2)\) , the sampling distribution of \(\overline{X}\) is exactly \(N\left(\mu, \frac{\sigma^2}{n} \right)\) .
For a single random variable \(X \sim N(\mu, \sigma^2)\), \(Z = \frac{X - \mu}{\sigma} \sim N(0, 1)\).
For the sample mean of \(n\) variables, \(\overline{X} \sim N(\mu_{\overline{X}}, \sigma^2_{\overline{X}}) = N(\mu, \frac{\sigma^2}{n})\), and hence
\[Z = \frac{\overline{X} - \mu_{\overline{X}}}{\sigma_{\overline{X}}} = \frac{\overline{X} - \mu}{\sigma/\sqrt{n}} \sim N(0, 1)\]
Psychomotor retardation scores for a group of patients have a normal distribution with a mean of 930 and a standard deviation of 130.
What is the probability that the mean retardation score of a random sample of 20 patients was between 900 and 960?

\[\small \begin{align} P(900 < \overline{X} < 960) &= P\left( \frac{900-930}{130/\sqrt{20}} < \frac{\overline{X}-930}{130/\sqrt{20}} < \frac{960-930}{130/\sqrt{20}}\right)=P(-1.03 < Z < 1.03)\\ &=P(Z < 1.03) - P(Z < -1.03) \end{align}\]
If \(X_i \stackrel{iid}{\sim} N(\mu, \sigma^2)\) , then \(\overline{X} \sim N\left(\mu, \frac{\sigma^2}{n} \right)\).
What if the population distribution is NOT normal?
The central limit theorem (CLT) gives us the answer!
Central Limit Theorem (CLT):
Suppose \(\overline{X}\) is from a random sample of size \(n\) and from a population distribution having mean \(\mu\) and standard deviation \(\sigma < \infty\).
As \(n\) increases, the sampling distribution of \(\overline{X}\) looks more and more like \(N(\mu, \sigma^2/n)\), regardless of the distribution from which we are sampling!

Many well-developed statistical methods are based on normal distribution assumption.
With CLT, we can use those methods even if we are sampling from a non-normal distribution, or we have no idea of the population distribution, provided that the sample size is large.
Suppose that selling prices of houses in Milwaukee are known to have a mean of $382,000 and a standard deviation of $150,000.
In 100 randomly selected sales, what is the probability the average selling price is more than $400,000?

Since the sample size is fairly large \((n = 100)\), by CLT, the sampling distribution of the average selling price is approximately normal with mean 382,000 and SD \(150,000 / \sqrt{100}\).
\(P(\overline{X} > 400000) = P\left(\frac{\overline{X} - 382000}{150000/\sqrt{100}} > \frac{400000 - 382000}{150000/\sqrt{100}}\right) \approx P(Z > 1.2)\) where \(Z \sim N(0, 1)\).