
MATH 4720/MSSC 5720 Introduction to Statistics
Inferential statistics uses sample data to learn about an unknown population.
Idea: Assume the target population follows some distribution but with unknown parameters.
Goal: Learning the unknown parameters of the assumed population distribution.
Two approaches in parameter learning: Estimation and Hypothesis testing.


If you can use only one single number to guess the unknown population mean \(\mu\), what would you like to use?
The one single point used to estimate the unknown parameter is called a point estimator.
A point estimator is any function of data \((X_1, X_2, \dots, X_n)\) (Before actually being collected).
A point estimate is a value of a point estimator used to estimate a population parameter. (A value calculated using the collected data).
Sample mean \((\overline{X})\) is a statistic and a point estimator for the population mean \(\mu\).
| x1 | x2 | x3 | x4 | x5 | sample mean |
|---|---|---|---|---|---|
| 2.88 | 2.94 | 3.09 | 2.66 | 2.38 | 2.79 |
Why \(\overline{x}\) is not equal to \(\mu\)?
Due to its randomness nature

| x1 | x2 | x3 | x4 | x5 | sample mean |
|---|---|---|---|---|---|
| 2.35 | 3.4 | 3.97 | 1.54 | 3.5 | 2.95 |
Why the first sample and the second sample give us different sample means?
A point estimator has its own sampling distribution

If you want to estimate \(\mu\), do you prefer to report a range of values the parameter might be in, or a single estimate like \(\overline{x}\)?
If you want to catch a fish, do you prefer a spear or a net?


A plausible range of values for \(\mu\) is called a confidence interval (CI).
To construct a CI we need to quantify the variability of our sample mean.
Quantifying this uncertainty requires a measurement of how much we would expect the sample statistic to vary from sample to sample.
👉 The larger variation of \(\overline{X}\) is, the wider the CI for \(\mu\) will be, given the same “level of confidence”.
Do we know the variance of \(\overline{X}\)?
A confidence interval is for a parameter, NOT a statistic.
We never say “The confidence interval of the sample mean \(\overline{X}\) is …”
We say “The confidence interval for the true population mean \(\mu\) is …”
\(\large \overline{x} \pm m = (\overline{x} - m, \overline{x} + m)\)
The \(m\) is called margin of error.
\(\overline{x} - m\) is the lower bound and \(\overline{x} + m\) is the upper bound of the confidence interval.
The point estimate \(\overline{x}\) and margin of error \(m\) can be obtained from known quantities and our data once sampled.
If we want to be very certain that we capture \(\mu\), should we use a wider or a narrower interval? What drawbacks are associated with using a wider interval?

With the sample size fixed, precision and reliability have a trade-off relationship.
The confidence level \(1-\alpha\): the proportion of times that the CI contains the population parameter, assuming that the estimation process is repeated a large number of times.
The common choices for the confidence level are
95% is the most common level because of good balance between precision (width of the CI) and reliability (confidence level)
High reliability and Low precision. I am 100% confident that the mean height of Marquette students is between 3’0” and 8’0”. duh…🤷
Low reliability and High precision. I am 20% confident that mean height of Marquette students is between 5’6” and 5’7”. far from it…🙅
\(\alpha = 0.05\)
Start with the sampling distribution of \(\overline{X} \sim N\left(\mu, \frac{\sigma^2}{n}\right)\)
\(\overline{x}\) will be within 1.96 SDs of the population mean \(\mu\) \(95\%\) of the time.
The \(z\)-score of 1.96 is associated with 2.5% area to the right, and called a critical value denoted as \(z_{0.025}\).

\[P\left(\mu-1.96\frac{\sigma}{\sqrt{n}} < \overline{X} < \mu + 1.96\frac{\sigma}{\sqrt{n}} \right) = 0.95\]
Is the interval \(\left(\mu-1.96\frac{\sigma}{\sqrt{n}}, \mu+1.96\frac{\sigma}{\sqrt{n}} \right)\) our confidence interval?
❌ No! We don’t know \(\mu\), the quantity we like to estimate!
But we’re almost there!

\[ \small P\left(\mu-1.96\frac{\sigma}{\sqrt{n}} < \overline{X} < \mu + 1.96\frac{\sigma}{\sqrt{n}} \right) = 0.95 \iff P\left( \boxed{\overline{X}-1.96\frac{\sigma}{\sqrt{n}} < \mu < \overline{X} + 1.96\frac{\sigma}{\sqrt{n}}} \right) = 0.95\]
With sample data of size \(n\), \(\left(\overline{x}-1.96\frac{\sigma}{\sqrt{n}}, \overline{x} + 1.96\frac{\sigma}{\sqrt{n}}\right)\) is our \(95\%\) CI for \(\mu\) if \(\sigma\) is known to us!
The margin of error \(m = 1.96\frac{\sigma}{\sqrt{n}}\).

Requirements for estimating \(\mu\) when \(\sigma\) is known:
👉 The sample should be a random sample, i.e. All data \(X_i\) are drawn from the same population, and \(X_i\) and \(X_j\) are independent.
👉 The population standard deviation \(\sigma\) is known.
👉 The population is either normally distributed or \(n > 30\) or both, i.e., \(X_i \sim N(\mu, \sigma^2)\).

\(\left(\overline{x}-1.96\frac{\sigma}{\sqrt{n}}, \overline{x} + 1.96\frac{\sigma}{\sqrt{n}}\right)\) \(\left(\overline{x}-z_{0.025}\frac{\sigma}{\sqrt{n}}, \overline{x} + z_{0.025}\frac{\sigma}{\sqrt{n}}\right)\)

\(\left(\overline{x}-z_{\alpha/2}\frac{\sigma}{\sqrt{n}}, \overline{x} + z_{\alpha/2}\frac{\sigma}{\sqrt{n}}\right)\)
Procedures for constructing a confidence interval for \(\mu\) when \(\sigma\) known:
\[\left( \overline{x} -z_{\alpha/2}\frac{\sigma}{\sqrt{n}}, \, \overline{x} + z_{\alpha/2} \frac{\sigma}{\sqrt{n}}\right)\]
We want to know the mean systolic blood pressure (SBP) of a population.
Assume that the population distribution is normal with the standard deviation of 5 mmHg.
We have a random sample of 16 subjects of this population with mean 121.5.
Estimate the mean SBP with a 95% confidence interval.

## save all information we have
alpha <- 0.05
n <- 16
x_bar <- 121.5
sig <- 5
## 95% CI
## z-critical value
(cri_z <- qnorm(p = alpha / 2, lower.tail = FALSE)) [1] 1.96
## margin of error
(m_z <- cri_z * (sig / sqrt(n))) [1] 2.45
## 95% CI for mu when sigma is known
x_bar + c(-1, 1) * m_z [1] 119.1 123.9
Construct a 99% CI for the mean SBP. Do you expect to have a wider or narrower interval? Why?
“We are 95% confident that the mean SBP lies between 119.1 mm and 123.9 mm.”
Suppose we were able to collect our dataset many times and build the corresponding CIs.
We would expect about 95% of those intervals would contain the true population parameter, here the mean systolic blood pressure.
We never know if in fact 95% of them do, or whether any interval contains the true parameter!
\(\sigma^2 = \frac{\sum_{i=1}^{N}(x_i - \mu)^2}{N}\), \(N\) is the population size.
It’s rare that we do not know \(\mu\), but know \(\sigma\).
We use the Student t distribution to construct a confidence interval for \(\mu\) when \(\sigma\) is unknown.
Still need
What is a natural estimator for the unknown \(\sigma\)?
Symmetric about the mean 0 and bell-shaped as \(N(0, 1)\).
More variability than \(N(0, 1)\) (heavier tails and lower peak).
The variability is different for different sample sizes (degrees of freedom).
As \(n \rightarrow \infty\) \((df \rightarrow \infty)\), the Student t distribution approaches to \(N(0, 1)\).
With the same \(\alpha\), \(t_{\alpha, n-1}\) or \(z_{\alpha}\) is larger?
Given the same confidence level \(1-\alpha\), \(t_{\alpha/2, n-1} > z_{\alpha/2}\).
| Level | t df = 5 | t df = 15 | t df = 30 | t df = 1000 | t df = inf | z |
|---|---|---|---|---|---|---|
| 90% | 2.02 | 1.75 | 1.70 | 1.65 | 1.64 | 1.64 |
| 95% | 2.57 | 2.13 | 2.04 | 1.96 | 1.96 | 1.96 |
| 99% | 4.03 | 2.95 | 2.75 | 2.58 | 2.58 | 2.58 |
The \((1-\alpha)100\%\) confidence interval for \(\mu\) when \(\sigma\) is unknown is \[\left(\overline{x} - t_{\alpha/2, n-1} \frac{s}{\sqrt{n}}, \overline{x} + t_{\alpha/2, n-1} \frac{s}{\sqrt{n}}\right)\]
Given the same confidence level \(1-\alpha\), \(t_{\alpha/2, n-1} > z_{\alpha/2}\).
We are more “uncertain” when doing inference about \(\mu\) because we also don’t have information about \(\sigma\), and replacing it with \(s\) adds additional uncertainty.
Back to the systolic blood pressure (SBP) example. We have \(n=16\) and \(\overline{x} = 121.5\).
Estimate the mean SBP with a 95% confidence interval with unknown \(\sigma\) and \(s = 5\).
alpha <- 0.05
n <- 16
x_bar <- 121.5
s <- 5 ## sigma is unknown and s = 5
## t-critical value
(cri_t <- qt(p = alpha / 2, df = n - 1, lower.tail = FALSE)) [1] 2.131
## margin of error
(m_t <- cri_t * (s / sqrt(n))) [1] 2.664
## 95% CI for mu when sigma is unknown
x_bar + c(-1, 1) * m_t [1] 118.8 124.2
| Numerical Data, \(\sigma\) known | Numerical Data, \(\sigma\) unknown | |
|---|---|---|
| Parameter of Interest | Population Mean \(\mu\) | Population Mean \(\mu\) |
| Confidence Interval | \(\bar{x} \pm z_{\alpha/2} \frac{\sigma}{\sqrt{n}}\) | \(\bar{x} \pm t_{\alpha/2, n-1} \frac{s}{\sqrt{n}}\) |