MATH 4720/MSSC 5720 Introduction to Statistics
A discrete random variable takes on a finite or countable number of values.
A continuous random variable has infinitely many values, and the collection of values is uncountable.
The probability (mass) function (pf, pmf) of a discrete random variable (r.v.) \(X\) is a function \(P(X = x)\) (or \(p(x)\)) that assigns a probability for every possible number \(x\).
The probability distribution for a discrete r.v. \(X\) displays its probability function.
The display can be a table, graph, or mathematical formula of \(P(X = x)\).
Example: 🪙🪙 Toss a fair coin twice independently and \(X\) is the number of heads.
| x | 0 | 1 | 2 |
| P(X = x) | 0.25 | 0.5 | 0.25 |
👉 \(\{X = x\}\) is an event corresponding to an event of some experiment.
What is the event that \(\{X = 0\}\) corresponds to?
How do we get \(P(X = 0)\), \(P(X=1)\) and \(P(X=2)\) ?
Suppose \(X\) takes values \(x_1, \dots, x_k\) with probabilities \(P(X = x_1), \dots, P(X = x_k)\).
The mean or expected value of \(X\) is the sum of each outcome multiplied by its corresponding probability: \[E(X) := x_1 \times P(X = x_1) + \dots + x_k \times P(X = x_k) = \sum_{i=1}^kx_iP(X=x_i)\]
The Greek letter \(\mu\) may be used in place of the notation \(E(X)\).
👉 The mean of a discrete random variable \(X\) is the weighted average of possible values \(x\) weighted by their corresponding probability.
What is the mean of \(X\) (the number of heads) in the previous example?
Suppose \(X\) takes values \(x_1, \dots , x_k\) with probabilities \(P(X = x_1), \dots, P(X = x_k)\) and expected value \(\mu = E(X)\).
The variance of \(X\), denoted by \(\mathrm{Var}(X)\) or \(\sigma^2\), is \[\small \mathrm{Var}(X) := (x_1 - \mu)^2 \times P(X = x_1) + \dots + (x_k - \mu)^2 \times P(X = x_k) = \sum_{i=1}^k(x_i - \mu)^2P(X=x_i)\]
The standard deviation of \(X\), \(\sigma\), is the square root of the variance.
👉 The variance of a discrete random variable \(X\) is the weighted sum of squared deviation from the mean weighted by probability values.
What is the variance of \(X\) (the number of heads) in the previous example?
The probability function \(P(X = x)\) of a binomial r.v. \(X\) can be fully determined by
Different \((n, \pi)\) pairs generate different binomial probability distributions.
\(X\) is said to follow a binomial distribution with parameters \(n\) and \(\pi\), written as \(\color{blue}{X \sim binomial(n, \pi)}\).
The binomial probability function is \[ \color{blue}{P(X = x \mid n, \pi) = \frac{n!}{x!(n-x)!}\pi^x(1-\pi)^{n-x}, \quad x = 0, 1, 2, \dots, n}\] with mean \(\mu = E(X) = n\pi\) and variance \(\sigma^2 = \mathrm{Var}(X) = n\pi(1-\pi)\).
Tossing a fair coin two times independently. Let \(X =\) # of heads. Is \(X\) a binomial r.v.?
Assume that 20% of all drivers have a blood alcohol level above the legal limit. For a random sample of 15 vehicles, compute the probability that:
Exactly 6 of the 15 drivers will exceed the legal limit.
Of the 15 drivers, 6 or more will exceed the legal limit.

Suppose it is a binomial experiment with \(n = 15\) and \(\pi = 0.2\).
Let \(X\) be the number of drivers exceeding limit.
\(X \sim binomial(15, 0.2)\).
\[ \color{blue}{P(X = x \mid n=15, \pi=0.2) = \frac{15!}{x!(15-x)!}(0.2)^x(1-0.2)^{15-x}, \quad x = 0, 1, 2, \dots, 15}\]
Assume that 20% of all drivers have a blood alcohol level above the legal limit. For a random sample of 15 vehicles, compute the probability that:
Exactly 6 of the 15 drivers will exceed the legal limit.
Of the 15 drivers, 6 or more will exceed the legal limit.

Never do this by hand. We compute them using R!
size the number of trials and prob the probability of success,
dbinom(x, size, prob) to compute \(P(X = x)\)
pbinom(q, size, prob) to compute \(P(X \le q)\)
pbinom(q, size, prob, lower.tail = FALSE) to compute \(P(X > q)\)
👉 Events occur one at a time; two or more events do not occur at the same time or in the same space or spot.
👉 The occurrence of an event in a given period of time or region of space is independent of the occurrence of the event in a nonoverlapping time period or region of space.
👉 \(\lambda\) is constant of any period or region.
Can you find the difference between binomial and Poisson distributions?
Last year there were 4200 births at the University of Wisconsin Hospital. Assume \(X\) be the number of births in a given day at the center, and \(X \sim Poisson(\lambda)\). Find

lambda the mean of Poisson distribution,
dpois(x, lambda) to compute \(P(X = x)\)
ppois(q, lambda) to compute \(P(X \le q)\)
ppois(q, lambda, lower.tail = FALSE) to compute \(P(X > q)\)
A continuous r.v. can take on any values from an interval of the real line.
Instead of probability functions, a continuous r.v. \(X\) has the probability density function (pdf) \(f(x)\) such that for any real value \(a < b\), \[P(a < X < b) = \int_{a}^b f(x) dx\]
The cumulative distribution function (cdf) of \(X\) is defined as \[F(x) := P(X \le x) = \int_{-\infty}^x f(t)dt\]
😎 Luckily we don’t deal with integrals in this course.
A pdf generates a graph called the density curve that shows the likelihood of a random variable at all possible values.
\(P(a < X < b) = \int_{a}^b f(x) dx\): The area under the density curve between \(a\) and \(b\).
\(\int_{-\infty}^{\infty} f(x) dx = 1\): The total area under any density curve is equal to 1.
In this course, we will touch normal (Gaussian), Student’s t, chi-squared, F
Some other common distributions include uniform, exponential, gamma, beta, inverse gamma, Cauchy, etc. (MATH 4700)
Standardization: Convert \(N(\mu, \sigma^2)\) to \(N(0, 1)\).
Why standardization: Put data onto a standardized scale, making comparisons easier!
| Measure | SAT | ACT |
|---|---|---|
| Mean | 1100 | 21 |
| SD | 200 | 6 |
The distribution of SAT and ACT scores are both nearly normal.
Suppose Anna scored 1300 on her SAT and Tommy scored 24 on his ACT. Who performed better?

If \(x\) is an observation from a distribution with mean \(\mu\) and standard deviation \(\sigma\), the standardized value of \(x\) is so-called \(z\)-score: \[z = \frac{x - \mu}{\sigma}\]
A \(z\)-score tells us how many standard deviations \(x\) falls away from the mean, and in which direction.
If \(X \sim N(\mu, \sigma^2)\), \(Z = \frac{X - \mu}{\sigma}\) follows the standard normal distribution, i.e., \(Z \sim N(0, 1)\).


A value of \(x\) that is 2 standard deviation below \(\mu\) corresponds to \(z = -2\).
\(z = \frac{x -\mu}{\sigma} \iff x = \mu + z\sigma\). If \(z = -2\), \(x = \mu - 2\sigma\).
What fraction of students have an SAT score below Anna’s score of 1300?
This is the same as the percentile Anna is at, which is the percentage of cases that have lower scores than Anna.
Need \(P(X < 1300 \mid \mu = 1100, \sigma = 200)\) or \(P(Z < 1 \mid \mu = 0, \sigma = 1)\).
mean and sd representing the mean and standard deviation of a normal distribution
pnorm(q, mean, sd) to compute \(P(X \le q)\)
pnorm(q, mean, sd, lower.tail = FALSE) to compute \(P(X > q)\)

We want the upper tail area, so lower.tail = FALSE!
pnorm(q = 1190, mean = 1100, sd = 200,
lower.tail = FALSE)[1] 0.326
qnorm(p, mean, sd) to get a value of \(X\), \(q\), such that \(P(X \le q) = p\)
qnorm(p, mean, sd, lower.tail = FALSE) to get \(q\) such that \(P(X \ge q) = p\)
What is the 95th percentile for SAT scores?
Find a value \(q\) of the normal random variable, not an area (probability), which is 0.95.