Probability Distributions

MATH 4720/MSSC 5720 Introduction to Statistics

Dr. Cheng-Han Yu
Department of Mathematical and Statistical Sciences
Marquette University

Random Variables

Random Variables

  • Recap: A variable in a data set is a characteristic that varies from one to another.
    • A variable can be either categorical or numerical.
    • Numerical variables can be either discrete or continuous.
  • A random variable, usually written as \(X\) 1, is a variable whose possible values are numerical outcomes determined by chance or randomness of a procedure or experiment.
    • Toss a coin 2 times. \(X\) = # of heads.
    • \(X\) = # of accidents in W. Wisconsin Ave. per day.
  • A random variable has a probability distribution associated with it, accounting for its randomness.

Discrete and Continuous Random Variables

  • A discrete random variable takes on a finite or countable number of values.

  • A continuous random variable has infinitely many values, and the collection of values is uncountable.

  • The number of relationships you’ve ever had is discrete variable because we can count the number and it is finite.
    • If we can further determine the probability that the number is 0, 1, 2, or any possible number, it is a discrete random variable.
  • Height is continuous because it can be any number within a range.
    • If we have a way to quantify the probability that the height is from any value \(a\) to any value \(b\), it is a continuous random variable.

Probability Distributions

  • Discrete Distributions

  • Continuous Distributions

A Statistician Should Know

What We Learn Here

  • Binomial Distribution

  • Poisson Distribution

  • Normal Distribution

Discrete Probability Distributions

Discrete Probability Distribution

  • The probability (mass) function (pf, pmf) of a discrete random variable (r.v.) \(X\) is a function \(P(X = x)\) (or \(p(x)\)) that assigns a probability for every possible number \(x\).

  • The probability distribution for a discrete r.v. \(X\) displays its probability function.

  • The display can be a table, graph, or mathematical formula of \(P(X = x)\).

Example: 🪙🪙 Toss a fair coin twice independently and \(X\) is the number of heads.

  • The probability distribution of \(X\) as a table is
x 0 1 2
P(X = x) 0.25 0.5 0.25

👉 \(\{X = x\}\) is an event corresponding to an event of some experiment.

What is the event that \(\{X = 0\}\) corresponds to?

How do we get \(P(X = 0)\), \(P(X=1)\) and \(P(X=2)\) ?

Discrete Probability Distribution as a Graph

  • \(0 \le P(X = x) \le 1\) for every value \(x\) of \(X\).
    • \(x = 0, 1, 2\)
  • \(\sum_{x}P(X=x) = 1\), where \(x\) assumes all possible values.
    • \(P(X=0) + P(X = 1) + P(X = 2) = 1\)
  • The probabilities for a discrete r.v. are additive because \(\{X = a\}\) and \(\{X = b\}\) are disjoint for any possible values \(a \ne b\).
    • \(P(X = 1 \text{ or } 2) = P(\{X = 1\} \cup \{X = 2\}) = P(X = 1) + P(X = 2)\).

Mean of a Discrete Random Variable

  • Suppose \(X\) takes values \(x_1, \dots, x_k\) with probabilities \(P(X = x_1), \dots, P(X = x_k)\).

  • The mean or expected value of \(X\) is the sum of each outcome multiplied by its corresponding probability: \[E(X) := x_1 \times P(X = x_1) + \dots + x_k \times P(X = x_k) = \sum_{i=1}^kx_iP(X=x_i)\]

  • The Greek letter \(\mu\) may be used in place of the notation \(E(X)\).

👉 The mean of a discrete random variable \(X\) is the weighted average of possible values \(x\) weighted by their corresponding probability.

What is the mean of \(X\) (the number of heads) in the previous example?

Variance of a Discrete Random Variable

  • Suppose \(X\) takes values \(x_1, \dots , x_k\) with probabilities \(P(X = x_1), \dots, P(X = x_k)\) and expected value \(\mu = E(X)\).

  • The variance of \(X\), denoted by \(\mathrm{Var}(X)\) or \(\sigma^2\), is \[\small \mathrm{Var}(X) := (x_1 - \mu)^2 \times P(X = x_1) + \dots + (x_k - \mu)^2 \times P(X = x_k) = \sum_{i=1}^k(x_i - \mu)^2P(X=x_i)\]

  • The standard deviation of \(X\), \(\sigma\), is the square root of the variance.

👉 The variance of a discrete random variable \(X\) is the weighted sum of squared deviation from the mean weighted by probability values.

What is the variance of \(X\) (the number of heads) in the previous example?

Binomial Distribution

Binomial Experiment and Random Variable

  • A binomial experiment is the one having the following properties:
    1. 👉 The experiment consists of a fixed number of identical trials \(n\).
    2. 👉 Each trial results in one of exactly two outcomes (success (S) and failure (F)).
    3. 👉 Trials are independent, meaning that the outcome of any trial does not affect the outcome of any other trial.
    4. 👉 The probability of success is constant for all trials.
  • If \(X\) is defined as the number of successes observed in \(n\) trials , \(X\) is a binomial random variable.
  • The word success just means one of the two outcomes, and does not necessarily mean something good.
  • 😲 Can define Drug abuse as success and No drug abuse as failure.

Binomial Distribution

  • The probability function \(P(X = x)\) of a binomial r.v. \(X\) can be fully determined by

    • the number of trials \(n\)
    • probability of success \(\pi\)
  • Different \((n, \pi)\) pairs generate different binomial probability distributions.

  • \(X\) is said to follow a binomial distribution with parameters \(n\) and \(\pi\), written as \(\color{blue}{X \sim binomial(n, \pi)}\).

  • The binomial probability function is \[ \color{blue}{P(X = x \mid n, \pi) = \frac{n!}{x!(n-x)!}\pi^x(1-\pi)^{n-x}, \quad x = 0, 1, 2, \dots, n}\] with mean \(\mu = E(X) = n\pi\) and variance \(\sigma^2 = \mathrm{Var}(X) = n\pi(1-\pi)\).

Tossing a fair coin two times independently. Let \(X =\) # of heads. Is \(X\) a binomial r.v.?

Binomial Distribution Example

Assume that 20% of all drivers have a blood alcohol level above the legal limit. For a random sample of 15 vehicles, compute the probability that:

  1. Exactly 6 of the 15 drivers will exceed the legal limit.

  2. Of the 15 drivers, 6 or more will exceed the legal limit.

  • Suppose it is a binomial experiment with \(n = 15\) and \(\pi = 0.2\).

  • Let \(X\) be the number of drivers exceeding limit.

  • \(X \sim binomial(15, 0.2)\).

\[ \color{blue}{P(X = x \mid n=15, \pi=0.2) = \frac{15!}{x!(15-x)!}(0.2)^x(1-0.2)^{15-x}, \quad x = 0, 1, 2, \dots, 15}\]

Binomial Distribution Example \(X \sim binomial(15, 0.2)\)

Binomial Distribution Example

Assume that 20% of all drivers have a blood alcohol level above the legal limit. For a random sample of 15 vehicles, compute the probability that:

  1. Exactly 6 of the 15 drivers will exceed the legal limit.

  2. Of the 15 drivers, 6 or more will exceed the legal limit.

  1. \(\small P(X = 6) = \frac{n!}{x!(n-x)!}\pi^x(1-\pi)^{n-x} = \frac{15!}{6!(15-6)!}(0.2)^6(1-0.2)^{15-6} = 0.043\)
  1. \(\small P(X \ge 6) = p(6) + \dots + p(15) = 1 - P(X \le 5) = 1 - (p(0) + p(1) + \dots + p(5)) = 0.0611\)

Never do this by hand. We compute them using R!

Binomial Example Computation in R

  • With size the number of trials and prob the probability of success,
    • dbinom(x, size, prob) to compute \(P(X = x)\)
    • pbinom(q, size, prob) to compute \(P(X \le q)\)
    • pbinom(q, size, prob, lower.tail = FALSE) to compute \(P(X > q)\)
## 1. P(X = 6)
dbinom(x = 6, size = 15, prob = 0.2) 
[1] 0.043
## 2. P(X >= 6) = 1 - P(X <= 5)
1 - pbinom(q = 5, size = 15, prob = 0.2) 
[1] 0.0611
## 2. P(X >= 6) = P(X > 5)
pbinom(q = 5, size = 15, prob = 0.2, 
       lower.tail = FALSE)  
[1] 0.0611

Binomial(15, 0.2)

plot(x = 0:15, y = dbinom(0:15, size = 15, prob = 0.2), type = 'h', xlab = "x", 
     ylab = "P(X = x)", lwd = 5, main = "Binomial(15, 0.2)")

Poisson Distribution

Poisson Random Variables

  • If we’d like to count the number of occurrences of some event over a unit of time period or space (region) and calculate its associated probability, we could consider the Poisson distribution.
    • Number of COVID patients arriving at ICU in one hour
    • Number of Marquette students logging onto D2L in one day
    • Number of dandelions per square meter in Marquette campus
  • Let \(X\) be a Poisson r.v. Then \(\color{blue}{X \sim Poisson(\lambda)}\), where \(\lambda\) is the parameter representing the mean number of occurrences of the event in the interval. \[\color{blue}{P(X = x \mid \lambda) = \frac{\lambda^x e^{-\lambda}}{x!}, \quad x = 0, 1, 2, \dots}\] with both mean and variance being equal to \(\lambda\).

Assumptions and Properties of Poisson Variables

  • 👉 Events occur one at a time; two or more events do not occur at the same time or in the same space or spot.

  • 👉 The occurrence of an event in a given period of time or region of space is independent of the occurrence of the event in a nonoverlapping time period or region of space.

  • 👉 \(\lambda\) is constant of any period or region.

Can you find the difference between binomial and Poisson distributions?

  • The Poisson distribution
    • is determined by one single parameter \(\lambda\)
    • has possible values \(x = 0, 1, 2, \dots\) with no upper limit (countable), while a binomial variable has possible values \(0, 1, 2, \dots, n\) (finite)

Poisson Distribution Example

Last year there were 4200 births at the University of Wisconsin Hospital. Assume \(X\) be the number of births in a given day at the center, and \(X \sim Poisson(\lambda)\). Find

  1. \(\lambda\), the mean number of births per day.
  2. the probability that on a randomly selected day, there are exactly 10 births.
  3. \(P(X > 10)\)?

  1. \(\small \lambda = \frac{\text{Number of birth in a year}}{\text{Number of days}} = \frac{4200}{365} = 11.5\)
  1. \(\small P(X = 10 \mid \lambda = 11.5) = \frac{\lambda^x e^{-\lambda}}{x!} = \frac{11.5^{10} e^{-11.5}}{10!} = 0.113\)
  1. \(\small P(X > 10) = p(11) + p(12) + \cdots + p(20) + \cdots\) (No end!) \(\small P(X > 10) = 1 - P(X \le 10) = 1 - (p(0) + p(1) + p(2) + \dots + p(10))\).

Poisson Example Compuatation in R

  • With lambda the mean of Poisson distribution,
    • dpois(x, lambda) to compute \(P(X = x)\)
    • ppois(q, lambda) to compute \(P(X \le q)\)
    • ppois(q, lambda, lower.tail = FALSE) to compute \(P(X > q)\)
(lam <- 4200 / 365)
[1] 11.5
## P(X = 10)
dpois(x = 10, lambda = lam)  
[1] 0.113
## P(X > 10) = 1 - P(X <= 10)
1 - ppois(q = 10, lambda = lam)  
[1] 0.599
## P(X > 10)
ppois(q = 10, lambda = lam, 
      lower.tail = FALSE) 
[1] 0.599

Poisson(11.5)

  • \(X\) has no upper limit. The graph is truncated at \(x = 24\).
Code
par(mar = c(3.8, 3.8, 1, 0), mgp = c(2.5, 0.5, 0))
plot(0:24, dpois(0:24, lambda = lam), type = 'h', lwd = 5, las = 1,
     ylab = "P(X = x)", xlab = "x", main = "Poisson(11.5)")

Continuous Probability Distributions

Continuous Probability Distributions

  • A continuous r.v. can take on any values from an interval of the real line.

  • Instead of probability functions, a continuous r.v. \(X\) has the probability density function (pdf) \(f(x)\) such that for any real value \(a < b\), \[P(a < X < b) = \int_{a}^b f(x) dx\]

  • The cumulative distribution function (cdf) of \(X\) is defined as \[F(x) := P(X \le x) = \int_{-\infty}^x f(t)dt\]

  • Every pdf must satisfy (1) \(f(x) \ge 0\) for all \(x\); (2) \(\int_{-\infty}^{\infty} f(x) dx = 1\)

😎 Luckily we don’t deal with integrals in this course.

Density Curve

  • A pdf generates a graph called the density curve that shows the likelihood of a random variable at all possible values.

  • \(P(a < X < b) = \int_{a}^b f(x) dx\): The area under the density curve between \(a\) and \(b\).

  • \(\int_{-\infty}^{\infty} f(x) dx = 1\): The total area under any density curve is equal to 1.

Commonly Used Continuous Distributions

  • Distribution Applet

  • In this course, we will touch normal (Gaussian), Student’s t, chi-squared, F

  • Some other common distributions include uniform, exponential, gamma, beta, inverse gamma, Cauchy, etc. (MATH 4700)

Normal (Gaussian) Distribution

  • The normal distribution, \(N(\mu, \sigma^2\)), has the pdf given by \[\small f(x) = \frac{1}{\sqrt{2\pi}\sigma}e^{\frac{-(x-\mu)^2}{2\sigma^2}}, \quad -\infty < x < \infty\]
    • Two parameters mean \(\mu\) and variance \(\sigma^2\) (standard deviation \(\sigma\))
    • Always bell shaped, and symmetric about the mean \(\mu\)
    • When \(\mu = 0\) and \(\sigma = 1\), \(N(0, 1)\) is called standard normal.

Normal Density Curves

Standardization and Z-Scores

  • Standardization: Convert \(N(\mu, \sigma^2)\) to \(N(0, 1)\).

  • Why standardization: Put data onto a standardized scale, making comparisons easier!

Measure SAT ACT
Mean 1100 21
SD 200 6
  • The distribution of SAT and ACT scores are both nearly normal.

  • Suppose Anna scored 1300 on her SAT and Tommy scored 24 on his ACT. Who performed better?

Standardization and Z-Scores

  • If \(x\) is an observation from a distribution with mean \(\mu\) and standard deviation \(\sigma\), the standardized value of \(x\) is so-called \(z\)-score: \[z = \frac{x - \mu}{\sigma}\]

  • A \(z\)-score tells us how many standard deviations \(x\) falls away from the mean, and in which direction.

    • Observations larger (smaller) than the mean have positive (negative) \(z\)-scores.
    • A \(z\)-score -1.2 means that \(x\) is 1.2 standard deviations to the left of (below) the mean.
    • A \(z\)-score 1.8 means that \(x\) is 1.8 standard deviations to the right of (above) the mean.
  • If \(X \sim N(\mu, \sigma^2)\), \(Z = \frac{X - \mu}{\sigma}\) follows the standard normal distribution, i.e., \(Z \sim N(0, 1)\).

Standardization Illustration

  • \(X - \mu\) shifts the mean from \(\mu\) to 0

  • \(\frac{X - \mu}{\sigma}\) scales the variation from 4 to 1

Standardization Illustration

  • A value of \(x\) that is 2 standard deviation below \(\mu\) corresponds to \(z = -2\).

  • \(z = \frac{x -\mu}{\sigma} \iff x = \mu + z\sigma\). If \(z = -2\), \(x = \mu - 2\sigma\).

SAT and ACT Example

  • \(z_{A} = \frac{x_{A} - \mu_{SAT}}{\sigma_{SAT}} = \frac{1300-1100}{200} = 1\); \(z_{T} = \frac{x_{T} - \mu_{ACT}}{\sigma_{ACT}} = \frac{24-21}{6} = 0.5\).

Finding Tail Areas \(P(X < x)\)

What fraction of students have an SAT score below Anna’s score of 1300?

  • This is the same as the percentile Anna is at, which is the percentage of cases that have lower scores than Anna.

  • Need \(P(X < 1300 \mid \mu = 1100, \sigma = 200)\) or \(P(Z < 1 \mid \mu = 0, \sigma = 1)\).

Finding Tail Areas \(P(X < x)\) in R

  • With mean and sd representing the mean and standard deviation of a normal distribution
    • pnorm(q, mean, sd) to compute \(P(X \le q)\)
    • pnorm(q, mean, sd, lower.tail = FALSE) to compute \(P(X > q)\)

pnorm(1, mean = 0, sd = 1)
[1] 0.841
pnorm(1300, mean = 1100, sd = 200)
[1] 0.841
  • The shaded area represents the proportion 84.1% of SAT test takers who had z-score below 1.

SAT Example Cont’d

  • SAT score follows \(N(1100, 200^2)\). Shannon is a SAT taker, and nothing is known about Shannon’s SAT aptitude. What is the probability Shannon SAT scores at least 1190?
  • Step 1: State the problem
    • We like to compute \(P(X \ge 1190)\).
  • Step 2: Draw a picture

  • Step 3: Find the area

We want the upper tail area, so lower.tail = FALSE!

pnorm(q = 1190, mean = 1100, sd = 200, 
      lower.tail = FALSE)
[1] 0.326

Normal Percentiles in R

  • To get the \(100p\)-th percentile (or the \(p\)-quantile \(q\)), given probability \(p\), we use
    • qnorm(p, mean, sd) to get a value of \(X\), \(q\), such that \(P(X \le q) = p\)
    • qnorm(p, mean, sd, lower.tail = FALSE) to get \(q\) such that \(P(X \ge q) = p\)

SAT Example

What is the 95th percentile for SAT scores?

Find a value \(q\) of the normal random variable, not an area (probability), which is 0.95.

  • Step 1: State the problem
    • Find a variable’s value \(q\) s.t \(P(X < q) = 0.95\).
  • Step 2: Draw a picture

  • Step 3: Find the quantile

We want the quantile, so use qnorm()!

qnorm(p = 0.95, mean = 1100, sd = 200)
[1] 1429