Exercise 1

Exercises for Exam 1

Exercise
  1. The General Social Survey asked the question, After an average work day, about how many hours do you have to relax or pursue activities that you enjoy?” to a random sample of 1,155 Americans. The average relaxing time was found to be 1.65 hours. Determine which of the following is an observation, a variable, a sample statistic (value calculated based on the observed sample), or a population parameter.
    1. An American in the sample.
    2. Number of hours spent relaxing after an average work day.
    3. 1.65.
    4. Average number of hours all Americans spend relaxing after an average work day.
# (a) Observation.
# (b) Variable.
# (c) Sample statistic (mean).
# (d) Population parameter (mean).


  1. Identify which value represents the sample mean and which value represents the claimed population mean.
    1. American households spent an average of about $52 in 2007 on Halloween merchandise such as costumes, decorations and candy. To see if this number had changed, researchers conducted a new survey in 2008 before industry numbers were reported. The survey included 1,500 households and found that average Halloween spending was $58 per household.
    2. The average GPA of students in 2001 at a private university was 3.37. A survey on a sample of 203 students from this university yielded an average GPA of 3.59 a decade later.
# (a) Population mean, 2007 = 52; sample mean, 2008 = 58.
# (b) Population mean, 2001 = 3.37; sample mean, 2012 = 3.59.


  1. Data collected at elementary schools in DeKalb County, GA suggest that each year roughly 25% of students miss exactly one day of school, 15% miss 2 days, and 28% miss 3 or more days due to sickness.
    1. What is the probability that a student chosen at random doesn’t miss any days of school due to sickness this year?
    2. What is the probability that a student chosen at random misses no more than one day?
    3. What is the probability that a student chosen at random misses at least one day?
    4. If a parent has two kids at a DeKalb County elementary school, what is the probability that neither kid will miss any school? Note any assumption you must make to answer this question.
    5. If a parent has two kids at a DeKalb County elementary school, what is the probability that both kids will miss some school, i.e. at least one day? Note any assumption you make.
    6. If you made an assumption in part (d) or (e), do you think it was reasonable? If you didn’t make any assumptions, double check your earlier answers.
# (a) P(no misses) = 1 - (0.25 + 0.15 + 0.28) = 0.32
# (b) P(at most 1 miss) = P(no misses) + P(1 miss) = 0.32 + 0.25 = 0.57
# (c) P(at least 1 miss) = P(1 miss) + P(2 misses) + P(3+ misses) 
#                        = 1 - P(no misses)
#                        = 1 - 0.32 = 0.68
# (d) For parts (d) and (e) assume that whether or not one kid misses school is
# independent of the other.
# P(neither miss any) = P(no miss) * P(no miss) = 0.32^2 = 0.1024
# (e) P(both miss some) = P(at least 1 miss) * P(at least 1 miss) = 0:682 = 0.4624
# (f) These kids are siblings, and if one gets sick it probably raises the 
# chance that the other one will get sick as well. So whether or not one misses
# school due to sickness is probably not independent of the other.


  1. In the United States, approximately \(9\%\) of the population have diabetes, while about \(30\%\) of adults have high blood pressure. An estimated \(6\%\) of the population have both diabetes (\(D\)) and hypertension (\(H\)).
    1. What is the probability that a randomly selected American adult has both two diseases (\(D \cap H\)) doesn’t have any of the two diseases (\(D^c \cap H^c\))? (Drawing a Venn diagram may help)
    2. If a randomly selected American adult has diabetes, what’s the probability that he also has hypertension, i.e., \(P(H \mid D)\)? Based on your result, is the event of someone being hypertensive independent of the event that someone has diabetes? How knowing someone having diabetes help us predict whether or not he also has hypertension?
## (a) P(H and D) + P(H^c and D^c) = 6% + (1 - c(24% + 6% + 3%)) = 73%
## (b) P(H|D) = 6%/9% = 66%. H and D not independent.


  1. Assume that females have pulse rates that are normally distributed with a mean of 74.0 beats per minute and a standard deviation of 12.5 beats per minute.
    1. If 1 adult female is randomly selected, find the probability that her pulse rate is less than 80 beats per minute.
    2. If 16 adult female are randomly selected, find the probability that their mean pulse rate is less than 80 beats per minute.
## (a)
pnorm(q = 80, mean = 74, sd = 12.5)
[1] 0.6843863
## (b)
pnorm(q = 80, mean = 74, sd = 12.5/sqrt(16))
[1] 0.9725711


  1. In parts (a) and (b), identify whether the events are disjoint, independent, or neither (events cannot be both disjoint and independent).
    1. You and a randomly selected student from your class both earn A’s in this course.
    2. You and your class study partner both earn A’s in this course.
    3. If two events can occur at the same time, must they be dependent?
# (a) If the class is not graded on a curve, they are independent. If graded on a curve, then neither independent nor disjoint { unless the instructor will only give one A, which is a situation we will ignore in parts (b) and (c).
# (b) They are probably not independent: if you study together, your study habits would be related, which suggests your course performances are also related.
# (c) No. See the answer to part (a) when the course is not graded on a curve. More generally: if two things are unrelated (independent), then one occurring does not preclude the other from occurring.


  1. A portfolio’s value increases by 18% during a financial boom and by 9% during normal times. It decreases by 12% during a recession. What is the expected return on this portfolio if each scenario is equally likely?
# E(X) = 0.05


  1. What percentage of data that follow a standard normal distribution \(N(\mu=0, \sigma=1)\) is found in each region? Drawing a normal graph may help.
    1. \(Z < -1.75\)
    2. \(-0.7 < Z < 1.3\)
    3. \(|Z| > 1\)
#(a)
pnorm(-1.75)
[1] 0.04005916
#(b)
pnorm(1.3) - pnorm(-0.7)
[1] 0.6612359
#(c)
pnorm(-1) + (1 - pnorm(1))
[1] 0.3173105


  1. At a university, 13% of students smoke.
    1. Calculate the expected number of smokers in a random sample of 100 students from this university.
    2. The university gym opens at 9 am on Saturday mornings. One Saturday morning at 8:55 am there are 27 students outside the gym waiting for it to open. Should you use the same approach from part (a) to calculate the expected number of smokers among these 27 students?
# (a) E(X) = 100 * 0.13 = 13
# (b) No, these 27 students are not a random sample from the university's
# student population. For example, it might be argued that the proportion of
# smokers among students who go to the gym at 9am on a Saturday morning would
# be lower than the proportion of smokers in the university as a whole.


  1. Head lengths of Virginia opossums follow a normal distribution with mean 104 mm and standard deviation 6 mm.
    1. Compute the \(z\)-scores for opossums with head lengths of 97 mm and 108 mm.
    2. Which observation (97 mm or 108 mm) is more unusual or less likely to happen than another observation? Why?
#(a)
(97-104)/6
[1] -1.166667
(108-104)/6
[1] 0.6666667
#(b)
# 97 mm more unusal


  1. Suppose weights of the checked baggage of airline passengers follow a nearly normal distribution with mean 45 pounds and standard deviation 3.2 pounds. Most airlines charge a fee for baggage that weigh in excess of 50 pounds. Determine what percent of airline passengers incur this fee.
pnorm(q = 50, mean = 45, sd = 3.2, lower.tail = FALSE)
[1] 0.05908512


  1. Weights of adult human brains are normally distributed. Samples of weights of adult human brains, each of size \(n=15\), are randomly collected and the sample means are found. Is it correct to conclude that the sample means cannot be treated as being from a normal distribution because the sample size is too small?
# No. because the original population is normally distributed, the sample means
# will be normally distributed for any sample size, not just for n > 30.


  1. Annual incomes of physicians are known to have a distribution that is skewed to the right instead of being normally distributed. Assume that we collect a large (\(n > 30\)) random sample of annual income of physicians. Can the distribution of those incomes in that sample be approximated by a normal distribution because the sample is large?
# No. The sample of annual incomes will tend to have a distribution that is
# skewed to the right, no matter how large the sample is. If we compute the
# sample mean, we can consider that value to be one value in a normally
# distributed population.