Take-home Exam 2

Take-home Exam for Week 7 to 11

Exam
Important

Due Wednesday, Nov 12, 11:59 PM

I recognize the importance of personal integrity in all aspects of life and work. I commit myself to truthfulness, honor and responsibility, by which I earn the respect of others. I support the development of good character and commit myself to uphold the highest standards of academic integrity as an important aspect of personal integrity. My commitment obliges me to conduct myself according to the Marquette University Honor Code.

Exam Problems (50 points)

  1. [30 points] A Marquette professor of College of Education believes that Marquette engineering students are talented and their mean ACT score is greater than 30. She would like to collect evidence of her claim and do a hypothesis testing. Marquette data center director said that the population of ACT score of engineering majors follows a normal distribution with the population variance of value 9.
    1. Is the claim a \(H_0\) or \(H_1\) claim? Write down \(H_0\) and \(H_1\) based on her claim. Is this a left-tailed, right-tailed, or two-tailed test? Please explain.
    2. Due to COVID-19, Marquette’s research budget is cut. The professor’s assistant can only collect a small random sample of size 16. Are one-sample \(z\)-test and/or \(t\)-test procedures appropriate with such a small size? If yes, why? If not, add conditions to make the procedures appropriate.
    3. The assistant never takes MATH 4720 and she only remembers that the professor said 31.7 is the threshold value to be compared with the observed sample mean \(\overline{x}\). Should she reject \(H_0\) when \(\bar{x} > 31.7\) or when \(\bar{x} < 31.7\)? Please explain.
    4. Given the rejection region in (c), write down the definition of the Type I error rate in the problem and determine its value.
    5. If the \(p\)-value is 0.1, what is the value of observed sample mean \(\overline{x}\)?
    6. Based on (c), (d), (e), Is \(H_0\) rejected?
# ------------
## (a) 
# ------------
# right-tailed, H1 claim
# H0: mu <= 30
# H1: mu > 30

# ------------
## (b) 
# ------------
# yes because the ACT score is normally distributed

# ------------
## (c) 
# ------------
# x_bar > 31.7 because it is a right-tailed test

# ------------
## (d)
# ------------
# type I error rate = P(reject H0 | H0 is true)
mu0 <- 30; x_bar_cv <- 31.7; sig2 <- 9; n <- 16
pnorm((x_bar_cv - mu0) / sqrt(sig2 / n), lower.tail = FALSE)
[1] 0.0117053
# ------------
## (e)
# ------------
p_val <- 0.1
qnorm(p_val, mean = mu0, sd = sqrt(sig2 / n), lower.tail = FALSE)
[1] 30.96116
# ------------
## (f)
# ------------
# Do not reject H0
# There is insufficient evidence to support the claim that engineering students has mean ACT > 30.

# ------------
## (g) 
# ------------
# type II error rate = P(do not reject H0 | H0 is false)
x_bar_cv_new <- 31.5
mu1 <- 31
pnorm((x_bar_cv_new - mu1) / sqrt(sig2 / n))
[1] 0.7475075


  1. [20 points] A corporation in Chicago makes insulation shields for electrical wires using three types of machines. The company wants to evaluate the variation in the inside diameter dimensions of the shields produced by the machine A, B and C. A quality control engineer randomly selects shields produced by each of the machines and records the inside diameter of each shield (in millimeters), as shown in Table 1. The data can be downloaded at shield_machine.csv. She wants to determine whether the mean diameter produced by the three machines differ.
    1. Check normality and homogeneity of variance using QQ-plot and boxplot. Comment on your plots.
    2. Based on your findings in (a), would it be appropriate to proceed with an analysis of variance (ANOVA)? Please explain.
    3. If your answer is YES in (b), conduct ANOVA on the original data. If NOT, do the natural log (log with base \(e = 2.71828...\)) transformation on the data and show the transformed data satisfy ANOVA assumptions. Then conduct ANOVA on the transformed data. Show the ANOVA table and determine whether the mean diameters differ.
Table 1: Sampled data of the inside diameter of shields produced by machine A, B and C.
Shield Machine A Machine B Machine C
1 31.0 34.2 39.3
2 7.0 39.8 36.2
3 8.2 15.8 47.8
4 9.1 29.8 58.7
5 14.4 61.1 82.3
6 10.5 31.1 30.8
7 10.5 18.3 11.0
8 5.1 10.6 15.0
9 18.2 11.2 46.3
10 25.2 20.0 36.4
11 6.5 26.9 54.5
12 13.4 52.6 25.3
13 23.7 65.2 24.6
14 17.0 37.3 90.5
15 8.6 19.0 64.0
# ------------
## (a)
# ------------
shield_machine <- read.csv("shield_machine.csv")
shield_machine_new <- matrix(shield_machine[, 1], 15, 3)

car::qqPlot(shield_machine_new[, 1])

[1]  1 10
car::qqPlot(shield_machine_new[, 2])

[1] 13  5
car::qqPlot(shield_machine_new[, 3])

[1] 14  5
# not very normal but OK

boxplot(shield_machine_new)

## variances are not homogeneous

# ------------
## (b)
# ------------
## Not appropriate because ANONA requires normality of data and homogeneous variances.

# ------------
## (d)
# ------------
apply(shield_machine_new, 2, var)
[1]  59.37781 296.37210 524.69314
apply(shield_machine_new, 2, mean)
[1] 13.89333 31.52667 44.18000
apply(shield_machine_new, 2, var)/ (apply(shield_machine_new, 2, mean)^2)
[1] 0.3076177 0.2981818 0.2688153
## log-transformation
shield_machine_log <- shield_machine
shield_machine_log[, 1] <- log(shield_machine_log[, 1])
lm_res_log <- lm(diameter ~ machine, data = shield_machine_log)
anova(lm_res_log)
Analysis of Variance Table

Response: diameter
          Df Sum Sq Mean Sq F value    Pr(>F)    
machine    2 10.456  5.2278  16.404 5.435e-06 ***
Residuals 42 13.385  0.3187                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
shield_machine_log_new <- apply(shield_machine_new, 2, log)
apply(shield_machine_log_new, 2, var)
[1] 0.2871122 0.3234484 0.3454946
boxplot(shield_machine_log_new)

# fitted_val_log <- matrix(lm_res_log$fitted.values, 15, 3)
# colnames(fitted_val_log) <- c("A", "B", "C")
# qqPlot(lm_res_log$residuals, pch = 19, id = FALSE, ylab = "residuals", 
#        main = "Normal Probability Plot for Residuals")
# plot(lm_res_log$fitted.values, lm_res_log$residuals, xlab = "Fitted Value",
#      ylab = "Residual", main = "Versus Fits", pch = 19, col = "red")
# abline(h = 0)

## another way
# aov_res <- aov(diameter ~ machine, data = data_sim_log)
# summary(aov_res)
  1. AI Usage Declaration. Using GenAI is permitted for this course. If you choose to use GenAI to assist with your exam, you must include a brief statement documenting your use. Please provide the following information:
    1. Why/How I Used AI Why do you need to use GenAI? Which tool did you use? Describe your prompts or questions. What and how did you ask the AI to help you?
    2. Generated Output Include a screenshot or excerpt (copy and paste) of the AI’s response.
    3. How I Used the Output Did you revise it? Did you use it directly, or compare it with your answers? What decisions did you make based on the output?