Homework 1

Data Collection and Data Type

Homework
Important

Due Friday, Sep 5, 11:59 PM

Homework Questions

  1. Data Type. Identify each of the following as numerical or categorical data.
    1. The names of the pharmaceutical companies that manufacture aspirin tablets
    2. The colors of pills
    3. The weights of aspirin tablets
  # (a) Categorical
  # (b) Categorical
  # (c) Numerical


  1. Level of Measurements. Identify the level of measurement used in each of the following.
    1. The weights of people in a sample of Marquette engineering students.
    2. A physician’s descriptions of “abstains from alcohol, light drinker, moderate drinker, heavy drinker.”
    3. Tree classifications of “oak, maple, elm.”
    4. Bob measures time in days, with 0 corresponding to his birth date. The day before his birth is -1, the day after his birth is +1, and so on. Bib has converted the dates of major historical events to his numbering system. What is the level of measurement of these numbers?
  # (a) Ratio
  # (b) Ordinal
  # (c) Nominal
  # (d) Interval


  1. Discrete vs Continuous. Determine whether the data are discrete or continuous.
    1. The length of stay (in days) for each baby in a sample of babies born in Wisconsin.
    2. Several subjects are randomly selected and their heights are recorded.
    3. From a data set, we see that a female had an arm circumference of 32.49 cm.
    4. A sample of married couples is randomly selected and the number of children in each family is recorded.
  # (a) Discrete
  # (b) Continuous
  # (c) Continuous
  # (d) Discrete


  1. Sampling Method. Identify which of these types of sampling is used: random, stratified, or cluster.
    1. Dr. Yu surveys his statistics class by identifying groups of males and females, then randomly selecting 5 students from each of those two groups.
    2. Dr. Yu conducts a survey by randomly selecting 3 different classes at Marquette and surveying all of the students as they left those classes.
    3. 532 subjects were randomly assigned to (1) regular exercise or (2) no exercise groups to study the effectiveness of exercise in lowering blood pressure.
  # (a) Stratified
  # (b) Cluster
  # (c) Random


  1. Study Type. Determine whether the study is an experiment or an observational study, then identify a major problem with this study.
    1. In a survey conducted by USA Today, 1072 Internet users chose to respond to the question:“How often do you seek medical information online?” 38% of the respondents said “frequently.”
    2. The Physicians’ Health Study involved 22,071 male physicians. Based on random selections, 11,037 of them were treated with aspirin and other other 11,034 were given placebos. The study was stopped early because it became clear that aspirin reduced the risk of myocardial infarctions by a substantial amount.
  # (a) Observational: self-reported voluntary sample may not be representative and may be biased. The question is posted in E-newsletter, so the sample is biased from the begining.
  # (b) Experiment: Male physicians only. Better to include male and female who are not physicians.


  1. (MSSC) UK baby names. The visualization below shows the number of baby girls born in the United Kingdom (comprised of England & Wales, Northern Ireland, and Scotland) who were given the name “Fiona” over the years.
    1. List the variables you believe were necessary to create this visualization.
    2. Indicate whether each variable is numerical or categorical. If numerical, identify as continuous or discrete. If categorical, indicate if the variable is ordinal.

  # (a) Year, number of baby girls named Fiona born in that year, nation. 
  # (b) Year (numerical, discrete), number of baby girls named Fiona born in that year (numerical, discrete), nation (categorical, nominal).
  1. AI Usage Declaration. Using GenAI is permitted for this course. If you choose to use GenAI to assist with your homework, you must include a brief statement documenting your use. Please provide the following information:
    1. Why/How I Used AI Why do you need to use GenAI? Which tool did you use? Describe your prompts or questions. What and how did you ask the AI to help you?
    2. Generated Output Include a screenshot or excerpt (copy and paste) of the AI’s response.
    3. How I Used the Output Did you revise it? Did you use it directly, or compare it with your answers? What decisions did you make based on the output?