The Mid-Term Project is worth 100 points. Please download this document to your computer and save it using the naming convention specified in the course syllabus. For the Mid-Term Project you will be using the MM207 Student Data Set, the survey codebook, and StatCrunch as necessary. You should enter your answers/responses directly after the question. There is no need to retype the project. After completing and saving the project, submit your project in the Mid-Term Drop Box.
In the course, go to Unit 4 -> Instructor Graded Project -> StatCrunch to access the MM207 Student Data Set. When
the page loads you will need to click on Data Set on
the left side of th e page.
You do not need a StatCrunch ID or a password to access
the data set; simply click on Data Set to load
the data file.
Name: Paul Montano
Unit 4 Mid Term

1. Identify the implied population in the following situation.

According to a recent report, it was found that Onglyza, in combination with diet and exercise, is effective for treating diabetes.

The individuals receiving treatment for diabetes make up the implied population.

2. Identify the type of statistical study conducted in the following scenario.

A recent telephone survey conducted by Gallop polled 1,018 adults, with 22% of the respondents indicating that they had smoked cigarettes in the previous week.

– The statistical study conducted in this scenario is a proportion test, as evidenced by the presence of a single variable and a binary answer choice (Yes or No). This indicates that the data is nominal and includes a sample size of 1,018 adults.

3. In the given scenario, what is the statistic and the corresponding parameter it would estimate?

According to the National Highway Traffic Safety Administration, 75% of drivers aged 70 and over in a recent study of 460 individuals had uncorrected vision problems.

The statistic in this scenario is that 75% of the 460 drivers studied. In order to find the parameter in inferential statistics, you will need to use a sampling distribution. The sample size would be scaled and include a margin for errors in order to estimate the true population.

The MM207 Student Data Set was collected using a Random Sampling method. This method ensured that each student had an equal opportunity to take the class and complete the questionnaire. Random sampling is considered the most effective way to gather a general outlook because it is impossible to survey everyone at Kaplan.

5. In the MM207 Student Data Set, there is one discrete variable and one continuous variable. Let me explain:

a) Discrete: Q17 represents a finite number of options that a person can have. For example, you cannot have 0.5 pets or any fractional amount of pets.

b) Continuous: On the other hand, Q3 does not have a finite number of options and can take on any value, including non-whole numbers or fractions. A continuous variable encompasses an infinite number of possible values.

6. Identify the following variables from the MM207 Student Data Set:

a) A variable measured at the nominal level of measurement.

– The Gender sample is a nominal variable, it has no sense of order and has a set list of possible outcomes. Nominal data is commonly associated with frequencies and proportions.

b) A variable measured at the ratio level of measurement.

In Q3, the height can be measured in inches using a ratio level. Ratio is the most precise form of measurement as it allows for whole numbers and fractions. Ratios are commonly used in conjunction with mean, median, and standard deviation calculations. Additionally, a variable can be measured at the ordinal level.

The data set in Q10 demonstrates an Ordinal variable. Similar to nominal data, the data can be displayed in a frequency. Ordinal variables often work with frequencies and proportions. However, they are separate from nominal variables and sometimes work with means as well.

7. The approximate percentage of students represented in the data set who are between the ages of 29 and 45 inclusive is approximately 49%. This calculation is based on the fact that 86 out of the total of 175 people in the data set fall within this age range.

8. Prepare suitable graphs for the following variables, in order to present a report on MM207 statistics students at Kaplan University to the Kaplan Board of Trustees.

a) The various majors of students enrolled in MM207.

b) The amount of time that students in MM207 dedicate to schoolwork.

Using the range rule of thumb, estimate the standard deviation for the number of credit hours students in this sample are taking and the shoe sizes of the females in the class. Then compute the actual standard deviation using StatCrunch and compare the results. a) Does the range rule of thumb overestimate or underestimate the standard deviation for number of credit hours? -The range rule of thumb overestimates the standard deviation for credit hours. The estimated deviation is 3.75 while the actual deviation is 2.56

The range rule of thumb slightly overestimates the standard deviation for shoe sizes of females, with an estimated standard deviation of 1.75 compared to the actual standard deviation of 1.29.

c) The impact of the distribution’s shape on the conclusion lies in both data sets falling within the rule of thumb estimate and the actual standard deviation. Both data sets have a larger range due to multiple data samples falling outside the standard deviation.

10. Compare the variability in the number of hours spent on school work (Q11) and watching television (Q14) using measures of center and measures of variability. Determine which variable has greater variability and provide an explanation for your reasoning.

b) What is the value at the 10th percentile for the number of hours on school work?
c) What is the value at the 90th percentile for the number of hours watching television?
Sum:
Column
(n)
Mean
Variance
Std. dev.
Std. err.
Median
Range
Min
Max
Q1
Q3
Q14
173
7.54
35.30
5.94
0.45
6
25
0
25n3
<10

Column
n
Mean
Variance
Std. dev.
Std. err.
Median
Range
Min
Max
Q1
Q3
Q11

171
17.15
82.21
9.07
0.69
16
42
3
45
10
20

The standard deviation for question 14 is 5.94 and the standard deviation for question 11 is 0.69. Despite the smaller standard deviation for question 11, its range indicates greater variability.
When entering the optional 10th percentile in StatCrunch, the result shown is 6.
When entering the optional 90th percentile in StatCrunch, the result shown is 16.
