STAtistics / Mr. Hansen |
Name: _______________________ |
General Instructions: Raise your hand if you have a question. Write answers in the space provided. If you need additional room, write "OVER" and use reverse side. You may find the provided z table to be helpful.
1.
|
In any normal
distribution, approximately what percentage of the data
values are between 1.5 and 2.5 standard deviations below the mean? Show some work for
partial credit.
|
||
|
(A) 5
|
(D) 8
|
|
|
|
|
|
2.
|
Suppose that
we have a univariate distribution in which 95% of the
values lie within plus or minus 2 standard deviations of the mean. Can we
conclude that the distribution is normal?
|
||
|
(A) No.
Although the Empirical Rule is satisfied, the Empirical Rule describes what
must be true if we have a normal distribution, not conversely.
(B) No,
since the Empirical Rule is not satisfied.
(C) Yes,
since the Empirical Rule is satisfied.
(D) Yes, even
though the Empirical Rule is not satisfied.
(E) Yes.
Although checking the Empirical Rule is helpful, and although the Empirical
Rule appears to be satisfied in this case, we must always look at the data.
|
||
|
|
||
3.
|
What is the meaning of the formula
|
||
|
|
||
|
(A) In
order to compute a z score, divide the
difference between the sample mean and the population mean by the number of
standard deviations.
|
||
|
(B) In
order to compute a z score, divide
the difference between the observed data point and the population mean by the
number of standard deviations.
|
||
|
(C) In
order to compute a z score, divide
the difference between the sample mean and the population mean by the size of
the s.d., thus giving an answer that is
dimensionless (i.e., has no units).
|
||
|
(D) In
order to compute a z score, divide
the difference between the observed data point and the population mean by the
size of the population s.d., thus giving an answer
that is dimensionless (i.e., has no units).
|
||
|
(E) In
order to compute a z score, divide
the difference between the observed data point and the population mean by the
size of the sample s.d., thus giving an answer that
is dimensionless (i.e., has no units).
|
||
|
|
||
4.
|
On his math SAT, Bill scores 620. What is this as a percentile if the mean
is 510 and the s.d. is 98?
|
||
|
(A) 72
|
(D) 87
|
|
|
|
|
|
5.
|
On her verbal
SAT, Mildred scores 484. What is this as a z score if the mean is 500 and the s.d.
is 102?
|
||
|
(A) 0
|
(D) 0.16
|
|
|
|
||
Part II: Short Answer.
Fill in each blank with the word,
symbol, or phrase that best fits.
6. Sketch a “good” residual plot for a linear regression and then, underneath, a “bad” residual plot. What should the bad plot be telling us? (Write a sentence.)
7. For males, greater height is associated with reduced comfort in airplane seats. If both quantities are expressed as quantitative variables, we would say that height and comfort level are _____________________ associated.
8. In mathematics, the term parameter means an “adjustable constant” or a value that determines the shape of a relationship between quantities. In statistics, the term parameter means __________________________________________ .
9. The two parameters of a normal distribution, in the mathematical sense, are ______________ , which measures central tendency, and ____________ , which measures dispersion. In other words, these two values together will determine the position and width of the curve.
10. In a linear regression, the value _____ , known as the coefficient of determination, tells us what part (as a decimal fraction) of the variation in the response variable can be explained by variation in the ______________ variable.
11. If r = –.847, we would say that there is a ___________ ___________ ___________ association between the variables in our scatterplot.
Part III. Essay Questions.
12. Describe exactly what is meant by control, randomization, and replication in connection with experimental design. Approximately one or two sentences for each should suffice.
Part IV. Regression Exercise.
Show adequate work and circle your answers. In any problem requiring a graph,
be sure to show units and the name of the variable on each axis.
13. The Commonwealth of Verhoovia has determined, over a period of years, that real GP (gross product, a measure of economic output) seems to be negatively related to the income tax rate as a percentage. (“Real” means that the effects of inflation have been canceled out through the use of a constant index value for the currency.) The following data are provided.
|
|
Real GP (billions of |
Income tax |
|
1981 |
1,518 |
14.0 |
|
1982 |
1,465 |
15.5 |
|
1983 |
1,488 |
16.0 |
|
1984 |
1,529 |
15.0 |
|
1985 |
1,622 |
13.0 |
|
1986 |
1,669 |
12.5 |
|
1987 |
1,720 |
11.5 |
|
1988 |
1,799 |
11.5 |
|
1989 |
1,835 |
11.0 |
|
1990 |
1,918 |
12.0 |
|
1991 |
1,878 |
12.8 |
|
1992 |
1,943 |
12.0 |
|
1993 |
1,966 |
12.0 |
|
1994 |
2,015 |
11.0 |
|
1995 |
2,060 |
10.5 |
|
1996 |
2,092 |
10.0 |
(a) What fraction of the variation in real GP can be explained by the variation in income tax rate? ________
What fraction of the variation in income tax rate can be explained by the variation in real GP? ________
(b) Compute the linear correlation coefficient
between tax rate and real GP, and identify your answer using proper notation.
______
= ______________
Is this strong or weak? _________ Negative or positive?
__________
(c) Compute the linear correlation coefficient between calendar year and real GP, and identify your answer using proper notation.
_______ = ______________
Is this strong or weak? _________ Negative or positive?
__________
(d) Use linear regression to estimate the 1997 Verhoovian real GP using a model based on calendar year as the sole explanatory variable. Show adequate work as discussed in class. Circle your answer and give proper units.
(e) Use linear regression to estimate the 1997 Verhoovian real GP using a model based on income tax rate as the sole explanatory variable. (The income tax rate for 1997 was 9.5%.) Show adequate work as discussed in class. Circle your answer and give proper units.
(f) Explain briefly why both (d) and (e) produce unreliable estimates.