AP Statistics / Mr. Hansen
Test #2, 10/21/2003 (Chapters 3 and 4)

Name: _______________________

General Instructions: Raise your hand if you have a question. Write answers in the space provided. If you need additional room, write "OVER" and use reverse side. You may find the provided formula sheet and z table to be helpful.

Part I: Multiple Choice (8 points for each correct answer, 0 points for each omission, –2 points for a wrong guess).
Circle the letter of the best choice. Because of the penalty for wrong answers, it is not to your advantage to guess unless you can positively rule out one or more choices. If you guess, a random guess is recommended.

1.

In any normal distribution, approximately what percentage of the data values are between 1.5 and 2.5 standard deviations below the mean?

 

(A) 5
(B) 6
(C) 7

(D) 8
(E) 9

 

 

 

2.

Suppose that we have a univariate distribution in which 95% of the values lie within plus or minus 2 standard deviations of the mean. Can we conclude that the distribution is normal?

 

(A) No. Although the Empirical Rule is satisfied, the Empirical Rule describes what must be true if we have a normal distribution, not conversely.

(B) No, since the Empirical Rule is not satisfied.

(C) Yes, since the Empirical Rule is satisfied.

(D) Yes, even though the Empirical Rule is not satisfied.

(E) Yes. Although checking the Empirical Rule is helpful, and although the Empirical Rule appears to be satisfied in this case, we must always look at the data.

 

 

3.

What is the meaning of the formula ?

 

 

 

(A) In order to compute a z score, divide the difference between the sample mean and the population mean by the number of standard deviations.

 

(B) In order to compute a z score, divide the difference between the observed data point and the population mean by the number of standard deviations.

 

(C) In order to compute a z score, divide the difference between the sample mean and the population mean by the size of the s.d., thus giving an answer that is dimensionless (i.e., has no units).

 

(D) In order to compute a z score, divide the difference between the observed data point and the population mean by the size of the s.d., thus giving an answer that is dimensionless (i.e., has no units).

 

(E) None of these.

 

 

4.

On his math SAT, Bill scores 620. What is this as a percentile if the mean is 510 and the s.d. is 98?

 

(A) 72
(B) 77
(C) 82

(D) 87
(E) 92

 

 

 

5.

On her verbal SAT, Mildred scores 484. What is this as a z score if the mean is 500 and the s.d. is 102?

 

(A) 0
(B) −0.16
(C) −16

(D) 0.16
(E) 16

 

 

 

[The following question was changed from the original. Taleb’s most famous book did not come out until 2007.]

 

 

6.

In class, we discussed the essential points of Nassim Nicholas Taleb’s bestselling tome The Black Swan, in which the author trashes people who use statistics for economic forecasting. Taleb, himself a multimillionaire as a result of shrewd contrarian investments that paid off big in 1987 and again in 2008, could be characterized as a believer in the concept of . . .

 

(A) fat tails
(B) Gaussian models
(C) extrapolation

(D) quantitative risk analysis
(E) credit-default swaps


Part II: Short Answer (2 pts./blank, 20 pts. total).
Fill in each blank with the word, symbol, or phrase that best fits.

7.   For males, greater height is associated with reduced comfort in airplane seats. If both quantities are expressed as quantitative variables, we would say that height and comfort level are _____________________ associated.

 

8.   In mathematics, the term parameter means an “adjustable constant” or a value that determines the shape of a relationship between quantities. In statistics, the term parameter means __________________________________________ .

 

9.   The two parameters of a normal distribution, in the mathematical sense, are ______________ , which measures central tendency, and ____________ , which measures dispersion. In other words, these two values together will determine the position and width of the curve. Do these two quantities also serve as parameters in the statistical sense of the word? ______________

 

10.  In a linear regression, the value _____ , known as the coefficient of determination, tells us what part (as a decimal fraction) of the variation in the response variable can be explained by variation in the ______________ variable.

 

11.  If r = –.847, we would say that there is a ___________  ___________  ___________  association between the variables in our scatterplot.

 

Part III. Essay (10 pts.).

12. If r = .0015, can we conclude that there are no patterns in our data? Try to mention at least two types of patterns that could be present even though the r value is so low. Complete sentences are not required, and you may be very concise, especially if you use visual aids.

 

 

 

 

 

 

 

 

 

 

 

 

 


Part IV. Regression Exercise (22 pts.).
Show adequate work and circle your answers. In any problem requiring a graph, be sure to show units and the name of the variable on each axis.

 

13. The Commonwealth of Verhoovia has determined, over a period of years, that real GP (gross product, a measure of economic output) seems to be negatively related to the income tax rate as a percentage. (“Real” means that the effects of inflation have been canceled out through the use of a constant index value for the currency.) The following data are provided.

 

 


Year

Real GP (billions of
constant currency units)

Income tax
rate (%)

 

1981

1,518

14.0

 

1982

1,465

15.5

 

1983

1,488

16.0

 

1984

1,529

15.0

 

1985

1,622

13.0

 

1986

1,669

12.5

 

1987

1,720

11.5

 

1988

1,799

11.5

 

1989

1,835

11.0

 

1990

1,918

12.0

 

1991

1,878

12.8

 

1992

1,943

12.0

 

1993

1,966

12.0

 

1994

2,015

11.0

 

1995

2,060

10.5

 

1996

2,092

10.0

 

(a)  What fraction of the variation in real GP can be explained by the variation in income tax rate? ________

      What fraction of the variation in income tax rate can be explained by the variation in real GP? ________

 

(b)  Compute the linear correlation coefficient between tax rate and real GP, and identify your answer using proper notation.
                                                          ______ = ______________

Is this strong or weak? _________ Negative or positive? __________

 

(c)  Compute the linear correlation coefficient between calendar year and real GP, and identify your answer using proper notation.

 

      _______ = ______________

Is this strong or weak? _________ Negative or positive? __________

 

(d)  Use linear regression to estimate the 1997 Verhoovian real GP using a model based on calendar year as the sole explanatory variable. Show adequate work as discussed in class. Circle your answer and give proper units.

 

 

 

 

 

(e)  Use linear regression to estimate the 1997 Verhoovian real GP using a model based on income tax rate as the sole explanatory variable. (The income tax rate for 1997 was 9.5%.) Show adequate work as discussed in class. Circle your answer and give proper units.

 

 

 

 

 

(f)   Explain briefly why both (d) and (e) produce unreliable estimates.

 

 

(g)  The use of economic statistics to “prove” that one tax rate policy choice is better for the economy than another is commonplace. Incorporate what you have learned from this little exercise into a paragraph describing some of the problems that people who use economic statistics to argue for or against tax rate adjustments (as well as the people who listen to them) should be aware of.