AP Statistics / Mr. Hansen
11/11/2004

Name: _________________________

Test on Chapters 3 and 4
(and recent classroom discussions)

 

Part I. Fill in the blanks (3 pts. each).

 

 

1.

The __________________ of a survey should really be called the “margin of sampling error,” since other types of error are not included and can often be significant. For example, the __________________ of the questions on a survey can play a large role in the outcome of the data collected. Proof of that appears in a Nov. 6, 2004, New York Times op-ed piece written by Gary Langer, the director of polling for ABC News, in which he explained how the recent reporting of “moral values” as a campaign issue suddenly of greater concern to voters may be completely phony, a mere artifact of the choices posed by the exit polls.

 

 

2.

In any linear least-squares regression, the __________________ (each one of which is computed by subtracting the predicted y value from the actual y value) always add up to 0.

 

 

3.

The LSRL is the unique line that minimizes the __________________ of __________________ residuals.

 

 

4.

“Transformations to achieve linearity” is the general procedure for finding a nonlinear function that does a good job of fitting the points on a scatterplot. For example, suppose that we have a good idea that the fit is exponential, i.e., that y » abx for suitable constants a and b. We begin by taking the __________________ of both sides (since that is the inverse of exponentiation) and then performing a LSRL fit to estimate slope and intercept values for predicting the __________________ of y.

In a similar way, we could find a curve that would fit any other nonlinear situation, provided the hypothesized predictor function is invertible. Suppose that we have good reason to believe that
y
» f (x), where f is an invertible function. We begin by applying __________________ to both sides, so that the right hand side becomes either x or a simple linear function of x. We then apply the __________________ procedure to find constants b0 and b1 for slope and intercept. We now have a model that says f –1 (y) » b0 + b1x, to which we can apply the f function to both sides. Our final conclusion is that , which we can use as our model for prediction purposes.

 

 

5.

In any LSRL model, the point __________________ must lie on the graph of the predictor line, even if that point is not present in the data shown on the scatterplot. Suppose, however, that the point mentioned is actually a data point. (That could happen, although it is rare in real-world data sets.) In that case, how likely is the point to be a regression outlier? __________________ How likely is the point to be an influential observation? __________________ (For each of the last two blanks, please answer with “totally impossible,” “unlikely,” “somewhat likely,” “very likely,” or “virtually certain.”)

 

 

6.

Notation check: The standard deviation of the explanatory variable in a scatterplot is denoted ___________ , and the standard deviation of the response variable is denoted ___________ . The predicted value of the response variable is denoted ___________ , while the actual value is denoted ___________ .

 

 

7.

Let lower case letters a, b, etc. denote the parameters of a curve fitting. A quadratic fit has the general equation __________________ , while a power fit has the general equation .

 

 

8.

A difference that is too large to be plausibly explained by chance alone is said to be ____________________________________ .

 

 

 

 

 

 

 

Part II. Essays (12 pts. each). Complete sentences are not required. A literate, clear presentation is required, however.

 

 

9.(a)

In the 2000 presidential election, the popular vote for Gore exceeded that for Bush by a statistically significant margin. However, when the electoral votes were aggregated, Bush won by a slim margin. Explain how this phenomenon is an example of a statistical paradox we have studied. (In other words, don’t merely give the vocabulary term; also explain why the term is appropriate.)

 

 

 

 

 

 

 

 

 

 

(b)

What are the coefficients r and r2 in the LSRL context? Give their full names and describe, in approximately one sentence each, what each one signifies.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Part III. Free response (24 pts. total).

 

 

Problems 10-14 refer to the following table. Show work underneath or on a blank sheet of paper.

 

 

 

 

 

Men’s
Shoe Size


Weight (lbs.)

 

 

 

 

 

 

8.5

105

 

 

 

 

 

 

9

110

 

 

 

 

 

 

9.5

120

 

 

 

 

 

 

10

130

 

 

 

 

 

 

11

152

 

 

 

 

 

 

11.5

165

 

 

 

 

 

 

12

175

 

 

 

 

 

 

12.5

190

 

 

 

 

 

 

13

200

 

 

 

 

 

 

14

222

 

 

 

 

 

 

 

 

10.

Make a scatterplot in which weight is the response variable.

 

11.

State 3 models (equations) that would be of possible value as predictors. Compute the parameters of each model, and identify the models by name. No work need be shown.

 

12.

Determine which of your 3 models is “best” in the sense of being most useful and most in accordance with the physical processes underlying the data. Support your answer with words, equations, and/or diagrams, whichever is appropriate.

 

13.

Predict the weight of a man whose shoe size is 10.5, to the nearest pound.

 

14.

Predict the shoe size associated with a man who weighs 180 lbs. Give answer to the nearest tenth.