Test on Chapters 3 and 4

AP Statistics / Mr. Hansen
11/11/2004

Name: _________________________

Test on Chapters 3 and 4
(and recent classroom discussions)

		Part I. Fill in the blanks (3 pts. each).

1.		The __________________ of a survey should really be called the “margin of sampling error,” since other types of error are not included and can often be significant. For example, the __________________ of the questions on a survey can play a large role in the outcome of the data collected. Proof of that appears in a Nov. 6, 2004, New York Times op-ed piece written by Gary Langer, the director of polling for ABC News, in which he explained how the recent reporting of “moral values” as a campaign issue suddenly of greater concern to voters may be completely phony, a mere artifact of the choices posed by the exit polls.

2.		In any linear least-squares regression, the __________________ (each one of which is computed by subtracting the predicted y value from the actual y value) always add up to 0.

3.		The LSRL is the unique line that minimizes the __________________ of __________________ residuals.

4.		“Transformations to achieve linearity” is the general procedure for finding a nonlinear function that does a good job of fitting the points on a scatterplot. For example, suppose that we have a good idea that the fit is exponential, i.e., that y » ab^x for suitable constants a and b. We begin by taking the __________________ of both sides (since that is the inverse of exponentiation) and then performing a LSRL fit to estimate slope and intercept values for predicting the __________________ of y. In a similar way, we could find a curve that would fit any other nonlinear situation, provided the hypothesized predictor function is invertible. Suppose that we have good reason to believe that y » f (x), where f is an invertible function. We begin by applying __________________ to both sides, so that the right hand side becomes either x or a simple linear function of x. We then apply the __________________ procedure to find constants b₀ and b₁ for slope and intercept. We now have a model that says f ^–1(y) » b₀ + b₁x, to which we can apply the f function to both sides. Our final conclusion is that , which we can use as our model for prediction purposes.

5.		In any LSRL model, the point __________________ must lie on the graph of the predictor line, even if that point is not present in the data shown on the scatterplot. Suppose, however, that the point mentioned is actually a data point. (That could happen, although it is rare in real-world data sets.) In that case, how likely is the point to be a regression outlier? __________________ How likely is the point to be an influential observation? __________________ (For each of the last two blanks, please answer with “totally impossible,” “unlikely,” “somewhat likely,” “very likely,” or “virtually certain.”)

6.		Notation check: The standard deviation of the explanatory variable in a scatterplot is denoted ___________ , and the standard deviation of the response variable is denoted ___________ . The predicted value of the response variable is denoted ___________ , while the actual value is denoted ___________ .

7.		Let lower case letters a, b, etc. denote the parameters of a curve fitting. A quadratic fit has the general equation __________________ , while a power fit has the general equation .

8.		A difference that is too large to be plausibly explained by chance alone is said to be ____________________________________ .



		Part II. Essays (12 pts. each). Complete sentences are not required. A literate, clear presentation is required, however.

9.(a)		In the 2000 presidential election, the popular vote for Gore exceeded that for Bush by a statistically significant margin. However, when the electoral votes were aggregated, Bush won by a slim margin. Explain how this phenomenon is an example of a statistical paradox we have studied. (In other words, don’t merely give the vocabulary term; also explain why the term is appropriate.)





(b)		What are the coefficients r and r² in the LSRL context? Give their full names and describe, in approximately one sentence each, what each one signifies.





	Part III. Free response (24 pts. total).
	Problems 10-14 refer to the following table. Show work underneath or on a blank sheet of paper.

	Men’s Shoe Size		Weight (lbs.)
	8.5		105
	9		110
	9.5		120
	10		130
	11		152
	11.5		165
	12		175
	12.5		190
	13		200
	14		222

10.	Make a scatterplot in which weight is the response variable.
11.	State 3 models (equations) that would be of possible value as predictors. Compute the parameters of each model, and identify the models by name. No work need be shown.
12.	Determine which of your 3 models is “best” in the sense of being most useful and most in accordance with the physical processes underlying the data. Support your answer with words, equations, and/or diagrams, whichever is appropriate.
13.	Predict the weight of a man whose shoe size is 10.5, to the nearest pound.
14.	Predict the shoe size associated with a man who weighs 180 lbs. Give answer to the nearest tenth.