STAtistics / Mr. Hansen

Name: _______________________________________

10/18/2010

Elapsed time (should be 50 min., or 75 min. w/ extra time): _____

 

Test #2: Take-Home Version

Please read: Calculator is OK throughout. Point values are shown in parentheses. If a blank is provided, give the short answer that fits best. If a gap is provided, provide justification/explanation to show that you know what you are doing.

 

1.
(3+2)

The symbol  stands for _________________________________________ . Is this statistic resistant, or is it influenced by outliers? _____________________

 

 

2.
(6+5)

In the case of univariate (1-variable) data, which is what we were working with in Chapter 1, outliers could be identified by the following rule of thumb:

___________________________________________________________________________

___________________________________________________________________________

___________________________________________________________________________

___________________________________________________________________________

How do we identify outliers in a LSRL setting (i.e., when scatterplots are involved)?

___________________________________________________________________________

___________________________________________________________________________

___________________________________________________________________________

___________________________________________________________________________

 

 

3.
(2+2)

Sometimes, especially if a data point has an x value that is near the _____________ of the domain of x values, it is possible for a regression outlier to have little or no effect on the LSRL slope or r value. In other words, the point may not be __________________________________ (use the term or phrase that we learned).

 

 

4.
(3+2)

The symbol  represents _________________________________________ . Is it the same as the mean squared deviation from the population mean? (Circle one.) YES  NO

 

 

5.
(2+2)

The LSRL is the unique line that minimizes the __________ of the __________ residuals for a given scatterplot (y versus x).

 

 

6.
(4·2
=8)

The term residual refers to __________ minus __________ value. What we mean is this: A data point that far exceeds the value predicted by the LSRL model (or any other regression model: quadratic, exponential, logarithmic, custom, etc.) would have a large (circle one) positive   negative   residual, while a data point that is far below the value predicted by the model would have a large (circle one) positive   negative   residual.

 

 


 

7.

Recall that in class, we fitted a LSRL model to derive a rule of thumb for dating. Here are the raw data that we used:

 

 

 

Man’s Age

Minimum Age for Woman

 

 

 

 

20

18

 

 

 

 

25

21

 

 

 

 

52

35

 

 

 

 

40

29

 

 

 

 

70

45

 

 

 

 

18

16

 

 

 

 

30

23

 

 

 

 

 

(a)
(8)

State the LSRL. Be sure to define your variables.

 

 

 

 

 

 

 

 

(b)
(10)

Prove that a linear model is appropriate for the domain [18, 70]. Warning: Although you should compute and describe the meaning of the r value here as part of your proof, you need more.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

(c)
(3)

What is the numeric value of the LSRL slope? __________ Interpret this number in the context of the problem.

___________________________________________________________________________

___________________________________________________________________________

___________________________________________________________________________

 

 

(d)
(2+3)

What is the numeric value of the LSRL y-intercept? __________ Although the intercept is sometimes of interest, this particular model’s intercept is of no particular value all by itself. Why not?

___________________________________________________________________________

___________________________________________________________________________

___________________________________________________________________________

 

 

8.

Suppose that we have a scatterplot and an exponential fit showing an extremely strong exponential correlation between our x and y values.

 

 

(a)
(2+4)

Can we conclude that there is a cause-and-effect relationship between x and y? ____ Give at least two reasons to support your answer. Try to use the terminology used in the textbook reading.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

(b)
(2)

Can we use the exponential model to compute a believable  if an x value is specified, provided we do not extrapolate? ____

 

 

(c)
(2+2)

Can we use the exponential model to compute a believable x if a y (technically ) value is specified, provided we do not extrapolate? ____ Give a short reason for your answer.

___________________________________________________________________________

 

 

9.
(4+4)

Sketch a scatterplot (with LSRL overlaid) for which the model is  = –.4x + 2, with r = –.02. You will need a few tick marks in order to convince me that your slope is reasonable.


 

10.

Suppose that we have performed a LSRL fit and have sketched the residual plot.

 

 

(a)
(4)

What should a “bowl-shaped” residual plot tell us, even if the r value is close to 1 or –1?

 

 

 

 

 

 

(b)
(3)

What should a wavy (sinusoidal) residual plot tell us?

 

 

 

 

 

 

11.
(15)

Describe what is meant by the term lurking variable. Then describe all the ways you can think of for reducing or eliminating their impact on an experiment.