|
Name: ____________________________ |
Take-Home Test on Chapters 12-14 (omitting §14.2)
Ground Rules: Collaboration is acceptable, but copying is not allowed. If you have nearly completed your test (i.e., have made a good stab at all problems, with most of the work done), you can get some assistance from your teacher in class on Monday, May 1. All work shown must be your own. For full credit, work all problems thoroughly, showing sufficient work, hypotheses, assumptions, conclusion in the context of the problem, etc. If you have any questions on what constitutes full credit, send voice mail to 703.599.6624 or e-mail.
1. A disease in the news |
As you may know, New York City Mayor Rudolph Giuliani announced on April 27 that he has prostate cancer. On the Web, visit the New England Journal of Medicine (www.nejm.org) and read the abstract of the article that appeared on Dec. 9, 1999, entitled "Immediate Hormonal Therapy Compared with Observation after Radical Prostatectomy and Pelvic Lymphadenectomy in Men with Node-Positive Prostate Cancer." |
|
||||
(a) |
[Answer part (a) only if your last name starts with a letter from the first half of the alphabet (A-M).] |
|||||
(b) |
[Answer part (b) only if your last name starts with a letter from the second half of the alphabet (N-Z).] Is there convincing evidence that immediate antiandrogen therapy (with either goserelin or bilateral orchiectomy) produces a higher success rate, as compared to patients who are merely observed, for prostate cancer patients of the type represented in the study? By "success" we mean that the patient is still alive, shows no evidence that the prostate cancer has returned, and has a normal blood PSA level. |
|||||
(c) |
[All students must answer.] Showing all work, compute a 90% confidence level for the true difference between the death rates, if you did part (a), or success rates, if you did part (b). Is this difference of practical significance? ____ |
|||||
|
As you may know, there has been an organized effort in recent years to change people’s ideas on how infants should be placed in their cribs. Please visit the Web site for the Journal of the American Medical Association (www.jama.com) and skim the article in the April 26, 2000, issue entitled "Factors Associated With Caregivers' Choice of Infant Sleep Position, 1994-1998: The National Infant Sleep Position Study." You need not read the article completely. | |||||
(a) |
Explain in everyday language what the article means by the terms supine, prone, and lateral. For this and other questions, you may wish to use a string search command to locate terms in context within the article. | |||||
(b) |
Explain in everyday language what the American Academy of Pediatrics (AAP) currently recommends for infant sleep position. | |||||
(c) |
What do you think is meant by "reweighting"? Why was it appropriate to consider performing a weighted analysis with this sample, and why did the researchers ultimately decide not to? | |||||
(d) |
The left side of the article window contains a link for "Index of Figures and Tables." Follow this link so that you can access Table 1 (Frequency of Reported Sleep Position Recommendations). Using only the clues found on the Web page for Table 1, demonstrate how the P-values of 0.002 and 0.10 were obtained in the rightmost column. Show a little bit of work, enough to indicate how you computed these values and to prove that they did not appear by magic. This is a fairly challenging problem that will require some thinking on your part. If you are stuck, leave a message at 703.599.6624 or on e-mail, and a hint will be phoned or e-mailed back to you within 24 hours. (If you want a phone reply, indicate the best time to call. If you want an e-mail reply, be sure to state your e-mail address clearly.) |
|||||
(e) |
Roll a fair die, or if none is available, select a random integer from 1 to 6 with your calculator. Don’t use someone else’s roll—use your own! Take the first result you get. Based on the outcome, examine one of the 6 "all sources" rows at the very bottom of the table. (For example, if you rolled a 5, you would examine the row for "Prone and other" recommendations from ALL SOURCES.) Is there evidence of a change across the 3 time points in the likelihood of nighttime infant caregivers to have heard this recommendation? Describe what test you will use, check your assumptions, state hypotheses, show sufficient work, compute the P-value, and state a conclusion. | |||||
(f) |
Did the likelihood you examined in part (e) seem to increase or decrease between 1994 and 1997-98? | |||||
(g) |
Explain why your P-value from part (e) does not address the question in part (f). | |||||
3. Every teacher’s dream, every student’s nightmare |
The following table shows real data taken from the most recent Honors AP Calculus (HappyCal) test. The test consisted of two portions, namely a short memorization portion (25 points) and a much longer multiple-choice and free-response portion (75 points). The second part was much more stressful for the students and was much more difficult to grade. In fact, the second part almost did not make the 5-class-day grading cutoff, but that’s another story. | |||||
Student # |
Score on Part 1 | Score on Part 2 | ||||
1 |
21 21 22.5 23 23 23 24 25 25 26 26 26 26 26 26 |
38 60 49 53 46 47 53 51 58 59 59 55 54 55 62 |
||||
(a) |
As a teacher, I would love to be able to say that the short (and easy to grade) Part 1 is a valid predictor of how a student will do on Part 2. However, only about ____ % of the variation in Part 2 scores can be explained by ________________________ . The rest of the variation in Part 2 scores is attributable to _________________ . Fill in the blanks and explain (very briefly) what you did to compute the first blank (no work needed): _________________________________ | |||||
(b) |
Just because Part 1 is not a very good predictor of Part 2 doesn’t mean that Part 1 is worthless for prediction. In fact, a linear fit shows that we can estimate a student’s Part 2 score (y) from his Part 1 score (x) using the equation ___________________ . Show the residual plot here: Do the residuals give any reason to doubt the assumptions for regression inference? Re-read (or read for the first time) the 3 paragraphs with boldface headings on pp.774-775 of your text and the 2 paragraphs that follow, and perform the suggested tests on a separate sheet of paper. |
|||||
(c) |
Does your regression model from part (b) have any regression outliers? ___ If so, what are they? _____________________ | |||||
(d) |
Does your regression model from part (b) have any influential observations? ___ If so, what are they? ___________________ | |||||
(e) |
I happen to know that student #2 is an A+ student who had a fluke low score on Part 1. If I omit him from the regression model, does the fit improve? ____ Why do you say that? __________________ What percentage of variation in Part 2 scores is now explained by variation in Part 1 scores (no work needed)? ______ Does the degree to which the model fits the assumptions for regression inference seem to change for the better? ____ Display the new residual plot: Display a new histogram, stemplot, or NQP (your choice) of the residuals: Keeping in mind that we will be making inference about the slope (since that is what is on the AP syllabus), not a prediction interval for a single response, is there any reason to doubt the validity of the assumptions for regression inference? ____ |
|||||
(f) |
Describe what the new slope means in the context of the problem. | |||||
(g) |
Compute a 95% confidence interval for the true slope of the regression line for the model that omits student #2. (Treat the remaining data points as an SRS of all possible student results on Part 1 and Part 2.) Show all of your work, including the computation of sb, the standard error of the slope. This computation is quite messy unless you use the hint below. Hint: You may (if you wish) use the following facts to speed up the computation of sb enormously: b is now _______ , and a computer printout from Minitab or SAS or some other package would tell you that the t statistic for the slope is 6.0636. Now that you know b and t, you should be able to compute sb, but if you’re still stuck, Doug Bemis and Ned Bartlett figured out how to do this during F period on 4/27/00. If you ask them nicely, they might help you. While you’re at it, ask Doug to add this feature to his program. |
|||||
(h) |
Is there evidence that the true slope is positive? (If you wish, you may perform a significance test and write out all the steps. However, there is a much simpler way of answering this question in a single sentence if you take advantage of your earlier work.) | |||||
(i) |
Is there evidence that the true correlation coefficient is positive? (Again, if you wish, you may perform a significance test and write out all the steps. However, there is a much simpler way of answering this question in a single sentence if you take advantage of your earlier work.) | |||||
(j) |
In the box on p.762 of your text is a formula for the standard error of the slope (sb, or what your book calls SEb). Describe in a short phrase what s represents. | |||||
BONUS |
Suppose that we have a linear regression model with the following definitions: 1. y (the dependent variable) is estimated by a + bx 2. n denotes the number of data points (ordered pairs) 3. sr denotes the sample standard deviation of the residuals 4. sx denotes the sample standard deviation of the xi’s 5. sb denotes the standard error of the slope as stated in the box on p.762 Prove: ![]() |