AP Statistics / Mr. Hansen

Name: _________________________________

Test #7 SAMPLE (4/14/99)

ANSWER KEY TO THE TWO SETS OF SAMPLE PROBLEMS (dated 3/18/98 and 4/16/98)

3/18/98 problem set (Sections 7.1, 7.2, 8.1, 8.2)

1. This is a 1-sample t interval (STAT TESTS 8).

2. This is a 1-sample t test (STAT TESTS 2).

3a. This is a 2-sample t test (STAT TESTS 4).

b. There is probably no justification for using pooled variances here. Because the sample standard deviations are quite unequal (3.094 for men, 4.071 for women) it appears unlikely that the population standard deviations are equal. The samples are likely small enough that these differences could be due to chance (however, don’t rely on the F test to prove this).

c. This is a 2-sample t interval (STAT TESTS 0).

d. Together the samples are large enough (total of 23). There are no outliers in either sample. Normal quantile plots show that the distribution is skew right for men but essentially normal for women. It’s a judgment call whether the skewness for men is great enough to violate the conditions, but we’re probably safe, especially since the similarity of sample sizes (13 vs. 10) helps to increase the robustness of the 2-sample t procedures.

4a, b. Each of these is a 1-proportion z interval (STAT TESTS A). Verify that the np and n(1–p) conditions are satisfied.

c. If the sample was an SRS from the population of interest (note: this is not stated), then we can say that roughly half (46% ± 3%) of the people favor testing all citizens, while a much larger proportion (87% ± 2%) favor testing doctors and dentists. The stated margins of error (3% and 2%, respectively) reflect possible sampling error and provide 95% confidence.

5. This is a 2-proportion z test (STAT TESTS 6). Verify that the np and n(1–p) conditions are satisfied. Also, we must have 2 independent SRS’s (note: this is not stated). Let p1 = true proportion of the theoretical universe of people that would favor proposal #1 if asked, p2 = true proportion of the theoretical universe of people that would favor proposal #2 if asked. [Minor point: The fact that the questions were posed to subjects drawn from the same actual population is irrelevant here, since the theoretical populations are what is important. If our country’s population is large, it is possible to draw 2 independent SRS’s as required.]
-- H0: p1 = p2
-- Ha: p1 > p2
The 2-proportion z test shows good evidence (phat1=0.753, phat2=0.689, z=2.471, P=0.0067) for rejecting the null hypothesis in favor of the 1-sided alternative. In plain English, the answer to the question posed is, "Yes, at the .01 level of significance, the data do suggest that people are more likely to favor the proposed wording in question #1." However, all of this breaks down if we don’t have 2 independent SRS’s.

6. From the t table, the 1-sided P-value lies between 0.025 and 0.05. Therefore, the 2-sided P-value requested must satisfy 0.05<P<0.10. Using your TI-83’s tcdf function, you can compute P » 2(0.033235) = 0.06647, but that’s not what the problem asked for.

7. If s is known (which essentially never happens except in certain fake textbook problems), use z procedures. Otherwise (i.e., 100% of the time in the real world), use t procedures. [Note that the question is asking about estimating the population mean. To estimate a population proportion, you would of course use z procedures since they give a reasonable approximation of the binomial distribution. Or, with your TI-83, there are many occasions where you could literally just use the binomial alone. Trouble is, the binomial is more awkward to work with, so we normally prefer z procedures for proportions whenever the np and n(1–p) conditions hold.]

8. From the t table (with df=29), you can read off t*=1.311. Double-check: If you have the INVT program loaded, plug in a cumulative probability of p=0.9 (so as to leave 0.1 in the right-hand tail) and df=29 to get t*=1.311433647.

9. From the formula at the bottom of p.381 (or from middle of p.577, which is the same thing), the standard error of the sample proportion is 0.0306.

10. We know that the margin of error (bottom of p.577) is z* times sqrt(p(1-p)/n), where z*=1.96. Since p is unknown, use p=0.5 to be conservative (this makes the numerator as large as it can possibly be and hence maximizes the margin of error). We want the maximum margin of error to be less than or equal to 0.05, so solve the inequality m.o.e.£ 0.05 for n. Plugging in, we have z* sqrt(p(1-p)/n) £ 0.05, or 1.96sqrt(0.25/n) £ 0.05, or n ³ 384.16. Answer: The sample must be at least 385 people, and it must be drawn from a "large" campus (by rule on top of p.578, at least 3850 students) so that the SRS is nearly independent. [We couldn’t achieve accuracy like this at St. Albans unless we did a census.] You could also memorize the formula on p.583, although it’s one that won’t be given to you on the AP formula sheet. Memorizing is not necessary to solve problems like #10, but it might save time (especially for a multiple-choice problem). The method shown above does demonstrate greater understanding.

Bonus. This question was already answered; see the long passage in square brackets in #7 above.

4/16/98 problem set (Sections 8.3, 9.1)

1. This is a chi-square goodness-of-fit problem. Presumably the neighborhood survey is to be carried out in the year 2020; otherwise the problem doesn’t make any sense. Also, our neighborhood survey must be an SRS of the neighborhood. The cell counts are large enough to proceed with the GOF test.
-- H0: The neighborhood population in 2020 conforms to the Census Bureau proportions.
-- Ha: The neighborhood population in 2020 does not conform to the Census Bureau proportions.
Omitting the work (although you must be sure to show it in your writeup), we conclude that our neighborhood appears to have many more children under 5 and many fewer adults 18 to 24 than the "typical" distribution, and there is very strong evidence (c 2=29.544, df=3, P=0.00000172) that the neighborhood population does not conform to the Census Bureau proportions. These values are somewhat flaky since the expected counts don’t add to 100, but they don’t change appreciably if we adjust the expected counts to correct the error.

2,3,4,5. These problems are based on Section 9.1 (not studied yet).

6. Note that the wording of the question ("Use a c 2 test to determine the probability of getting the outcome we got") is nonsensical. That probability is 1. What the author of the test meant to say was, "Let H0 be as follows: In any month, for any day, a selection number 1-183 is as likely as a selection number 184-366; this property is consistent with a random drawing. Test this hypothesis against Ha: Not all of the months have this property." The cell counts are all satisfactory for chi-square GOF. Conclusion: Although there may be bias that our test did not detect, the chi-square goodness-of-fit test gives no evidence (c 2=15.570, df=11, P=0.158) that the birthday lottery was inconsistent with a random drawing. [However, when we can do inference for regression, as in problem #5, we will have much more to say about the nonrandomness of the birthday lottery.]