Answer Key for Sample Test #5
(practice for F period test 2/25/99 and D period test 2/26/99)
revised 2/25/99, 7:30 p.m.

1. sign test
2. distribution-free
3. t procedures
4. robust
5. degrees of freedom
6. 27
7. critical value
8. standard error, or s/Ö n
9. alternative hypothesis
10. 1-power
11. not statistically significant
12. the procedure will give correct results (i.e., will specify an interval that contains the true value of the population mean) 95% of the time; other answers are possible, as long as you are careful to avoid using the words "chance" or "probability" incorrectly

13. from 12.603 to 13.797
Required work (using TI-83):
-- df=71; upper p critical value is (by calc.) t*=1.99394; confidence interval is xbar ± t* × s/Ö n = 13.2 ± 1.99394(2.54/Ö 72) = 13.2 ± 0.597 = (12.603, 13.797)
Required work (using Table E):
-- df=71, but use df=60 to be conservative; upper p critical value is (by table) t*=2.000; confidence interval is xbar ± t* × s/Ö n = 13.2 ± 2(2.54/Ö 72) = 13.2 ± 0.599 = (12.601, 13.799)

14. last line (the "infinity" line); 1.645
Required work (using TI-83):
-- Just say "by calc." (Don't say invNorm.)
Required work (using Table A):
-- Say "interpolated from z table."
Required work (using Table E):
-- Say "from t table, with df=infinity."

15. 8 subjects
Required work (using TI-83; note, however, that this method may be unfamiliar to you):
-- By equation solver, df=6.983 in order to produce a tail to the right of t=3.0 having an area of 0.01. Thus n=df+1=8.
Required work (using Table E):
-- Using column for right tail probability of 0.01, we see that the smallest df for which a t score of 3 or more would suffice is df=7. Thus n=df+1=8.

16. There is some skewness, but it's a judgment call. If your diagram is accurate, you would get full credit if you said the data are essentially symmetric (stemplot) or essentially normal (normal quantile plot). You would lose a couple of points for asserting normality if all you had drawn was a stemplot, though. As far as defining the skewness as left or right, this is a bad data set for that; the 1-var. stats. show mean < median, a hallmark of left skewness, but a histogram can be built with bin choices that suggest right skewness or even symmetry. It may be best not to discuss the direction of the skewness here, especially since the sample size is so small.

17. between 0.02 and 0.025 (exact value 0.02275), which is significant under the usual a =0.05 level for both one-sided and two-sided z tests
Required work (using TI-83):
-- right tail probability is P=0.02275 (by calc.) for z=2.000; double this to get two-sided probability of P=0.0455; both situations imply statistical significance since P < 0.05=a
Required work (using Table A):
-- right tail probability is P=0.023 (by table) for z=2.0; double this to get two-sided probability of P=0.046; both situations imply statistical significance since P < 0.05=a
Required work (using Table E):
-- from last line (df=infinity), we see that z=2.0 falls between upper p critical values of 0.02 and 0.025; therefore the two-sided probability P falls between 0.04 and 0.05; both situations imply statistical significance since P < 0.05=a

18. Assumptions:
-- (1) Subjects are representative (i.e., closely approximate an SRS of patients with inoperable knee injuries). Subjects cannot have been "hand picked" for likelihood of responding to the drug.
-- (2) The knee injuries must be old enough and/or severe enough that they are chronic and relatively stable. Otherwise, the purported effect of the drug will be confounded with time and natural healing.

Solution:
-- Since this is a matched pairs design, add a third column to show differences in scores ("after" minus "before"); a positive number denotes improvement. The new column has a mean of 2.500 and a standard error of 2.892822/Ö 20=0.646855. A normal quantile plot or stemplot [you should show it here] reveals one outlier and significant granularity, but no pronounced skewness other than the outlier. Because n < 40, use of t procedures is somewhat risky because of the outlier. However, in this particular case, removing the outlier would cause the P-value to be even lower than it otherwise would be--in other words, the results we get with the outlier are conservative. Therefore we can proceed, having made a conscious decision, with a t test using H0: m =0 and Ha: m > 0 to see whether the new drug is effective in raising subjects' knee scores. We compute t = (xbar-m 0)/(std. error) = 2.5/0.646855 = 3.86485. Using df=19 in the t table, we have P<0.001 [or use calc. to get P=0.0005216]. Conclusion: There is strong evidence that the drug, when administered as in this experiment, is effective in raising the knee scores for subjects having inoperable knee injuries.

Shortcomings:
-- The words "when administered as in this experiment" allude to the fact that the experiment was not placebo-controlled and was not double-blind. Whether the observed effect is due to enthusiastic researchers, a placebo effect, or the chemical properties of the drug cannot be determined. In fairness to the researchers, however, a P-value this low is difficult to achieve unless some physiologic effect is present. Perhaps they should treat this study as a pilot project in order to attract a larger grant to perform a placebo-controlled, double-blind experiment on a larger sample.

19. Statistical significance means that the result seen is far enough away from the null hypothesis (i.e., the assertion of "no effect") that we conclude that chance alone would rarely produce such a result. The result itself, however, could be very small in a real-world practical sense. For example, the mean height of people in Maryland probably differs from that of people in Virginia--and with large samples, we might even be able to obtain extremely low P-values to say that chance alone would rarely produce such a difference. However, the difference itself is very likely minuscule and of little value in public policy discussions. Would any city council change its building code appreciably if Marylanders were known to be 0.2 inches taller on average? Probably not.

20. This is not a very good question. The phrase "when H0 is false" muddies the water unnecessarily, since the P-value is always computed as a conditional probability of rejecting H0 given that it is true (regardless of whether H0 is true or not). The short answer to the question is that the P-value will decrease as n increases.

Reasoning (algebraic approach): The absolute value of the t or z test statistic, depending on which one you are using, has Ö n in the denominator of the denominator, i.e., varies directly (not inversely) with the square root of n. In other words, as n grows, the t or z score grows, albeit at a slower rate. The larger the t or z score, the smaller the P-value, since there is less of the tail (or two tails in the case of a two-sided test) to take the area of.

Reasoning (geometric approach): Diagram is not shown here, but basically you need a sketch showing how the sampling distribution of sample means gets sharper and narrower as n grows. This has the effect of reducing the area that lies at a fixed distance from m 0 on either one or two sides, depending on the type of test you are performing. But, of course, said area is precisely what we call the P-value of the test.


Return to Mr. Hansen's home page

Return to Mathematics Department home page

Return to St. Albans home page

Last updated: 25 Feb 1999