AP Statistics / Mr. Hansen
5/2/2005

Name: ___________KEY___________

AP Free-Response Practice I (100 points)

 

Instructions: Show adequate justification for each answer.

 

 

1.(a)

Explain briefly why, in a linear regression t test, testing for the alternative hypothesis that the true slope is positive is equivalent to testing for the alternative hypothesis that the true r value (usually called r) is positive.

 

 

 

In any real-world LSRL, sx and sy are both positive. Thus b1 = r sy/sx [on AP formula sheet] implies that b1 and r have the same sign. Slope (statistic b1, parameter b) is > 0 if and only if corr. coeff. (statistic r, parameter r) is > 0.

 

 

(b)

Determine the true expected probabilities and counts that would occur if the data shown below had come from a normal distribution with mean 500 and s.d. 100. No work is required.

 

(from 297 randomly chosen SAT math scores)

 

 

score < 350

 

20

 

 

 

350 £ score < 475

 

100

 

 

 

475 £ score < 575

 

115

 

 

 

score ³ 575

 

62

 

 

 

 

 

_______

 

 

 

 

 

297

 

 

 

 

By calc., probabilities are .067, .334, .372, and .227, respectively. [Check: should add to 1.] Multiply through by 297 to get expected counts: 19.842, 99.3425, 110.5075, and 67.308. [Check: should add to 297.]

 

 

(c)

Perform a goodness-of-fit test to assess whether the data above might have come from a normal distribution with mean 500 and s.d. 100 (the null hypothesis) or from some other type of distribution (the alternative). You must state and verify assumptions for full credit.

 

 

 

H0: data come from N(500, 100)
Ha: data do not come from N(500, 100)
Assumptions:
   SRS (“randomly” was given; must assume SRS)
ü
   all expected counts
³ 1, no more than 20% < 5ü [all are above 19, in fact]
Test statistic:
c2 = S (obs.–exp.)2/exp.
=
(19.842–20)2/19.842 + (99.3425–100)2/99.3425 + (110.5075–115)2/110.5075 + (67.308–62)2/67.308
= .607
P-value = .895
Conclusion: There is no evidence (df = 3,
c2 = .607, P = .895) to refute the claim that the data come from the N(500, 100) distribution.


 

2.

The FloatFloat.com competition in May had 778 possible chances to win. The 45 student contestants had between 1 and 32 chances each, and 6 winners were randomly chosen.

 

 

(a)

Explain why a simulation is the most appropriate method for computing a student’s probability of being a winner.

 

 

 

Thorough answer: It is too hard to analyze all the possibilities that can occur. Although it is easy to calculate the probability of being the first winner chosen (simply divide the # of chances by 778), there are many possible cases to consider regarding the probability of being the second winner. For each of those, there are many cases to consider regarding the probability of being the third winner, and so on. The multiplication of cases through all 6 levels is overwhelming. To analyze the entire probability tree could require more than 33 million paths! Even a computer program designed to perform this analysis would probably produce an incorrect answer unless the program had been painstakingly debugged. By contrast, a computer program to perform a simulation would be easy to write and execute.

Short answer: Trials are not independent! Simulation is much more effective than a priori analysis in cases such as this.

Pessimistic answer: Simulation is the only procedure that is likely to produce a valid answer. Consider, for example, that 7 out of 7 intelligent AP Statistics students were unable to find a valid answer in part (b).

 

 

(b)

In a simplified contest in which the 45 contestants each have 1 chance to win (i.e., 45 total numbers, instead of 778), compute the probability that a given student (Max) is one of the 6 winners.

 

 

 

Common-sense answer: Since everyone has the same probability of success on each drawing, we can treat the process as equivalent to a single drawing from a pool of 45, where there are 6 winning numbers. Answer: 6/45 » .1333.

Recommended approach: P(Max wins) = 1 – P(Max loses 6 times in a row)
= 1 – (44/45)(43/44)(42/43)(41/42)(40/41)(39/40) = 1 – 39/45 [after cancellation]
= 6/45 as before. This method is most in keeping with the spirit of the AP, and this is the method that I would most recommend learning. The “common-sense” approach can sometimes lead you astray in other problems.

Incorrect but popular answer:
1/45 + 1/44 + 1/43 + 1/42 + 1/41 + 1/40
» .1414
This cannot be true, since if the expected number of wins is .1414 for each of the 45 contestants, then the total number of winners must equal 45(.1414) = 6.363, a contradiction.

Note that the probability of winning on the second drawing, for example, is not 1/44 but rather 1/45. Here is the proof:
P(Max wins on second try) = P(Max loses first
Ç Max wins second)
= P(Max loses first) · P(Max wins second | Max loses first)
= (44/45)(1/44) = 1/45.

Clarification: When we compute P(Max wins on second try) in this way, we are referring to an unconditional probability. Yes, the conditional probability of winning on the second drawing given that Max did not win the first is 1/44, but that is an answer to a question that was not posed. We need to have mutually exclusive events in order to apply the addition rule. Here is a more detailed approach:
P(Max wins) = P(Max first wins on 1 È Max first wins on 2 È . . . È Max first wins on 6)
= P(Max wins 1
È (Max loses 1 Ç wins 2) È (Max loses 1,2 Ç wins 3) È (Max loses 1,2,3 Ç wins 4) È (Max loses 1,2,3,4 Ç wins 5) È (Max loses 1,2,3,4,5 Ç wins 6)).
Note that these 6 events truly are mutually exclusive, which means that we can apply the addition rule. The difficulty is that most people have a tendency to confuse the intersection symbol (Ç) with the conditional probability symbol ( | ). I know I certainly had trouble with that when I was a student. Each of the 6 events in this union has probability 1/45. For example, here is one of them:
P(Max loses 1,2,3,4
Ç wins 5)
= P(Max loses 1,2,3,4) · P(Max wins 5 | Max loses 1,2,3,4)
= [(44/45)(43/44)(42/43)(41/42)] · (1/41)
= (41/45) (1/41)
= 1/45 as claimed.

 

 

(c)

Describe but do not execute a simulation to estimate the probability that Max is one of the 6 winners in the real FloatFloat contest. Max has 32 chances to win, and notional (fake) data are shown below.

 

 

 

 

Student

# of chances

 

 

 

 

Sharpe

1

 

 

 

 

Shoes

1

 

 

 

 

Armstrong

15

 

 

 

 

Baker

16

 

 

 

 

Clark

14

 

 

 

 

   ·

·

 

 

 

 

   ·

·

 

 

 

 

   ·

·

 

 

 

 

Lemi

22

 

 

 

 

Max

32

 

 

 

 

 

_______

 

 

 

 

Total

778

 

 

 

 

 

 

 

 

 

Consider the numbers 001 through 778, and assign the appropriate block of winning numbers to each contestant: 001=Sharpe, 002=Shoes, 003-017=Armstrong, etc. Use random digit table and select 3 digits at a time. If number selected is 001 through 778, count a win for the appropriate contestant; ignore 000, 779-999, and any number corresponding to someone who has already been chosen to be one of the 6 winners. Continue until 6 winners are chosen. If Max is among them, tally a “SUCCESS”; if not, tally a “FAILURE.” Repeat the process many times. Our estimate of P(Max wins) is SUCCESSES divided by (SUCCESSES + FAILURES).