Monthly Schedule

(STAtistics, Period C)

M 2/1/010	HW due: Read pp. 525-529; write #9.53, 9.54, 9.57. The third problem, which is the hardest, is set up and partially solved for you below. You may copy my work without penalty if you wish. 9.57. We want n such that a 95% C.I. for p has m.o.e. . We know [from Friday’s class] that m.o.e. = (s.e.)(crit. value), and here, s.e. = . Since we do not know p and q, we have to make a guess at . Note, however, that the numerator is maximized when p = q = .5. [Try it! There are no other choices for p and q that produce a larger value. That is because q = 1 − p. Therefore, pq = p(1 − p) = p − p², and as you learned in precal, that is a quadratic (parabola) opening down. The vertex occurs where p = .5.] Therefore, we know and we want this final expression, when multiplied by the critical value, to be .05. The critical value is found in the very last line of the table on the inside back cover of your textbook (labeled “z critical values”). Solve the inequality for n, and that’s that.
T 2/2/010	HW due: Study for Wednesday’s test. I would strongly recommend that you work several odd-numbered problems from the chapter review exercises. There is no additional written work that will be collected, because you were given rather short notice for the upcoming test. However, it cannot be later in the week because of NAEP and other scheduling difficulties.
	Per your request, here is another worked example problem of the “find the required sample size” genre: You have been hired as a researcher to determine the mean number of mature trees that Chevy Chase residents have on each lot. Most lots are between an eighth of an acre and half an acre, although there are a few outliers. The standard deviation in tree counts per lot appears to be approximately 2.4, based on pilot testing. How large a random sample is needed to estimate, with 95% confidence, the mean tree count per lot to the nearest tenth of a tree (i.e., with an error of less than .05)? Solution: This problem, in order to be done “correctly,” would require a spreadsheet or specialized software, since you would need to use t distributions where the degrees of freedom are adjustable. You can’t simply solve an inequality to find the answer, since each t distribution has a slightly different shape. You would need to use an iterative process, which is easy on a spreadsheet, but that would go beyond the scope of our course and beyond anything you would be asked to do on the AP exam. However, since we know we will be using a large sample in order to get such high accuracy, the normal curve is a reasonable approximation for the t curves that we “ought” to be using. Check assumptions: 1. Do we have an SRS? Yes. (We will go to the office of the recorder of deeds and will use a random number generator to choose lots from the database.) 2. Is the population normal? No! There is surely some skewness (right skewness, most likely, if there are a few large, heavily wooded lots). However, with a large sample (n > 40), only extreme skewness would prevent the z curve given by from being a reasonable model for the sampling distribution of . Proceed with caution. Since is unknown (as is always the case in real-world situations), we will use s as an estimate of . Set up an inequality: m.o.e. = (s.e.)(z) < .05 is what we want to be true. (You can write the word “want” over the less-than sign to make this clear.) Note 1:* Did you catch how the inequality sign flips when we take reciprocals? You were supposed to have learned that in Algebra II. There is a reason that Algebra II and Precal are prerequisites for our class. Note 2: We must always round up in problems of this type. Answer: A sample size of 8852 lots will be required. The cost of performing a tree survey on such a large number of properties would be unaffordably high. Moreover, since 8852 is more than the number of housing units in Chevy Chase, the question is ill-posed. There is no way to use sampling to obtain the required accuracy; the only option is to perform a census of all trees on all lots in Chevy Chase, which would be extremely expensive. Modified problem: The client has decided that accuracy to the nearest tree (i.e., error of no more than .5) will be acceptable. Does this reduce the cost? Solution: Indeed it does. Since m.o.e. follows an inverse square root law (this is true in both proportion problems and sample mean problems, by the way), enlarging the m.o.e. by a factor of 10 allows us to reduce the sample size by a factor of 100. Revised Answer: A sample size of 89 lots will be required. This is still considered a “large” sample, which means that the normal approximation is reasonably accurate, even though the t distribution would be better. If you feel worried, you might want to increase the sample size (say, to 100) to improve the robustness of the estimate. Also note that the quality of the estimate, as measured by the size of the m.o.e., depends on the sample s.d. (s) that you found during the pilot test. What if the pilot test was inaccurate? What if the real population s.d. is more like 5 trees instead of 2.4 trees? To guard against this sort of difficulty, you would perform what is called a sensitivity analysis on the value of s. If s doubles, you will need to increase your sample size by a factor of the square root of 2, which is about 1.414. Bottom line? If your client can afford to pay for sampling 150 lots, that would probably be a good thing. Remember, though, that research is expensive. Performing a tree survey on 150 lots would cost thousands of dollars for the data gathering alone, even in a region as compact and easy to survey as Chevy Chase. You could save money by performing a satellite imagery survey (Google Earth or equivalent), but even that would cost a fair amount of money, and your counts might not be as accurate.
W 2/3/010	Test (100 pts.) on all material covered since the midterm exam, plus one question (regarding LSRL and residual plots) that was fumbled by numerous people on the midterm. Textbook passages upon which the greatest focus will be placed are pp. 461-529. Note: The most probable outcome for today is that the start of school will be delayed because of snow. The temperature will warm up quickly, and I think it is unlikely that the entire day will be canceled. If we have a short period, I will shorten the test appropriately. As noted above, it would be difficult to move the test to Thursday. If you wish to take the test from 2:15 to 3:15 in the Math Lab, I will provide an alternate version. You must come to class at the scheduled time for roll call, even if you wish to take the test later in the day.
Th 2/4/010	LOCATION NOTICE: Class today will be held in SB-202, a.k.a. the freshman study hall room, because of NAEP testing in our usual classroom. No additional HW is due today.
F 2/5/010	Double Quiz (20 pts.) on previously assigned textbook reading, pp. 461-529. No additional HW is due today.
M 2/8/010	No school because of the Snowpocalypse! However, there will be an assignment for Tuesday, Wednesday, and Thursday, regardless of whether or not St. Albans is in session. Be sure to check here each day by 3:00 p.m. for the following day’s assignment. If time permits, I may post some additional video links or other resources to help you learn the material without having the benefit of classroom discussion. If the textbook and other materials prove inadequate, please see my contact information and call me on my 24-hour number.
T 2/9/010	HW due: Read pp. 531-534 and this web page; then (in view of what you have just read), read pp. 531-534 a second time. Write #10.1, 10.3, 10.4, 10.9, 10.12. Note: This assignment is due Tuesday and will be scanned when we return to school, regardless of whenever that might be.
W 2/10/010	HW due: Read pp. 537-548; write #10.19, 10.23. Note: This assignment is due Wednesday and will be scanned when we return to school, regardless of whenever that might be.
Th 2/11/010	HW due: Read pp. 537-548 a second time and the PHASTPC instructions; write up Example 10.12 (pp. 546-547) using the PHASTPC format, referring to your textbook as necessary. Note: This assignment is due Thursday and will be scanned when we return to school, regardless of whenever that might be.
F 2/12/010	No school (teacher work day).
M 2/15/010	No school (holiday).
T 2/16/010	Normal school day.
W 2/17/010	No additional HW due today. The three assignments from last week will be collected, however.
Th 2/18/010	HW due: Read pp. 550-558; write #10.39 using PHASTPC format.
F 2/19/010	No additional HW is due. However, make sure that all your previously assigned problems are in truly wonderful shape.
M 2/22/010	HW due: Read pp. 562-567; skip pp. 568-570; read pp. 571 (bottom)-574 (top), plus the summary on pp. 575-576. Write #10.65 (with sketches), 10.77, 10.91. The answers to #10.65 are given in the back of the book, but if you copy them without making sketches, you will learn nothing. Below is an example of a problem I assigned in 2007. Look at the sketches and try to understand what is going on. Given: Let H₀: = 6 and H_a: < 6 be our hypotheses, and let s = 4.8. Let the sample size be n = 25. We will use = .05 for part (a) only. Remember, a is a “cutoff value” for P, .i.e., the value of P below which we will reject H₀. (a) Draw a sketch and estimate the power of the test against the alternative = 5.3. In other words, how effective is the test at avoiding Type II error if the true value of happens to be 5.3? Your sketch should include two sampling distributions (null and alternative), as well as a clear vertical line that separates the “reject” and “do not reject” regions. (b) Draw a sketch to show how power changes if we allow a greater level of Type I error, and if we shift the alternative hypothesis slightly upward, but if everything else stays the same. Solution to part (a): Begin by sketching two sampling distributions for . One distribution (the “H₀ distribution”) is centered on 6, while the other distribution (the “H_a distribution”) is centered on 5.3. To find the power, we begin by computing s.e. = We can estimate our power by using the z distribution. [A more exact method requires t distributions or the curves from the omitted reading on p. 568, but both of those methods are outside the scope of the AP syllabus.] The cutoff value for significance, shown by the bold vertical line in the sketch, is 4.42094 [calculator keystrokes: invNorm(.05,6,.96)]. If falls to the left of that cutoff line, we will reject H₀, whereas if falls to the right of that cutoff line, we will fail to reject H₀. Since power may be defined as the portion of the alternative distribution that lies within the “reject” zone, we can estimate power to be .180 [calculator keystrokes: normalcdf(-99999,4.42094,5.3,.96)]. Even though this is only an approximation based on using z (when we should really be using t), we can “guesstimate” the power to be approx. 20%, which means that the probability of Type II error, is approximately 80% for the 5.3 alternative. That is low power! However, you can see that the test would be much more powerful against a lower alternative such as 4.3, since then most of the alternative distribution would be safely in the “reject” zone. Remember that you cannot show calculator keystrokes on your writeup unless they are Xed out. Note: If you simply cannot handle the normalcdf calculations, you can estimate power from careful sketch work alone. The important thing is that you must make your sketch so that it respects the horizontal axis values. In particular, s.e. must be drawn to scale. Solution to part (b): We know that there is usually a tradeoff between Type I and Type II error. Specifically, we know that allowing Type I probability to increase will allow Type II error to decrease, thus increasing power. (One exception to this tradeoff rule is an increase in n, which will reduce both the probability of Type I error and the probability of Type II error.) In this problem, however, has shifted to the right. The diagram, as drawn, shows that power (shaded) has stayed exactly the same. We may summarize this concept by saying that if the alternative of interest is moved closer to the null hypothesis, then power will decrease unless we accept a higher probability of Type I error, in which case we may be able to keep power the same.
T 2/23/010	No additional HW due. In class: Review, catch up, and get all of your questions answered.
W 2/24/010	No additional HW due. However, now that you have had plenty of time to think about the previously assigned problems, especially the sketches that were due on 2/22, I expect them to be quite good.
Th 2/25/010	HW due: Execute the following problems as “button pushers,” i.e., as if they were posed as AP multiple-choice questions where work did not count. In other words, simply use your calculator to answer the questions posed. In some cases you must also write a thoughtful sentence or two. Do these problems: #9.62, 9.64, 9.66, 10.84, 10.92. For the first three, note that each one requires an interpretation sentence (e.g., “We are 90% confident that . . .”) as part of your answer. For the last two, note that each one requires an answer (yes or no) as well as a P-value and a decision regarding H₀. This is a short homework assignment, and if you know what you are doing, you can finish it in 10-15 minutes. However, an intelligent student who wishes to maximize his performance on Monday’s test will also work on a selection of odd-numbered problems each night, regardless of whether they have been assigned or not. Why? Simply as a way to get instant feedback. The name of the game is learning. Note: Because there is no work shown this time, it will be virtually impossible for me to determine whether you have done your own work. If you copy someone’s work, you are not only committing an honor offense (which, on this assignment, can be neither detected nor proved) but also cheating yourself, since you are losing out on the chance to hone and improve your skills in advance of the test.
F 2/26/010	HW due: Write #10.60, 10.62, 10.64, and the problem below. For the first 3, you must do at least 2 of them as full PHA(S)TPC procedures. (I recommend doing all 3 as PHA(S)TPC, but if time is short, you may do one of them as a button-pusher without penalty.) Remember that in your assumptions, you should identify the test you are using. Additional problem: By means of sketch(es) and a short paragraph of explanation, show that constant 75% power against flexible alternatives leads to an increase in Type I error probability for alternatives that are closer to the null-hypothesis value of the parameter in either a one-tailed or a two-tailed test. In other words, let power be fixed at .75, and consider an alternative that has a certain probability of Type I error. Show that a different alternative that also has .75 power but is centered closer to p₀ (or , as the case may be) will have a greater than the value of associated with the first alternative. By your request, detailed solutions for this assignment are now available. Please see instructions in the 3/1 calendar entry.

Return to the STAtistics Zone

Return to Mr. Hansen’s home page

Return to Mathematics Department home page

Return to St. Albans home page

Last updated: 08 Mar 2010