Monthly Schedule

(STAtistics, Period C)

 

Spring break.

 

M 4/5/010

HW due: Read pp. 689-700, 702-710, 713-723, and 737-740. Note: Skip Activity 13.1, and ignore everything on p. 740 after the line that says, “Standardized residual plot.” This is a total of 36 pages of reading spread over 16 days, or an average of 2.25 pages per day. You can do this. Reading notes are required, as always. This is the last textbook reading assignment of the year. Hooray!

I can simplify your life somewhat by saying that everything you need to know about checking assumptions for the LSRL t test can be found on row E of this handout. Note that this gives you only 3 things to check, instead of 4 as indicated in the tan box at the top of p. 707.

In class: Double Quiz (20 pts.) on the reading assignment. Both quizzes will be closed book, open notes. Any handwritten notes you care to use will be acceptable.

Quiz I will consist of one or two questions designed to test whether you actually did the assigned reading, including all the examples. For example, anyone who did the reading would surely remember a few salient facts about Example 3.1 (pp. 693-694) and might be able to answer the following question correctly:

When wrestlers stand on their heads immediately before weigh-in, according to the study cited in the textbook, the apparent weight loss (y) associated with wrestler weight before headstand (x) exhibits which of the following behaviors?

I.              There is a positive linear association between x and y.
II.            There is a negative linear association between x and y.
III.           There is no statistically significant linear association between x and y, because y is essentially constant.
IV.           The standard deviation of LRSL residuals is approximately constant for any value of x.

(A)          I only.
(B)           II only.
(C)           III only.
(D)          I and IV only.
(E)           III and IV only.

In order to answer this question correctly, you would need both (1) a vague recollection of the example, based on your memory of an appropriate upward-sloping linear model, implying that statement I is true, and (2) knowledge of the entire chapter, not merely Example 3.1, since in the very last line of p. 713 you are told that constant standard deviation of the random deviations from the LSRL is a feature of appropriate models, and from that you can conclude that statement IV is true. (Note: Your book uses e on p. 713 as a synonym for residual.) Correct answer: D.

In other words, although the question is reasonable, it would require careful reading of the entire 36 pages, and you might even need to read them a second time in order to place everything into context in your mind.

  • Will you need to do this in college? Yes.
  • Is this level of reading required from you in high school? Perhaps not often enough. However, I want you to succeed in college.

 

Quiz II will consist of a data set for which you need to determine (a) the df of the LSRL t test, (b) the null hypothesis for an appropriate test, and (c,d,e) the verification of the assumptions. A completely worked example is given below.

Example problem: Mr. Hansen has noted that for the top 24 lunch skippers in the Upper School during the first semester, there seems to be a slight linear trend in their second semester skippiness. Here are the data:

 

 

 

Semester 1
18.0
17.0
17.0
16.5
15.5
14.0
14.0
13.0
12.0
11.0
11.0
11.0
10.5
10.0
10.0
9.0
8.0
8.0
8.0
8.0
8.0
8.0
8.0
8.0

Semester 2
5.0
4.0
5.5
9.0
9.0
4.0
0.0
2.0
5.5
4.0
6.0
8.0
5.0
4.0
5.0
2.0
1.0
2.0
6.3
6.5
5.0
3.0
9.0
1.0

 

 

 

Answer Key

(a) Since there are 24 data points, df = n − 2 = 22 for the LSRL t test.

(b) H0: , where  denotes the “true slope” of the linear relationship between S1 and S2 lunch skips.

(c) Assumption 1 [adapted from tan box at top of p. 707; see also row E near the bottom of the STAT TESTS summary handout]: Is the true relationship linear? A scatterplot [omitted here to save space, but you must show it] has r = .2097 and suggests a weak positive linear association between x and y, where x = S1 unexcused skips, y = S2 unexcused skips. The residual plot [omitted here to save space, but you must show it] shows no obvious patterns, suggesting that the linear fit is appropriate. We judge Assumption 1 to be satisfied, although some people might disagree. Proceed with caution.

[The scatterplot, residual plot, and r value should all be shown for full credit, and you must state what it is about the plots that makes you think the true relationship is linear. “No pattern in residuals, hence good linear fit” or similar words must be written so that the AP graders know that you know what you are talking about. Merely sketching the plot and expecting the grader to fill in the gaps does not work. You have to write down an actual justification of the evidence you are presenting.]

(d) Assumption 2: Do the residuals have a standard deviation that does not depend on x? To answer this question, we refer to the residual plot found above and note that the vertical spread of the residuals does not really seem to get larger or smaller as x changes. This is a judgment call, of course. What we do not see is “flanging” or “flaring” in the residual plot [e.g., Figure 13.15(c) on p. 717], which would indicate a violation of the assumption. We judge Assumption 2 to be satisfied.

(e) Assumption 3: Are the residuals normally distributed? [Your book suggests on p. 714 that we should try to calculate the standardized residuals using software but admits in the second paragraph from the bottom of the page that these computations can be “tedious.” Because we have no easy way to compute standardized residuals, we will use plain residuals as suggested in Figure 3.14(b) on p. 716. This is acceptable for the AP exam.] For the data set given, the LSRL is , and the residuals are −.698, −1.541, −.0406, 3.5381, 3.6956, −1.068, −5.068, −2.911, .74656, −.596, 1.404, 3.404, .48269, −.4386, .56141, −2.281, −3.124, −2.124, 2.1763, 2.3763, .87626, −1.124, 4.8763, −3.124.

[Recall that the list called RESID is created automatically every time you perform a STAT CALC 8 operation. To view the RESID list, if it is not already shown on your STAT EDIT display, you can proceed as follows:

1. Press STAT EDIT.
2. Scroll all the way to the right until your cursor is on a blank line above a column.
3. Press 2nd LIST.
4. On the NAMES menu, highlight RESID and press ENTER twice.

The RESID list can be checked for normality either by histogram, which is acceptable for the AP exam, or by NQP, a normal quantile plot. The NQP approach is better because it makes it easier to assess the skewness, if any. If you have forgotten how to make a histogram or an NQP, you will need to review your notes from the first semester. We did many examples of both.]

Since the NQP of the residuals [omitted here to save space, but you should show it] shows a nearly perfect compliance with normality, we judge Assumption 3 to be satisfied.

(f) This will not be on your quiz, but after doing all this work, wouldn’t you like to know if there is statistically significant evidence of a positive linear association between S1 and S2 lunch skips? Here is the PHA(S)TPC writeup:

Let  = “true slope” of the linear relationship between S1 and S2 lunch skips, for the 24 skippiest S1 students.

H0:
Ha:

Assumptions for LSRL t test:

(1) Is the true relationship linear? Yes, based on scatterplot, residual plot, and r = .2097 as described above. [All three must be shown, and words such as “Lack of pattern in residual plot suggests linear fit is valid” must be written so that the grader knows what it is about the residual plot that makes you so certain.]

(2) Do the residuals have s.d. that does not depend on x? Yes, based on resid. plot [resid. plot must be shown]. The “spread” of residuals does not “flange” or “flare out” as x gets larger or smaller. [You need to write some words to that effect. Do not expect the resid. plot to speak for itself.]

(3) Are the residuals normally distributed? Yes, based on NQP of residuals [NQP of residuals should be shown]. Straight line implies good normality.

Test statistic:  [by LinRegTTest]


P = .1627

Conclusion: There is no evidence (t = 1.0058, df = 22, P = .1627) that the true slope of the regression line associating the 24 worst first-semester lunch skippers with their second-semester results is positive. The slight upward trend seen in the data can be plausibly explained by chance alone.

[There is no need to show the computation of , the standard error of the slope, even though a complicated formula for it is given on your AP formula sheet. That AP formula is not useful, and you can cross it out. If you ever need to know , you can easily compute  by a much easier method. Since t is defined to be , you can do a bit of algebra to obtain . Therefore, since both b1 and t are readily produced by calc. and/or displayed on a statistical software printout, you should be able to find  if you ever need to. By the way, do not confuse  with the value called s that your calculator displays when you perform STAT TESTS LinRegTTest. The calculator’s so-called s value is irrelevant for our purposes here. If you are curious about what s means in this context, ask me and I’ll explain how s is related to the standard deviation of the residuals, but that is not anything you need to know for the AP exam.]

 

T 4/6/010

HW due: (1) Review your reading assignment and freshen your reading notes so that the content is fresh in your mind. (2) Finish your CFU, carefully correcting the quibble(s) found among your answers. Note that there may be more than one error, even if I mentioned only one in class. You may confer with your assigned partner, but you may not copy anyone’s wording for the last question.

Graham and Ben are exempt from the second part of this assignment.

 

W 4/7/010

HW due: For the data set in the 4/5 calendar entry, verify all three of the assumptions. Try to do this without referring to my work. (You will learn more that way. You can and should compare your answers when finished.)

Quiz will be over material similar to Monday’s “disaster” quiz. I am thinking positive thoughts and hoping that a majority of students do much better on this quiz. There will be a small prize for the most improved student.

 

Th 4/8/010

HW due: Write #13.64 a-f (omit part g) and start preparing for the Must-Pass Quiz by beginning a study sheet of written notes. You will be graded on the quality of your study sheet. For example, something that says “#19 no/yes” will not qualify for credit. However, the following more thoughtful notes would qualify:

19. We know that r is unaffected by choice of units, because r is standardized (hence dimensionless). However, b0 and b1 must be expressed in the units of the problem and will change if the units change. For example, if x is ocean depth and y is pressure, b0 would be measured in y units (e.g., lbs./in.2), while b1 would be measured in

units, namely something like pounds per square inch per foot. If these are changed to metric units, then b0 and b1 would have to be converted using the appropriate conversion factors (or recomputed from metric source data, which might be just as easy).

Note: I understand that you may not need notes for most of the questions. That is fine. Simply make notes for the questions for which you need a little extra reinforcement. For today, only a start is expected. Please keep a time log.

 

F 4/9/010

HW due: Continue working on your notes for the Must-Pass Quiz. Keep a time log. This assignment is due by 3:30 p.m. Thursday for those who are attending today’s field trip.

In class: Work the following  two-way test. People who missed class are expected to have this problem in their notes, worked.

A census of Schmilson H.S. has revealed the following categorical data:



Are hair color and eye color related in some way? Perform a suitable test and analyze the results.

You are expected to have a worked version of this problem in your class notes for Monday. Copying from somebody who was present is acceptable, since these are class notes, not homework. To speed the computation of the contributions to , we used an excerpt from the CSDELUXE program equivalent to the following. Note that every square-bracketed entry, e.g., [A], must be entered by using the MATRX function.

0F
sum(dim[A]){1,0})R
sum(dim[A]){0,1})C
For(I,1,R,1)
For(J,1,C,1)
[A](I,J)O
[B](I,J)E
(O–E)2/E[C](I,J)
If E<5
Then
F+1F
End
End
End


Note: Before running the program, you need to have matrix [A] populated with data, matrix [B] populated with expected counts (either computed manually or produced automatically by the STAT TESTS  test procedure), and matrix [C] defined with the same dimensions as [A] and [B].

After you have run the program above, F will contain the number of expected counts that are less than 5. Remember, this count should not exceed 20% of the number of entries in the matrix, which in this case would be 20% of 16, or about 3.

 

M 4/12/010

HW due: Write #12.45, 12.46. Show all steps. Note: I will allow all students to submit this assignment on Monday, even the students who attended Friday’s field trip.

 

T 4/13/010

HW due: Continue reviewing for your Must-Pass Quiz. Prepare additional notes and/or work a selection of odd-numbered problems from Chapters 12 and 13. Keep a time log in either case.

 

W 4/14/010

Test (100 pts.) on Chapters 12 and 13.

The useful CSDELUXE program is available for TI-GRAPH LINK or TI CONNECT download, or you can visit me at Math Lab to receive a copy by direct cable transfer. CSDELUXE covers both g.o.f. and 2-way situations, and it even helps you with checking the expected counts to see if they are large enough. You will want this program!

Today’s test will emphasize the following topic areas:

 

  •  g.o.f. tests
  •  2-way tests (both for homogeneity of proportions and for independence)
  • LSRL t tests
  • Confidence intervals for the LSRL slope.


You will be given a standard AP formula sheet to use during the test. However, please note that one of the most useful formulas of all, namely  should be memorized, since it is not on the AP formula sheet.

 

Th 4/15/010

Test (100 pts.) over same material as yesterday.

If you take both tests, only the higher of the two scores will count toward your quarter average. If you wish to skip one day or the other in order to use the time for studying for other classes, you must make special arrangements with me in advance. Face-to-face is preferred. If you send me an e-mail and do not receive a reply, you are not approved to skip a test, and you must appear for roll call both Wednesday and Thursday as usual.

 

F 4/16/010

HW due: Write Wednesday’s free-response portion completely (showing work). Set a timer for 25 minutes to make it challenging, and then write everything beyond that point in a different color.

In class: The Must-Pass Quiz begins, for real, with a 5-question written SRS for a quiz grade.

 

M 4/19/010

HW due: Show clear, written evidence of work toward AP review (all students, even those who are not taking the exam). A mixture of free-response problem solutions, written out as if they were from a real exam, would be preferred. Multiple-choice questions, if used as part of your review, must include some legible work. Keep a time log.

In class: The Must-Pass Quiz continues with another quiz grade. Robbie gets an automatic 10 on this quiz, because he passed last Friday. Congratulations, Robbie!

 

T 4/20/010

Diversity Day (no class).

 

W 4/21/010

HW due: Do parts (a) and (b) of the following problem.

Problem: It is claimed that the SAT I math exam (SATM) is scored and normed each year so that the population approximately satisfies the the N(500, 100) distribution. [If you have forgotten what this notation means, then I expect you to look it up.] Suppose that an SRS of 1648 SATM student papers shows the following score distribution:



There are some anomalies. For example, by using normal probabilities, you can easily determine that the expected number of scores above 750 would be 10.23355, and yet 20 people in the sample—almost double the expected number—managed to score that high. [If you can’t determine 10.23355 as the expected count for scores above 750, then e-mail me or call me immediately at 703-599-6624, since you will not be able to solve the problem.]

(a) Let the null hypothesis be that the population really is N(500, 100), and use a suitable test with  = .05 to see if there is any evidence that the claim is false. (Show all steps.) Note that you cannot use the NQP approach, nor even the empirical rule, since the raw data are not given.

(b) Analyze the contributions to  to see where the largest areas of possible concern may be.

In class: Survey critique.

 

Th 4/22/010

HW due: Adjust the mean and s.d. in the null hypothesis for yesterday’s problem so that statistical significance is assured. Show your work, as you did yesterday. Do both part (a) and part (b).

 

F 4/23/010

HW due: Nothing new, but previously assigned problems may be re-scanned, and accuracy is required for problems that have been gone over in class. If there is anyone who does not have homework today, I will send an e-mail to his advisor strongly recommending that the student not sit for the AP Statistics exam. (Rationale: If a student cannot do a few straightforward homework problems, I cannot understand why such a student would want to sit in an exam room for more than 3 hours on a warm afternoon in May.)

In class: The Must-Pass Quiz continues with another quiz grade.

 

M 4/26/010

Phi Beta Kappa Day (no school).

 

T 4/27/010

The Must-Pass Quiz continues. Connor and Eric go down in flames, for the moment.

 

 


Return to the STAtistics Zone

Return to Mr. Hansen’s home page

Return to Mathematics Department home page

Return to St. Albans home page

Last updated: 28 Apr 2010