Monthly Schedule

(STAtistics, Period B)

T 1/3/012

Classes resume.

In class: Review.

 

W 1/4/012

Review Quiz #1: Focus on notation, terminology, exploratory data analysis, and linear least-squares regression.

 

Th 1/5/012

Review Quiz #2: Focus on curve fitting, probability, and positive predictive value (PPV). For a good example of a PPV problem, look at #17-20 from the 12/12/2005 test. (Note: You are not yet expected to know the terms “null hypothesis,” “Type I error,” and “Type II error,” but you are expected to know “false positive,” “false negative,” and how to compute PPV.) Be sure to work the problems before you check the solution key.

 

F 1/6/012

Review Quiz #3: Focus on curve fitting and residual plots.

 

Exam Prep

The following resources are provided to help you prepare for Monday’s midterm exam:

 

·         College Board website
AP Stat home page: course description, 2 full-length sample exams, study guides, and more
Sample exam questions

·         Test (11/15/2011) on Probability and Transformations (class mean = 70%)
Blank copy of in-class portion
Blank copy of take-home portion
Solution key for in-class portion
Solution key for take-home portion

·         Big Quiz (12/21/2011) on Random Variables (class mean = 55.7/60, or 93%)
Blank copy
Solution key

·         Please send e-mail to Mr. Hansen over the weekend (double underscore in subject line, please) if you have any questions.

 

M 1/9/012

Midterm Exam, 8:00 a.m., MH-315. If MH-315 proves to be too small, we may move to another nearby room, but we will start in MH-315.

Format will be a reduced-scale AP exam, as follows:

Part I. Multiple Choice. There will be 22 questions in 50 minutes. Work is not graded in this part of the exam. There is no penalty for wrong guesses, since the penalty was phased out in 2011.

Part II. Free Response. There will be one multi-part question of standard length (13 minutes) and one “project-type” question that requires in-depth thinking and writing (25 minutes).

Examples of all these types of questions can be found on the College Board website (see link at bottom of STAtistics Zone webpage) or in your Barron’s review book. You are responsible for all information covered in the textbook through Chapter 7, except for a few passages that we skipped here and there, such as §7.8. If you have any doubt about the reading assignments, please check the “archives” link at the top of the schedule.

We should be finished with the exam by about 9:30 a.m. In my experience, most students have ample time during Part I and run out of time in Part II. However, there is no extended time. All scores will be based on a modified “AP curve” in which an A (AP score of 5) generally requires a score in the mid-70s. As always, Mr. Hansen will use judgment to ensure that the curved scores make sense in context.

What to Bring to the Exam

 

Pencil is REQUIRED for Part I. You may use either pencil or pen for Part II.

 

A graphing calculator is required throughout. Bring spare batteries, since if your calculator dies partway through, you will be out of luck.

 

Scratch paper is not allowed.

 

A standard AP formula sheet will be provided for you. Examples are available both in your Barron’s book and in the Course Description PDF file (available on the College Board website).

 

One question you often see on the AP exam is one that asks you to “interpret the slope of the linear regression in the context of the problem.” For example, you may have a LSRL that uses speed (mph) as the explanatory variable and fuel economy (mpg) as the response variable, and maybe the slope is –0.28. Here is the wording you would be expected to provide for full credit:

For each additional mile per hour of speed, the linear model predicts a fuel economy reduction of 0.28 miles per gallon.

Note that all of the following are incorrect:

“Miles per gallon are predicted to be –0.28 times the speed in mph.” (No, that’s not what the LSRL says.)
“Each additional unit of x is associated with a reduction of 0.28 in y.” (Not in context of problem. Fail!)
“Miles per gallon decreases by 0.28 for each additional mile per hour.” (No, that’s only what the model predicts.)
“Speed is predicted to decrease by 0.28 mph for each additional mile per gallon.” (No, that’s backwards.)
“Miles per gallon are predicted to change by 0.28 for each additional mph in speed.” (Vague, sign omitted.)

 

W 1/18/012

Classes resume.

In class: Discussion of midterm exam.

Evening: Attend The Prep School Negro at Trapier Theater.

 

Th 1/19/012

No additional HW due. This is in order to provide time for you to attend Wednesday night’s screening.

 

F 1/20/012

HW due: Fill out your “I/E/P” (Incorrect/Partially Correct/Essentially Correct) ratings, using the scoring rubrics found here (Question 1) and here (Question 6). If you wish to see the original text of the questions, check here and here.

 

M 1/23/012

HW due: Read pp. 445-449 twice. This is not super-exciting reading, but it concerns the most important abstract concept of the year, the concept of a sampling distribution. There may be an open-notes quiz today to check for reading comprehension.

 

T 1/24/012

HW due: Read the following paragraph and take some reading notes. Then read pp. 451-459 and do the exercise at the bottom.

Reading Assignment to Supplement pp. 451-459:

AP Statistics is concerned with 4 principal topic areas, namely

          – exploratory data analysis
          – design of experiments and surveys
          – probability (including simulations and random variables)
          – inferential statistics

We did the first 3 in the first semester. The last topic will take us the rest of the year. Inferential statistics deals with many questions, but the most important goal is this: We use statistics in order to estimate parameters.

The two most important abstract concepts in the second half of the course are sampling distributions and statistical significance. You need to write your own definition, in your own words, of “sampling distribution,” but basically it should be something like this: the set of all possible values that a statistic can have when samples of a fixed size are chosen from some population of interest. As for statistical significance, that is a determination that a difference is too large to be plausibly explained by chance alone, or to put it another way, that a statistic is too far out in the tail of the purported sampling distribution to be plausibly explained by chance alone.

Statistical significance means that, as far as we can tell, the observed difference in means (t distribution) or proportions (z distribution for 1 or 2,  distribution for 3 or more) is most likely not a fluke. If we have statistical significance, we are saying that the observed difference is meaningful in some sense.

Warning: Statistical significance cannot be found hiding in a pool of data. We are not on a “witch hunt” for statistical significance, poking around patiently until we find some statistical significance somewhere. If we did that, we would be committing the “Texas sharpshooter fallacy,” a.k.a. the “data mining fallacy.” The reason is that any sufficiently large set of data will contain, by chance alone, some interesting patterns. We have to assert in advance that a pattern exists, then gather data to prove that the pattern exists to a degree that could not plausibly be explained by chance alone.

If you are familiar with Babe Ruth and the legend of the “called shot,” you understand exactly what the distinction is. If Babe Ruth had said, at the beginning of the season, “I will hit some home runs,” there would be no story. Boring! No significance! Since he was a power hitter (also a power striker-outer, but that is another story), it was a safe bet that he would hit some home runs. However, for him to point to a certain place in the outfield, as he did (according to the legend), and then hit a home run to that precise location, moments later, was certainly impressive.

In the same way, we would have some persuasive proof if we assert that a certain pattern exists, then gather data to show that the pattern exists to a degree that cannot be plausibly explained by chance alone. However, finding a few statistics that are “statistically significant” in a large data set is not surprising at all.

Exercise:

The green box at the top of p. 457 says that the CLT can be safely applied if  However, some distributions have such extreme skewness and/or “fat tails” that even n = 40 or 50 would not suffice for approximate normality of . At the end of the following paragraph, your textbook says, “In practice, however, few population distributions are likely to be this badly behaved.” Comment briefly, taking the crash of 2008 into account.

 

W 1/25/012

Quiz on recent class discussions. It is a safe bet, for example, that you will have to state the CLT.

HW due: Mark notational changes to the tan box on p. 465 as described below; read pp. 461-466; write #8.20.

Note: When reading pp. 461-466, remember that the notation used by our textbook (and many other college-level textbooks) is to let p = sample proportion (statistic),  = population proportion = probability (parameter). This is consistent with the general rule that Roman letters are used for statistics, while Greek letters are used for parameters. However, the AP exam uses a different notational convention, which is that  = sample proportion (statistic), p = population proportion = probability (parameter). Therefore, you need to translate all occurrences of p in your textbook into  when reading, and translate all occurrences of  into p. You can make the translation in your head, or you can mark the changes in your textbook. For tomorrow, you are required only to make the changes to the tan box on p. 465. Please make the changes in pencil, in case you wish to sell your used textbook at the end of the year.

 

Th 1/26/012

HW due: Read summary on pp. 469-470; write #8.22, 8.30, 8.37, and the question below.

Question: Why do we care about sampling distributions? What value could they possibly have?

 

F 1/27/012

HW due: Make sure yesterday’s written work is complete (matching against worked examples if necessary). Then, read pp. 475-480, 482-484 (middle). In the green box on p. 484, rewrite  as p and the formula as


 

M 1/30/012

No additional HW due. Please get plenty of sleep, recharge your batteries, and do something kind for a stranger.

 

T 1/31/012

HW due: Write review problems on p. 470 (#8.32, 8.33, 8.34, 8.36), plus full corrections of #8.30 and 8.37 from the solutions listed below.

8.30.

Assume that sample is an SRS of likely voters [must be stated].
Check: Is N  10n? Yes, assuming there are at least 10(500) = 5000 voters in the district. checkmrk2.bmp
Check: Is np  10? Yes, since 500(0.48) = 240 >> 10. checkmrk2.bmp
Check: Is nq  10? Yes, since 500(0.52) = 260 >> 10. checkmrk2.bmp



Thus we can use the N(0.48, 0.022343) distribution to approximate the sampling distribution of
 by calc.


8.37

Assume that the 50 shoppers constitute an SRS of all possible shoppers.
Check: Is population distribution normal? (Doesn’t matter, since n = 50 > 30.)
Check: Is population distribution free of extreme skewness or outliers? (Must assume!) checkmrk2.bmp



By the CLT, the sampling distribution of  is closely approximated by N(100, 4.24264).
Total amount spent exceeds $5300 when the mean for all 50 exceeds $5300/50 = $106.
 by calc.


 

 


Return to the STAtistics Zone

Return to Mr. Hansen’s home page

Return to Mathematics Department home page

Return to St. Albans home page

Last updated: 27 Feb 2012