Monthly Schedule

(STAtistics, Period B)

W 9/8/010

First day of class. Do not write in your textbook. Please return it immediately to Ms. Ulasevich in the bookstore. She has ordered the correct textbook by Peck, Olsen, and Devore (the vendor made an error) and will have a replacement for you in about a week. We will do non-textbook assignments in the meantime.

 

Th 9/9/010

HW due: Watch video topic #3B (Greek alphabet for statistics) and the quiz-prep material below.

There will be two 10-point quizzes. The first will be on the Greek alphabet, and the second will be based on yesterday’s class discussion. Here are the questions for the second quiz, with answers in italics (bracketed portions are optional).

 

1. What is mathematics? [The science of abstraction:] the study of patterns.

 

2. Are data plural? Yes, always.

 

3. What are data? Facts that are either categorical or quantitative.

 

4. What is statistics? The study of data [often considered to be a branch of applied mathematics, though increasingly seen as a separate field].

 

5. What is a statistic? A number computed from data.

 

6. What is a parameter? A number that describes a population. [Also, in mathematics, economics, engineering, game theory, and many other fields, the term “parameter” often means an adjustable constant.]

 

7. What is our course about? Variability.

 

8. Say more . . . We use statistics to estimate parameters. Knowing how variable a statistic is allows us to say how confident we are about our estimate. [Point estimates are for sissies.]

 

9. What is a distribution? The set of all possible quantitative outcomes. [Note: We never use the word “distribution” when referring to categorical data. Categorical data are often depicted with bar graphs, a topic that you probably understood fully before you were a teenager. Quantitative data require histograms, not bar graphs.]

 

10. What are some common ways of depicting distributions? Boxplot, modified boxplot, stemplot, histogram, [and perhaps most useful of all, the relative frequency histogram, in which the probabilities are shown adding up to 1].

 

11. So how is it that we could use a statistic to estimate a parameter? We consider the “DOAPS”: the hypothetical distribution of all possibilities for a statistic. [If the DOAPS has high variability, we cannot estimate the parameter very accurately, but if the DOAPS has low variability, we can estimate the parameter within a narrow confidence interval.]

 

[Bonus Question #12. Do we ever know exactly what the “DOAPS” looks like? No, and that is the hardest thing about our subject. It takes most people several months to get comfortable with the concept. People who have never studied statistics have no clue. The technical name for DOAPS is “sampling distribution,” and that is the term you should use on your AP exam. Most people can understand the term “distribution,” because a distribution is easy to understand from looking at a histogram, but the term “sampling distribution” is tricky. The distinction is that in a sampling distribution, the quantities in the distribution are statistics instead of data values.]

 

F 9/10/010

HW due:

1. Watch the Simpson’s Paradox video and take some notes. Reading notes will be graded. They may be brief and cryptic, but you must have some notes of your own. For full credit, follow the formatting requirements.

2. Read Mr. Hansen’s absence and tardiness policies. A quiz is possible either today or next week.

 

M 9/13/010

Quiz (10 pts.) on last Friday’s video. A second quiz on the absence and tardiness policies is likely.

HW due:

1. Read the following paragraphs:

We use the lower case letter s to denote standard deviation as a statistic, and we use  (lower case sigma) to denote standard deviation as a parameter. This convention of using Roman letters for statistics and Greek letters for parameters is common in our subject, though there are a few exceptions.

The formula for s, which you will never need to memorize, is , where n = number of data values. Your calculator computes s for you.

We will always ignore the value that the calculator gives for , with one exception that we will encounter in a few months. However, for all practical purposes, you can say that you will ignore the value of  shown by the calculator.

The calculator does not have the ability to show a lower case s with a subscript of x. Therefore, your calculator shows standard deviation as Sx. However, you will never write it that way. Always write it as s.

2. Write out the answers to the following questions concerning Bill Bulldog’s morning commuting times over the course of 2 weeks, which were as follows (in minutes):

14, 17, 14, 15.5, 15, 15, 19, 14, 15, 18

(a) Sketch a histogram, a stemplot, and a boxplot of these data. You supposedly learned all of these in previous classes. If you have forgotten what the terms mean, look them up online.

(b) Compute the mean, mode, and median of the data. Use the notation  to label your mean, the notation Q2 to label your median, and the word “mode” to label your mode. Again, you were taught the terms mean, mode, and median in previous classes.

(c) Use the technique demonstrated in class on Friday to compute s. No work is required, but give your answer in the following format:

s = _____ minutes

 

T 9/14/010

HW due: Prepare for the quiz that was supposed to occur yesterday. Then answer the question below.

1. Throckmorton’s 11 closest friends have the following heights:

5'2", 5'7", 5'9", 5'10", 5'10", 5'10", 6'0", 6'0", 6'2", 6'4", 7'2".

(a) Convert all values to inches, and make a histogram and a boxplot.

(b) Use the STAT CALC L1 ENTER technique that we learned last Friday to compute the min., the first quartile, the median, the third quartile, and the max. (Scroll your display with the down-arrow to see all five values.) These five values are called the five-number summary of the data. Write the five-number summary.

(c) The IQR (interquartile range) equals Q3 − Q1. Compute the IQR for this data set.

(d) In class on Monday, we defined an outlier to be any value that is more than (1.5 · IQR) above Q3 or more than (1.5 · IQR) below Q1. Write one or two sentences to explain clearly why Throckmorton’s shortest friend is not an outlier, but his tallest friend is. Be clear and specific. (Don’t write something bland and vague like, “The 62-inch friend does not meet the definition of outlier, but the 86-inch friend does.” Instead, explain exactly why the definition is or is not satisfied, using specific numbers.)

(e) Based on your judgment of the histogram and/or boxplot, do these data exhibit skewness? (Either “yes” or “no” will be accepted as an answer. You must explain your reasoning.)

(f) One rough measure of skewness is the difference between the median and the mean. How close are the median and the mean in this example? Show your work, and be sure to use correct notation (symbols, that is) when referring to the median and mean.

(g) Occasionally we may find it useful to compute the trimmed mean, which is the mean after removing any outliers. Compute the trimmed mean for this data set, and write a sentence or two to assess whether or not the trimmed mean gives a truer sense of the central tendency of the data set than  does.

 

W 9/15/010

HW due:

1. Since we are still waiting for our textbooks to arrive, let’s all listen to Monday night’s 60-minute-long radio program (scroll to the bottom of the page to see the audio link) or, if you are in a hurry, read the transcript. The program itself is better than the transcript, in my opinion, since it gives a good feeling for the central role that statistics will play in U.S. educational reform. Also, make sure you are ready for possible quizzes on both this radio program and last week’s video. Both quizzes will be “open notes.”

2. Today and every Wednesday until December, there is a recurring assignment to read the “Quick Study” segments in The Washington Post. These segments, and there are usually three of them each week, give a brief glimpse of statistics as used in the field of medical research. If you do not receive a newspaper at home, or if you are unable to read the copy in the STA Ellison Library, here is a link to the most recent online version (sometimes the article is omitted in the print edition for space reasons). The date of this article is 9/7/2010, and this time there is only one segment instead of the usual three. Again, your quiz on this material will be “open notes.”

 

Th 9/16/010

HW due:

1. Be ready for a possible quiz on the Post “Quick Study” article.

2. Below are the aggregate quiz results for students who have taken all four 10-point quizzes.

26, 30.5, 32, 32, 33, 33.5, 34, 34.5, 34.5, 35.5, 36, 37.5, 38, 39, 40

(a) Make a histogram, a boxplot, and a stemplot for these data.

(b) Compute the median, standard deviation, and five-number summary for the data. Use proper notation.

(c) One measure of skewness is the direction in which the mean deviates from the median. If the mean is pulled lower than the median (by a long left tail of the distribution), we say that the distribution is skew left. If the mean is pulled higher than the median (by a long right tail), we say that the distribution is skew right. Show that this distribution has little skewness by this measure.

(d) Another way of gauging skewness is to look at the boxplot and see which tail is longer. By this measure, in what direction does the skewness lie?

3. Below are Mr. Hansen’s morning commuting data for the first half of September. All times are in military format (HHMM) to the nearest half minute.



(a) Explain why it is not possible to work with the data in their current format, not even if the fractions are converted to decimal values (for example, coding the start time for 9/2 as 702.5). Hint: Look at all of the data.

(b) What would you recommend as a standard method of “coding” the data for use in statistics and graphs? Many correct answers are possible.

(c) If we are looking for patterns in the data, what should we do with anomalous data or missing data values? (See, for example, the data for 9/8.) Explain your reasoning.

(d) Look for patterns in the data. List two or three interesting patterns that you observe. (No, the fact that Mr. Hansen nearly always takes the same route to work is not interesting.)

(e) Carefully describe several statistics that would help us find interesting patterns in the data. Try to go beyond the obvious ones (mean, median, standard deviation, IQR), though you are certainly free to use some of those if you wish. Don’t simply say “mean,” though, since that is vague. Mean of what?

 

F 9/17/010

Catch-up day (no additional HW due; use the time to get caught up). A quiz on recent class discussions is likely.

 

M 9/20/010

HW due: Consider the updated commuting data below, and answer the questions that follow.



1. Code the start times as “minutes after 0600.” For example, the first entry would be 73, and the last entry would be 49. Record your column of data on your paper and in your calculator’s L1 list.

2. Code the arrival times in the same manner. The first entry would be 116, and the last entry would be 84. Record your column of data on your paper and in your calculator’s L2 list.

3. Compute the elapsed times by punching L2 − L1 STO L3 ENTER. Record the values from column L3 on your paper so that you have a convenient record of them.

4. Plot a scatterplot (i.e., xy-plot) having L1 on the x-axis and L3 on the y-axis. You learned how to do this in precal. [If you have forgotten the steps, they begin with 2nd STATPLOT ENTER On. Then highlight the first pictograph, the one that looks like a tiny scatterplot, press ENTER, set Xlist to L1, set Ylist to L3, set Mark to the square, and press ZOOM 9.]

5. Record a rough sketch of your scatterplot on your homework paper. Plot all 12 ordered pairs. (You can plot them as dots to save time.) As for the axes, your required minimum labeling is at least 2 x-values, at least 2 y-values, description of each variable (in this case, “departure time” for the x-axis and “elapsed time” for the y-axis), and the units (in parentheses). Thus the x-axis should be labeled with at least two values (the min. and max. will suffice) and the following label:

Departure Time (minutes after 0600)

Similarly, the y-axis should be labeled with at least two values (again, the min. and max. will suffice) and the following label:

Elapsed Time (minutes)

6. Make sure that your calculator’s diagnostics are turned on. You will want to leave them turned on throughout the entire school year. (If you buy a new calculator, or if you change the backup battery on your calculator, you need to remember to re-enable diagnostics.) The keystrokes for the DiagnosticOn command are as follows:

2nd
CATALOG (the 2nd function of the 0 key)
D (the x−1 key; note that the alpha lock is automatically enabled for you)
(now press the down-arrow key about 8 times to highlight DiagnosticOn)
ENTER
ENTER

7. Perform a linear regression of L3 upon L1, and write down the linear correlation coefficient, r. [Again, you learned this skill in precal, but if you have forgotten the steps, here they are: STAT CALC 8 L1, L3,Y1 ENTER. The commas are required, and the way you get Y1 is by using VARS Y-VARS Function Y1.]

8. Press ZOOM 9 to see the linear regression line overlaid on your scatterplot. Add this regression line to the scatterplot sketch you made in exercise 5 above.

9. Mr. Hansen believes that the later he leaves for STA in the morning, the longer the trip takes. Do Mr. Hansen’s data prove this claim? Do they corroborate this claim? Do they refute this claim? Explain your answer briefly.

In class: Watch and discuss Dr. Hans Rosling’s TED video.

 

T 9/21/010

HW due:

1. Pick up your textbook at the bookstore between 3:00 and 3:30 on Monday. If you are unable to do this before the bookstore closes at 3:30, then come in early Tuesday morning to do your reading in the Math Lab. (I will have one or two books you can borrow.)

2. Read pp. 1-19 for review purposes. Reading notes are required, as always, but you can zip efficiently through this reading. For example, most of pp. 2-3 can be scanned quickly, since the examples, while somewhat interesting, do not contain anything you would be expected to remember. The bottom of p. 3 is much more important and should be read carefully. Pay special attention to definitions, tan and green boxes, and section subheadings. There are a few new terms sprinkled here and there (for example, in the middle of p. 14) that you are responsible for knowing. An open-notes quiz is possible.

3. Several students did not finish their HW that was due yesterday. If you are one of those people (e.g., if you failed to compute r or failed to plot the regression line overlaid on your scatterplot), you are expected to make those additions. The assignment may be picked up and graded a second time.

 

W 9/22/010

HW due: Prepare for a quiz on last week’s Quick Study, as well as this week’s Quick Study. There is only one capsule article for each week. Also read pp. 27-46 in your textbook (through the bottom of the shaded box).

 

Th 9/23/010

HW due: Read pp. 46 (middle)-49, 51-54, 61-63; write p. 67 #2.58, 2.60.

 

F 9/24/010

HW due: Read pp. 75-83, 87-92, 97-104 (top); write p. 116 #3.29.

 

M 9/27/010

HW due: Read pp. 104-113, 117-124 (top), 127-130; write p. 138 #3.56. If you have some extra time, you may wish to look at an actual Mr. Hansen test from 2008. This year, if I were to recycle portions of the test (as is likely), I would add some questions on Simpson’s Paradox, animated scatterplots, and educational reform. Since we have not yet studied z scores and normal quantile plots, I would also delete the following questions from the sample test:

8(b)
10 [the answer is z]
11(b)
12(b)
13

 

T 9/28/010

Test (100 pts.) on all material so far, except for the “Quick Study” articles. Note that anything discussed in class, including the technical meaning of error, the meaning of acronyms in the news such as IRB and VAA, and the whimsical but quite useful regression rule , are fair game for the test. You can earn a good score by paying attention in class, but to get an A+, you will also need to review and reflect upon your class notes.

 

W 9/29/010

No additional HW due, but older assignments may be spot-checked. The “Quick Study” quiz that would normally occur today has been postponed to tomorrow so that you have a day off after your test.

 

Th 9/30/010

HW due: Read pp. 147-156; change all occurrences of p to  in the tan box on p. 155; write p. 157 #4.6, p. 159 #4.10, and the required reading notes below.

[Copy this paragraph and the table below it into your reading notes. The paragraphs after the table are optional.]

Notation: Our textbook uses p for sample proportion and  for population proportion (i.e., probability). This is consistent with the general rule that Roman letters are for statistics and Greek letters are for parameters. Many other textbooks follow this convention. However, the AP exam uses , not p, to denote the sample proportion, and p, not , to denote probability. To make matters even more confusing, the same letter is used for a completely different purpose in the second half of the course. Here is a handy translation table:

 

 

 



sample proportion


population proportion (probability)

P-value of a statistical test (second semester only)

 

 

many college-level texts

lower case p

lower case p

 

 

our textbook

lower case p

capital P

 

 

TI-83/84 calculators

lower case p

lower case p

 

 

what Mr. Hansen wants

lower case p

capital P

 

 

AP exam

lower case p

capital P

 

 


[Optional additional notes.] The only notation that everyone seems to agree on is the way to indicate the probability that a named event (such as event A) occurs. That is universally shown as P(A), using a capital letter P and parentheses surrounding the letter of the event.

For example, if we wish to compute the probability of getting a king and an ace, in that order, when 2 cards are drawn without replacement from a well-shuffled deck, the answer would be



The vertical bar symbol, | , is read as “given.” Note that  would give the wrong answer, since we were told that the cards are drawn without replacement.

In class: Surprise visit from 14-year-old Davidson Fellow, Meredith Lehmann of La Jolla, California.

 

 


Return to the STAtistics Zone

Return to Mr. Hansen’s home page

Return to Mathematics Department home page

Return to St. Albans home page

Last updated: 01 Oct 2010