F 10/1/010
|
No class (Form VI retreat).
|
|
M 10/4/010
|
No additional HW is due.
However, please make sure that your previously assigned problems are well
organized and as complete as possible.
|
|
T 10/5/010
|
HW due: Read pp. 159-192
including all examples but skipping over the pages of exercises. This is a
net of about 25 pages, but you can spread it over two nights. Then write the
following questions, showing your work where appropriate. (Yes, I realize
that some of the computations are easy enough to do in your head. That
doesn’t matter. You still need to “teach” the material to the reader as
illustrated in class yesterday.)
pp. 167-168 #21a*bc and #4.23 [see note below]
p. 175 #4.31
pp. 183-184 #4.36, #4.40, #4.41.
* For #21a on p. 168, go ahead and actually compute the mean when combining
Los Osos and Morro Bay. This is called a weighted
mean or a weighted average.
In class: Substitute teacher will show you this 27-minute
video on normal distributions. You
are required to watch the video twice.
If the substitute teacher needs help getting popup windows to display, I
trust that you can lend a hand. The video is #4 in the series of 26
statistics videos entitled “Against All Odds.”
On the second pass, please ask the substitute teacher to skip over the
beginning credits (the first 1:45 of the video), the segment featuring the
Boston Beanstalks (from about 15:21 to 17:31), and the long interview with
Stephen Jay Gould (20:29 to 24:45). Gould died in 2002, for what it’s worth.
When you make those cuts, the running time will be 27 minutes for the first
pass and 19 minutes for the second pass. Therefore, you should be able to accomplish
both viewings during the 50-minute class period.
If for some reason you are unable to watch the video a second time at school,
I expect you to watch it a second time at home. This is a great way to learn!
You may be a little bored, but you will also notice that you pick up details
on the second time that you missed the first time.
Note-taking during the video is encouraged, since if there is a quiz at a
later date on the video, you will be permitted to use your notes.
|
|
W 10/6/010
|
No additional written HW is
due, but be sure that all of your previously assigned problems are complete
and well organized. Make sure you have read through p. 192, with reading
notes.
|
|
Th 10/7/010
|
Quiz (10 pts.) on last
Tuesday’s Quick Study and this
Tuesday’s Quick Study.
HW due: Read pp. 199-207; write pp. 207-208 #5.1, 5.2, 5.4.
|
|
F 10/8/010
|
No school (faculty
professional day).
|
|
M 10/11/010
|
No school (holiday).
|
|
T 10/12/010
|
HW due: There are two
required parts. People who turned in a randomization methodology last
Thursday are exempt from having to redo #1 (even if it was not very good),
but everyone needs to do #2.
1. Write up a randomization
methodology for choosing groups and group leaders. There are to be 6 groups,
with 3 students in each group except for a single group of size 4.
Student names are as follows: Alex, Andrei, Andrew, Brennan, Chick, Daniel,
Dominique, Edward, Jamie, Jordan, Julien, Justin, Nick R., Nick S., Ousmane,
Phineas, Preston, Tip, Zeke.
Important: Your methodology must be
neat and legible. Somewhere between a half page to a full page will be
required, depending on how elaborate your steps are. For example, writing
“Draw names from a hat; the first 6 names chosen will determine the group
leaders” is not nearly enough to write, because two people executing the same
instructions from the same sequence of drawn names could easily come up with
many different reasonable ways of interpreting the instructions. One person
may fill group 1 before going on to groups 2 through 6, whereas another
person may fill all the groups simultaneously (in the manner of dealing cards
from a deck). One person may stir up the names in the hat before drawing a
new one, and another may leave the names in the hat in essentially their
original order (thus biasing the drawing). One person may write only first
names on the hat, and another may write full legal names. One person may
discard each entry as it is drawn, while another may return names to the hat
each time and ignore duplications if they occur. One person may decide that
the final name assigned is determined by default, while another may insist on
drawing the final name as a cross-check to make sure that no errors have
occurred. One person may fill the even-numbered groups first, while another
may fill the groups in reverse order from 6 to 1, and a third person may wait
until the end to number the groups. One person may perform the drawing in
secret, while another insists on having the entire process observed by the
class so that everyone can agree that the process was fair. One person may
use numbers instead of names, taking care to underline the 6 so that it is
distinguishable from the 9, and another may insist on using full legal names.
One person may specify that the slips of paper are of uniform size and shape,
while another specifies that the pieces of paper are randomly sized but are
numbered or marked in random order in order to preserve randomness.
Do you see how many choices remain? You can probably come up with even more
questions that are not answered by the initial terse instruction.
If your instructions are in any way unclear, you will need to rewrite them to
remove all ambiguity. Also, please try to come up with something more
creative than drawing slips of paper from a hat.
If you submitted a methodology writeup last Thursday, you are exempt from
this task (even if your writeup was not very good), but you are welcome to
try again anyway. The skill to learn here is the skill of organizing your
rules clearly, so that any reasonable person could implement the rules or,
equivalently, read the rules and then act as a judge to determine whether
somebody else had implemented them properly. If the answer to the question,
“Is this permitted by the rules?” is, “I don’t know; let’s ask Mr. Hansen for
a ruling,” then your methodology is not yet clear enough.
Does this skill have real-world value? Definitely . . . if you plan on going
into engineering, law, computer science, medicine, business management,
accounting, or almost any other field where following certain rules and
procedures is important. As you become more senior in your field, your job
will consist less of following
rules and more of writing rules and
guidelines for other people in the profession to follow.
“But wait, Mr. Hansen! I want to be an artist, a small-business entrepreneur,
or a professional athlete. This is a waste of my time!” Admittedly, the skill
of writing detailed rules is generally not important in those professions.
However, I still want you to learn the skill, if for no other reason than to
help you recognize when other people have done it poorly.
2. Read pp. 210-217 (taking care to
ignore the tan box on p. 213, which we will never use); write pp. 217-218 #5.17.
|
|
W 10/13/010
|
Quiz (10 pts.) on yesterday’s Quick
Study. By now you should know how to score high on these quizzes. Forget
about memorizing trivia. Instead, read the article with your new “statistics
student” eyes. What are the important aspects of the article from a
statistics standpoint? What is missing, if anything? What is the conclusion,
and is it statistically significant?
HW due: Each group must submit an exploratory data analysis project proposal
(approximately one page) with milestones. If you cannot think of something
exciting to do, you can always analyze my commuting data, which now has
nearly six weeks’ worth of morning drama distilled down to a single table.
Ground rules: No exemption if your
group leader is absent today (or yesterday, for that matter). Life goes on.
Choose a deputy and soldier forth. If you don’t know or don’t remember what
group you are in, use some of that networking skill that your generation is
so savvy with.
More ground rules: Plan on
collecting at least 30 data points for at least 2 variables of interest (more
would be better). Describe how you would conduct univariate (histogram,
boxplot, etc.) and bivariate (scatterplot, regression, etc.) data analyses in
your search for interesting patterns. If you have any research questions that
would guide or motivate your data explorations, you may mention them as well,
but that is not required at this point. For the most part, we will postpone
doing inferential statistics until the second semester.
Opinion surveys are permitted, but if you gather opinions, be sure to collect
at least two quantitative variables as well so that you will be able to run
some linear regressions.
When listing your milestones, give the estimated date by which each of the
following will be complete: final methodology document, data gathering,
spreadsheet compilation, consultation with Mr. Hansen, first draft and
feedback from Mr. Hansen (optional), final draft, and project submission.
Your final report should be 3-5 pages, plus appendices for your raw data (one
row per record), survey instruments (if any), and any artifacts of the
data-gathering process such as consent forms, data recording forms, and
instruction forms. There will be a small point deduction for each spelling
and typographical error, as well as a deduction for use of color graphics.
(Any good report should still be able to tell its story with black-and-white
graphics, even if they do not look quite as appealing.)
The job of the group leader is to keep everyone working productively and to
be the “heavy” when people miss meetings. The group leader must attach a
short report to the final report stating what each person’s contribution was (in
specific detail) and recommending a
division of points.
|
|
Th 10/14/010
|
HW due: Read first tan box
on p. 228 and the review material below; write p. 234 #5.37, 5.38.
You were taught how to make a residual plot in precalculus. In case you have
forgotten, the instructions are reproduced below.
Residual Plot Quick Review
Every time you perform a regression on your calculator (STAT CALC followed by
any choice except for 1 or 2), the special list called RESID is recomputed
for you. Beware that if you need the values in the RESID list, you should
save them to another list beforehand, since they will be “clobbered” the next
time you perform a regression.
Since residuals are a type of “error” in estimation, the entries in the RESID
list are computed according to the error formula we discussed earlier in the
year: actual minus predicted. In other words, For linear
regression, this is the same as the formula your book gives at the top of p.
222. However, note that a residual can be defined for any type of regression, not merely linear regression. The formula
is always the same for any type of regression: 
The general process of finding a curve (i.e., mathematical model) to fit a
set of data points in a scatterplot is called curve fitting. The most common type of curve fitting is linear
regression (STAT CALC 8), and in fact if you get an MBA you will seldom do
any other type of curve fitting. The “curve” in this case would simply be a
straight line, which is called the least-squares regression line or LSRL for
short. The term “least squares” refers to the fact that the LSRL is the
unique line that minimizes the sum of the squared residuals.
If you understand the definition of residual as “actual minus predicted” or
“observed minus expected,” you should understand why we say that a positive residual indicates a data
value that is above the curve, while a negative
residual indicates a data value that is below the curve. Being above the
curve, after all, means that y is
greater than , which means that the residual, namely , is positive. By contrast, being below the curve means
that y is less than , which would make be negative.
In any LSRL, the sum of the residuals must always equal 0. This fact can be
proved mathematically, though the proof is beyond the scope of our course.
Note that there are other types of linear regression for which the residuals
do not necessarily add up to 0, and in nonlinear regression
(quadratic, exponential, sinusoidal, etc.), the residuals almost never add up
to 0.
You could, if you had to, use the residual formula to compute the residuals
one by one and store them into a list for plotting on a scatterplot. You
could, but we never do, since that would be a waste of time. Instead, we
typically create the residuals at the same time we perform a linear
regression. The keystrokes are
STAT CALC 8 L1,L2,Y1 ENTER
2nd STAT PLOT ENTER
Choose “On” and choose the first icon (indicating a scatterplot chart type).
Set Xlist to L1 and Ylist to RESID.
Finally, press ZOOM 9 and copy the dots (reasonably accurately) onto your
paper.
Important: After plotting your points, press the WINDOW key so that you can
tell what the bounds of your graph are in the x and y directions.
Label each axis (both x and y) with the following pieces of
information:
1. At least 2 numeric values (min. and max. will suffice)
2. Name of variable
3. Units in parentheses.
For example, in a graph of height as a function of shoe size, you would label
your x-axis with values from, say,
6 to 14, plus the following words:
Shoe Size (adult male units, U.S.)
You would then label the y-axis
with values from, say, 55 to 84, plus the following words:
Height (inches)
As you remember from precalculus, a residual plot that shows a random pattern
with no “flaring” left to right is desirable. Such a random pattern would
suggest that the regression fit you found is an appropriate fit to the data,
especially if the r2
value is close to 1 and the absolute size of the residuals is small.
A residual plot that shows a bowl-shaped, dome-shaped, or wavy pattern is
considered bad. These are all indications that your regression model is inappropriate.
The most common boo-boo is to use a linear regression when a power or
exponential fit would be more appropriate. You may have a high r2 value if you do this,
but it doesn’t matter! You will still be marked down by the AP graders, since
the residual plot would have a strong bowl-shaped look to it.
[This is the origin, by the way, of the famous catch phrase invented by
Eduard Ferrer: “That’s BOWL-SHAPED, Mr. Hansen!”]
|
|
F 10/15/010
|
Test (100 pts.) on all material covered so far this year
(through p. 217, plus the first tan box on p. 228), except for “Quick Study”
readings and the radio broadcast.
|
|
M 10/18/010
|
HW due at start of class:
Re-do your entire test from last Friday,
including all parts of #12 and #13, and record how long it takes you. You may
confer with other students if you wish, but the learning benefit is greater
if you push yourself to do the best possible job on your own before you break
down and call somebody or look in the textbook. The test may be re-graded for
accuracy as a double (or triple) homework assignment. Grammar, neatness, and
legibility count.
Also, group leaders must submit
final versions of group project methodology statements today. (These can be
turned in after school if necessary, but no later than 3:15 p.m.) You are
welcome to get started earlier on data gathering if you have verbal approval
from Mr. Hansen.
Groups:
Justin, Nick S., Andrei, Edward (sport vs. satisfaction)
Jordan, Dominique, Julien (personality impressions related to height)
Tip, Ousmane, Preston (original topic nixed; replacement TBD)
Brennan, Jamie, Zeke (political leanings by ZIP code)
Chick, Alex, Nick R. (workload and math grade by school)
Andrew, Phineas, Daniel (bag-lugging weight vs. STA form)
|
|
T 10/19/010
|
HW due: Read pp. 238-252
and work on group project.
|
|
W 10/20/010
|
HW due: Rework your test
from last week, especially all three
parts of the question regarding randomization, replication, and control.
I saw many wrong answers when I spot-checked.
|
|
Th 10/21/010
|
HW due: Write p. 253 #5.54.
Also, prepare for another “Quick Study” quiz from the Monday,
October 18, article that appeared in The
Washington Post.
|
|
F 10/22/010
|
HW due: Write p. 253 #5.55
(companion problem to yesterday’s assignment) and work on group project. Both
#5.54 and #5.55 are likely to be collected and/or spot-checked.
|
|
M 10/25/010
|
HW due: Before starting the
take-home test, read pp. 221-227. There is no new material here, and all of
the terminology (residual, residual plot, influential observation, etc.) has
been covered in class, but you may find the textbook reading to be helpful as
you prepare to take the test. Take some reading notes, as always.
Take-Home
Test (20 pts.) is due at start of class. Click the link at left to
view the test, and print it out. Before starting, make sure you have done all
the textbook reading (including the assignment immediately above), and treat
the test like an in-class test. The only way you will get better at taking
tests is by practicing. Taking the test with your textbook or notes open is
not a good assessment of what you have learned, since you will not fully come
to “know what you don’t know” (metaknowledge).
Find a clear work area where you will not be disturbed for at least 50
minutes. Your calculator is permitted, as always, but you should take the
test without using any notes. Record your time in the upper right corner of
the first page, in the space provided.
After you have finished the test, you may extend and revise your answers if
you wish. Use a different color of pencil (or a pen) for this purpose. I will
permit you to work together if you wish, but as I have explained repeatedly
in class, outright copying or
paraphrasing is prohibited and may be treated as an honor violation.
|
|
T 10/26/010
|
HW due: Work on group project.
Leaders may be required to give another update today.
|
|
W 10/27/010
|
HW due: Work on group
projects, and prepare for the Quick
Study quiz.
|
|
Th 10/28/010
|
HW due: Make sure you are
still prepared for yesterday’s quiz, and work on your group projects.
In class: Work on the following “Excelcise” (Excel exercise). There is a time
limit of 5 minutes, and everyone must eventually pass. My best time, with
practice, is 2:10. It is unlikely that anyone will pass on the first day, but
who knows? Anyone who is proficient at text messaging should be able, at
least in theory, to learn these steps. Mouse usage is permitted, but you will
find that the less you use your mouse, the less time you will waste.
1. Begin a new workbook in Excel (File / New, or Ctrl+N if you
are in a real hurry).
2. Enter the following four headings in row 1, columns A through D:
SubjectID NumSiblings NumLivingGrandparents NumExGirlfriends
3. Make row 1 bold, not only for the four existing entries, but also for any
future column headings in row 1.
4. Freeze the worksheet panes so that row 1 and column 1 will always be
visible. (Place cursor in cell B2 and issue the Worksheet / Freeze
Panes command, or Alt+W F. Note that if you accidentally put your cursor in
cell A2 instead of B2, freezing the panes will freeze row 1 but no columns.)
5. Place the numbers 1 through 4952 in cells A2 through A4953. Leave these
values as values, not as formulas.
6. Directly underneath the subject numbers, type the following in rows 4954
through 4965, leaving row 4961 blank:
mean
s.d.
min
Q1
median
Q3
max
Correlations (r values)
NumSiblings
NumLivingGrandparents
NumExGirlfriends
7. Apply bold formatting to rows 4954 through 4965.
8. Highlight columns A-D, and use Format / Column / Autofit
Selection (Alt+O C A) to widen them appropriately.
9. Fill cells B2 through D4953 with random integers between 0 and 4,
inclusive. The formula is
=INT(RAND()*5)
10. Replace cells B2 through D4953 with their values. (Keystrokes: Ctrl+C,
Alt+E S V Enter.)
11. Create suitable formulas in cells B4954 through D4960. Hint: Do column B first, then copy to
the right.
12. Create named ranges from the names in cells B1 through D1. (Keystrokes:
Ctrl+Home, UpArrow, highlight B1 through D1, Shift+Ctrl+DownArrow, press
UpArrow 7 times while still holding Shift, Alt+I N C Enter.)
13. Create suitable correlation formulas in cells B4963 through D4965. You
can make one formula in cell B4963 and then copy it:
=CORREL(INDIRECT($A4963),INDIRECT(B$1))
14. As a check on your work, observe that the five-number summaries for all
three data columns are 0,1,2,3,4. Also observe that the sample means are all
close to 2, and the sample standard deviations are all close to . Finally, observe that the correlations are 1 for each
column when paired with itself, but the correlations are all close to 0 when
each column is paired with a different column. Correlations have the same
value regardless of the order in which the columns are specified.
|
|
F 10/29/010
|
End of Q1. All Mathcross puzzles, group projects, and
any other graded items must be submitted by 3:00 p.m. today.
|
|