AP Statistics / Mr. Hansen |
Name: _________________________ |
The
“Must-Pass” Quiz: Partial Answer Key
1.* |
A number computed from
data. [You should provide examples.] |
|
|
2.* |
A number that describes a
population. [You should provide examples.] |
|
|
3.* |
An “adjustable constant”
that defines the nature of a mathematical model, much as a tuning knob or
volume slider adjusts the output of a television or radio. |
|
|
4. |
Uniform: min and max [also
need to know whether distrib. is discrete or
continuous] |
|
|
5. |
Uniform: flat line in
relative frequency histogram |
|
|
6. |
Range is a single number for the spread of values
in a column of data: range = max – min. People who say things like “the range
is from 28 to 75” are misusing the term in its statistical sense. |
|
|
7. |
IQR (interquartile range) = Q3 – Q1. Use STAT
CALC 1 to get 5-number summary, then |
|
|
8. (a) |
Easiest way is to make
modified boxplot, then TRACE to see the points (use arrow keys). Outliers are
more than 1.5IQR below Q1 or more than 1.5IQR above Q3. |
(b) |
No rule of thumb—just judge
visually. Outliers have “large” residuals. |
|
|
9. |
Explanatory, response. |
|
|
10. |
Mean squared error = pop. variance (mean squared deviation from the mean). Sample
variance is different, since denom. is n
– 1 instead of n. |
|
|
11. |
Pop. s.d. ( |
|
|
12. |
In a normal distribution (required), the distribution curve is
bell-shaped, satisfies the 68-95-99.7 rule, and has inflection points at |
|
|
13. |
Lack of symmetry. Right
skewness means the central hump dribbles out to the right, forcing mean > median,
since mean is less resistant to extreme values. Left skewness is the
opposite, forcing mean < median. Easy ways to detect skewness involve
looking at histogram, boxplot, or stemplot to see where the tail is longer.
If you use NPP, trace dots from left to right; if they bend to left, plot
shows left skewness, but if they bend to right, plot shows right skewness. |
|
|
14. |
Easiest way is look for a
pattern that is not straight in NPP. If you are a glutton for punishment (as
on #4 from the Chap. 13-14 free response), you can use |
|
|
15.* |
Linear least-squares. [It
is not sufficient to say linear, because the LSRL is not the only type of
linear regression. For example, there is the median-median line, which is
useful in some situations and which is more resistant than the LSRL.] |
|
|
16. |
Slope, since it estimates
how many response units will increase (or decrease) for each additional
explanatory unit. Intercept is less crucial, even meaningless in some
contexts. |
|
|
17. |
Linear correlation
coefficient. Signed strength of linear pattern (–1 = pure negative linear
association, 0 = no linear association,
+1 = pure positive linear association.) Use STAT CALC 8 and make sure your
Diagnostics are on (2nd CATALOG DiagnosticOn). |
|
|
18. |
Coefficient of
determination. Tells what portion of the variation in one variable can be
explained by variation in the other. If r
= .8, then 64% of the variation in y
(or x) can be explained by
variation in x (or y). It is also acceptable to say that
64% of the variation in the response variable (y) is explained by the LSRL model. [The other 36% is due to
randomness or other factors.] |
|
|
19. |
No; yes. |
|
|
20. |
No; yes. |
|
|
21. |
STAT CALC 8, or with
formulas 6 and 8 on first page of AP formula sheet. (Never use formula 5.) |
|
|
22. |
[See LSRL Top Ten.] |
|
|
23. |
Resid. = |
|
|
24. |
[See LSRL Top Ten.] |
|
|
25. |
Regression outlier and influential observation are not synonyms.
A point can be a regression outlier (large residual), but if it is near the
center of the x values, it is
usually not influential. Similarly, a point can be influential (large effect
on slope or r if removed) but have
only a small residual, meaning the point is not an outlier. It is also
possible for a point to be both influential and an outlier. |
|
|
26. |
b0 = value of response if explanatory variable (x value) is set to 0 |
|
|
27. |
Random variable (discrete
or continuous). [You should provide examples.] |
|
|
28. |
r.v., |
|
|
29. |
r.v., variance, Var(X), |
|
|
30. |
sum, sum, means; yes; mean
of difference equals difference of means |
|
|
31. |
sum, sum, variances; true only for independent r.v.’s;
variance of difference (assuming indep. r.v.’s) equals sum of variances |
|
|
32. |
scalar (i.e., a constant),
scalar, |
|
|
33. |
r: no
change |
|
|
34. |
Standardized
(dimensionless) representation of a data point, in s.d.’s. |
|
|
35. |
events, mutually exclusive;
independence; no; independence of A
and B means P(A|B) = P(A), which is not at
all the same as |
|
|
36.* |
The aspect of probability
that we care most about is sampling distributions. If we understand the
sampling distribution of a statistic, we can determine how statistically significant
a result is. Without this, we would never know whether experiments or
clinical trials of new drugs were showing anything of value or were merely
“flukes.” |
|
|
37. |
Sampling distribution of |
|
|
38. |
standard error, s.e. |
|
|
39.* |
Law of large numbers. WRONG: If
|
|
|
40. |
Central Limit Theorem. |
|
|
41. |
P-value, test;
principles of good experimental design; [add your personal description,
incorporating control, randomization of
assignment, and replication; if
you wish, add blocking (a form of
control that reaches its ultimate expression in the case of matched pairs)] |
|
|
42-52. |
[Research on your own,
please.] |
|
|
53. |
Two-tailed, since if the
experiment goes the wrong way (as sometimes occurs in science), there will
still be the possibility of making an inference. All decisions regarding
methodology are supposed to be made before
any data-gathering occurs. (Otherwise, people could say that the methodology
was tailored toward achieving a low P-value.
In theory, the experiment should be repeatable, so that anyone following the
same methodology would likely reach a similar conclusion.) |
|
|
54. |
It is possible to write a
true sentence using the words probability
and confidence interval. However,
it is also very easy to make an error along the way. That is why it is much
better to say, “We are 95% confident that the true proportion of voters
favoring Smedley is between 48% and 54%,” not anything involving probability.
Probability is a technical term meaning long-run relative frequency, and it
cannot be haphazardly misused in the way laypeople misuse it. |
|
|
55. |
We cannot prove H0. All we can do is judge
whether the evidence against it is “sufficient to reject” or “insufficient to
reject.” |
|
|
56. |
We can sometimes gather
overwhelming evidence that H0
can be rejected in favor of Ha.
In the real world, even in a court of law, that is good enough. (Of course,
in the world of mathematics, that is not considered a proof—one of the
reasons that mathematicians and statisticians do not consider themselves to
be equivalent.) |
|
|
57.* |
[You’d better know this by
now!] |
|
|
58.* |
inferential, use statistics
to estimate parameters |
|
|
59. |
[I think everyone can do
this.] |
|
|
60. |
Blocking
is a form of control in which similar experimental units are grouped (for
example, by age or gender) before being randomly assigned to treatment
groups. |
|
|
61. |
The first one (unequal
proportions) is for a 2-prop. z confidence interval, and the second one is usually for a 2-prop. z test. |
|
|
62. |
False: If there are matched
pairs, you have only one sample (namely, a column of differences). |
|
|
63. |
Bias = any situation in which
the expected value of a statistic does not equal the parameter being
estimated. Selection bias refers to
a methodology that produces samples that are systematically different from
the population in a way that causes a parameter to be systematically underestimated
or overestimated. An SRS is not biased; although an SRS often fails to match
the population, the differences are random
differences, not systematic
differences. |
|
|
64. |
|
|
|
65. |
[I hope you have thought about
this. This is a personal matter, but what I do is first to decide whether
there are proportions involved or not. Then, do we have 1 sample, matched
pairs (also 1 sample), or 2 real samples? Or is this a |
|
|
66-74. |
[See TI-83 STAT TESTS Summary.] |
|
|
75. |
“Since P = ____ , which is less than |
|
|
76. |
“Since P = ____ , which is greater than |
|
|
77. |
“We are XX% confident that the
true ... is between YY and ZZ.” Be sure to phrase the “...” in the context of
the problem, e.g., “true mean boiling point,” “true difference in voter
preference proportions,” “true mean improvement in test scores,” etc. |
|
|
78. |
Compute C.I. using TI-83.
Then punch upper–lower, i.e., VARS 5 TEST I – VARS 5 TEST H, divide result by
2 and STO into M (for m.o.e.). You can then write your C.I. as est. |
|
|
79. |
[See AP formula sheet.] |
|
|
80. |
Since |
|
|
81.* |
convenience, anecdotal;
voluntary response bias |
|
|
82.* |
Not really. For example,
the m.o.e. (at a 95% confidence level) of a 1300-person poll will be about 3
percentage points, regardless of whether the poll is taken in California or
in Wyoming. You do not need a larger sample to get the same accuracy
in California, even though the population of California is about 39 million,
more than 65 times larger than that of Wyoming. |
|
|
83. |
No; q; yes. |
|
|
84. |
Simple random sample; a
sample in which every possible subset is equally likely to be selected. |
|
|
85. |
SRS, since bias can
invalidate the results quite easily. Normality of population is not an issue in
large samples (courtesy of CLT), since normality of the sampling distribution
rescues us. |
|
|
86. |
Marginal probabilities =
fractions involving row or column totals divided by grand total. Conditional
probabilities = fractions involving individual cells divided by a row or
column total. Both are usually concerned with categorical data in 2-way
tables. |
|
|
87.* |
Just because an effect is
not plausibly caused by chance alone does not mean that it is large enough to
be of any real-world significance. The reverse situation is also possible. In
the presidential election of 2000, a 0.01% difference in vote totals in
Florida (a margin of no statistical significance whatsoever) was enough to
permit George W. Bush to defeat Al Gore in Florida and thereby in the overall
election, even though Bush lost to Gore by more than half a million votes
nationwide. George W. Bush’s election was thus a chance outcome, not
indicative of any general trend in voting preferences, but it had a huge
effect on U.S. history. |
|
|
88.* |
[I think everyone knows
this. In fact, you probably knew it before you took the course.] |
|
|
89.* |
Only a controlled
experiment is considered convincing. In situations (e.g., smoking in humans)
where it is not ethical to run a controlled experiment, various types of
observational and correlative studies can suggest, but not prove, a
cause-and-effect link. |
|
|
90.* |
[Everyone probably knows
this. Remember to discuss placebo effect and hidden bias.] |
|
|
91. |
Yes; perhaps many new
employees have been hired. |
|
|
92. |
Yes; the relative mix of
employee categories could be a lurking variable. Perhaps there are now
proportionally more employees in the higher-paid job categories, so that the
weighted average salary has increased even while each category has had cuts
in mean salaries. This would be an example of Simpson’s Paradox. |
|
|
93.* |
Using deceptive
(“gee-whiz”) graphs, changing the subject, confusing correlation with
causation, using inappropriate averages (e.g., mean with highly skewed
distributions), citing anecdotal data, using biased samples, concealing the
wording of a survey question, computing absurd precision with qualitative
data (e.g., “74% more beautiful skin!”), etc., etc. |
|
|
94.* |
Who says so? How do they
know? What’s missing? Did somebody change the subject? Does it make sense?
(For example, a claim that a child is kidnapped every 30 seconds in America
is absurd, since that would be more than a million children per year.) |
|
|
95. |
The last one. Statisticians
are mostly from mathematical or scientific backgrounds, which means we are on
a quest for truth. Our clients may mangle, misuse, and abuse our conclusions,
but we try very hard not to do that ourselves. |
|
|
96. |
Nobody knows. The statement
is usually attributed to Mark Twain, although he himself credited it to
Benjamin Disraeli. |
|
|
97.* |
Odds in favor = ratio of
favorable to unfavorable outcomes. |
|
|
98. |
For the individual player
who plays a few dozen hands of blackjack or pulls the arm on a slot machine a
few dozen times, the sampling distribution of net outcomes is relatively
short and wide, meaning that a good portion of the sampling distribution can
spill into positive territory even though the mean is negative. This is why
it is not rare for people to return home from Las Vegas as winners. (Fewer
than half are this lucky, but since the lucky ones are usually the only ones
who say anything, it is easy to get a false impression that winning is
common.) |
|
|
99. |
Confounding means that
group membership affects the response variable in a way that makes
determining cause and effect difficult or impossible. Because the designers
of a study or an experiment are often unaware of this relationship, confounding
variables (a.k.a. confounding factors or confounders) are sometimes called “lurking
variables.” Confounding makes us unable to see what portion of the response,
if any, can be attributed to the explanatory variable and not to the lurking
variables. |
|
|
100. |
It is true that poker has
chance elements. However, almost all games (such as golf, for example) have
chance elements: a puff of wind, a sneeze by a spectator, a bad bounce off a
yardage marker. In the long run, however, the most skillful players of poker
and of golf come out on top. In a poker tournament involving many rounds,
everyone’s luck will be approximately the same. The skill required to bluff
convincingly and wager appropriately will guarantee that many of the same
elite poker players end up battling each other at the World Series of Poker
each year. |