AP Statistics / Mr. Hansen |
Name: _________________________ |
The
“Must-Pass” Quiz for 2003: Partial Answer Key
1.* |
A number computed from
data. [You should provide examples.] |
|
|
2.* |
A number that describes a population.
[You should provide examples.] |
|
|
3.* |
An “adjustable constant”
that defines the nature of a mathematical model, much as a tuning knob or
volume slider adjusts the output of a television or radio. |
|
|
4. |
Uniform: min and max [also
need to know whether distrib. is discrete or
continuous] |
|
|
5. |
Uniform: flat line in
relative frequency histogram |
|
|
6. |
Range is a single number for the spread of values
in a column of data: range = max – min. People who say things like “the range
is from 28 to 75” are misusing the term in its statistical sense. |
|
|
7. |
IQR (interquartile range) = Q3 – Q1.
Use STAT CALC 1 to get 5-number summary, then |
|
|
8. (a) |
Easiest way is to make
modified boxplot, then TRACE to see the points (use
arrow keys). Outliers are more than 1.5IQR
below Q1 or more than
1.5IQR above Q3. |
|
|
(b) |
No rule of thumb—just judge
visually. Outliers have “large” residuals. |
|
|
9.* |
Explanatory, response. |
|
|
10. |
Mean squared error = pop. variance (mean squared deviation from the mean). Sample
variance is different, since denom. is n – 1 instead of n. |
|
|
11. |
Pop. s.d.
(s) and sample s.d. (s) are measures
of data dispersion (“spread”). Use STAT CALC 1 to compute, never the formula
on AP formula sheet. Technically, s equals the square root of MSE (square root of pop.
variance), and s equals the square
root of sample variance. |
|
|
12. |
In a normal distribution (required), the distribution curve is
bell-shaped, satisfies the 68-95-99.7 rule, and has inflection points at ±1s. |
|
|
13. |
Lack of symmetry. Right skewness means the central hump dribbles out to the
right, forcing mean > median, since mean is less resistant to extreme
values. Right skewness is the opposite, forcing
mean < median. Easy ways to detect skewness
involve looking at histogram, boxplot, or stemplot to see where the tail is longer. If you use NQP,
trace dots from left to right; if they bend to left, plot shows left skewness, but if they bend to right, plot shows right skewness. |
|
|
14. |
Easiest way is look for a
pattern that is not straight in NQP. If you are a glutton for punishment (as
on #4 from the Chap. 13-14 free response), you can use c2 g.o.f. to test for departures from expected bin counts.
There are also several standard “canned” tests that are beyond the scope of
AP Statistics. |
|
|
15.* |
Linear. |
|
|
16. |
Slope, since it estimates
how many response units will increase (or decrease) for each additional
explanatory unit. Intercept is less crucial, even meaningless in some
contexts. |
|
|
17. |
Linear correlation coefficient.
Signed strength of linear pattern (–1 = pure negative linear association, 0 =
no linear association, +1 = pure
positive linear association.) Use STAT CALC 8 and make sure your Diagnostics
are on (2nd CATALOG DiagnosticOn). |
|
|
18. |
Coefficient of
determination. Tells what portion of the variation in one variable can be
explained by variation in the other. If r
= .8, then 64% of the variation in y
(or x) can be explained by
variation in x (or y). |
|
|
19. |
No; yes. |
|
|
20. |
No; yes. |
|
|
21. |
STAT CALC 8, or with
formulas 6 and 8 on first page of AP formula sheet. (Never use formula 5.) |
|
|
22. |
[See LSRL Top Ten.] |
|
|
23. |
Resid. = y – yhat (i.e., actual y – predicted y). Resid. plot is scatterplot with RESID on y-axis and either the x or y variable on the x-axis. (It doesn’t matter, since x and y are linearly related.) In beginning statistics courses, we usually make resid. plot with x on the x-axis and RESID on the y-axis, but there was at least one AP exam that had y values on the x-axis of the resid. plot. Don’t let that bother you. |
|
|
24. |
[See LSRL Top Ten.] |
|
|
25. |
Regression outlier and influential observation are not
synonyms. A point can be a regression outlier (large residual), but if it is
near the center of the x values, it
is usually not influential. Similarly, a point can be influential (large
effect on slope or r if removed)
but have only a small residual, meaning the point is not an outlier. It is
also possible for a point to be both influential and an outlier. |
|
|
26. |
b0 = value of response if explanatory variable (x value) is set to 0 |
|
|
27. |
Random variable (discrete
or continuous). [You should provide examples.] |
|
|
28. |
r.v., Spixi, mean, expected value |
|
|
29. |
r.v., variance, Var(X), s2X,
square root, Var(X), sX |
|
|
30. |
sum, sum, means; yes; mean
of difference equals difference of means |
|
|
31. |
sum, sum, variances; true only for independent r.v.’s; variance of difference (assuming indep. r.v.’s) equals sum of variances |
|
|
32. |
scalar (i.e., a constant),
scalar, sX;
yes |
|
|
33. |
r: no
change |
|
|
34. |
Standardized (dimensionless)
representation of a data point, in s.d.’s. |
|
|
35. |
events, mutually exclusive;
independence; no; independence of A
and B means P(A|B) = P(A), which is not at
all the same as P(A Ç B) = 0 |
|
|
36.* |
The aspect of probability
that we care most about is sampling distributions. If we understand the sampling
distribution of a statistic, we can determine how statistically significant a
result is. Without this, we would never know whether experiments or clinical
trials of new drugs were showing anything of value or
were merely “flukes.” |
|
|
37. |
Sampling distribution of xbar or diff.
of means: Follows z if s is known (rare), otherwise t. |
|
|
38. |
s.e.; no idea; yes |
|
|
39.* |
Law of large numbers. WRONG: If phat < p, then the proportion of successes
will start to increase until we “catch up.” (Or, if phat > p, the proportion of successes will start to decrease until we
are “back down to the correct value.”) These are both wrong, because what
really happens is that the effect of any finite collection of observations
becomes diluted as n
®
¥. A coin has no memory,
no desire to set things right, and no ability to iron out past discrepancies.
Nevertheless, the proportion of heads—even if the coin is biased—will, over
time, approach whatever the true probability is. |
|
|
40. |
Central limit theorem. |
|
|
41. |
P-value,
test; principles of good experimental design; [add your personal description] |
|
|
42-52. |
[Research on your own, please.] |
|
|
53. |
Two-tailed, since if the
experiment goes the wrong way (as sometimes occurs in science), there will still be the possibility of making an
inference. All decisions regarding methodology are supposed to be made before any data-gathering occurs.
(Otherwise, people could say that the methodology was tailored toward
achieving a low P-value. In theory,
the experiment should be repeatable, so that anyone following the same
methodology would likely reach a similar conclusion.) |
|
|
54. |
It is possible to write a
true sentence using the words probability
and confidence interval. However,
it is also very easy to make an error along the way. That is why it is much
better to say, “We are 95% confident that the true proportion of voters
favoring candidate Smedley is between 48% and 54%,”
not anything involving probability. Probability is a technical term meaning
long-run relative frequency, and it cannot be haphazardly misused in the way
laypeople misuse it. |
|
|
55. |
We cannot prove H0. All we can do is judge
whether the evidence against it is “sufficient to reject” or “insufficient to
reject.” |
|
|
56. |
We can sometimes gather overwhelming
evidence that H0 can be
rejected in favor of Ha.
In the real world, even in a court of law, that is good enough. (Of course,
in the world of mathematics, that is not considered a proof—one of the
reasons that mathematicians and statisticians do not consider themselves to
be equivalent.) |
|
|
57.* |
[You’d better know this by
now!] |
|
|
58.* |
inferential, use statistics
to estimate parameters |
|
|
59. |
[I think everyone can do
this.] |
|
|
60. |
Always use the first one, never
the second. |
|
|
61. |
The first one (unequal
proportions) is for a 2-prop. z confidence interval, and the second one is usually for a 2-prop. z test. |
|
|
62. |
False: If there are matched
pairs, you really have only one sample (namely, a column of differences). |
|
|
63. |
Systematic departure from
randomness, i.e., a methodology that produces samples that are systematically
different from the population in a way that causes a parameter to be
systematically underestimated or overestimated. An SRS is not biased;
although an SRS often fails to match the population, the differences are random differences, not systematic differences. Systematic
means that there are flaws in the methodology. |
|
|
64. |
xbar is an unbiased estimator of m; i.e., E(xbar) = mxbar = m |
|
|
65. |
[I hope you have thought
about this. This is a personal matter, but what I do is first to decide
whether there are proportions involved or not. Then, do we have 1 sample,
matched pairs (also 1 sample), or 2 real samples? Or is this a c2
problem? And if so, are we comparing against fixed proportions (g.o.f.) or looking for differences across a 2-way table?] |
|
|
66-74. |
[See TI-83 STAT TESTS Summary.] |
|
|
75. |
“There is strong evidence
that ...” It is a good idea to list the test statistic, n or df, and the P-value in parentheses. Be sure to phrase the conclusion in the
context of the problem. |
|
|
76. |
“There is insufficient
evidence that ...” It is a good idea to list the test statistic, n or df, and
the P-value in parentheses. Be sure
to phrase the conclusion in the context of the problem. |
|
|
77. |
“We are XX% confident that the
true ... is between YY and ZZ.” Be sure to phrase the “...” in the context of
the problem, e.g., “true mean boiling point,” “true difference in voter
preference proportions,” “true mean improvement in test scores,” etc. |
|
|
78. |
Compute C.I. using TI-83.
Then punch upper–lower, i.e., VARS 5 TEST I – VARS 5 TEST H, divide result by
2 and STO into M (for m.o.e.). Your can then write
your C.I. as est. ± M. Depending on the problem, “est.” will be xbar, phat, xbar1 – xbar2,
or phat2 – phat2. |
|
|
79. |
[See AP formula sheet.] |
|
|
80. |
Since t = (b1 –
0)/s.e. = b1/sb1 in the LSRL t-test, sb1 = b1/t. |
|
|
81.* |
convenience, anecdotal;
voluntary response bias |
|
|
82.* |
No. [We talked about this
on the very first day of class and on numerous occasions since then. Please
provide an illustrative example.] |
|
|
83. |
No; q; yes. |
|
|
84. |
Simple random sample; a
sample in which every possible subset is equally likely to be selected. |
|
|
85. |
SRS, since bias can
invalidate the results quite easily. Normality of population is not an issue in
large samples (courtesy of CLT), since normality of the sampling distribution
rescues us. |
|
|
86. |
Marginal probabilities =
fractions involving row or column totals divided by grand total. Conditional probabilities
= fractions involving individuals cells divided by a row or column total.
Both are usually concerned with categorical data in 2-way tables. |
|
|
87.* |
Just because an effect is
not plausibly caused by chance alone does not mean that it is large enough to
be of any real-world significance. |
|
|
88.* |
[I think everyone knows
this. In fact, you probably knew it before you ever signed up for the
course.] |
|
|
89.* |
Only a controlled experiment
is considered convincing. In situations (e.g., smoking in humans) where it is
not ethical to run a controlled experiment, various types of observational
and correlative studies can suggest, but not prove, a cause-and-effect link. |
|
|
90.* |
[Everyone probably knows
this. Remember to discuss placebo effect and hidden bias.] |
|
|
91. |
Yes; perhaps many new
employees have been hired. |
|
|
92. |
Yes; the relative mix of
employee categories could be a lurking variable. Perhaps there are now
proportionally more employees in the higher-paid job categories, so that the
weighted average salary has increased even while each category has had cuts
in mean salaries. This would be an example of Simpson’s Paradox. |
|
|
93.* |
Using deceptive
(“gee-whiz”) graphs, changing the subject, confusing correlation with
causation, using inappropriate averages (e.g., mean with highly skewed
distributions), citing anecdotal data, using biased samples, concealing the
wording of a survey question, computing absurd precision with qualitative data
(e.g., “74% more beautiful skin!”), etc., etc. |
|
|
94.* |
Who says so? How do they
know? Did somebody change the subject? Is the result credible? (For example,
a claim that a child is kidnapped every 30 seconds in |
|
|
95. |
The last one. Statisticians
are mostly from mathematical or scientific backgrounds,
which means we are on a quest for truth. Our clients may mangle, misuse,
and abuse our conclusions, but we try very hard not to do that
ourselves. |
|
|
96. |
Nobody knows. The statement
is usually attributed to Mark Twain, although he himself credited it to
Benjamin Disraeli. |