AP Statistics / Mr. Hansen |
Name: ________________________________ |
Test #2, Oct. 28 and 29, 1998
Part I (40 points): Terminology.
Fill in the blanks. For questions 2-10, use one of the terms from the list. Questions 1-10 are worth 4 points each.
anecdotal evidence |
exploratory data analysis |
simulation |
bar graph / segmented bar graph |
exponential growth |
SRS |
bias |
factor |
statistic |
blind test / double-blind test |
level |
statistical inference (3 principles): |
blocking (block design) |
lurking variable(s) |
1. controlling the variables |
case |
matching / matched pairs |
2. randomization |
categorical variable |
parameter |
3. replication |
census |
placebo effect |
statistical significance |
common response |
prospective study |
strata |
confounding |
quantitative variable |
stratified random sample |
control group |
response (or nonresponse) bias |
subject |
correlation (strength and direction) |
response variable |
transformations to achieve linearity |
correlation vs. causation |
sampling |
two-way / three-way table |
experiment |
sampling distribution |
variability of sampling distribution |
explanatory variable |
Simpson’s paradox |
wording of questions |
1. The sampling distribution of a statistic is defined to be _______________________________
_______________________________________________________________________
_______________________________________________________________________ .
2. In a study purporting to show that cigarette smokers suffer higher mortality depending on the number of years they have smoked, mortality rate is the __________________ .
3-4. The values xbar [meaning x with a bar over it] and s are examples of __________________ , while µ and s are examples of __________________ .
5. When designing an experiment, we sometimes find it advantageous to let each location (or subject) serve as its own control. For example, in an experimental design for testing a new type of body lotion, we may apply the new lotion to one arm of the subject, and a placebo lotion to the other arm, with the choice of left or right arm determined randomly. This type of experimental design is called __________________ .
6-8. Because human subjects (as well as researchers) are easily swayed into seeing positive results from experimental treatments (a phenomenon known as the __________________ ), it is crucial that any scientifically valid experiment include a __________________ that receives no actual treatment. Neither the subjects nor the researchers who meet with the subjects should know who is getting a real treatment and who is getting a fake treatment; in other words, the experiment should be__________________ .
9. If three different lotions are tested at two doses each, and if subjects are stratified by gender, then gender and lotion are examples of experimental factors, and the dosages are examples of __________________ of one of the factors.
10. An interesting fact that most non-statisticians are completely unaware of is that the standard deviation of a __________________ depends on the sample size, not on the population size.
Part II (30 points): Regression Analysis.
11. If r2 = 0.818, if correlation is negative, and if sx = 0.28 and sy = 0.35, then the slope of the regression line when regressing Y on X is ____________ .
12. Consider changing problem 11 so that we regress X on Y.
(a) How does r change? _________________
(b) How does the slope change? (Compute the new value.)
13. Does weak linear correlation mean that there are no patterns in your data? Try to raise at least two issues that need to be considered.
14. Can a regression line be used for predicting yhat based on x whenever r2 is large, or must there be a true causal link between X and Y in order for the prediction to be valid? Explain. [Here, yhat means y with a ^ over it.]
Part III (30 points): Essays.
Questions 15-17 are worth 10 points each.
15. Recently the Bigg Tobacco Corporation (BTC) has been accused of targeting minors in its marketing campaigns. A consumer group gathered anonymous data revealing that the incidence of under-18 smoking (i.e., fraction of the under-18 population that smoke regularly) has increased over the past year from 13% to 15%, in towns where BTC has placed billboards. BTC, using the same raw data, countered with the following 2-way table showing how smoking incidence has actually dropped:
Smoking Incidence |
||
Last year |
This year |
|
Age 0-13 |
11% |
10% |
Above age 13 but below 18 |
18% |
16% |
Note: Assume that all the samples were large enough and random enough to make the figures shown in the table accurate.
(a) Make some segmented bar graphs ("100%" bars) comparing last year’s proportion of "smokers vs. nonsmokers" with this year’s proportion. (In other words, make two bars showing the change for ages 0-13, and two more bars showing the change for 13<age<18.) Use the data from the table above.
(b) Explain why your graphs (which BTC plans to use in its P.R. campaign defending itself) are misleading. What is the likely source of the discrepancy? Explain.
16. The Kookoo Cola Company (KCC) wishes to see if its new recipe is enjoyed by consumers. The company sets up a big "KCC" booth in a shopping mall, gives out free unidentified cola (the new recipe) to all those who stop at the booth, and asks them to verbally rate the taste on a scale of 0 to 10. Based on the outcome of this test—79% rating the taste as "7" or higher—the company decides to spend millions of dollars introducing this new product nationwide. Please criticize this decision mercilessly. (List as many objections as you can, including a short reason. The first one has been done for you as an example.)
1. bias due to voluntary selection—only passersby who are willing to stop will try the cola, and this might bias the sample against shy people, for example
17.(a) Describe how you would use Table B [the random digit table] to simulate the sampling distribution of coin flipping (10 flips per sample) to compute phat, the estimate of the probability of getting heads on a flip. [Here, phat means p with a ^ over it.]
17.(b) Starting at row 127 of Table B, compute at least 9 data points and plot them on a stemplot. To make grading easier, please mark the 10 flips on Table B that you used for each data point, and list the outcomes (fractions) here, too, in the order that they occur:
____ , ____ , ____ , ____ , ____ , ____ , ____ , ____ , ____ [do a few more if time permits]
ALTERNATE QUESTION 17 USED FOR SOME STUDENTS
(a) The Kryptonite College soccer team is so all-dominant in their league that the team decides in advance how many goals they will allow the other team to score in a match, and then simply makes sure to beat the other team by a few goals. The number of goals permitted to the other team is determined at random and is never more than 6. (In other words, the opponent’s goals can be 0, 1, 2, 3, 4, 5, or 6—each situation equally likely.) Briefly describe a simulation process to estimate the mean number of goals scored against Kryptonite College per soccer match, using samples of size 2.
(b) Carry out the process you described in part (a) and make a stemplot of the first 9 values you get for xbar in the sampling distribution. Start at row 141 of Table B, and circle the digits you use for each sample of size 2. For ease in grading, please also list your 9 means in order here: _____ , _____ , _____ , _____ , _____ , _____ , _____ , _____ , _____ .
(c) Your stemplot shows considerable variability. How large would your samples have to be (i.e., how many digits would you have to average at a time) to reduce the variability of the sampling distribution by a factor of 4? Sample size = ______________
[Note: We usually use standard deviation to measure this variability. We can answer (c) without calculating either the s.d. of the data (i.e., the distribution of goals scored) or the s.d. of the sampling distribution of xbar. Since the s.d. obeys an inverse-root relationship (think, for example, of the formula we learned later in the semester for binomial s.d., or for that matter, the CLT), we must increase sample size by a factor of 16 to reduce s.d. by a factor of 4.]