AP Statistics / Mr. Hansen |
Name: _______________________ |
What are the hallmarks and differences?
The normal (z) distribution is a continuous distribution that arises in many
natural processes. "Continuous" means that between any two data
values we could (at least in theory) find another data value. For example,
men's heights vary continuously and are the result of so many tiny random
influences that the overall distribution of men's heights in
The bell-shaped normal curve has probabilities that are found as the area
between any two z values. You can use
either Table A in your textbook or the normalcdf
function on your calculator as a way of finding these normal probabilities.
Not all natural processes produce normal distributions. For example, incomes in
Here are some example problems. Make sure that you are familiar with BOTH
METHODS for solving each problem.
Example 1. |
What percentage of men are between 5'10" and
6'1" if men's heights in inches follow the N(69, 3) distribution? |
|
|
|
Method 1: The z score for 5'10" is .3333, and the
z score for 6'1" is 1.3333,
both by the z = (x-m)/s formula. By Table A, the area to the left of z = .3333 is about .63 (double-check me, please), and the area to
the left of z = 1.3333 is about
.909. Therefore, the area between z
= .3333 and z = 1.3333 is .909 –
.63, or approx. .28. Answer: 28%. |
|
|
Example 2. |
At what percentile for height is a man who is 5
feet, 4 and a half inches? |
|
|
|
Method 1: His z score is –1.5 since he is 1.5 s.d.'s below the mean. If you
can't do this in your head, use the formula |
|
|
Example 3. |
How tall must a man be to be at the 90th percentile
for height? |
|
|
|
Method 1: Look in the body of
Table A for an entry that is close to 90%. We find it (very closely) for z = 1.28. Use equation z = (x-m)/s to solve for x, the man's height. I will omit the
algebra, but please do this yourself. Answer: 72.84
inches. |
The central limit theorem (CLT) says that the sampling distribution of xbar will
approach a normal distribution, namely N(m, s/Ön), if the sample size is large. Thus we can use the z tables for many types of problems that
seemingly have nothing to do with normally distributed data, as long as the
sample size is large enough.
BINOMIAL DISTRIBUTION
A binomial distribution is very different from
a normal distribution, and yet if the sample size is large enough, the shapes
will be quite similar.
The key difference is that a binomial distribution is discrete, not continuous.
In other words, it is NOT possible to find a data value between any two data
values.
The requirements for a binomial distribution are
1) The r.v. of interest is the count of successes in n trials
2) The number of trials (or sample size), n,
is fixed
3) Trials are independent, with fixed value p
= P(success on a trial)
4) There are only two possible outcomes on each trial, called
"success" and "failure." (This is where the "bi"
prefix in "binomial" comes from. If there were several possible
outcomes, we would need to use a multinomial distribution to account for them,
but we don't study multinomial distributions in the beginning AP Statistics
course.)
Consider X = number of sixes when a
fair die is rolled 31 times.
Is X a binomial r.v.? Let us check...
1) X counts the number of successes
(sixes) in 31 trials. CHECK!
2) The sample size (31) is fixed. CHECK!
3) Trials are independent, with p = P(six) = 1/6, a
fixed value. CHECK!
4) There are only two possible outcomes on each trial. Either we get a six
(success), or we fail to get a six (failure). We say
p = 1/6 and q = 5/6. CHECK!
Since X is binomial, we say X follows the B(31, 1/6) distribution. Do you
see why X is discrete? X could equal 4, or 5, or 6, for
example, but there is no way that X
could ever equal 4.25 or 4.37. (Note, however, that the mean and s.d. of X could
have messy decimal values.)
You can find the relative frequency distribution for X by making a histogram as follows:
For the X = 0 bin, graph a bar of
height binompdf(31,1/6,0).
For the X = 1 bin, graph a bar of
height binompdf(31,1/6,1).
For the X = 2 bin, graph a bar of
height binompdf(31,1/6,2).
For the X = 3 bin, graph a bar of
height binompdf(31,1/6,3).
For the X = 4 bin, graph a bar of
height binompdf(31,1/6,4).
For the X = 5 bin, graph a bar of
height binompdf(31,1/6,5).
[And so on.] You really should do this at least once in your life. Each year, I
give a HW exercise to do something similar to this, though with a smaller n.
The fast way to get the histogram, and please do this now, is to punch in the
following keystrokes (note that seq means 2nd LIST
OPS 5):
seq(X,X,0,31,1)→L1
seq(binompdf(31,1/6,X),X,0,31,1)→L2
At this point, you can use STAT EDIT to read off the various probabilities. For
example, the probability of getting 0 sixes in 31 rolls is .00351. The
probability of getting 1 six in 31 rolls is .02177. The probability of getting
2 sixes in 31 rolls is .0653. I hope you are checking these numbers to make
sure they are correct.
Are you?
The shorthand notation we use when making a writeup
for other people to read is as follows:
P(X=0)
= .00351
P(X=1)
= .02177
P(X=2)
= .0653
[and so on].
Now enter the following keystrokes:
2nd STATPLOT 4 ENTER (same as PlotsOff)
2nd STATPLOT 1 On
Highlight the "histogram" (third icon), set Xlist
to L1, Freq to L2.
WINDOW Xmin=0, Xmax=31, Xscl=1, Ymin=0, Ymax=.3, Yscl=1, Xres=1
GRAPH
You should see a binomial distribution. It is "stairsteppy"—not smooth like a normal curve. And yet,
the shape is quite similar to the familiar normal shape. For large values of n, a binomial distribution is so close
to normal that we can use the z
(normal) curve as an approximation.
Our rules of thumb for knowing when the normal approximation to the binomial is
valid are as follows:
np must be
at least 10, AND
nq must be
at least 10.
In our example, nq
= 31(5/6) is certainly big enough, but np is not. Therefore, the normal approximation to the
binomial will not be very accurate in our example.
To find the mean and s.d. of X, you can punch
STAT CALC 1 L1,L2 ENTER
The mean is 5.167, and the s.d. is 2.075. Note that
you could also have found these by using the formula E(X) = mX = np = 31(1/6) = 5.167 for mean, and the formula s = Ö(npq) = Ö( (31) (1/6) (5/6) ) = 2.075 for standard deviation. When
finding these on a free-response problem, you should show those formulas and
then do the STAT CALC 1 L1,L2 as
a double-check if time permits.
Does it make sense that the expected value (a.k.a. mean) of X is 5.167? I think so, since in 31
rolls we would expect a little more than 5 to be sixes.
Does it make sense for the s.d. to be about 2? Yes;
since the shape is roughly normal, we can see from the histogram that most of
the time (at least 2/3 of the time), we get an answer of 5 plus or minus 2
(i.e., 3, 4, 5, 6, or 7). Note that you could not use this "empirical
rule" if the shape were distinctly non-normal.
Here are some more example problems.
Example 4. |
In 31 rolls, what is the probability of getting no
sixes? |
|
|
|
Solution: P(X=0)
= q31 = .00351. |
|
|
Example 5. |
In 31 rolls, what is the probability of getting at
least one six? |
|
|
|
Solution: P(X³1) = 1 – P(X<1) = 1 – P(X=0) = 1 – .00351 =
.9965. |
|
|
Example 6. |
In 31 rolls, what is the probability of getting at
least 5 sixes? |
|
|
|
Solution: P(X³5) = 1 – P(X<5) = 1 – P(X£4) = 1 – .39355 by calc. = .606. [Note: The value
.39355 for P(X£4) is obtained by punching binomcdf(31,1/6,4), but you cannot write binomcdf
on your paper.] |
|
|
Example 7. |
In 31 rolls, what is the probability of getting
exactly 2, 3, or 4 sixes? |
|
|
|
Solution: P(X=2,3,or4) = P(X=2) + P(X=3) + P(X=4)
= .065297... + .12624... + .1767... by calc. = .368.
[Be sure to round only at the very end. Dots signify additional accuracy
beyond the accuracy shown on paper. Answers were obtained by binompdf(31,1/6,2),
binompdf(31,1/6,3), and binompdf(31,1/6,4),
but you cannot show that.] |
|
|
Example 8. |
In 31 rolls, what is the probability of getting more
than 3 sixes but fewer than 10 sixes? |
|
|
|
Solution: P(3<X<10) = P(X£9) – P(X£3) = .97515... – .21681... by
calc. = .758. [Again we used binomcdf to find the
intermediate answers, but we cannot write binomcdf.] |
|
|
Example 9. |
In 31 rolls, what is the most likely number of
sixes? |
|
|
|
Solution: Look at lists L1
and L2. The greatest probability value is .19088, and that occurs
when X = 5. Answer: 5. |
|
|
Example 10. |
In 3.5 million rolls of a fair die, what is the
probability of getting somewhere between 583,000 and 584,000 sixes,
inclusive? |
|
|
|
Solution: Here the sample
size is so huge that (depending on the model of calculator you are using) you
may choke it if you try to enter binomcdf(3500000,1/6,584000) – binomcdf(3500000,1/6,582999).
Clearly, the normal approximation to the binomial is a much better method. |
|
|
Example 11. |
Suppose that 15% of the people in a city are
slippery. Explain why the count of slippery people in an SRS of 100 people
from this city is not binomial. |
|
|
|
Solution: An SRS is
sampling WITHOUT replacement, i.e., not independent trials. We must have
independent trials for the count X of slippery people to be a binomial r.v. However, if the city is "large" (by which
we mean that the population is at least 10 times the sample), the distinction
between SRS and independent trials can be ignored. By this rule of thumb, we
could use binomial methods if the city had at least 1000 people. [If the city
has at least 1000 people, note that since np = 100(.15) = 15 > 10, and
nq =
100(.85) = 85 > 10, we could also use the normal approximation to the
binomial if we so desired.] |
SUMMARY
Normal distributions are
continuous and have a special bell shape.
Binomial distributions are discrete ("stairsteppy");
they are close to normal only if the sample size satisfies np ³ 10 and nq ³ 10.
Normal distributions arise in three general areas:
1) Natural processes where the data value (e.g., height) is the result of many
small random inputs.
2) Sampling distribution of xbar, where either the underlying distribution is normal or
(more commonly) where the sample size is large enough for the CLT to take
effect. Rules of thumb are on p.606 of textbook.
3) Repeated measurement of a fixed phenomenon (e.g., the orbital period of
Mars, the mass of a moon rock, or the height of a mountain). Most phenomena
cannot be measured precisely—even if we have an accurate pan balance or laser
range finder or whatever, there will always be some uncertainty or error in our
measurement. For this reason, the normal distribution is sometimes called the
"error function." However, #3 is really just a special case of #1.
Binomial distributions arise whenever the r.v.
of interest is the count of successes in a fixed number (n) of independent trials. The four rules are listed near the
beginning of the “binomial distribution” section, before the second set of
example problems.