AP Statistics / Mr. Hansen |
Name: _________________________ |
The Statistics of GPS Signal Detection
Supplement to Dr. Sullivan’s Presentation
One extremely useful fact
that Dr. Sullivan stated quickly was that noise is Gaussian. In other words, noise is normally distributed. For the values of Boltzmann’s constant and
the parameters (temperature and bandwidth) relevant to the problem, noise in nanovolts follows the N(0, 600) distribution. I think
that if he had said that, much of the confusion would have been reduced.
Strip away all the engineering words for a moment. You do know how to sketch a normal distribution that is centered at 0
with s.d. 600, don’t you? That is the basis the first
method of signal detection that Dr. Sullivan discussed. You have N(0, 600), and
after adding in a 70 nV GPS signal, you have N(70, 600). Clearly, there is a great
deal of overlap, making identification difficult. For example, what does 66 nV signify? It could easily have
come from either distribution, since 66 is in the “fat part” of both
distributions.
However, with a sample size of n =
4092, the s.d. becomes much more manageable. Using
the formula we learned last week, = 9.380 nV.
(Note: This is the s.d. of the sampling
distribution of the sample mean, not of the signal itself.)
Dr. Sullivan drew two normal sampling distributions on the board, with a
vertical line at 34.89 separating a region marked “NOT IN SYNC” on the left and
“SYNC” on the right. What this meant was that only .0001 of the area of the
left curve would be to the right of 34.89. [Calculator keystrokes: invNorm(.9999,0,600/Ö(4092)).] He assumed you understood this, since nobody
stopped him. In fact, this is well within the range of what you should have
been capable of understanding, although he did probably cover it too fast for
you to keep up.
I went to Dr. Sullivan at the end to obtain his figures for a and b under the second method. Although his figures differed from mine, I
think the values below are correct. Here are the loose ends that were unclear
or omitted because of time constraints:
Second method: Replace A/D converter with a signum fcn. [note: technically (signum
+ 1)/2] that returns 0 if voltage is < 0, 1 if voltage is ³ 0. For 4092 samples (1023 per ms, running for 4 ms)
of pure noise, binomial r.v. X = (tally of 0’s and 1’s) satisfies mX = np = 4092(.5) = 2046, sX = . [Both formulas are from the formula sheet near the end of
your book.]
If the 70 nV GPS signal is
added to the noise, then the output is no longer centered at 0 nV but rather at 70 nV, with the
same s.d. of 600 nV. Now P(voltage > 0)
is no longer .5 but rather normalcdf(0,99999,70,600) ≈ .5464, as calculated by Paul J. [Thank you, Paul. I
was starting to worry that nobody had remembered anything from the first
quarter.]
We could construct a new binomial r.v.
Y, defined as the tally of 0’s and
1’s for 4092 samples, except this time with the 70 nV
GPS signal added in. [The difference is that X assumed nothing but noise. Y
assumes the presence of a weak signal on top of the noise.] We get mY = np = 4092(.5464) = 2236 and sY = .
Since 4092 is a large number, binomial distribs. of X and Y are approx. normal. The cutoff value
for X such that 99.99% of the area
lies to the left is 2165. [Calculator keystrokes: invNorm(.9999,2046,31.984).]
In other words, if there is nothing but noise, the probability of obtaining X ³ 2165 through chance alone is less than .0001.
Therefore, if we obtain X ³ 2165, we can reasonably conclude that a GPS signal is
present. The number .0001 is what Dr. Sullivan called the a level of the test, or what we would call P(Type I error),
the “false positive” probability. The probability of concluding that a GPS
signal is present, given that nothing but noise is really there, is .0001.
On the other hand, the mere fact that the tally of the signum
fcn. is below 2165 does not
guarantee that no signal is present. Since Y
is approx. N(2236, 31.846), it is
possible through chance alone to obtain Y
< 2165 even if a GPS signal is present. That probability, what Dr. Sullivan
called the b level of the test, is
approx. normalcdf(–99999,2165,2236,31.846) = .0129. We would call this the
“false negative” probability, or P(Type II error).
In testing situations, we often face tradeoffs between Type I and Type II error
probabilities. Here, the penalty associated with a Type II error is minor,
since if the GPS receiver’s processor thinks no signal is present, it will
probably continue acquiring for a few more minutes before giving up altogether.
On the other hand, the penalty associated with Type I error is significant,
since it could lead the receiver to process bogus data after erroneously
concluding that a GPS signal was present. (One could imagine a jetliner using
GPS for navigation being sent horribly off course as a result.) For this reason,
it is important to set a extremely low, perhaps even lower than .0001. There is no free lunch,
though, since lowering a causes b to increase.
After all the students had left the room, Dr. Sullivan said he thought someone
would ask why we don’t simply increase n
as a way of reducing both a and b. In most AP-type problems, that is a standard strategy. In fact, if
someone asks, “How can the probability of Type I and Type II error both be reduced, without tradeoff?” the stock answer is,
“Increase the sample size.” However, in the case of GPS, taking more than about
4 ms of samples introduces additional problems with time slewing. Remember, the
satellites are constantly moving, which means that the signal pulses will be
Doppler-shifted depending on the motion of each satellite relative to the
ground-based GPS user. In plain English, that means that the blips will be
slightly closer together or farther apart as time passes, and the difference
from the start of the sample to the end of the sample can be large enough to
cause real problems.
The second method is cheaper (signum fcn. is cheaper than a full A/D converter), but comes with
a performance penalty. Here are the comparative figures:
Method 1 (comparing two sampling distributions of , the mean voltage level): a = .0001, b = .00009.
Method 2 (comparing two binomial distributions): a = .0001, b = .0129.
Bottom line: Method 2 is less expensive on the hardware side but will lead to
many more false negatives, i.e., failure to identify GPS signal when one is
present. As a result, systems designed using the second method will take
longer, on average, to achieve GPS signal lock.