AP Statistics / Mr. Hansen |
|
Group
Project #1A: Exploratory Data Analysis of Your Own Data
Deadlines: Draft proposal is due, in writing, by start of class
Wednesday,
Ground Rules: At least 50 data points, with at least two variables for
each, are required—and that means net, after throwing out unusable data, so 60
are recommended. Although you may wish to focus on categorical data, at least
one of your variables must be quantitative. You may choose to look for
relationships (or lack thereof) among the variables, or you may compare the
distributions of a single variable across multiple populations. In general,
more data and more variables will make your project more interesting and will give
you a wider range of patterns you can unearth in your data. No animal
experimentation, no illegal or unethical activities, and no direct involvement
with human subjects (unless they give you a signed consent form) are permitted.
For example, you may not spray Form III students as they come out of class and
see if there is a linear correlation between their height and the number of
meters that they chase you.
Warning: Your milestones should strike a balance between being aggressive (so
that your group stays working) and realistic (so that you don’t overstress
yourselves). Choose a topic that is feasible within your time constraint, and
propose no more than what you can deliver.
Sample Proposal With
Milestone Chart:
"We propose to
interview 60 randomly selected STA students to see if there is a strong linear
correlation between shoe size and height. After allowing for difficulty in
locating people and having to discard some data points for clerical errors, we
anticipate having at least 50 valid ordered pairs. We will obtain subjects’
consent to be polled for this information and will keep the consent forms on
file for Mr. Hansen’s inspection. We will analyze shoe size and height
separately for skewness, normality, gaps, outliers,
etc., and we will describe the clusters and/or general trends that appear. We will
compare our overall results with the results of a Web search for the underlying
mechanism (e.g., there may be a linear correlation for people in certain age
brackets but other types of relationships for the population as a whole). We
will also state which variable is explanatory and which is response, and we
will note any likely lurking variables that we encounter."
F
9/27: Proposal and list of milestones finalized
M 9/30: Meeting after
school to decide who will purchase supplies, who will print fliers, etc.
W 10/2: Comments back from
Mr. Hansen / final logistic plans made after school
M 10/7: Preliminary data
gathering complete
W 10/9: Raw data
spreadsheet and first draft of report submitted to Mr. Hansen for review
T 10/15: Comments back
from Mr. Hansen (note delay because of 4-day weekend)
W 10/16: Second draft
circulated to all members for final markups
Th
10/17: Spell check, data cleanup, and final printout complete
F 10/18: Final project writeup submitted to Mr. Hansen
Warning: Use this only as an example. Please don’t parrot the
wording. In previous years, some groups lost points because they said they
would do all the things listed in the paragraph above, and then they ran out of
time and couldn’t complete them all. You may change the number of milestones and the dates for milestones as you see fit.
However, after your project
milestones are approved, any slippage of dates should be reported immediately
to your teacher. Do not appear suddenly on the due date and say, "We can’t
turn it in today--could we have another week?" A week’s delay may be
reasonable if you have kept me informed of legitimate stumbling blocks, but it
is not reasonable as a last-minute surprise.
You may meet with me at any time for informal
feedback on the progress of your project. However, if you would like formal
markups (i.e., a thorough critique of what is strong and what is weak in your
project), you must allow me 48 hours (not counting weekends or school holidays)
for turnaround time, so build that into your schedule. For example, something
submitted at 5:00 p.m. Friday would be returned to your group leader by 5:00
p.m. the following Tuesday. If I take longer than that, the additional slippage
will not penalize you in any way.
Warning: It is extremely difficult to pin people down on schedules, so take
that into account when setting up your meetings and milestones. Group leaders
may want to institute a system of sanctions (e.g., 5 points for a missed
meeting, 15 points for defaulting on a required section of the writeup, or whatever).
1. Be sure to allow enough time to gather your data.
(Estimate the time you think it should take, and double it. Data
collection is time-consuming.)
2. If you gather data by interviewing subjects, be sure
to record names so that you can go back later if necessary to get more
information. You may wish to start with a pilot study of a few friends so that
you can get a feeling for what variables are important. Then, if you can guess
at some of the lurking variables (e.g., age, year in school), you can ask those
questions up front to save time. (Later in the course we will call this process
blocking or block design. The purpose is to reduce variation in
the pools of subjects so that the effect we are looking for becomes more clear.)
3. If you gather opinion data, proceed cautiously.
Wording is tricky, and questions must be posed exactly the same way (preferably
in writing) to all subjects.
4. Categorical variables are OK, but if you use them,
they will require different types of visual aids (e.g., two-way tables and 100%
bar charts in addition to scatterplots). Remember
that you also need to have at least one quantitative variable for this project.
5. If you use a survey, questionnaire, or tabulation
sheet, be sure to include a blank copy in your final report.
6. I will ask to see your raw data spreadsheet early, as
well as in your final report. Microsoft Excel format is recommended.
1. Length is not important. Clarity, interest, and
relevance are. I tend to give higher scores to shorter reports with meat,
as opposed to longer reports with a lot of fluff.
2.
3. Be sure to include a printout of your raw data. An
Excel table is the most convenient way to do this, but the choice is yours. IMPORTANT
NOTE: If you use human subjects, their raw data should be identified in the
printout only by subject number, not by name. The only time I will see their
names is when you show me the signed consent sheets.
4. If your diagram tells the story, cite it in the text
but let the picture do most of the talking. Assume that your reader is a Scientific
American or Smithsonian reader; intelligent, though not necessarily
an expert in statistics.
5. Number your figures (Fig. 1, Fig. 2, etc.) and use a
consistent citation style for any external sources. The library has a guideline
on "Electronic Footnote Citations" on the east wall, underneath the
lunar phase chart.
1. Keep your group working productively.
Assign tasks, or resolve disputes if two people want the same task. It’s OK to
be laid-back if you wish, but be prepared to step in and take charge if things
are bogging down.
2. You are responsible for submitting the proposal and
milestone chart. You are also the person ultimately responsible for the quality
of the final product. That doesn’t mean you have to write everything yourself,
but it does mean that have to juggle other people’s schedules and make things
come together.
3. If people shirk their responsibilities, you may need
to use small sanctions (a few points here, a few points there) to encourage
them to do the right thing. Since 1998, only a few groups have had this
problem, so let’s hope we don’t run into it too often.
4. There are 300 points possible in a 3-person group: 30
for proposal, 30 for raw data, 10 for group leader
report (see below), 200 for final report, and 30 for timeliness/adherence to
milestone chart. (If you inform me early of any slippage, as you should,
penalties for missing milestones will usually be relatively light, but if I
decide your schedule has slipped excessively, you may lose more than just the
30 points reserved for timeliness.) The 200 final report points are subdivided
as follows: 50 for interest, 80 for technical accuracy, 35 for quality of
writing (including spelling and grammar), and 35 for format, style, and
neatness. Groups that have more or fewer than 3 members will have points
prorated accordingly.
5. As group leader you must prepare a short report (one
paragraph) justifying the point split you feel is correct for your group. Last
year many group leaders, though not all, opted for an even split. If your group
leader report is missing or inadequate, your personal score will be reduced by
10 points. The group leader report is not necessarily the final say, but in
most cases I will support any group leader’s decision provided it is based on the
work accomplished. (One year, a group tried to divert points from people who
already had a solid "A" average to help someone else raise his grade.
This is not permitted.) If your schedule slips excessively and you have not
taken steps as group leader to gain control of your members, you may lose more
points than they do.