AP Statistics / Mr. Hansen |
|
Group
Project #1: Exploratory Data Analysis
Deadlines: Proposal is due, in writing, by start of class Wednesday,
9/15/2004. In your proposal, you will describe what you intend to do. In your
milestone chart (due Friday, 9/17/2004), you will list several milestones with
projected dates of completion. All proposals are subject to approval by me. Try
to schedule your milestones so that you can turn in your final writeup on or before Friday, 10/1/2004. Longer projects may
be permitted if you can justify a need for more time in your milestone chart.
Ground Rules: At least 50 data points are required—and since that
means “net, after throwing out unusable data,” 60 are recommended. Although you
may wish to focus on categorical data, at least one of your variables must be
quantitative. You may choose to look for relationships (or lack thereof) in the
data, or you may compare the distributions of a single variable across multiple
populations. In general, more data and more variables will make your project
more interesting and will give you a wider range of patterns you can unearth in
your data. No animal experimentation, no illegal or unethical activities, and
no direct involvement with human subjects (unless they give you a signed
consent form) are permitted. For example, you may not spray Form III students
as they come out of class and see if there is a linear correlation between
their height and the number of meters that they chase you.
Warning: Your milestones should strike a balance between being aggressive (so
that your group stays working) and realistic (so that you don’t overstress
yourselves). Choose a topic that is feasible within your time constraint, and
propose no more than what you can deliver.
Sample Proposal With
Milestone Chart:
We propose to
interview 60 randomly selected STA students to see if there is a strong linear
correlation between shoe size and height. After allowing for difficulty in
locating people and having to discard some data points for clerical errors, we
anticipate having at least 50 valid ordered pairs. We will obtain subjects’
consent to be polled for this information and will keep the consent forms on
file for Mr. Hansen’s inspection. We will analyze shoe size and height
separately for skewness, normality, gaps, outliers,
etc., and we will describe the clusters and/or general trends that appear. We
will compare our overall results with the results of a Web search for the
underlying mechanism (e.g., there may be a linear correlation for people in
certain age brackets but other types of relationships for the population as a
whole). We will also state which variable is explanatory and which is response,
and we will note any likely lurking variables that we encounter.
W
9/15: Proposal submitted in writing
F 9/17: Revised proposal
and milestones submitted in writing
M 9/20: Meeting after
school to decide who will purchase supplies, who will print fliers, etc.
T 9/21: Meeting with Mr.
Hansen after class; final logistic plans made after school
Th
9/23: Preliminary data gathering complete
F 9/24: Raw data spreadsheet
and first draft of report submitted to Mr. Hansen for review
T 9/28: Comments back from
Mr. Hansen (we allow 2 full school days for turnaround)
W 9/29: Spell check, data
cleanup, and final printout complete; project submitted to Mr. Hansen
Warning: Use this only as an example. Please feel free to make
extensive changes or to revamp the wording completely. In previous years, some
groups lost points because they said they would do all the things listed in the
paragraph above, and then they ran out of time and couldn’t complete them all.
You may change the number of milestones and the dates for milestones as you see
fit. However, after your project milestones are approved, any slippage of dates
should be reported immediately. Do not appear suddenly on the due date and say,
“We can’t turn it in today--could we have another week?” A week’s delay may be
reasonable if you have kept me informed of legitimate stumbling blocks, but it
is not reasonable as a last-minute surprise.
You may meet with me at any time for informal
feedback on the progress of your project. However, if you would like formal
markups (i.e., a thorough critique of what is strong and what is weak in your
project), you must allow me 48 hours (not counting weekends or school holidays)
for turnaround time, so build that into your schedule. For example, something
submitted at 5:00 p.m. Friday would be returned to your group leader by 5:00
p.m. the following Tuesday. If I take longer than that, the additional slippage
will not penalize you in any way.
Warning: Because it is hard to pin people down on schedules, take that into
account when setting up your meetings and milestones. Group leaders may want to
institute a system of sanctions (e.g., 5 points for a missed meeting, 15 points
for defaulting on a required section of the writeup,
or whatever).
1. Be sure to allow enough time to gather your data.
(Estimate the time you think it should take, and double it. Data
collection is time-consuming.)
2. If you gather data by interviewing subjects, be sure
to record names so that you can go back later if necessary to get more
information. You may wish to start with a pilot study of a few friends so that
you can get a feeling for what variables are important. Then, if you can guess
at some of the lurking variables (e.g., age, year in school), you can ask those
questions up front to save time. (Later in the course we will call this process
blocking or block design. The purpose is to reduce variation in
the pools of subjects so that the effect we are looking for becomes more clear.)
3. If you gather opinion data, proceed cautiously.
Wording is tricky, and questions must be posed exactly the same way (preferably
in writing) to all subjects.
4. Categorical variables are OK, but if you use them,
they will require different types of visual aids (e.g., two-way tables and 100%
bar charts, as opposed to histograms, boxplots, and scatterplots). Remember that you also need to have at least
one quantitative variable for this project.
5. If you use a survey, questionnaire, consent form, or
tabulation sheet, be sure to include these in your final report.
6. I will ask to see your raw data spreadsheet early, as
well as in your final report. Microsoft Excel format is recommended. Each
subject (or “data point” if you are not using human subjects) should be on a
separate row, and each variable should be in a separate column.
1. Length is not important. Clarity, interest, and
relevance are. I tend to give higher scores to shorter reports with meat,
as opposed to longer reports with a lot of fluff.
2. About 2-3 pages of text should suffice. You will, of
course, have additional pages for attachments, figures, raw data printouts, scatterplots, histograms, and questionnaires if used. If
you put your figures “inline” with the text, which is preferred, then the
report will probably be longer than 3 pages.
3. Be sure to include a printout of your raw data. An
Excel table is the most convenient way to do this, but the choice is yours. IMPORTANT
NOTE: If you use human subjects, their raw data should be identified in the
printout only by subject number, not by name. The only time I will see their
names is when you show me the signed consent sheets.
4. If your diagram tells the story, cite it in the text
but let the picture do most of the talking. Assume that your reader is a Scientific
American or Smithsonian reader; intelligent, though not necessarily
an expert in statistics.
5. Number your figures (Fig. 1, Fig. 2, etc.) and use a
consistent citation style for any external sources. Ask the librarians in the
Ellison Library for assistance with electronic footnote citations if you use
Web, CD-ROM, or DVD-ROM sources.
1. Keep your group working productively.
Assign tasks, or resolve disputes if two people want the same task. It’s OK to
be laid-back if you wish, but be prepared to step in and take charge if things
are bogging down.
2. You are responsible for submitting the proposal and
milestone chart. You are also the person ultimately responsible for the quality
of the final product. That doesn’t mean you have to write everything yourself,
but it does mean that have to juggle other people’s schedules and make things
come together.
3. If people shirk their responsibilities, you may need
to use small sanctions (a few points here, a few points there) to encourage
them to do the right thing. Since 1998, only a few groups have had this
problem, so let’s hope we don’t run into it too often.
4. There are 300 points possible: 30 for proposal, 30 for
raw data, 10 for group leader report (see below), 200
for final report, and 30 for timeliness/adherence to milestone chart. (If you
inform me early of any slippage, as you should, penalties for missing
milestones will be relatively light, but if I decide your schedule has slipped
excessively, you may lose more than just the 30 points reserved for
timeliness.) The 200 final report points are subdivided as follows: 50 for
interest, 80 for technical accuracy, 35 for quality of writing (including
spelling and grammar), and 35 for format, style, and neatness. Note: This means that the project is
worth 150 points per students: a test and a half!
5. As group leader you must prepare a short report (one
paragraph) justifying the point split you feel is correct for your group. Provide details of who did what, and to
what extent. Merely writing “I feel we both worked equally on the project” is
not sufficient. Last year many group leaders, though not all, opted for an even
split. If your group leader report is missing or inadequate, your personal
score will be reduced by 10 points. The group leader report is not necessarily
the final say, but in most cases I will support any group leader’s decision
provided it is based on the work accomplished. (One year, a group tried to
divert points from people who already had a solid “A”
average to help someone else raise his grade. This is not permitted.) If your
schedule slips excessively and you have not taken steps as group leader to gain
control of your members, you may lose more points than they do.