AP Statistics / Mr. Hansen |
|
Group Project #1: Exploratory Data Analysis
Deadlines:
Proposal is due, in writing, by start of class on Monday, 10/2/00. In your proposal, you will describe what you intend to do and you will list approximately 6 to 12 milestones with projected dates of completion. All proposals are subject to approval by me. Allow about two weeks altogether. Slightly longer periods are acceptable if you have a particularly interesting topic to pursue, or if you intend to submit early versions to me for formal comment.Ground Rules: At least 50 data points, with at least two variables (or populations) for each, are required—and that means net, after throwing out unusable data. You may choose to look for relationships (or lack thereof) among the variables, or you may compare the distributions of a single variable across multiple populations. No animal experimentation, no illegal or unethical activities, and no direct involvement with human subjects (unless they give you a signed consent form) are permitted. For example, you may not spray Form III students as they come out of class and see if there is a linear correlation between their height and the number of meters that they chase you.
Warning: Your milestones should strike a balance between being aggressive (so that your group stays working) and realistic (so that you don’t overstress yourselves). Choose a topic that is feasible within your time constraint, and propose no more than what you can deliver.
Sample Proposal With Milestone Chart:
"We propose to interview 60 randomly selected STA students to see if there is a strong linear correlation between shoe size and height. After allowing for difficulty in locating people and having to discard some data points for clerical errors, we anticipate having at least 50 valid ordered pairs. We will obtain subjects’ consent to be polled for this information and will keep the consent forms on file for Mr. Hansen’s inspection. We will analyze shoe size and height separately for skewness, normality, gaps, outliers, etc., and we will describe the clusters and/or general trends that appear. We will compare our overall results with the results of a Web search for the underlying mechanism (e.g., there may be a linear correlation for people in certain age brackets but other types of relationships for the population as a whole). We will also state which variable is explanatory and which is response, and we will note any likely lurking variables that we encounter."
M 10/2: Proposal and list of milestones submitted
T 10/3: Meeting after school to decide who will purchase supplies, who will print fliers, etc.
W 10/4: Comments back from Mr. Hansen / final logistic plans made after school
M 10/9: Preliminary data gathering complete (note: no class today or tomorrow)
W 10/11: Raw data spreadsheet and first draft of report submitted to Mr. Hansen for review
F 10/13: Comments back from Mr. Hansen
M 10/16: Second draft circulated to all members for final markups
T 10/17: Spell check, data cleanup, and final printout complete
W 10/18: Project writeup submitted to Mr. Hansen
The list above is only an example; you may adjust your milestones and final due date as you see fit.. However, once your project milestones are approved, any slippage of dates should be reported immediately to your teacher. Do not appear suddenly on the due date and say, "We can’t turn it in today--could we have another week?" A week’s delay may be reasonable if you have kept me informed of legitimate stumbling blocks, but it is not reasonable as a last-minute surprise.
You may meet with me at any time for informal feedback on the progress of your project. However, if you would like formal markups (i.e., a thorough critique of what is strong and what is weak in your project), you must allow me 48 hours (not counting weekends or school holidays) for turnaround time, so build that into your schedule. For example, something submitted at 5:00 p.m. Friday would be returned to your group leader by 5:00 p.m. the following Tuesday. If I take longer than that, the additional slippage will not penalize you in any way.
Warning: It is extremely difficult to pin people down on schedules, so take that into account when setting up your meetings and milestones. Group leaders may want to institute a system of sanctions (e.g., 5 points for a missed meeting, 15 points for defaulting on a required section of the writeup, or whatever).
1.
If you get data from a printed source or from the Web, this process may take only a few minutes, but if you gather data by hand, be sure to allow enough time. (Estimate the time you think it should take, and double it. Data collection is time-consuming.)2. If you gather data by interviewing subjects, be sure to record names so that you can go back later if necessary to get more information. You may wish to start with a pilot study of a few friends so that you can get a feeling for what variables are important. Then, if you can guess at some of the lurking variables (e.g., age, year in school), you can ask those questions up front to save time. (Later in the course we will call this process blocking or block design. The purpose is to reduce variation in the pools of subjects so that the effect we are looking for becomes more clear.)
3. If you gather opinion data, proceed cautiously. Wording is tricky, and questions must be posed exactly the same way (preferably in writing) to all subjects.
4. Categorical variables are OK, but if you use them, they will require different types of visual aids (e.g., two-way tables and 100% bar charts in addition to scatterplots). For simplicity, you may wish to stick to quantitative variables for this project.
5. If you use a survey, questionnaire, or tabulation sheet, be sure to include a blank copy in your final report.
6. I will ask to see your raw data early, as well as in your final report. You can simplify our lives by e-mailing your data file to me as soon as you have it. Excel or Lotus 1-2-3 format are preferred but not required.
1.
Length is not important. Clarity, interest, and relevance are. I tend to give higher scores to shorter reports with meat, as opposed to longer reports with a lot of fluff.2. Two to three pages (plus attachments for raw data printouts, scatterplots, and questionnaires if used) should suffice.
3. Be sure to include your raw data. An Excel table is the most convenient way to do this, but the choice is yours. IMPORTANT NOTE: If you use human subjects, their raw data should be identified in the report only by subject number, not by name. The only time I will see their names is when you show me the signed consent sheets.
4. If your diagram tells the story, cite it in the text but let the picture do most of the talking. Assume that your reader is a Scientific American or Smithsonian reader; intelligent, though not necessarily an expert in statistics.
5. Number your figures (Fig. 1, Fig. 2, etc.) and use a consistent citation style for any external sources. The library has a guideline on "Electronic Footnote Citations" on the east wall, underneath the lunar phase chart.
1.
Keep your group working productively. Assign tasks, or resolve disputes if two people want the same task. It’s OK to be laid-back if you wish, but be prepared to step in and take charge if things are bogging down.2. You are responsible for submitting the proposal and milestone chart on Monday, 10/2. You are also the person ultimately responsible for the quality of the final product. That doesn’t mean you have to write everything yourself, but it does mean that have to juggle other people’s schedules and make things come together.
3. If people shirk their responsibilities, you may need to use small sanctions (a few points here, a few points there) to encourage them to do the right thing. Last year only a few groups had this problem, so let’s hope we don’t run into it too often.
4. There are 300 points possible: 30 for proposal, 30 for raw data, 10 for group leader report (see below), 200 for final report, and 30 for timeliness/adherence to milestone chart. (If you inform me early of any slippage, as you should, penalties for missing milestones will usually be relatively light, but if I decide your schedule has slipped excessively, you may lose more than just the 30 points reserved for timeliness.) The 200 final report points are subdivided as follows: 50 for interest, 80 for technical accuracy, 35 for quality of writing (including spelling and grammar), and 35 for format, style, and neatness.
5. As group leader you must prepare a short report (one paragraph) justifying the point split you feel is correct for your group. Last year many group leaders, though not all, opted for an even split. If your group leader report is missing or inadequate, your personal score will be reduced by 10 points. The group leader report is not necessarily the final say, but in most cases I will support any group leader’s decision provided it is based on merit, not need. (Two years ago, one group tried unsuccessfully to divert points from people who already had a solid "A" average to help someone else raise his grade.) If your schedule slips excessively and you have not taken steps as group leader to gain control of your members, you may lose more points than they do.