AP Statistics / Mr. Hansen
9/10/2003

 

Group Project #1: Exploratory Data Analysis

 

Deadlines: Proposal is due, in writing, by start of class Thursday, 9/11/2003. In your proposal, you will describe what you intend to do. In your milestone chart (due Friday, 9/12/2003), you will list several milestones with projected dates of completion. All proposals are subject to approval by me. Schedule your milestones so that you can turn in your final writeup on or before Wednesday, 9/24/2003.

Ground Rules: At least 50 data points are required—and that means net, after throwing out unusable data, so 60 are recommended. Although you may wish to focus on categorical data, at least one of your variables must be quantitative. You may choose to look for relationships (or lack thereof) in the data, or you may compare the distributions of a single variable across multiple populations. In general, more data and more variables will make your project more interesting and will give you a wider range of patterns you can unearth in your data. No animal experimentation, no illegal or unethical activities, and no direct involvement with human subjects (unless they give you a signed consent form) are permitted. For example, you may not spray Form III students as they come out of class and see if there is a linear correlation between their height and the number of meters that they chase you.

Warning: Your milestones should strike a balance between being aggressive (so that your group stays working) and realistic (so that you don’t overstress yourselves). Choose a topic that is feasible within your time constraint, and propose no more than what you can deliver.

Sample Proposal With Milestone Chart:

"We propose to interview 60 randomly selected STA students to see if there is a strong linear correlation between shoe size and height. After allowing for difficulty in locating people and having to discard some data points for clerical errors, we anticipate having at least 50 valid ordered pairs. We will obtain subjects’ consent to be polled for this information and will keep the consent forms on file for Mr. Hansen’s inspection. We will analyze shoe size and height separately for skewness, normality, gaps, outliers, etc., and we will describe the clusters and/or general trends that appear. We will compare our overall results with the results of a Web search for the underlying mechanism (e.g., there may be a linear correlation for people in certain age brackets but other types of relationships for the population as a whole). We will also state which variable is explanatory and which is response, and we will note any likely lurking variables that we encounter."

                Th 9/11: Proposal submitted in writing
                F 9/12: Revised proposal and milestones submitted in writing
                M 9/15: Meeting after school to decide who will purchase supplies, who will print fliers, etc.
                T 9/16: Meeting with Mr. Hansen after class; final logistic plans made after school
                Th 9/18: Preliminary data gathering complete
                F 9/19: Raw data spreadsheet and first draft of report submitted to Mr. Hansen for review
                T 9/23: Comments back from Mr. Hansen (we allow 2 full school days for turnaround)
                W 9/24: Spell check, data cleanup, and final printout complete; project submitted to Mr. Hansen

Warning: Use this only as an example. Please feel free to make extensive changes or to revamp the wording completely. In previous years, some groups lost points because they said they would do all the things listed in the paragraph above, and then they ran out of time and couldn’t complete them all. You may change the number of milestones and the dates for milestones as you see fit. However, after your project milestones are approved, any slippage of dates should be reported immediately to your teacher. Do not appear suddenly on the due date and say, "We can’t turn it in today--could we have another week?" A week’s delay may be reasonable if you have kept me informed of legitimate stumbling blocks, but it is not reasonable as a last-minute surprise.

You may meet with me at any time for informal feedback on the progress of your project. However, if you would like formal markups (i.e., a thorough critique of what is strong and what is weak in your project), you must allow me 48 hours (not counting weekends or school holidays) for turnaround time, so build that into your schedule. For example, something submitted at 5:00 p.m. Friday would be returned to your group leader by 5:00 p.m. the following Tuesday. If I take longer than that, the additional slippage will not penalize you in any way.

Warning: Because it is hard to pin people down on schedules, take that into account when setting up your meetings and milestones. Group leaders may want to institute a system of sanctions (e.g., 5 points for a missed meeting, 15 points for defaulting on a required section of the writeup, or whatever).

Data-Gathering Hints:

1. Be sure to allow enough time to gather your data. (Estimate the time you think it should take, and double it. Data collection is time-consuming.)

2. If you gather data by interviewing subjects, be sure to record names so that you can go back later if necessary to get more information. You may wish to start with a pilot study of a few friends so that you can get a feeling for what variables are important. Then, if you can guess at some of the lurking variables (e.g., age, year in school), you can ask those questions up front to save time. (Later in the course we will call this process blocking or block design. The purpose is to reduce variation in the pools of subjects so that the effect we are looking for becomes more clear.)

3. If you gather opinion data, proceed cautiously. Wording is tricky, and questions must be posed exactly the same way (preferably in writing) to all subjects.

4. Categorical variables are OK, but if you use them, they will require different types of visual aids (e.g., two-way tables and 100% bar charts, as opposed to histograms, boxplots, and scatterplots). Remember that you also need to have at least one quantitative variable for this project.

5. If you use a survey, questionnaire, consent form, or tabulation sheet, be sure to include these in your final report.

6. I will ask to see your raw data spreadsheet early, as well as in your final report. Microsoft Excel format is recommended. Remember, as we discussed in class on Wednesday 9/10, that each subject (“case”) should be on a separate row, and each variable should be in a separate column.

Report-Writing Hints:

1. Length is not important. Clarity, interest, and relevance are. I tend to give higher scores to shorter reports with meat, as opposed to longer reports with a lot of fluff.

2. Two to three pages (plus attachments for raw data printouts, scatterplots, histograms, and questionnaires if used) should suffice.

3. Be sure to include a printout of your raw data. An Excel table is the most convenient way to do this, but the choice is yours. IMPORTANT NOTE: If you use human subjects, their raw data should be identified in the printout only by subject number, not by name. The only time I will see their names is when you show me the signed consent sheets.

4. If your diagram tells the story, cite it in the text but let the picture do most of the talking. Assume that your reader is a Scientific American or Smithsonian reader; intelligent, though not necessarily an expert in statistics.

5. Number your figures (Fig. 1, Fig. 2, etc.) and use a consistent citation style for any external sources. Ask the librarians in the Ellison Library for assistance with electronic footnote citations if you use Web, CD-ROM, or DVD-ROM sources.

For Group Leaders Only:

1. Keep your group working productively. Assign tasks, or resolve disputes if two people want the same task. It’s OK to be laid-back if you wish, but be prepared to step in and take charge if things are bogging down.

2. You are responsible for submitting the proposal and milestone chart. You are also the person ultimately responsible for the quality of the final product. That doesn’t mean you have to write everything yourself, but it does mean that have to juggle other people’s schedules and make things come together.

3. If people shirk their responsibilities, you may need to use small sanctions (a few points here, a few points there) to encourage them to do the right thing. Since 1998, only a few groups have had this problem, so let’s hope we don’t run into it too often.

4. There are 300 points possible: 30 for proposal, 30 for raw data, 10 for group leader report (see below), 200 for final report, and 30 for timeliness/adherence to milestone chart. (If you inform me early of any slippage, as you should, penalties for missing milestones will be relatively light, but if I decide your schedule has slipped excessively, you may lose more than just the 30 points reserved for timeliness.) The 200 final report points are subdivided as follows: 50 for interest, 80 for technical accuracy, 35 for quality of writing (including spelling and grammar), and 35 for format, style, and neatness. Note: This means that the project is worth 150 points per students: a test and a half!

5. As group leader you must prepare a short report (one paragraph) justifying the point split you feel is correct for your group. Provide details of who did what, and to what extent. Merely writing “I feel we both worked equally on the project” is not sufficient. Last year many group leaders, though not all, opted for an even split. If your group leader report is missing or inadequate, your personal score will be reduced by 10 points. The group leader report is not necessarily the final say, but in most cases I will support any group leader’s decision provided it is based on the work accomplished. (One year, a group tried to divert points from people who already had a solid "A" average to help someone else raise his grade. This is not permitted.) If your schedule slips excessively and you have not taken steps as group leader to gain control of your members, you may lose more points than they do.