AP Statistics / Mr. Hansen
9/16/2002 (due 9/18/2002)

Name: _________________________
Project #1

Project #1: Exploratory Data Analysis

 

This assignment is due in final form at the start of class Wednesday, 9/18/2002.

The group leader is responsible for submitting the project (one copy, stapled neatly) plus a paragraph in which a point split is justified. There are 100 points available per group, which would be 50 per person if an equal division is warranted. If the group leader’s report is missing or does not adequately justify the split that is requested, the group leader will lose 5 points.

Note: The group leader’s report must contain some details about who did what. A mere report that “we both worked equally hard” will not qualify for the 5 points.

1.

Log in under one partner’s name, open the spreadsheet (FAKEDATA.xls), and save it to your personal area under \\BONEBOX\USERS on the network.

2.

Use the Excel on-line help facility to figure out how to set “worksheet titles” so that the top 3 rows remain fixed and do not scroll when you scroll the rest of the screen. Raise your hand when you have accomplished this task. Mr. Hansen’s initials: ______

3.

Create a formula in cell E4 that sums the two cells to the left (C4 and D4). Copy this formula, highlight to the end of the spreadsheet (SHIFT+CTRL+END), and paste.

4.

Highlight the cells in column E and use either F11 or the Chart Wizard to create a time series (a.k.a. time plot, to use your textbook’s term). Mr. Hansen’s initials: ______

5.

Change the format of the time series from bars to a line without markers. Mr. Hansen’s initials: ______

6.

The time series looks quite random, doesn’t it? Paste a second column onto the chart, formed by generating random values between 2 and 13. (The RAND function generates random values between 0 and 1.) Mr. Hansen’s initials: ______

Briefly describe any differences you see between the original time series and the random values:




7.

Use the FREQUENCY function and the Chart Wizard to create a histogram of column E. You will need to consult the on-line help for the FREQUENCY function in order to adapt the examples shown to your needs. Note that an “array formula” must be entered by highlighting an entire range and then pressing SHIFT+CTRL+ENTER (instead of the usual ENTER) to create the formula. Show Mr. Hansen your histogram before proceeding (initials: _____). AFTER YOUR HISTOGRAM HAS BEEN APPROVED, please make a printout.

8.

Use the terminology we discussed in class to describe the distribution of values in column E. Write your description directly on the printout you made in step 7.

9.

Click the tab for Sheet 2 of your workbook. On Sheet 2, create formulas to compute each of the statistics shown below for the data in column E. For example, you can use the MEDIAN function to find the median and other functions under the “Statistics” category to find the rest. Label your formulas in some neat, clear fashion. Mr. Hansen’s initials: ______

sample size with symbol: _____ = ____________________
five-number summary: ________________________
mean with symbol: _____ = ____________________
sample s.d. with symbol: _____ = ____________________
sample variance with symbol: _____ = ____________________
IQR = ____________________
range = ____________________

10.

Draw a modified boxplot of the data. Attach numeric labels to the outliers, Q1, M, and Q3. Do a rough sketch in the area below, and then recopy it neatly on the reverse side of your histogram at home or after class. Neatness counts.









11.

In cell F3 of Sheet 1, type the words “Day of Week.” Create a date serial number formula in cell F4 by typing the formula =DATE(2002,[entry from col. A],[entry from col. B]). Apply the custom format dddd to cell F4 before proceeding. Mr. Hansen’s initials: ______

12.

In cell G4, type the formula =MOD(F4,7) in order to generate a number between 0 and 6 that denotes the day of the week. If the MOD formula displays as a day of the week instead of a number, reset its cell format to General. In cell H4, type the formula =E4 in order to create a duplicate copy of the sleep totals. Now copy the formulas in cells F4, G4, and H4 all the way to the bottom of the data set. If you have forgotten how to do this, refer back to the instructions for step 3. Mr. Hansen’s initials: ______

13.

Replace all the cells in columns G and H with values. (Copy, followed by Edit / Paste Special / Values.) Mr. Hansen’s initials: ______

14.

Compute the mean trace, median trace, and IQR, by day of week, for the data in column H. You may find it useful to sort columns G and H. Summarize your findings in the table below.

Monday mean = ________, median = ________, IQR = ________
Tuesday mean = ________, median = ________, IQR = ________
Wednesday mean = ________, median = ________, IQR = ________
Thursday mean = ________, median = ________, IQR = ________
Friday mean = ________, median = ________, IQR = ________
Saturday mean = ________, median = ________, IQR = ________
Sunday mean = ________, median = ________, IQR = ________

15.

Perform “smoothing” on the original data in column E by computing the 7-day and 14-day moving averages. Plot these two time series on the same set of axes, and make a printout. Mr. Hansen’s initials: ______

16.

Write a paragraph or two (below, or on the reverse side of your smoothed plots) in which you describe the nature of the data. Discuss any trends or cyclical patterns that you observed. Refer to statistics in your writing. Clarity, spelling, and grammar count.