Introduction to Statistics: Coursework
Instructions
This coursework tests your basic statistical modelling skills, using spreadsheet software as well as your awareness of the reality of how probability calculations, estimation and regression work in practice. Your answers are to be presented in an essay/report format, for which you will use a word processor. In writing your report, please:
· state and explain all assumptions, on which your answers are based;
· clearly indicate your answer/recommendations
· support any answers with the appropriate calculations to arrive at the answer
· include selected printouts of formulae underlying computed values. Despite the fact that you will be submitting the Excel file as well, your report is a stand-alone document, meaning a reader should not be required to look at the Excel file to understand your analysis, findings and recommendations
· please note that adequate usage of the excel calculations in the report is important. This means that the key data/findings needs to be included in the report and appropriate referencing needs to be done, i.e. the relevant cell/table/range in the relevant tab of the excel file mentioned at the point of the report when it should be consulted.
The report will have a maximum of 6 pages (including any Appendixes; penalties will be applied for longer submissions – you are required to develop your judgement on what is and isn’t important). Ten percent of the total mark is allowed for quality of the presentation and these marks are distributed among the questions.
Deadline: The coursework is to be submitted on Moodle no later than 5.00 pm on Monday 7^{th} April 2014. You will need to submit a Word document with the report (see instructions above) and an Excel file with the calculations.
Notes:
This coursework is your own (individual) work. Any student found guilty of plagiarism will be penalised. Standard penalties for late submissions are applicable.
The table below represents data for the profits, sales, average shop size and number of product lines sold by the 20 branches of a retailing company. You have been asked to analyse the data, using the Data Analysis tool in Excel, and make recommendations, including the following:
a) Summarise the distribution of profits of the twenty branches and comment on the results, including identification of any particularly good or poorly performing branches.
b) Identify whether there is evidence that the average number of lines stocked per branch is significantly different from 150.
c) Identify whether there is a significant difference between the profits of two groups of branches, split by the level of sales, with the threshold being £600,000.
d) Based on this sample, provide a 98% confidence interval, and comment on the outcome, for the profits of the twenty branches.
Profit (£000s) |
Sales (£000s) |
Size (000s sq. ft.) |
Lines |
77.5 |
613.9 |
3.2 |
80 |
91 |
217.4 |
4.3 |
200 |
20.7 |
900.9 |
3.1 |
164 |
40.8 |
673.4 |
1.5 |
150 |
45.8 |
424.7 |
3.2 |
69 |
41.1 |
542.2 |
1.8 |
128 |
47.5 |
564.6 |
2.5 |
75 |
80.4 |
662.1 |
3.1 |
182 |
16.5 |
583.6 |
4.2 |
126 |
22.3 |
720.2 |
0.6 |
164 |
40.8 |
881.5 |
1.8 |
145 |
68.1 |
227.7 |
0.8 |
130 |
17.7 |
807.4 |
3.8 |
154 |
66.2 |
656.4 |
0.3 |
124 |
31.3 |
632.8 |
2.3 |
142 |
15 |
548.5 |
5 |
178 |
67.8 |
533.6 |
1.5 |
173 |
55 |
147.5 |
1.7 |
199 |
8.6 |
311.4 |
3.8 |
98 |
16.5 |
450.1 |
4.6 |
148 |
Question 2: (30%)
A statistician is trying to find whether there is a relationship between the number of hours of study and exam results, or whether exam results are random. The first part of his study was to generate random numbers of study hours and exam grades, which will be compared to the actual results once the exams are written. The data randomly generated is given in the table below:
Student |
QM Study Hours |
QM Exam Grade |
Acting. Study Hours |
Acting. Exam Grade |
1 |
25 |
80.0 |
45 |
96.0 |
2 |
5 |
8.0 |
3 |
29.0 |
3 |
18 |
16.0 |
14 |
46.0 |
4 |
29 |
87.0 |
49 |
32.5 |
5 |
17 |
95.0 |
11 |
61.0 |
6 |
39 |
14.0 |
19 |
61.0 |
7 |
49 |
9.5 |
43 |
12.0 |
8 |
25 |
10.0 |
44 |
46.5 |
9 |
6 |
35.0 |
0 |
95.0 |
10 |
22 |
58.0 |
38 |
66.0 |
11 |
37 |
8.0 |
38 |
24.0 |
12 |
31 |
89.0 |
50 |
62.0 |
13 |
18 |
57.0 |
17 |
60.5 |
14 |
45 |
22.0 |
33 |
10.0 |
15 |
5 |
39.5 |
33 |
61.0 |
16 |
4 |
90.0 |
42 |
36.0 |
17 |
17 |
23.0 |
45 |
39.0 |
18 |
29 |
55.0 |
34 |
86.0 |
19 |
16 |
74.0 |
15 |
87.0 |
20 |
22 |
29.0 |
29 |
100.0 |
21 |
28 |
69.5 |
49 |
76.0 |
22 |
6 |
27.0 |
39 |
50.5 |
23 |
21 |
12.0 |
31 |
55.0 |
24 |
4 |
82.0 |
14 |
67.5 |
25 |
24 |
26.0 |
10 |
94.0 |
26 |
12 |
13.0 |
17 |
33.0 |
27 |
4 |
23.0 |
34 |
85.0 |
28 |
38 |
24.5 |
45 |
33.0 |
29 |
31 |
77.0 |
3 |
19.0 |
30 |
5 |
71.0 |
44 |
38.0 |
Required:
a) Summarise the distribution of expected grades for both exams, according to the data given.
b) Construct a 95% confidence interval for the exam marks for each of the subjects. Is there a significant difference between them?
c) By constructing a regression model for each of the subjects, indicate for which does study hours have a higher impact on exam grades. Do you think this result is significant?
d) For the best regression model in the previous question, identify whether a better model can be developed by splitting the data into students who study more than 20 hours versus students that study less than 20 hours.
e) Without further calculations, discuss whether you believe these results will be replicated when data is collected for students that actually sat the exam.
Question 3: (30%)
You have been asked by the Association of Car Manufacturers to analyse the data they have collected, which is contained in the attached excel file called "Data File Q3 IF1202 CW14".
Required:
a) Prepare a summary table with the correlations between all the variables and discuss which variables are highly correlated and which are not.
b) Assuming you were going to conduct a multiple regression identify and justify:
1. Which variable you believe will be the dependent variable; and
2. Which independent variables you believe will be useful in a multiple regression.
c) Construct a multiple regression model with all independent variables and clearly indicate your regression equation;
d) Indicate and justify which variables are significant and non-significant in the regression model and compare with your answer o part b2) above;
e) Construct another multiple regression model including only the significant variable from the model in c) above and discuss whether it is a better model or not.
Subject | Business |
Due By (Pacific Time) | 04/07/2014 12:00 am |
Tutor | Rating |
---|---|
pallavi Chat Now! |
out of 1971 reviews More.. |
amosmm Chat Now! |
out of 766 reviews More.. |
PhyzKyd Chat Now! |
out of 1164 reviews More.. |
rajdeep77 Chat Now! |
out of 721 reviews More.. |
sctys Chat Now! |
out of 1600 reviews More.. |
sharadgreen Chat Now! |
out of 770 reviews More.. |
topnotcher Chat Now! |
out of 766 reviews More.. |
XXXIAO Chat Now! |
out of 680 reviews More.. |