Statistics Assignment
Answer 1: Mean, Mode, Variance and Standard Deviation
The following data represent business startup costs (thousands of dollars) for shops.
X1 = startup costs for pizza
X2 = startup costs for baker/donuts
X3 = startup costs for shoe stores
X4 = startup costs for gift shops
X5 = startup costs for pet stores
Statistics | X1 | X2 | X3 | X4 | X5 |
Mean | 83 | 92.0909091 | 72.3 | 87 | 51.63 |
Mode | 35 | #N/A | #N/A | 100 | 30 |
Median | 80 | 87 | 70 | 97.5 | 49 |
Maxima | 140 | 160 | 125 | 150 | 110 |
Minima | 35 | 40 | 35 | 35 | 20 |
Range | 105 | 120 | 90 | 115 | 90 |
Variance | 1165.166667 | 1512.690909 | 983.7888889 | 1289.111111 | 733.05 |
Standard Deviation | 34.13453774 | 38.89332731 | 31.36540911 | 35.9041935 | 27.0749 |
Answer 2: For business type construct
X1 Business (Frequency distribution)
Bin | Frequency |
25 | 0 |
50 | 3 |
75 | 2 |
100 | 4 |
125 | 3 |
150 | 1 |
More | 0 |
X2 Business (Frequency distribution)
Bin | Frequency |
25 | 0 |
50 | 2 |
75 | 2 |
100 | 4 |
125 | 1 |
150 | 1 |
More | 1 |
X3 Business (Frequency distribution)
Bin | Frequency |
25 | 0 |
50 | 4 |
75 | 2 |
100 | 2 |
125 | 2 |
150 | 0 |
More | 0 |
X4 Business (Frequency distribution)
Bin | Frequency |
25 | 0 |
50 | 3 |
75 | 1 |
100 | 4 |
125 | 1 |
150 | 1 |
More | 0 |
X5 Business (Frequency distribution)
Bin | Frequency |
25 | 3 |
50 | 6 |
75 | 4 |
100 | 2 |
125 | 1 |
150 | 0 |
More | 0 |
X1 business (relative frequency distribution)
Bin | Relative frequency |
25 | 0 |
50 | 0.230769231 |
75 | 0.153846154 |
100 | 0.307692308 |
125 | 0.230769231 |
150 | 0.076923077 |
More | 0 |
X2 business (relative frequency distribution)
Bin | Relative frequency |
25 | 0 |
50 | 0.181818182 |
75 | 0.181818182 |
100 | 0.363636364 |
125 | 0.090909091 |
150 | 0.090909091 |
More | 0.090909091 |
X3 business (relative frequency distribution)
Bin | Relative frequency |
25 | 0 |
50 | 0.4 |
75 | 0.2 |
100 | 0.2 |
125 | 0.2 |
150 | 0 |
More | 0 |
X4 business (relative frequency distribution)
Bin | Relative frequency |
25 | 0 |
50 | 0.3 |
75 | 0.1 |
100 | 0.4 |
125 | 0.1 |
150 | 0.1 |
More | 0 |
X5 business (relative frequency distribution)
Bin | Relative frequency |
25 | 0.1875 |
50 | 0.375 |
75 | 0.25 |
100 | 0.125 |
125 | 0.0625 |
150 | 0 |
More | 0 |
Answer 3: Discussing result obtained from parts (answer) 1 and 2
From the above calculation in Answer 1, it can be interpreted that X2 business is spending more in starting-up business as compared to the X5 business. From the above evaluation, it is also found that X2 has largest spread of values among the set of data of given businesses and this indicates that the mean value is not data representative.
However, large spread of data helps in interpreting that probably there may a large difference in individual scores which each business got. On the other hand, it is also determined that X3 and X5 business has low range of data representative that means that both business are less efficient to start a business. But in contrast to it, X2 business has low and high values comparatively as X2 business indicates that the mean value is widely spread.
At the same time, in Answer 2, it can be evaluated that frequency and relative frequency distributions in X2 business is also widely spread. From the above graphs, it can be easy to understand and identify that X2 distribution data is skewed to left as given data set value mostly are average i.e., some are small and large.
The change in the shape of data indicates that set of given data helped in analyzing the data in order to determine the average (mean) value. In similar manner, X5 business also has normally distributed data set.
Answer 4: Testing a significant difference in starting cost of business
Anova: Single Factor | |||||||
SUMMARY | |||||||
Groups | Count | Sum | Average | Variance | |||
Column 1 | 13 | 1079 | 83 | 1165.167 | |||
Column 2 | 11 | 1013 | 92.09091 | 1512.691 | |||
Column 3 | 10 | 723 | 72.3 | 983.7889 | |||
Column 4 | 10 | 870 | 87 | 1289.111 | |||
Column 5 | 16 | 826 | 51.625 | 733.05 | |||
ANOVA | |||||||
Source of Variation | SS | df | MS | F | P-value | F crit | |
Between Groups | 14298.22 | 4 | 3574.556 | 3.246336 | 0.018391 | 2.539689 | |
Within Groups | 60560.76 | 55 | 1101.105 | ||||
Total | 74858.98 | 59 | |||||
From the above ANOVA table, it can be easy to test and interpret the significant difference in starting cost of the business. With the help of this table, it can be determined that F value is more significant than the P value which is less than 0.05. However, P- value helped in determining that this data have major difference in the starting cost of business.
So, it can be stated that this types of business faces a significant difference in the starting cost of business.
All Greens Franchise
The data (X1, X2, X3, X4, X5, X6) are for each franchise store.
X1 = annual net sales/$1000
X2 = number sq. ft./1000
X3 = inventory/$1000
X4 = amount spent on advertizing/$1000
X5 = size of sales district/1000 families
X6 = number of competing stores in district
Answer 1: Ms- Excel output and estimated regression equation
SUMMARY OUTPUT | |||||||||
Regression Statistics | |||||||||
Multiple R | 0.996583914 | ||||||||
R Square | 0.993179497 | ||||||||
Adjusted R Square | 0.991555568 | ||||||||
Standard Error | 17.64924165 | ||||||||
Observations | 27 | ||||||||
ANOVA | |||||||||
df | SS | MS | F | Significance F | |||||
Regression | 5 | 952538.9415 | 190507.7883 | 611.5903672 | 5.3973E-22 | ||||
Residual | 21 | 6541.410344 | 311.4957306 | ||||||
Total | 26 | 959080.3519 | |||||||
Coefficients | Standard Error | t Stat | P-value | Lower 95% | Upper 95% | Lower 95.0% | Upper 95.0% | ||
Intercept | -18.8594 | 30.1502 | -0.6255 | 0.538 | -81.5602 | 43.8414 | -81.5602 | 43.8414 | |
X Variable 1 | 16.2016 | 3.5444 | 4.5710 | 0.000 | 8.8305 | 23.5726 | 8.8305 | 23.5726 | |
X Variable 2 | 0.1746 | 0.0576 | 3.0315 | 0.006 | 0.0548 | 0.2944 | 0.0548 | 0.2944 | |
X Variable 3 | 11.5263 | 2.5321 | 4.5521 | 0.000 | 6.2605 | 16.7921 | 6.2605 | 16.7921 | |
X Variable 4 | 13.5803 | 1.7705 | 7.6705 | 0.000 | 9.8984 | 17.2622 | 9.8984 | 17.2622 | |
X Variable 5 | -5.3110 | 1.7054 | -3.1142 | 0.005 | -8.8576 | -1.7643 | -8.8576 | -1.7643 |
Regression Equation:
Y=m 1 X 1 + m 2 X 2 + m 3 X 3 + m 4 X 4+ m5 X 5 + C
Y= Annual sales
Y = 16.20*area+0.17* inventory + 11.53* advertising spending + 13.58*size of sales district + 5.31 * number of competing stores -18.86
Answer 2: Determining that how well model fit to the data
In order to determine that how well is model fit to the data, for that R-squared is used for measuring and this helps in determining the data are close to the fitted regression line.
In this measurement, if r-square is 0 % then it states that regression model demotes that there is none variability around the mean value. But, is it is found that R- square is around 100% then it indicates that there is variability near to mean value. In concern to it, the above calculated table shows that R-square value is 99.31% or 0.993179 which means that this model is well suited or fit with the data.
Answer 3: Testing the hypothesis (no significant relationship between the dependent and any independent variables)
The hypothesis test is conducted in order to determine the no significance difference between the dependent and independent variables. If P-value is less that 0.05 then it indicates that variables have a significant relationship and null hypothesis will get rejected. But is P-value is greater than 0.05 then null hypothesis accepted and both variables have no significance relationship between them.
Dependent and independent variables | P-value | Null Hypothesis (Rejected or Accepted) |
Annual sales and area | 0.000 | Rejected |
Annual sales and inventory | 0.006 | Rejected |
Annual sales and advertising spending | 0.000 | Rejected |
Annual sales and size of sales district | 0.000 | Rejected |
Annual sales and Competing Stores | 0.005 | Rejected |
Answer 4: Interpret individual slope coefficients
Variable | Slope | Interpretation |
Area | 16.2016 | The rate of change in the mean value of sales in respect to the area is 16.2016. |
Inventory | 0.1746 | The rate of change of mean value of sales with respect to inventory is 0.1746. |
Advertising spending | 11.5363 | The rate of change of mean value of sales with respect to advertising spending is 11.5363. |
Size of sales district | 13.5803 | The rate of change of the conditional mean of sales with respect to size of sales district is about 13.5803. |
Number of competing stores | -5.3110 | With respect to the rate of change of mean value of sales number, then competing stores is -5.3110 |
From the above stated table, it can be easily interpreted that the change in area of store is due to the change in the advertising spending and sales district size and also due to which there is maximum change in the sales observed and minimum impact on inventory of the firm.
Answer 5: Construct a 95% confidence interval for the slope coefficients of individual variables
Variable | Lower 95.0% | Upper 95.0% |
Sales | -81.5602 | 43.8414 |
Area | 8.8305 | 23.5726 |
Inventory | 0.0548 | 0.2944 |
Advertising spending | 6.2605 | 16.7921 |
Size (sales district) | 9.8984 | 17.2622 |
Competing stores | -8.8576 | -1.7643 |
Answer 6: For determining the significance of individual variables, then there is need to test the estimated slope coefficients
For identifying the significance of individual variable, the hypothesis is tested by estimating the slope coefficient. In this, if t-value is absolute value and greater than critical value then null hypothesis is rejected. But is t-value is less than critical value then null hypothesis is accepted.
t-Test: Paired Two Sample for Means (Area) | ||
Sales | Area | |
Mean | 286.5740741 | 3.32592593 |
Variance | 36887.70584 | 4.044301983 |
Observations | 27 | 27 |
Pearson Correlation | 0.894092081 | |
Hypothesized Mean Difference | 0 | |
df | 26 | |
t Stat | 7.735497275 | |
P(T<=t) one-tail | 1.65162E-08 | |
t Critical one-tail | 1.705617901 | |
P(T<=t) two-tail | 3.30324E-08 | |
t Critical two-tail | 2.055529418 |
t-Test: Paired Two Sample for Means (Inventory ) | |||
sales | inventory | ||
Mean | 286.574074 | 387.4815 | |
Variance | 36887.7058 | 36545.11 | |
Observations | 27 | 27 | |
Pearson Correlation | 0.94550363 | ||
Hypothesized Mean Difference | 0 | ||
df | 26 | ||
t Stat | -8.28771958 | ||
P(T<=t) one-tail | 4.5308E-09 | ||
t Critical one-tail | 1.7056179 | ||
P(T<=t) two-tail | 9.0617E-09 | ||
t Critical two-tail | 2.05552942 |
t-Test: Paired Two Sample for Means (advertising spending) | |||
sales | advertising spending | ||
Mean | 286.5741 | 8.099999982 | |
Variance | 36887.71 | 14.24692313 | |
Observations | 27 | 27 | |
Pearson Correlation | 0.914024 | ||
Hypothesized Mean Difference | 0 | ||
df | 26 | ||
t Stat | 7.671559 | ||
P(T<=t) one-tail | 1.92E-08 | ||
t Critical one-tail | 1.705618 | ||
P(T<=t) two-tail | 3.85E-08 | ||
t Critical two-tail | 2.055529 |
t-Test: Paired Two Sample for Means (size) | |||
sales | size | ||
Mean | 286.5741 | 9.692593 | |
Variance | 36887.71 | 26.41994 | |
Observations | 27 | 27 | |
Pearson Correlation | 0.953683 | ||
Hypothesized Mean Difference | 0 | ||
df | 26 | ||
t Stat | 7.686851 | ||
P(T<=t) one-tail | 1.85E-08 | ||
t Critical one-tail | 1.705618 | ||
P(T<=t) two-tail | 3.71E-08 | ||
t Critical two-tail | 2.055529 |
t-Test: Paired Two Sample for Means (competing stories) | |||||||||||
sales | No. of competing stores | ||||||||||
Mean | 286.5741 | 7.740741 | |||||||||
Variance | 36887.71 | 23.96866 | |||||||||
Observations | 27 | 27 | |||||||||
Pearson Correlation | -0.91224 | ||||||||||
Hypothesized Mean Difference | 0 | ||||||||||
df | 26 | ||||||||||
t Stat | 7.371908 | ||||||||||
P(T<=t) one-tail | 3.96E-08 | ||||||||||
t Critical one-tail | 1.705618 | ||||||||||
P(T<=t) two-tail | 7.91E-08 | ||||||||||
t Critical two-tail | 2.055529 | ||||||||||
Variables | t-stat | t-critical | Accepted or Rejected | Significance of individual variables | |||||||
Area | 7.735497275 | 2.055529 | Rejected | Statistically significant | |||||||
Inventory | -8.28771958 | 2.055529 | Accepted | Not significant | |||||||
Advertising spending | 7.671559 | 2.055529 | Rejected | Statistically significant | |||||||
Size | 7.686851 | 2.055529 | Rejected | Statistically significant | |||||||
Competing stores | 7.371908 | 2.055529 | Rejected | Statistically significant | |||||||
However, this above stated table helped in determining that the value of individual variables i.e., inventory is not significant in this model.
Answer 7: Removing unnecessary significant variables and re-estimating the model by doing regression analysis.
SUMMARY OUTPUT | ||||||||
Regression Statistics | ||||||||
Multiple R | 0.995085241 | |||||||
R Square | 0.990194637 | |||||||
Adjusted R Square | 0.988411844 | |||||||
Standard Error | 20.67511795 | |||||||
Observations | 27 | |||||||
ANOVA | ||||||||
df | SS | MS | F | Significance F | ||||
Regression | 4 | 949676.2208 | 237419.0552 | 555.4175271 | 9.5799E-22 | |||
Residual | 22 | 9404.131054 | 427.4605024 | |||||
Total | 26 | 959080.3519 | ||||||
Coefficients | Standard Error | t Stat | P-value | Lower 95% | Upper 95% | Lower 95.0% | Upper 95.0% | |
Intercept | -39.46002205 | 34.41055873 | -1.146741683 | 0.263807827 | -110.823153 | 31.90311 | -110.823 | 31.90311 |
X Variable 1 | 20.444 | 3.814801407 | 5.359095937 | 2.21824E-05 | 12.5324729 | 28.3553 | 12.53247 | 28.3553 |
X Variable 2 | 16.966 | 2.092787626 | 8.10695865 | 4.73185E-08 | 12.6259669 | 21.30632 | 12.62597 | 21.30632 |
X Variable 3 | 15.673 | 1.90985556 | 8.206359798 | 3.85791E-08 | 11.7121639 | 19.63376 | 11.71216 | 19.63376 |
X Variable 4 | -4.043301284 | 1.936828415 | -2.087588788 | 0.048629066 | -8.06003755 | -0.02657 | -8.06004 | -0.02657 |
Answer 8: Using the above model, for predicting annual sales for a franchise with floor area 1000sq ft:
Families in the area of operation = 5000
Competitors =2
Inventory= $150,000
Advertising expenses =$5,000
Equation: Annual Sales = Y=m 1 X 1 + m 2 X 2 + m 3 X 3 + m 4 X 4+ m5 X 5 + C
M1 = 20.44
M2 = 16.97
M3 = 15.67
M4 = 4.04
Intercept = 39.46
Annual Sales = 20.44*area + 16.97* advertising spending + 15.67*size of sales district + 4.04 * number of competing stores -39.46
= 20.44*1000+ 16.97*5000 + 15.67*5000+4.04*2 -39.46
= 20440+84850+ 78350+8.08-39.46
= 183609/$1000
= $ 183.609