Business statistics
Task 1
1). Find mean, median, mode, range, variance and standard deviation separately for every type of business
Statistics | X1 | X2 | X3 | X4 | X5 |
Mean | 83.00 | 92.09 | 72.30 | 87.00 | 51.63 |
Mode | 35 | #N/A | #N/A | 100 | 30 |
Median | 80 | 87 | 70 | 97.5 | 49 |
Standard Deviation | 34.13 | 38.89 | 31.37 | 35.90 | 27.07 |
Maxima | 140 | 160 | 125 | 150 | 110 |
Minima | 35 | 40 | 35 | 35 | 20 |
Variance | 1165.166667 | 1512.6909 | 983.79 | 1289.111 | 733.05 |
Range | 105 | 120 | 90 | 115 | 90 |
2). Construct for every type of business:
a). A frequency & relative frequency distributions
Frequency Distributions For X1:
Bin | Frequency | Relative Frequency |
0-30 | 0 | 0.0000 |
31-60 | 4 | 0.3077 |
61-90 | 4 | 0.3077 |
91-120 | 3 | 0.2308 |
121-150 | 2 | 0.1538 |
More | 0 | 0.0000 |
Frequency Distributions For X2:
Bin | Frequency | Relative Frequency |
0-30 | 0 | 0 |
31-60 | 3 | 0.272727 |
61-90 | 4 | 0.363636 |
91-120 | 2 | 0.181818 |
121-150 | 1 | 0.090909 |
151-180 | 1 | 0.090909 |
More | 0 | 0 |
Frequency Distributions For X3:
Bin | Frequency | Relative Frequency |
0-30 | 0 | 0.0 |
31-60 | 4 | 0.4 |
61-90 | 3 | 0.3 |
91-120 | 2 | 0.2 |
121-150 | 1 | 0.1 |
More | 0 | 0.0 |
Frequency Distributions For X4:
Bin | Frequency | Relative Frequency |
0-30 | 0 | 0.0 |
31-60 | 3 | 0.3 |
61-90 | 1 | 0.1 |
91-120 | 5 | 0.5 |
121-150 | 1 | 0.1 |
151-180 | 0 | 0.0 |
More | 0 | 0.0 |
Frequency Distributions For X5:
Bin | Frequency | Relative Frequency |
0-30 | 6 | 0.375 |
31-60 | 5 | 0.3125 |
61-90 | 4 | 0.25 |
91-120 | 1 | 0.0625 |
More | 0 | 0 |
b). Relative frequency histogram
Frequency Distributions For X1:
Frequency Distributions For X2:
Frequency Distributions For X3:
Frequency Distributions For X4:
Frequency Distributions For X5:
3). Discuss the results achieved in the 1st and 2nd part
From the 1st part, it is illustrated that business X2 signifies the highest average start-up ending whereas it is least for business X5. On the other hand, in businesses, the largest spread value among the data sets is X2 which denotes that the means is not as representative of data. The reason behind it is that there are large variations among individual scores. At the same time, lower range businesses are X3 and X5 which shows mean is as representative of data. Despite this, Business X2 has a high standard deviation and variance which further demonstrates that there is widespread around the mean because the outliners have high or low values comparatively. Also, the business X5 data set reflects less variability relative to its mean.
From the 2nd part, it is concluded by the frequency and relative frequency distributions that business X2 start-up costs are widely spread due to the outliners. In business X2 the data’s distribution is skewed to left as the majority of values are small with certain larger ones in the data set. Further, the plot shape is skewed to the left as the outliners have changed the results of the data analysis in the data by impacting the value of the mean. Yet, the data set for the business X5 is distributed normally because the data is around the means of the data set.
4). Check whether there is a significant difference in the starting costs of these businesses.
ANOVA: Single Factor | ||||||
SUMMARY | ||||||
Groups | Count | Sum | Average | Variance | ||
Column 1 | 13 | 1079 | 83 | 1165.166667 | ||
Column 2 | 11 | 1013 | 92.09091 | 1512.690909 | ||
Column 3 | 10 | 723 | 72.3 | 983.7888889 | ||
Column 4 | 10 | 870 | 87 | 1289.111111 | ||
Column 5 | 16 | 826 | 51.625 | 733.05 | ||
ANOVA | ||||||
Source of Variation | SS | df | MS | F | P-value | F crit |
Between Groups | 14298.22 | 4 | 3574.556 | 3.24633618 | 0.018391 | 2.539689 |
Within Groups | 60560.76 | 55 | 1101.105 | |||
Total | 74858.98 | 59 |
From the table presented above, it can be elucidated that the critical value of F is less than the F value as well as p-value is < 0.05 which signifies the cut off for the significance. If the p-value is < 0.05 then it is depicted that there is a significant difference in the data set. For these types of businesses, there is a significant difference.
Task 2
1). Represent the output obtained from MS Excel and also write the estimated equation of the regression
SUMMARY OUTPUT | ||||||||
Regression Statistics | ||||||||
Multiple R | 0.996584 | |||||||
R Square | 0.993179 | |||||||
Adjusted R Square | 0.991556 | |||||||
Standard Error | 17.64924 | |||||||
Observations | 27 | |||||||
ANOVA | ||||||||
df | SS | MS | F | Significance F | ||||
Regression | 5 | 952538.9415 | 190507.8 | 611.5903672 | 5.4E-22 | |||
Residual | 21 | 6541.410344 | 311.4957 | |||||
Total | 26 | 959080.3519 | ||||||
Coefficients | Standard Error | t Stat | P-value | Lower 95% | Upper 95% | Lower 95.0% | Upper 95.0% | |
Intercept | -18.8594 | 30.1502 | -0.6255 | 0.538 | -81.5602 | 43.8414 | -81.5602 | 43.8414 |
X Variable 1 | 16.2016 | 3.5444 | 4.5710 | 0.000 | 8.8305 | 23.5726 | 8.8305 | 23.5726 |
X Variable 2 | 0.1746 | 0.0576 | 3.0315 | 0.006 | 0.0548 | 0.2944 | 0.0548 | 0.2944 |
X Variable 3 | 11.5263 | 2.5321 | 4.5521 | 0.000 | 6.2605 | 16.7921 | 6.2605 | 16.7921 |
X Variable 4 | 13.5803 | 1.7705 | 7.6705 | 0.000 | 9.8984 | 17.2622 | 9.8984 | 17.2622 |
X Variable 5 | -5.3110 | 1.7054 | -3.1142 | 0.005 | -8.8576 | -1.7643 | -8.8576 | -1.7643 |
Sales = 16.20*area+0.17* inventory + 11.53* advertising spending + 13.58*size of sales district + 5.31 * number of competing stores -18.86
2). Explain how the model fits the data so well?
The R-squared is measured to determine how well the model fits the data so well. It further demonstrates how data is so close to the fitted regression line. If the R-squared is 0% then it will demonstrate that the regression model explains none of the variability of data around means. Yet, if it is near 100% it shows that model explains the variability of data around the mean. It can be signified from the table above that value of R-squared is 99.31% or 0.9931 means the model fits the data better.
3). Test the hypothesis which says that there is no significant relationship between any of the independent and dependent variables.
There is no significant relationship between the area and dependent (annual sales)
In relation to this hypothesis, the p-value is < 0.05 which signify that the means sample provides enough evidence which shows that for the entire population the null hypothesis can be rejected. So, the null hypothesis can be rejected.
There is no significant relationship between the inventory and dependent (annual sales)
In this variable, the value of p is 0.006 which is < 0.05 which illustrates that the null hypothesis is rejected.
There is no significant relationship between the advertising spending and dependent (annual sales)
In this variable, the value of p is 0.006 which is < 0.05 which illustrates that the null hypothesis is rejected.
There is no significant relationship between the size of the sales district and dependent (annual sales)
In this variable, the value of p is 0.000 which is < 0.05 which illustrates that the null hypothesis is rejected.
There is no significant relationship between the number of competing stores and dependent (annual sales)
In this variable, the value of p is 0.005 which is < 0.05 which illustrates that the null hypothesis is rejected.
4). Interpret coefficients of individual slope
Variable | Slope | Interpretation |
Area | 16.20 | The rate of change of the conditional mean of sales with respect to the area is about 16.20. |
Advertising spending | 11.53 | The rate of change of the conditional mean of sales with respect to advertising spending is about 11.53. |
Inventory | 0.17 | The rate of change of the conditional mean of sales with respect to inventory is about 0.17. |
Number of competing stores | 5.31 | The rate of change of the conditional mean of sales with respect to a number of competing stores is about 5.31. |
Size of sales district | 13.58 | The rate of change of the conditional mean of sales with respect to the size of the sales district is about 13.58. |
From the table presented above, it can be demonstrated that in sales there is a maximum change which is due to the variations in-store area follow up by the size of district advertising spending and sales as well. Additionally, there is less effect of the inventory on the volume of sales for the firm.
5). Construct a 95% confidence interval for the slope coefficients of individual variables
Variable | Lower 95.0% | Upper 95.0% |
Sales | -81.5602 | 43.8414 |
Area | 8.8305 | 23.5726 |
Inventory | 0.0548 | 0.2944 |
Advertising spending | 6.2605 | 16.7921 |
Size of sales district | 9.8984 | 17.2622 |
Number of competing stores | -8.8576 | -1.7643 |
6). Test the estimated slope coefficients for individual variables for significance
If the critical value of t is less than the absolute value, then the null hypothesis is not accepted. If the absolute value is larger than the critical value of t then the null hypothesis is accepted.
Variables | t-stat | t-critical | Reject or accept | Significance |
Area | 7.7354 | 2.0555 | Rejected | Statistically significant |
Size of sales market | 7.6869 | 2.0555 | Rejected | Statistically significant |
Inventory | -8.2877 | 2.0555 | Accepted | Not significant |
No. of competing stores | 7.3719 | 2.0555 | Rejected | Statistically significant |
Adv, spending | 7.6716 | 2.0555 | Rejected | Statistically significant |
From the above table, it can be depicted that inventory is not a significant variable in the model.
7). Re-estimate the model and also remove all insignificant variables
After removing inventory that is an insignificant variable, the regression analysis is given as below:
SUMMARY OUTPUT | ||||||||
Regression Statistics | ||||||||
Multiple R | 0.9950852 | |||||||
R Square | 0.9901946 | |||||||
Adjusted R Square | 0.9884118 | |||||||
Standard Error | 20.675118 | |||||||
Observations | 27 | |||||||
ANOVA | ||||||||
df | SS | MS | F | Significance F | ||||
Regression | 4 | 949676.2208 | 237419.1 | 555.4175271 | 9.58E-22 | |||
Residual | 22 | 9404.131054 | 427.4605 | |||||
Total | 26 | 959080.3519 | ||||||
Coefficients | Standard Error | t Stat | P-value | Lower 95% | Upper 95% | Lower 95.0% | Upper 95.0% | |
Intercept | -39.46002 | 34.41055873 | -1.14674 | 0.263807827 | -110.823 | 31.90311 | -110.8231527 | 31.90310863 |
X Variable 1 | 20.443887 | 3.814801407 | 5.359096 | 2.21824E-05 | 12.53247 | 28.3553 | 12.53247286 | 28.35530058 |
X Variable 2 | 16.966143 | 2.092787626 | 8.106959 | 4.73185E-08 | 12.62597 | 21.30632 | 12.62596687 | 21.30631862 |
X Variable 3 | 15.672962 | 1.90985556 | 8.20636 | 3.85791E-08 | 11.71216 | 19.63376 | 11.7121639 | 19.63375988 |
X Variable 4 | -4.043301 | 1.936828415 | -2.08759 | 0.048629066 | -8.06004 | -0.02657 | -8.060037553 | -0.026565015 |
8). Predict annual sales for a franchisee with 1,000 sq ft floor area, by using the model from part (g), and $150,000 inventory, $5,000 spent on advertising, 5,000 families in the area of operation and 2 competitors.
We can draw the below equation from the question7,
Y=m 1 X 1 + m 2 X 2 + m 3 X 3 + m 4 X 4+ m5 X 5 + C
Sales = 20.44*area + 16.97* advertising spending + 15.67*size of sales district + 4.04 * number of competing stores -39.46
= 20.44*1000+ 16.97*5000 + 15.67*5000+4.04*2 -39.46
= 20440+84850+ 78350+8.08-39.46
= 183609
=183.69 /$1000