laptop

Business statistics

Task 1

1). Find mean, median, mode, range, variance and standard deviation separately for every type of business

Statistics X1 X2 X3 X4 X5
Mean 83.00 92.09 72.30 87.00 51.63
Mode 35 #N/A #N/A 100 30
Median 80 87 70 97.5 49
Standard Deviation 34.13 38.89 31.37 35.90 27.07
Maxima 140 160 125 150 110
Minima 35 40 35 35 20
Variance 1165.166667 1512.6909 983.79 1289.111 733.05
Range 105 120 90 115 90

2). Construct for every type of business:

a). A frequency & relative frequency distributions

Frequency Distributions For X1:

Bin Frequency Relative Frequency
0-30 0 0.0000
31-60 4 0.3077
61-90 4 0.3077
91-120 3 0.2308
121-150 2 0.1538
More 0 0.0000

Frequency Distributions For X2:

Bin Frequency Relative Frequency
0-30 0 0
31-60 3 0.272727
61-90 4 0.363636
91-120 2 0.181818
121-150 1 0.090909
151-180 1 0.090909
More 0 0

Frequency Distributions For X3:

Bin Frequency Relative Frequency
0-30 0 0.0
31-60 4 0.4
61-90 3 0.3
91-120 2 0.2
121-150 1 0.1
More 0 0.0

Frequency Distributions For X4:

Bin Frequency Relative Frequency
0-30 0 0.0
31-60 3 0.3
61-90 1 0.1
91-120 5 0.5
121-150 1 0.1
151-180 0 0.0
More 0 0.0

Frequency Distributions For X5:

Bin Frequency Relative Frequency
0-30 6 0.375
31-60 5 0.3125
61-90 4 0.25
91-120 1 0.0625
More 0 0

Get Assignment Help from Industry Expert Writers (1)

b). Relative frequency histogram

Frequency Distributions For X1:

Frequency Distributions For X2:

Frequency Distributions For X3:

Frequency Distributions For X4:

Frequency Distributions For X5:

3). Discuss the results achieved in the 1st and 2nd part

From the 1st part, it is illustrated that business X2 signifies the highest average start-up ending whereas it is least for business X5. On the other hand, in businesses, the largest spread value among the data sets is X2 which denotes that the means is not as representative of data. The reason behind it is that there are large variations among individual scores. At the same time, lower range businesses are X3 and X5 which shows mean is as representative of data. Despite this, Business X2 has a high standard deviation and variance which further demonstrates that there is widespread around the mean because the outliners have high or low values comparatively. Also, the business X5 data set reflects less variability relative to its mean.

From the 2nd part, it is concluded by the frequency and relative frequency distributions that business X2 start-up costs are widely spread due to the outliners.  In business X2 the data’s distribution is skewed to left as the majority of values are small with certain larger ones in the data set. Further, the plot shape is skewed to the left as the outliners have changed the results of the data analysis in the data by impacting the value of the mean. Yet, the data set for the business X5 is distributed normally because the data is around the means of the data set.

4). Check whether there is a significant difference in the starting costs of these businesses.

ANOVA: Single Factor
SUMMARY
Groups Count Sum Average Variance
Column 1 13 1079 83 1165.166667
Column 2 11 1013 92.09091 1512.690909
Column 3 10 723 72.3 983.7888889
Column 4 10 870 87 1289.111111
Column 5 16 826 51.625 733.05
ANOVA
Source of Variation SS df MS F P-value F crit
Between Groups 14298.22 4 3574.556 3.24633618 0.018391 2.539689
Within Groups 60560.76 55 1101.105
Total 74858.98 59

From the table presented above, it can be elucidated that the critical value of F is less than the F value as well as p-value is < 0.05 which signifies the cut off for the significance. If the p-value is < 0.05 then it is depicted that there is a significant difference in the data set. For these types of businesses, there is a significant difference.

Task 2

Get Assignment Help from Industry Expert Writers (1)

1). Represent the output obtained from MS Excel and also write the estimated equation of the regression

SUMMARY OUTPUT
Regression Statistics
Multiple R 0.996584
R Square 0.993179
Adjusted R Square 0.991556
Standard Error 17.64924
Observations 27
ANOVA
  df SS MS F Significance F
Regression 5 952538.9415 190507.8 611.5903672 5.4E-22
Residual 21 6541.410344 311.4957
Total 26 959080.3519
  Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept -18.8594 30.1502 -0.6255 0.538 -81.5602 43.8414 -81.5602 43.8414
X Variable 1 16.2016 3.5444 4.5710 0.000 8.8305 23.5726 8.8305 23.5726
X Variable 2 0.1746 0.0576 3.0315 0.006 0.0548 0.2944 0.0548 0.2944
X Variable 3 11.5263 2.5321 4.5521 0.000 6.2605 16.7921 6.2605 16.7921
X Variable 4 13.5803 1.7705 7.6705 0.000 9.8984 17.2622 9.8984 17.2622
X Variable 5 -5.3110 1.7054 -3.1142 0.005 -8.8576 -1.7643 -8.8576 -1.7643

Sales = 16.20*area+0.17* inventory + 11.53* advertising spending + 13.58*size of sales district + 5.31 * number of competing stores -18.86

2). Explain how the model fits the data so well?

The R-squared is measured to determine how well the model fits the data so well. It further demonstrates how data is so close to the fitted regression line. If the R-squared is 0% then it will demonstrate that the regression model explains none of the variability of data around means. Yet, if it is near 100% it shows that model explains the variability of data around the mean. It can be signified from the table above that value of R-squared is 99.31% or 0.9931 means the model fits the data better.

3). Test the hypothesis which says that there is no significant relationship between any of the independent and dependent variables.

There is no significant relationship between the area and dependent (annual sales)

In relation to this hypothesis, the p-value is < 0.05 which signify that the means sample provides enough evidence which shows that for the entire population the null hypothesis can be rejected. So, the null hypothesis can be rejected.

There is no significant relationship between the inventory and dependent (annual sales)

In this variable, the value of p is 0.006 which is < 0.05 which illustrates that the null hypothesis is rejected.

There is no significant relationship between the advertising spending and dependent (annual sales)

In this variable, the value of p is 0.006 which is < 0.05 which illustrates that the null hypothesis is rejected.

There is no significant relationship between the size of the sales district and dependent (annual sales)

In this variable, the value of p is 0.000 which is < 0.05 which illustrates that the null hypothesis is rejected.

There is no significant relationship between the number of competing stores and dependent (annual sales)

In this variable, the value of p is 0.005 which is < 0.05 which illustrates that the null hypothesis is rejected.

4). Interpret coefficients of individual slope

Variable Slope Interpretation
Area 16.20 The rate of change of the conditional mean of sales with respect to the area is about 16.20.
Advertising spending 11.53 The rate of change of the conditional mean of sales with respect to advertising spending is about 11.53.
Inventory 0.17 The rate of change of the conditional mean of sales with respect to inventory is about 0.17.
Number of competing stores 5.31 The rate of change of the conditional mean of sales with respect to a number of competing stores is about 5.31.
Size of sales district 13.58 The rate of change of the conditional mean of sales with respect to the size of the sales district is about 13.58.

From the table presented above, it can be demonstrated that in sales there is a maximum change which is due to the variations in-store area follow up by the size of district advertising spending and sales as well. Additionally, there is less effect of the inventory on the volume of sales for the firm.

5). Construct a 95% confidence interval for the slope coefficients of individual variables

Variable Lower 95.0% Upper 95.0%
Sales -81.5602 43.8414
Area 8.8305 23.5726
Inventory 0.0548 0.2944
Advertising spending 6.2605 16.7921
Size of sales district 9.8984 17.2622
Number of competing stores -8.8576 -1.7643

6). Test the estimated slope coefficients for individual variables for significance

If the critical value of t is less than the absolute value, then the null hypothesis is not accepted. If the absolute value is larger than the critical value of t then the null hypothesis is accepted.

 Variables t-stat t-critical Reject or accept Significance
Area 7.7354 2.0555 Rejected Statistically significant
Size of sales market 7.6869 2.0555 Rejected Statistically significant
Inventory -8.2877 2.0555 Accepted Not significant
No. of competing stores 7.3719 2.0555 Rejected Statistically significant
Adv, spending 7.6716 2.0555 Rejected Statistically significant

From the above table, it can be depicted that inventory is not a significant variable in the model.

7). Re-estimate the model and also remove all insignificant variables

After removing inventory that is an insignificant variable, the regression analysis is given as below:

SUMMARY OUTPUT
Regression Statistics
Multiple R 0.9950852
R Square 0.9901946
Adjusted R Square 0.9884118
Standard Error 20.675118
Observations 27
ANOVA
  df SS MS F Significance F
Regression 4 949676.2208 237419.1 555.4175271 9.58E-22
Residual 22 9404.131054 427.4605
Total 26 959080.3519
  Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept -39.46002 34.41055873 -1.14674 0.263807827 -110.823 31.90311 -110.8231527 31.90310863
X Variable 1 20.443887 3.814801407 5.359096 2.21824E-05 12.53247 28.3553 12.53247286 28.35530058
X Variable 2 16.966143 2.092787626 8.106959 4.73185E-08 12.62597 21.30632 12.62596687 21.30631862
X Variable 3 15.672962 1.90985556 8.20636 3.85791E-08 11.71216 19.63376 11.7121639 19.63375988
X Variable 4 -4.043301 1.936828415 -2.08759 0.048629066 -8.06004 -0.02657 -8.060037553 -0.026565015

8). Predict annual sales for a franchisee with 1,000 sq ft floor area, by using the model from part (g), and $150,000 inventory, $5,000 spent on advertising, 5,000 families in the area of operation and 2 competitors.

  We can draw the below equation from the question7,

Y=m 1 X 1 + m 2 X 2 + m 3 X 3 + m 4 X 4+ m5 X 5 + C

Sales = 20.44*area + 16.97* advertising spending + 15.67*size of sales district + 4.04 * number of competing stores -39.46

= 20.44*1000+ 16.97*5000 + 15.67*5000+4.04*2 -39.46

= 20440+84850+ 78350+8.08-39.46

= 183609

=183.69 /$1000

 

 

 

 

 

 

Leave a Comment