Business

Business statistics

Task 1

1). Find mean, median, mode, range, variance and standard deviation separately for every type of business

StatisticsX1X2X3X4X5
Mean 83.0092.0972.3087.0051.63
Mode 35#N/A#N/A10030
Median 80877097.549
Standard Deviation 34.1338.8931.3735.9027.07
Maxima140160125150110
Minima3540353520
Variance 1165.1666671512.6909983.791289.111733.05
Range 1051209011590

2). Construct for every type of business:

a). A frequency & relative frequency distributions

Frequency Distributions For X1:

BinFrequencyRelative Frequency
0-3000.0000
31-6040.3077
61-9040.3077
91-12030.2308
121-15020.1538
More00.0000

Frequency Distributions For X2:

BinFrequencyRelative Frequency
0-3000
31-6030.272727
61-9040.363636
91-12020.181818
121-15010.090909
151-18010.090909
More00

 

Frequency Distributions For X3:

BinFrequencyRelative Frequency
0-3000.0
31-6040.4
61-9030.3
91-12020.2
121-15010.1
More00.0

Frequency Distributions For X4:

BinFrequencyRelative Frequency
0-3000.0
31-6030.3
61-9010.1
91-12050.5
121-15010.1
151-18000.0
More00.0

Frequency Distributions For X5:

BinFrequencyRelative Frequency
0-3060.375
31-6050.3125
61-9040.25
91-12010.0625
More00

b). Relative frequency histogram

Frequency Distributions For X1:

Frequency Distributions For X2:

Frequency Distributions For X3:

Frequency Distributions For X4:

Frequency Distributions For X5:

3). Discuss the results achieved in the 1st and 2nd part

From the 1st part, it is illustrated that business X2 signifies the highest average start-up ending whereas it is least for business X5. On the other hand, in the businesses the largest spread values among the data sets is of X2 which denotes that the means is not as representative of data. The reason behind it is that there are large variations among individual scores. At the same time, lower range businesses are X3 and X5 which shows mean is as representative of data. Despite this, Business X2 has high standard deviation and variance which further demonstrates that there is wide spread around the mean because the outliners have high or low values comparatively. Also, business X5 data set reflects less variability relative to its mean.

From the 2nd part, it is concluded by the frequency and relative frequency distributions that business X2 start-up costs are widely spread due to the outliners.  In business X2 the data’s distribution is skewed to left as majority of values are small with certain larger ones in the data set. Further, plot shape is skewed to the left as the outliners have changed the results of the data analysis in the data by impacting the value of mean. Yet, the data set for the business X5 is distributed normally because the data is around the means of data set.

4). Check whether there is a significant difference in the starting costs of these businesses.

Anova: Single Factor
SUMMARY
GroupsCountSumAverageVariance
Column 1131079831165.166667
Column 211101392.090911512.690909
Column 31072372.3983.7888889
Column 410870871289.111111
Column 51682651.625733.05
ANOVA
Source of VariationSSdfMSFP-valueF crit
Between Groups14298.2243574.5563.246336180.0183912.539689
Within Groups60560.76551101.105
Total74858.9859

From the table presented above, it can be elucidated that the critical value of F is less than F value as well as p value is < 0.05 which signifies the cut off for the significance. If the p value is < 0.05 then it is depicted that there is significant difference in the data set. For these types of businesses there is a significant difference.

Task 2

1). Represent the output obtained from MS Excel and also write the estimated equation of regression

SUMMARY OUTPUT
Regression Statistics
Multiple R0.996584
R Square0.993179
Adjusted R Square0.991556
Standard Error17.64924
Observations27
ANOVA
 dfSSMSFSignificance F
Regression5952538.9415190507.8611.59036725.4E-22
Residual216541.410344311.4957
Total26959080.3519
 CoefficientsStandard Errort StatP-valueLower 95%Upper 95%Lower 95.0%Upper 95.0%
Intercept-18.859430.1502-0.62550.538-81.560243.8414-81.560243.8414
X Variable 116.20163.54444.57100.0008.830523.57268.830523.5726
X Variable 20.17460.05763.03150.0060.05480.29440.05480.2944
X Variable 311.52632.53214.55210.0006.260516.79216.260516.7921
X Variable 413.58031.77057.67050.0009.898417.26229.898417.2622
X Variable 5-5.31101.7054-3.11420.005-8.8576-1.7643-8.8576-1.7643

 

Sales = 16.20*area+0.17* inventory + 11.53* advertising spending + 13.58*size of sales district + 5.31 * number of competing stores -18.86

2). Explain how the model fits the data so well?

The R-squared is measured to determine how well the model fits the data so well. It further demonstrates how data is so close to the fitted regression line. If the R-squared is 0% then it will demonstrate that regression model explains none of the variability of data around mean. Yet, if it is near 100% it shows that model explain variability of data around mean. It can be signified from the table above that value of R-squared is 99.31% or 0.9931 means the model fits the data better.

3). Test the hypothesis which says that there is no significant relationship between any of independent and dependent variables.

There is no significant relationship between the area and dependent (annual sales)

In relation to this hypothesis, the p-value is < 0.05 which signify that the means sample provides enough evidence which shows that for the entire population the null hypothesis can be rejected. So, null hypothesis can be rejected.

There is no significant relationship between the inventory and dependent (annual sales)

In this variable, the value of p is 0.006 which is < than 0.05 which illustrates that the null hypothesis is rejected.

There is no significant relationship between the advertising spending and dependent (annual sales)

In this variable, the value of p is 0.006 which is < 0.05 which illustrates that the null hypothesis is rejected.

There is no significant relationship between the size of sales district and dependent (annual sales)

In this variable, the value of p is 0.000 which is < 0.05 which illustrates that the null hypothesis is rejected.

There is no significant relationship between the number of competing stores and dependent (annual sales)

In this variable, the value of p is 0.005 which is < 0.05 which illustrates that the null hypothesis is rejected.

4). Interpret coefficients of individual slope

Variable Slope Interpretation
Area16.20The rate of change of the conditional mean of sales with respect to area is about 16.20.
Advertising spending11.53The rate of change of the conditional mean of sales with respect to advertising spending is about 11.53.
Inventory0.17The rate of change of the conditional mean of sales with respect to inventory is about 0.17.
Number of competing stores5.31The rate of change of the conditional mean of sales with respect to number of competing stores is about 5.31.
Size of sales district13.58The rate of change of the conditional mean of sales with respect to size of sales district is about 13.58.

From the table presented above, it can be demonstrated that in sales there is maximum change which is due to the variations in store area follow up by the size of district advertising spending and sales as well. Additionally, there is less effect of the inventory on the volume of sales for the firm.

5). Construct a 95% confidence interval for the slope coefficients of individual variables

Variable Lower 95.0%Upper 95.0%
Sales-81.560243.8414
Area8.830523.5726
Inventory0.05480.2944
Advertising spending6.260516.7921
Size of sales district9.898417.2622
Number of competing stores-8.8576-1.7643

 

6). Test the estimated slope coefficients for individual variables for significance

If the critical value of t is less than the absolute value, then the null hypothesis is not accepted. If the absolute value is larger than the critical value of t then the null hypothesis is accepted.

 Variables t-statt-critical Reject or accept Significance
Area7.73542.0555RejectedStatistically significant
Size of sales market7.68692.0555RejectedStatistically significant
Inventory-8.28772.0555AcceptedNot significant
No. of competing stores7.37192.0555RejectedStatistically significant
Adv, spending7.67162.0555RejectedStatistically significant

From the above table, it can be depicted that inventory is not significant variable in the model.

7). Re-estimate the model and also remove all insignificant variables

After removing inventory that is insignificant variable, the regression analysis is given as below:

SUMMARY OUTPUT
Regression Statistics
Multiple R0.9950852
R Square0.9901946
Adjusted R Square0.9884118
Standard Error20.675118
Observations27
ANOVA
 dfSSMSFSignificance F
Regression4949676.2208237419.1555.41752719.58E-22
Residual229404.131054427.4605
Total26959080.3519
 CoefficientsStandard Errort StatP-valueLower 95%Upper 95%Lower 95.0%Upper 95.0%
Intercept-39.4600234.41055873-1.146740.263807827-110.82331.90311-110.823152731.90310863
X Variable 120.4438873.8148014075.3590962.21824E-0512.5324728.355312.5324728628.35530058
X Variable 216.9661432.0927876268.1069594.73185E-0812.6259721.3063212.6259668721.30631862
X Variable 315.6729621.909855568.206363.85791E-0811.7121619.6337611.712163919.63375988
X Variable 4-4.0433011.936828415-2.087590.048629066-8.06004-0.02657-8.060037553-0.026565015

8). Predict annual sales for a franchisee with 1,000 sq ft floor area, by using the model from part (g), and $150,000 inventory, $5,000 spent on advertising, 5,000 families in the area of operation and 2 competitors.

  We can draw the below equation from question7,

Y=m 1 X 1 + m 2 X 2 + m 3 X 3 + m 4 X 4+ m5 X 5 + C

Sales = 20.44*area + 16.97* advertising spending + 15.67*size of sales district + 4.04 * number of competing stores -39.46

= 20.44*1000+ 16.97*5000 + 15.67*5000+4.04*2 -39.46

= 20440+84850+ 78350+8.08-39.46

= 183609

=183.69 /$1000

 

 

 

 

 

 

Leave a Comment