Statistics

Statistics Assignment

Task 1

Answer 1: Mean, Mode, Variance and Standard Deviation

The following data represent business startup costs (thousands of dollars) for shops.

X1 = startup costs for pizza

X2 = startup costs for baker/donuts

X3 = startup costs for shoe stores

X4 = startup costs for gift shops

X5 = startup costs for pet stores

StatisticsX1X2X3X4X5
Mean 8392.090909172.38751.63
Mode 35#N/A#N/A10030
Median 80877097.549
Maxima140160125150110
Minima3540353520
Range 1051209011590
Variance 1165.1666671512.690909983.78888891289.111111733.05
Standard Deviation 34.1345377438.8933273131.3654091135.904193527.0749

Answer 2: For business type construct

  1. a) frequency and relative frequency distributions

X1 Business (Frequency distribution)

BinFrequency
250
503
752
1004
1253
1501
More0

X2 Business (Frequency distribution)

BinFrequency
250
502
752
1004
1251
1501
More1

X3 Business (Frequency distribution)

BinFrequency
250
504
752
1002
1252
1500
More0

X4 Business (Frequency distribution)

BinFrequency
250
503
751
1004
1251
1501
More0

X5 Business (Frequency distribution)

BinFrequency
253
506
754
1002
1251
1500
More0

  1. b) A relative frequency histogram

X1 business (relative frequency distribution)

BinRelative frequency
250
500.230769231
750.153846154
1000.307692308
1250.230769231
1500.076923077
More0

X2 business (relative frequency distribution)

BinRelative frequency
250
500.181818182
750.181818182
1000.363636364
1250.090909091
1500.090909091
More0.090909091

X3 business (relative frequency distribution)

BinRelative frequency
250
500.4
750.2
1000.2
1250.2
1500
More0

X4 business (relative frequency distribution)

BinRelative frequency
250
500.3
750.1
1000.4
1250.1
1500.1
More0

X5 business (relative frequency distribution)

BinRelative frequency
250.1875
500.375
750.25
1000.125
1250.0625
1500
More0

Answer 3: Discussing result obtained from parts (answer) 1 and 2

From the above calculation in Answer 1, it can be interpreted that X2 business is spending more in starting-up business as compared to the X5 business. From the above evaluation, it is also found that X2 has largest spread of values among the set of data of given businesses and this indicates that the mean value is not data representative.

However, large spread of data helps in interpreting that probably there may a large difference in individual scores which each business got. On the other hand, it is also determined that X3 and X5 business has low range of data representative that means that both business are less efficient to start a business. But in contrast to it, X2 business has low and high values comparatively as X2 business indicates that the mean value is widely spread.

At the same time, in Answer 2, it can be evaluated that frequency and relative frequency distributions in X2 business is also widely spread. From the above graphs, it can be easy to understand and identify that X2 distribution data is skewed to left as given data set value mostly are average i.e., some are small and large.

The change in the shape of data indicates that set of given data helped in analyzing the data in order to determine the average (mean) value. In similar manner, X5 business also has normally distributed data set.

Answer 4: Testing a significant difference in starting cost of business

Anova: Single Factor
SUMMARY
GroupsCountSumAverageVariance
Column 1131079831165.167
Column 211101392.090911512.691
Column 31072372.3983.7889
Column 410870871289.111
Column 51682651.625733.05
ANOVA
Source of VariationSSdfMSFP-valueF crit
Between Groups14298.2243574.5563.2463360.0183912.539689
Within Groups60560.76551101.105
Total74858.9859

From the above ANOVA table, it can be easy to test and interpret the significant difference in starting cost of the business. With the help of this table, it can be determined that F value is more significant than the P value which is less than 0.05. However, P- value helped in determining that this data have major difference in the starting cost of business.

So, it can be stated that this types of business faces a significant difference in the starting cost of business.

Task 2

All Greens Franchise                       

The data (X1, X2, X3, X4, X5, X6) are for each franchise store.

X1 = annual net sales/$1000

X2 = number sq. ft./1000

X3 = inventory/$1000

X4 = amount spent on advertizing/$1000

X5 = size of sales district/1000 families

X6 = number of competing stores in district

Answer 1: Ms- Excel output and estimated regression equation

SUMMARY OUTPUT
Regression Statistics
Multiple R0.996583914
R Square0.993179497
Adjusted R Square0.991555568
Standard Error17.64924165
Observations27
ANOVA
 dfSSMSFSignificance F
Regression5952538.9415190507.7883611.59036725.3973E-22
Residual216541.410344311.4957306
Total26959080.3519
 CoefficientsStandard Errort StatP-valueLower 95%Upper 95%Lower 95.0%Upper 95.0%
Intercept-18.859430.1502-0.62550.538-81.560243.8414-81.560243.8414
X Variable 116.20163.54444.57100.0008.830523.57268.830523.5726
X Variable 20.17460.05763.03150.0060.05480.29440.05480.2944
X Variable 311.52632.53214.55210.0006.260516.79216.260516.7921
X Variable 413.58031.77057.67050.0009.898417.26229.898417.2622
X Variable 5-5.31101.7054-3.11420.005-8.8576-1.7643-8.8576-1.7643

Regression Equation:

Y=m 1 X 1 + m 2 X 2 + m 3 X 3 + m 4 X 4+ m5 X 5 + C

Y= Annual sales

Y = 16.20*area+0.17* inventory + 11.53* advertising spending + 13.58*size of sales district + 5.31 * number of competing stores -18.86

Answer 2: Determining that how well model fit to the data

In order to determine that how well is model fit to the data, for that R-squared is used for measuring and this helps in determining the data are close to the fitted regression line.

In this measurement, if r-square is 0 % then it states that regression model demotes that there is none variability around the mean value. But, is it is found that R- square is around 100% then it indicates that there is variability near to mean value. In concern to it, the above calculated table shows that R-square value is 99.31% or 0.993179 which means that this model is well suited or fit with the data.

Answer 3: Testing the hypothesis (no significant relationship between the dependent and any independent variables)

The hypothesis test is conducted in order to determine the no significance difference between the dependent and independent variables. If P-value is less that 0.05 then it indicates that variables have a significant relationship and null hypothesis will get rejected. But is P-value is greater than 0.05 then null hypothesis accepted and both variables have no significance relationship between them.

Dependent and independent variables P-value Null Hypothesis

(Rejected or Accepted)

Annual sales and area0.000Rejected
Annual sales and inventory0.006Rejected
Annual sales and advertising spending0.000Rejected
Annual sales and size of sales district0.000Rejected
Annual sales and Competing Stores0.005Rejected

Answer 4: Interpret individual slope coefficients

Variable Slope Interpretation
Area16.2016The rate of change in the mean value of sales in respect to the area is 16.2016.
Inventory 0.1746The rate of change of mean value of sales with respect to inventory is 0.1746.
Advertising spending 11.5363The rate of change of mean value of sales with respect to advertising spending is 11.5363.
Size of sales district 13.5803The rate of change of the conditional mean of sales with respect to size of sales district is about 13.5803.
Number of competing stores -5.3110With respect to the rate of change of mean value of sales number, then competing stores is -5.3110

From the above stated table, it can be easily interpreted that the change in area of store is due to the change in the advertising spending and sales district size and also due to which there is maximum change in the sales observed and minimum impact on inventory of the firm.

Answer 5: Construct a 95% confidence interval for the slope coefficients of individual variables

Variable Lower 95.0%Upper 95.0%
Sales -81.560243.8414
Area8.830523.5726
Inventory0.05480.2944
Advertising spending6.260516.7921
Size (sales district)9.898417.2622
Competing stores-8.8576-1.7643

Answer 6: For determining the significance of individual variables, then there is need to test the estimated slope coefficients

For identifying the significance of individual variable, the hypothesis is tested by estimating the slope coefficient. In this, if t-value is absolute value and greater than critical value then null hypothesis is rejected. But is t-value is less than critical value then null hypothesis is accepted.

t-Test: Paired Two Sample for Means (Area)
 Sales Area
Mean286.57407413.32592593
Variance36887.705844.044301983
Observations2727
Pearson Correlation0.894092081
Hypothesized Mean Difference0
df26
t Stat7.735497275
P(T<=t) one-tail1.65162E-08
t Critical one-tail1.705617901
P(T<=t) two-tail3.30324E-08
t Critical two-tail2.055529418

t-Test: Paired Two Sample for Means (Inventory )
 sales inventory
Mean286.574074387.4815
Variance36887.705836545.11
Observations2727
Pearson Correlation0.94550363
Hypothesized Mean Difference0
df26
t Stat-8.28771958
P(T<=t) one-tail4.5308E-09
t Critical one-tail1.7056179
P(T<=t) two-tail9.0617E-09
t Critical two-tail2.05552942

t-Test: Paired Two Sample for Means (advertising spending)
 sales advertising spending
Mean286.57418.099999982
Variance36887.7114.24692313
Observations2727
Pearson Correlation0.914024
Hypothesized Mean Difference0
df26
t Stat7.671559
P(T<=t) one-tail1.92E-08
t Critical one-tail1.705618
P(T<=t) two-tail3.85E-08
t Critical two-tail2.055529

t-Test: Paired Two Sample for Means (size)
 sales size
Mean286.57419.692593
Variance36887.7126.41994
Observations2727
Pearson Correlation0.953683
Hypothesized Mean Difference0
df26
t Stat7.686851
P(T<=t) one-tail1.85E-08
t Critical one-tail1.705618
P(T<=t) two-tail3.71E-08
t Critical two-tail2.055529

t-Test: Paired Two Sample for Means (competing stories)
 sales No. of competing stores
Mean286.57417.740741
Variance36887.7123.96866
Observations2727
Pearson Correlation-0.91224
Hypothesized Mean Difference0
df26
t Stat7.371908
P(T<=t) one-tail3.96E-08
t Critical one-tail1.705618
P(T<=t) two-tail7.91E-08
t Critical two-tail2.055529
Variables t-statt-critical Accepted or Rejected Significance of individual variables
Area7.7354972752.055529RejectedStatistically significant
Inventory-8.287719582.055529AcceptedNot significant
Advertising spending7.6715592.055529RejectedStatistically significant
Size7.6868512.055529RejectedStatistically significant
Competing stores7.3719082.055529RejectedStatistically significant

However, this above stated table helped in determining that the value of individual variables i.e., inventory is not significant in this model.

Answer 7: Removing unnecessary significant variables and re-estimating the model by doing regression analysis.

SUMMARY OUTPUT
Regression Statistics
Multiple R0.995085241
R Square0.990194637
Adjusted R Square0.988411844
Standard Error20.67511795
Observations27
ANOVA
 dfSSMSFSignificance F
Regression4949676.2208237419.0552555.41752719.5799E-22
Residual229404.131054427.4605024
Total26959080.3519
 CoefficientsStandard Errort StatP-valueLower 95%Upper 95%Lower 95.0%Upper 95.0%
Intercept-39.4600220534.41055873-1.1467416830.263807827-110.82315331.90311-110.82331.90311
X Variable 120.4443.8148014075.3590959372.21824E-0512.532472928.355312.5324728.3553
X Variable 216.9662.0927876268.106958654.73185E-0812.625966921.3063212.6259721.30632
X Variable 315.6731.909855568.2063597983.85791E-0811.712163919.6337611.7121619.63376
X Variable 4-4.0433012841.936828415-2.0875887880.048629066-8.06003755-0.02657-8.06004-0.02657

Answer 8: Using the above model, for predicting annual sales for a franchise with floor area 1000sq ft:

Families in the area of operation = 5000

Competitors =2

Inventory= $150,000

Advertising expenses =$5,000

Equation: Annual Sales = Y=m 1 X 1 + m 2 X 2 + m 3 X 3 + m 4 X 4+ m5 X 5 + C

M1 = 20.44

M2 = 16.97

M3 = 15.67

M4 = 4.04

Intercept = 39.46

Annual Sales = 20.44*area + 16.97* advertising spending + 15.67*size of sales district + 4.04 * number of competing stores -39.46

= 20.44*1000+ 16.97*5000 + 15.67*5000+4.04*2 -39.46

= 20440+84850+ 78350+8.08-39.46

= 183609/$1000

= $ 183.609

Leave a Comment