Business

Statistics Assignment for Business Decisions

Task 1

1. Descriptive Statistics

The below tables show the descriptive statistics for all variables:

Income

 

Mean43.48
Standard Error2.057785614
Median42
Mode54
Standard Deviation14.55074162
Sample Variance211.7240816
Kurtosis-1.247719422
Skewness0.095855639
Range46
Minimum21
Maximum67
Sum2174
Count50

Household Size:

Household Size

 

Mean3.42
Standard Error0.245930138
Median3
Mode2
Standard Deviation1.738988681
Sample Variance3.024081633
Kurtosis-0.722808552
Skewness0.527895977
Range6
Minimum1
Maximum7
Sum171
Count50

Amount Charged:

Amount Charged

 

Mean3963.86
Standard Error132.023387
Median4090
Mode3890
Standard Deviation933.5463219
Sample Variance871508.7351
Kurtosis-0.742482171
Skewness-0.128860064
Range3814
Minimum1864
Maximum5678
Sum198193
Count50

Interpretation:

Based on descriptive statistics, average income level of the consumers is approx $43480, whereas average household size is between 3 and 4. In addition, average amount charged to the consumers is approx $3964 by the credit card users. Household size of sample is between 1 and 7.

At the same time, coefficient of variance is very high for income and amount charged by credit card users. In addition, values of kurtosis and skewness near to ±1 indicate consistency of the data distribution closed to the average of the data for each variable. The data for all variables shows acceptability for empirical use.

At the same time, correlation between variable can be determined as below:

Variables Correlation
Income-household  size17.25%
Income-Amount charged63.08%
Amount charged-household size75.28%

Above statistics show that there is a significant correlation between income & amount charged and household size & amount charged by credit card users.

2. Estimated regression equations

Annual income and credit card charges:

SUMMARY OUTPUT
Regression Statistics
Multiple R0.630780826
R Square0.39788445
Adjusted R Square0.385340376
Standard Error731.902474
Observations50
ANOVA
 dfSSMSFSignificance F
Regression116991228.9116991228.9131.718917739.10311E-07
Residual4825712699.11535681.2315
Total4942703928.02
 CoefficientsStandard Errort StatP-valueLower 95%Upper 95%Lower 95.0%Upper 95.0%
Intercept2204.240517329.13403066.6970908872.14344E-081542.4722072866.0091542.4722866.009
X Variable 140.469629327.1857159615.6319550549.10311E-0726.0217793154.9174826.0217854.91748

Regression equation:

y= mx+ c

y= 40.4696x + 2204.24

Here,

x = Annual income

y = Annual credit card charges

Household size and credit card charges:

SUMMARY OUTPUT
Regression Statistics
Multiple R0.752853835
R Square0.566788897
Adjusted R Square0.557763666
Standard Error620.8162594
Observations50
ANOVA
 dfSSMSFSignificance F
Regression124204112.2824204112.2862.800484372.86236E-10
Residual4818499815.74385412.8279
Total4942703928.02
 CoefficientsStandard Errort StatP-valueLower 95%Upper 95%Lower 95.0%Upper 95.0%
Intercept2581.644082195.26988613.220902281.287E-172189.0276692974.262189.0282974.26
X Variable 1404.156701350.999778227.9246756642.86236E-10301.6147764506.6986301.6148506.6986

Regression equation:

y= mx+ c

y= 404.156X + 2581.64

Here,

x = Household Size

y = Annual credit card charges

Regression analysis for both variables indicates that household size variable is better predictor of annual credit card charges as compared to income. It is because R2 for variable income is 0.3978 means approx 40% of the variation in amount charged can be explained by annual income.

Meanwhile, On the other hand, R2 for household size variable is approx 0.57 implies about 57% of the variation in amount charged can be explained by household size.

3. Estimated regression equations

SUMMARY OUTPUT
Regression Statistics
Multiple R0.908501824
R Square0.825375565
Adjusted R Square0.817944738
Standard Error398.3249315
Observations50
ANOVA
 dfSSMSFSignificance F
Regression235246778.7217623389.36111.07452281.54692E-18
Residual477457149.298158662.751
Total4942703928.02
 CoefficientsStandard Errort StatP-valueLower 95%Upper 95%Lower 95.0%Upper 95.0%
Intercept1305.033885197.7709886.5987124693.32392E-08907.16998251702.898907.171702.898
X Variable 133.121955393.9702374448.3425628457.88598E-1125.1348680141.1090425.1348741.10904
X Variable 2356.340203233.2203997910.726547713.17247E-14289.5093801423.171289.5094423.171

y=m1x1 + m2x2 + c

y= 33.12 x1 + 356.34 x2 + 1305.03

Where,

y= Amount Charged

x1 = Income

x2 = Household Size

The above regression analysis shows that R2 is 0.8253 indicating household size and income can explain about 82.53% of the variation in amount charged. From this, it can be determined that both variables together have low significance in comparison of the single variables.  Standard error for both variables together is less than the single variables indicating improvement in regression model.

4. Predicted annual credit card charge

y=m1x1 + m2x2 + c

y= 33.12 x1 + 356.34 x2 + 1305.03

y= 33.12 *40 + 356.34 *3 + 1305.03

y= 1324.8 + 1069.02 + 1305.03

y= $3698.85

The predicted annual credit card charge for a three-person household with an annual income of $40,000 is approx $3,699.

5. Other independent variables

Number of credit card could be added to the model as independent variable to determine its relationship with amount charged by credit card users. There may be a significant relationship between multiple cards and amount charged. Age and gender of the consumers can also be significant in determining amount charged.

It is because youngsters and female consumers are likely to purchase more as it can increase the amount charged by credit card users. In addition, purchasing options including online and offline modes can be considered to determine buying patterns of customers.

 Task 2

Activity1:

Activity 2:

(a) Histograms

(b) Descriptive Statistics

Descriptive StatsHI001 FINAL EXAMHI001 ASSIGNMENT 01HI001 ASSIGNMENT 02
Mean31.7217.2115.46
SD6.751.992.31
Min088
Max452221

 

Descriptive StatsHI002 FINAL EXAMHI002 ASSIGNMENT 01HI002 ASSIGNMENT 02
Mean26.5017.8212.42
SD5.913.441.99
Min044
Max402216

 

Descriptive StatsHI003 FINAL EXAMHI003 ASSIGNMENT 01HI003 ASSIGNMENT 02
Mean25.9918.1913.54
SD8.273.911.76
Min4108
Max433020

 Activity 3:

(a) Correlation

Variables Correlation
HI001 FINAL EXAM- HI001 ASSIGNMENT 019.26%
HI001 ASSIGNMENT 02- HI001 ASSIGNMENT 0165.94%
HI002 FINAL EXAM- HI001 ASSIGNMENT 02-3.74%
HI002 ASSIGNMENT 02- HI002 ASSIGNMENT 0154.90%
HI003 ASSIGNMENT 02- HI003 ASSIGNMENT 0151.98%
HI002 ASSIGNMENT 02- HI002 FINAL EXAM36.26%
HI003 FINAL EXAM- HI002 ASSIGNMENT 011.55%
HI003 ASSIGNMENT 01- HI002 FINAL EXAM-6.00%
HI003 FINAL EXAM- HI001 ASSIGNMENT 0123.17%
HI001 FINAL EXAM- HI003 FINAL EXAM12.19%
HI002 FINAL EXAM- HI001 FINAL EXAM4.92%

 b) Results:

Variables Correlation Positive/NegativeStrong/WeakSignificance value
HI001 FINAL EXAM- HI001 ASSIGNMENT 019.26%Positive WeakNot significant
HI001 ASSIGNMENT 02- HI001 ASSIGNMENT 0165.94%PositiveStrongsignificant
HI002 FINAL EXAM- HI001 ASSIGNMENT 02-3.74%NegativeWeakNot significant
HI002 ASSIGNMENT 02- HI002 ASSIGNMENT 0154.90%PositiveStrongsignificant
HI003 ASSIGNMENT 02- HI003 ASSIGNMENT 0151.98%PositiveStrongsignificant
HI002 ASSIGNMENT 02- HI002 FINAL EXAM36.26%PositiveWeakNot significant
HI003 FINAL EXAM- HI002 ASSIGNMENT 011.55%PositiveWeakNot significant
HI003 ASSIGNMENT 01- HI002 FINAL EXAM-6.00%Negative WeakNot significant
HI003 FINAL EXAM- HI001 ASSIGNMENT 0123.17%PositiveWeakNot significant
HI001 FINAL EXAM- HI003 FINAL EXAM12.19%PositiveWeakNot significant
HI002 FINAL EXAM- HI001 FINAL EXAM4.92%PositiveWeakNot significant

In the above table, significance value reveals that HI001 ASSIGNMENT 02- HI001 ASSIGNMENT 01, HI002 ASSIGNMENT 02- HI002 ASSIGNMENT 01 and HI003 ASSIGNMENT 02- HI003 ASSIGNMENT 01 have strong correlation to each other.

It means the students who received good marks in HI001 ASSIGNMENT 01, also achieved similar marks in HI001 ASSIGNMENT 02. This pattern is also followed in next assignments as well.

Task 3

1. Descriptive Statistics

 

Medical Study 1

GroupsCountSumAverageVariance
Florida201115.554.576316
New York2016084.842105
North Carolina201417.058.05

Medical Study 2
GroupsCountSumAverageVariance
Florida2029014.510.05263
New York2030515.2517.03947
North Carolina2027913.958.681579

From the above descriptive statistics, it can be interpreted that there is higher depression among normal individuals in New York in comparison of Florida and North Carolina.

In addition, individuals with a chronic health condition such as arthritis, hypertension, and/or heart ailment have similar depression level but high in all locations. But, people with chronic disease have higher depression in comparison of normal individuals.

2. Analysis of variance

Study 1:

Hypothesis Formulation:

H0: µ1=µ2=µ3

No difference in the mean depression score of healthy people in all three locations.

Ha: µ1≠µ2≠µ3                                                                                                                                                                                                                                                                       significant difference in the mean depression score of healthy people in all three locations.

Where,

µ1= the mean depression score of healthy people in Florida

µ2= the mean depression score of healthy people in New York

µ3= the mean depression score of healthy people in North Carolina

Rejection Rule: The null hypothesis is rejected if, p-value ≤0.05)

ANOVA Single Factor:

ANOVA
Source of VariationSSdfMSFP-valueF crit
Between Groups61.03333230.516666675.2408860.008143.158842719
Within Groups331.9575.822807018
Total392.933359

Interpretation:

Here, p value is less 0.05, so the null hypothesis is rejected. Therefore, the mean depression score of healthy people is significantly different in the three locations.

Study 2:

ANOVA Single Factor:

ANOVA
Source of VariationSSdfMSFP-valueF crit
Between Groups17.0333333328.5166670.7142120.4939063.158843
Within Groups679.75711.92456
Total696.733333359

 

Interpretation:

Here, p value is greater than 0.05 as the null hypothesis is accepted. The mean depression score of individuals with a chronic health condition is not significantly different in the three locations.

3. Conclusions

Based on the above analysis, it can be concluded that in test 1, the mean depression score related to locations because there are differences in score in each location. Individuals in New York possess high depression score as compared to other locations.

The mean depression score of individuals with chronic disease does not relate with locations as there is similarity in these scores for all locations.

References

Berenson, M., Levine, D., Szabat, K. A., & Krehbiel, T. C. (2012) Basic business statistics: Concepts and applications. Australia: Pearson higher education AU.

Heiko, A. (2012) Consensus measurement in Delphi studies: review and implications for future quality assurance. Technological forecasting and social change, 79(8), pp. 1525-1536.

Newbold, P., Carlson, W., & Thorne, B. (2012) Statistics for business and economics. UK: Pearson.

Siegel, A. (2016) Practical business statistics. UK: Academic Press.

Leave a Comment