HI6007 Statistics Assignment for Business Decisions
The below table presents the descriptive statistics for all variables including income, household size and amount charged:
Descriptive Statistics | Income | Household Size | Amount Charged |
Mean | 43.48 | 3.42 | 3963.86 |
Standard Error | 2.057785614 | 0.245930138 | 132.023387 |
Median | 42 | 3 | 4090 |
Mode | 54 | 2 | 3890 |
Standard Deviation | 14.55074162 | 1.738988681 | 933.5463219 |
Sample Variance | 211.7240816 | 3.024081633 | 871508.7351 |
Kurtosis | -1.247719422 | -0.722808552 | -0.742482171 |
Skewness | 0.095855639 | 0.527895977 | -0.128860064 |
Range | 46 | 6 | 3814 |
Minimum | 21 | 1 | 1864 |
Maximum | 67 | 7 | 5678 |
Sum | 2174 | 171 | 198193 |
Count | 50 | 50 | 50 |
From the above descriptive statistics, it can be determined that a sample of 50 consumers is taken by Consumer Research, Inc. to determine the consumer characteristics to predict the amount charged by the credit card users. At the same time, it is found that the average income of people is approx $43480 and average household size is between 3 and 4 and average amount charged is $3964 by the credit users.
Moreover, coefficient of variation is very high as a high variance of amount is charged by the credit users. In addition, maximum amount charged is $5678 and minimum amount charged is $1864.
Negative or less positive value of Kurtosis indicates that distribution the data is closed to the mean value (Tang and Zhang, 2013). It means that a distribution of the data is not consistent with the mean. A kurtosis value of +/-1 is the most suitable for the empirical utilization. Skewness shows deviation of the values from the mean as acceptable value of Skewness is +/-1.
From the table, it can be evaluated that values of Skewness for all variables were close to +/-1 as all the variables passed the acceptability level for the empirical use.
The below table shows correlation between the variables:
Correlation | Income | Household size | Amount charged |
Income | 1 | ||
Household size | 0.172533 | 1 | |
Amount charged | 0.630781 | 0.752853835 | 1 |
There is a significant correlation between amount charged and income and between amount charged and household size. But, the correlation between household size and amount charged is more than the correlation between income and amount charged.
- Develop estimated regression equations, first using annual income as the in- dependent variable and then using household size as the independent variable. Which variable is the better predictor of annual credit card charges? Discuss your findings.
Regression analysis for annual income and credit card charges:
SUMMARY OUTPUT | ||||||||
Regression Statistics | ||||||||
Multiple R | 0.630780826 | |||||||
R Square | 0.39788445 | |||||||
Adjusted R Square | 0.385340376 | |||||||
Standard Error | 731.902474 | |||||||
Observations | 50 | |||||||
ANOVA | ||||||||
df | SS | MS | F | Significance F | ||||
Regression | 1 | 16991228.91 | 16991228.91 | 31.71891773 | 9.10311E-07 | |||
Residual | 48 | 25712699.11 | 535681.2315 | |||||
Total | 49 | 42703928.02 | ||||||
Coefficients | Standard Error | t Stat | P-value | Lower 95% | Upper 95% | Lower 95.0% | Upper 95.0% | |
Intercept | 2204.240517 | 329.1340306 | 6.697090887 | 2.14344E-08 | 1542.472207 | 2866.009 | 1542.472 | 2866.009 |
X Variable 1 | 40.46962932 | 7.185715961 | 5.631955054 | 9.10311E-07 | 26.02177931 | 54.91748 | 26.02178 | 54.91748 |
Regression equation:
Y= m X+ c
Y= 40.47X + 2204
Here,
X = Annual income
Y = Annual credit card charges
Regression analysis for household size and credit card charges:
SUMMARY OUTPUT | ||||||||
Regression Statistics | ||||||||
Multiple R | 0.752853835 | |||||||
R Square | 0.566788897 | |||||||
Adjusted R Square | 0.557763666 | |||||||
Standard Error | 620.8162594 | |||||||
Observations | 50 | |||||||
ANOVA | ||||||||
df | SS | MS | F | Significance F | ||||
Regression | 1 | 24204112.28 | 24204112.28 | 62.80048437 | 2.86236E-10 | |||
Residual | 48 | 18499815.74 | 385412.8279 | |||||
Total | 49 | 42703928.02 | ||||||
Coefficients | Standard Error | t Stat | P-value | Lower 95% | Upper 95% | Lower 95.0% | Upper 95.0% | |
Intercept | 2581.644082 | 195.269886 | 13.22090228 | 1.287E-17 | 2189.027669 | 2974.26 | 2189.028 | 2974.26 |
X Variable 1 | 404.1567013 | 50.99977822 | 7.924675664 | 2.86236E-10 | 301.6147764 | 506.6986 | 301.6148 | 506.6986 |
Regression equation:
Y= m X+ c
Y= 404.16X + 2581.64
Here,
X = Household Size
Y = Annual credit card charges
From the above regression analysis, it can be interpreted that p-value is 9.1 and R2 is approx 0.40 for income as the independent variable.
It means about 40% of the variation in amount charged can be explained by annual income. On the other hand, p-value is 2.86 and R2 is approx 0.57 for household size as independent variable. It means household size explains about 57% of the variation in Amount Charged.
Household size variable is the better predictor of annual credit card charges than income because it explains about 57% of the variation in Amount Charged, against only about 40% for Annual Income.
- Develop an estimated regression equation with annual income and household size as the independent variables. Discuss your findings.
SUMMARY OUTPUT | ||||||||
Regression Statistics | ||||||||
Multiple R | 0.908501824 | |||||||
R Square | 0.825375565 | |||||||
Adjusted R Square | 0.817944738 | |||||||
Standard Error | 398.3249315 | |||||||
Observations | 50 | |||||||
ANOVA | ||||||||
df | SS | MS | F | Significance F | ||||
Regression | 2 | 35246778.72 | 17623389.36 | 111.0745228 | 1.54692E-18 | |||
Residual | 47 | 7457149.298 | 158662.751 | |||||
Total | 49 | 42703928.02 | ||||||
Coefficients | Standard Error | t Stat | P-value | Lower 95% | Upper 95% | Lower 95.0% | Upper 95.0% | |
Intercept | 1305.033885 | 197.770988 | 6.598712469 | 3.32392E-08 | 907.1699825 | 1702.898 | 907.17 | 1702.898 |
X Variable 1 | 33.12195539 | 3.970237444 | 8.342562845 | 7.88598E-11 | 25.13486801 | 41.10904 | 25.13487 | 41.10904 |
X Variable 2 | 356.3402032 | 33.22039979 | 10.72654771 | 3.17247E-14 | 289.5093801 | 423.171 | 289.5094 | 423.171 |
Y=m1 X1 + m2X2 +C
Y= 33.12 X1 + 356.34 X2 + 1305.03
Here,
Y= Amount Charged
X1 = Income
X2 = Household Size
Based on the above regression analysis, it can be determined that the value of R2 is 0.8253 means about 82.53% of the variation in amount charged can be explained by household size and income.
There is high level of significance for the single variables as compared to both variables together. There is reduction in the standard error as compared to the previous simple linear regression models showing the improvement in the regression model.
- What is the predicted annual credit card charge for a three-person household with an annual income of $40,000?
It can be calculated by the equation obtained below from regression analysis of both variables together as independent:
Y= 33.12 X1 + 356.34 X2 + 1305.03
Y= 33.12 *40 + 356.34 *3 + 1305.03
= 1324.8 + 1069.02 + 1305.03
= $3698.85
The predicted annual credit card charge for a three-person household with an annual income of $40,000 is about $3,699.
- Discuss the need for other independent variables that could be added to the model. What additional variables might be helpful?
The following independent variables could be added to the model because of big standard error of the estimate in the regression model:
Number of credit card: Number of credit cards can provide valuable information about the consumer in measurable form. It can be expected that the higher number of credit cards is likely to be higher amount charged. However, there may be low correlation between multiple cards and income and household size (Little and Rubin, 2014).
Purchasing options: This variable can provide information about the consumers’ preferences over purchasing through cash or credit card. It can provide buying pattern of the consumer based on security measures and culture aspects.
Average age and gender ratio of a household: This demographic information of the consumer can be helpful to improve the existing model of estimation. It is believed that youth and females purchase more than males even through credit card, so this variable can help to refine the model and provide the accurate results. In addition, both data are also easy to collect but are independent from others.
(A)
(B)
(A)
The correlations between ten different pairs of variables have been done for evaluating the marks of students under different exams. Different set of correlation and there results have been demonstrated as below:
Correlation
Variable 1 | Variable 2 | Correlation |
HI001 FINAL EXAM | HI002 FINAL EXAM | 5% |
HI001 ASSIGNMENT 01 | HI001 ASSIGNMENT 02 | 66% |
HI002 ASSIGNMENT 01 | HI002 ASSIGNMENT 02 | 55% |
HI003 ASSIGNMENT 01 | HI003 ASSIGNMENT 02 | 52% |
HI001 FINAL EXAM | HI003 FINAL EXAM | 12% |
HI002 FINAL EXAM | HI003 FINAL EXAM | 12% |
HI001 ASSIGNMENT 01 | HI002 ASSIGNMENT 01 | -13% |
HI001 ASSIGNMENT 02 | HI002 ASSIGNMENT 02 | -4% |
HI002 ASSIGNMENT 01 | HI003 ASSIGNMENT 01 | -23% |
HI002 ASSIGNMENT 02 | HI003 ASSIGNMENT 02 | -11% |
(B)
- Positive or Negative Correlated
Variable 1 | Variable 2 | Correlated |
HI001 FINAL EXAM | HI002 FINAL EXAM | Positive |
HI001 ASSIGNMENT 01 | HI001 ASSIGNMENT 02 | Positive |
HI002 ASSIGNMENT 01 | HI002 ASSIGNMENT 02 | Positive |
HI003 ASSIGNMENT 01 | HI003 ASSIGNMENT 02 | Positive |
HI001 FINAL EXAM | HI003 FINAL EXAM | Positive |
HI002 FINAL EXAM | HI003 FINAL EXAM | Positive |
HI001 ASSIGNMENT 01 | HI002 ASSIGNMENT 01 | Negative |
HI001 ASSIGNMENT 02 | HI002 ASSIGNMENT 02 | Negative |
HI002 ASSIGNMENT 01 | HI003 ASSIGNMENT 01 | Negative |
HI002 ASSIGNMENT 02 | HI003 ASSIGNMENT 02 | Negative |
- Weak or Strong Correlation
Variable 1 | Variable 2 | Correlation |
HI001 FINAL EXAM | HI002 FINAL EXAM | Weak |
HI001 ASSIGNMENT 01 | HI001 ASSIGNMENT 02 | Strong |
HI002 ASSIGNMENT 01 | HI002 ASSIGNMENT 02 | Strong |
HI003 ASSIGNMENT 01 | HI003 ASSIGNMENT 02 | Strong |
HI001 FINAL EXAM | HI003 FINAL EXAM | Weak |
HI002 FINAL EXAM | HI003 FINAL EXAM | Weak |
HI001 ASSIGNMENT 01 | HI002 ASSIGNMENT 01 | Weak |
HI001 ASSIGNMENT 02 | HI002 ASSIGNMENT 02 | Weak |
HI002 ASSIGNMENT 01 | HI003 ASSIGNMENT 01 | Weak |
HI002 ASSIGNMENT 02 | HI003 ASSIGNMENT 02 | Weak |
- Significance Value
Variable 1 | Variable 2 | Significance Value |
HI001 FINAL EXAM | HI002 FINAL EXAM | Low |
HI001 ASSIGNMENT 01 | HI001 ASSIGNMENT 02 | High |
HI002 ASSIGNMENT 01 | HI002 ASSIGNMENT 02 | High |
HI003 ASSIGNMENT 01 | HI003 ASSIGNMENT 02 | High |
HI001 FINAL EXAM | HI003 FINAL EXAM | Low |
HI002 FINAL EXAM | HI003 FINAL EXAM | Low |
HI001 ASSIGNMENT 01 | HI002 ASSIGNMENT 01 | Low |
HI001 ASSIGNMENT 02 | HI002 ASSIGNMENT 02 | Low |
HI002 ASSIGNMENT 01 | HI003 ASSIGNMENT 01 | Low |
HI002 ASSIGNMENT 02 | HI003 ASSIGNMENT 02 | Low |
- It has been evaluated from the above collected information that significance value plays a vital in establishing the relationship between different variables. The high value helps in concluding the statement and facilitates in making wise decision for different activities (Cohen, et al., 2013).
4.In addition, the significance value reveals about the data, which has been collated that there is positive as well as negative relationship between different variables. In the positive with high significance value states that all the students are passed in there examination with good grades. Moreover, there is weak and strong correlation among the variables that determines the significance value (Guiso, et al., 2015). The strong correlation value reveals about the significance value that the data has high results and students are satisfied with the good marks in the examination.
- Use descriptive statistics to summarize the data from the two studies. What are your preliminary observations about the depression scores?
Medical Study 1
Descriptive Statistics | Florida | New York | North Carolina |
Mean | 5.55 | 8 | 7.05 |
Standard Error | 0.478347 | 0.492042 | 0.634428877 |
Median | 6 | 8 | 7.5 |
Mode | 7 | 8 | 8 |
Standard Deviation | 2.139233 | 2.200478 | 2.837252192 |
Sample Variance | 4.576316 | 4.842105 | 8.05 |
Kurtosis | -1.06219 | 0.626432 | -0.904925496 |
Skewness | -0.27356 | 0.625687 | -0.056188269 |
Range | 7 | 9 | 9 |
Minimum | 2 | 4 | 3 |
Maximum | 9 | 13 | 12 |
Sum | 111 | 160 | 141 |
Count | 20 | 20 | 20 |
In medical study 1, it can be identified that people in New York has higher depression as compared to Florida and North Carolina.
Medical Study 2
Descriptive Statistics | Florida | New York | North Carolina |
Mean | 14.5 | 15.25 | 13.95 |
Standard Error | 0.708965146 | 0.923024206 | 0.65884668 |
Median | 14.5 | 14.5 | 14 |
Mode | 17 | 14 | 12 |
Standard Deviation | 3.170588522 | 4.127889737 | 2.946451925 |
Sample Variance | 10.05263158 | 17.03947368 | 8.681578947 |
Kurtosis | -0.340799481 | -0.0301367 | -0.592052134 |
Skewness | 0.280721497 | 0.525352494 | -0.041733773 |
Range | 12 | 15 | 11 |
Minimum | 9 | 9 | 8 |
Maximum | 21 | 24 | 19 |
Sum | 290 | 305 | 279 |
Count | 20 | 20 | 20 |
From the study 2, it can be found that Individuals a chronic health condition such as arthritis, hypertension, and/or heart ailment in all locations have similar scores as high depression levels. When both the studies are compared then it can be analyzed that individuals 65 years of age or older with chronic health diseases have higher depression as compared to normal individuals (Tang and Zhang, 2013).
- Use analysis of variance on both data sets. State the hypotheses being tested in each case. What are your conclusions?
Medical Study 1:
Hypothesis Formulation:
H0: µ1=µ2=µ3
H0 indicates no significant difference in the mean depression score of healthy people in the three locations.
Ha: µ1≠µ2≠µ3 Ha shows a significant difference in the mean depression score of healthy people in the three locations.
Here,
µ1= the mean depression score of healthy people in Florida
µ2= the mean depression score of healthy people in New York
µ3= the mean depression score of healthy people in North Carolina
Rejection Rule: The null hypothesis is rejected if, the calculated value of F statistic ≥ the F critical value or p-value ≤0.05)
ANOVA Single Factor:
ANOVA | ||||||
Source of Variation | SS | df | MS | F | P-value | F crit |
Between Groups | 61.03333 | 2 | 30.51666667 | 5.240886 | 0.00814 | 3.158842719 |
Within Groups | 331.9 | 57 | 5.822807018 | |||
Total | 392.9333 | 59 |
Conclusion:
Here, F (5.24) is greater than Fcrit (3.15) as the null hypothesis is rejected (F statistic ≥ the F critical value and p-value ≤ 0.05). The sample provides enough evidence to support claim that there is a significant difference in the mean depression score of healthy people in the three locations.
Medical Study 2:
ANOVA Single Factor:
ANOVA | ||||||
Source of Variation | SS | df | MS | F | P-value | F crit |
Between Groups | 17.03333333 | 2 | 8.516667 | 0.714212 | 0.493906 | 3.158843 |
Within Groups | 679.7 | 57 | 11.92456 | |||
Total | 696.7333333 | 59 |
Conclusion:
Here, F (0.714) is less than Fcrit (3.15) as the null hypothesis is accepted (F statistic ≤ the F critical value or p-value ≥0.05). The sample provides enough evidence to support claim that there is no significant difference in the mean depression score of Individuals having a chronic health condition such as arthritis, hypertension, and/or heart ailment in the three locations (Shipley, 2016).
From the results, it can be inferred that in medical test 1, the mean depression score is related to geographical location as it differs for each location. In addition, people from New York have high depression score then individuals from other two locations.
In medical test 2, it can be inferred that mean depression score of individuals 65 years of age or older having chronic health condition is not linked to locations. The mean depression scores are similar in all three geographical locations.
Cohen, J., Cohen, P., West, S. G., & Aiken, L. S. (2013). Applied multiple regression/correlation analysis for the behavioral sciences. UK: Routledge.
Guiso, L., Sapienza, P. and Zingales, L. (2015) The value of corporate culture. Journal of Financial Economics, 117(1), pp. 60-76.
Little, R. J., and Rubin, D. B. (2014) Statistical analysis with missing data. USA: John Wiley & Sons.
Shipley, B. (2016) Cause and Correlation in Biology: A User’s Guide to Path Analysis, Structural Equations and Causal Inference with R. Cambridge University Press.
Tang, Q. Y., and Zhang, C. X. (2013) Data Processing System (DPS) software with experimental design, statistical analysis and data mining developed for use in entomological research. Insect Science, 20(2), 254-260.
Thanks again for the article.Really thank you! Really Cool.
https://englandr614tem9.scrappingwiki.com/600431/the_fact_about_anchor_fastener_companies_in_india_that_no_one_is_suggesting
Really enjoyed this article.Much thanks again. Will read on…
https://nsfw-ai.chat/
Thanks so much for the blog article.Much thanks again. Keep writing.
https://best-accounting-software44192.develop-blog.com/31621039/examine-this-report-on-bali-honeymoon-package-for-7-days
Great, thanks for sharing this blog.Much thanks again. Keep writing.
https://confuciusq000wpg5.wikicommunications.com/user
I appreciate you sharing this article post.Much thanks again. Fantastic.
https://net7789096.bloggosite.com/31188580/discover-the-ultimate-thai-adventure-with-kbg-trips-your-premier-source-for-thailand-tour-packages
Im grateful for the blog post. Much obliged.
https://atop-education.degree/
Thanks for the blog. Really Cool.
https://gbdownload.cc/
Wow, great blog.Much thanks again. Really Cool.
https://www.detroitbadboys.com/users/bookinnovahycross
Great blog article.Really looking forward to read more. Really Great.
https://nutrition30504.bloginder.com/25577575/elevate-your-style-buy-bracelets-for-men-online-at-vavdiya-com
I appreciate you sharing this blog article.Thanks Again. Awesome.
https://www.hkcashwebsite.com
Thanks so much for the post.Thanks Again. Really Cool.
https://www.hommar.com/ar/video/products-detail-664746
Looking forward to reading more. Great article post.Much thanks again. Cool.
https://www.kol.kim/custom-design/
I really liked your blog article. Cool.
https://blog.huddles.app
Thank you ever so for you post. Want more.
https://crushon.ai/
I really like and appreciate your blog post.Really thank you! Will read on…
https://keeperaitest.com/
Thanks for the blog.Thanks Again. Want more.
https://crushon.ai/character/7573a4cc-a7fc-4c3b-9b11-0f296d2bde95/details?not-for-all-audiences=true&utm_medium=home
I really liked your article.Thanks Again. Much obliged.
https://ai-sexting.top/
Say, you got a nice article post. Keep writing.
https://crushon.ai/
I value the blog post.Really looking forward to read more. Fantastic.
https://crushon.ai/
I am so grateful for your article post.
https://www.yinraohair.com/cosplay/shop-by-color/pink
Really enjoyed this blog.Thanks Again.
https://blog.huddles.app
Thank you for your article.Much thanks again. Great.
https://umhom13.com
Thank you for your article post. Keep writing.
https://x.yupoo.com/photos/lireplica/albums/78305214?uid=1&isSubCate=false&referrercate=854757
Im grateful for the post. Fantastic.
https://www.pcbgogo.com/
Appreciate you sharing, great blog post. Really Great.
https://crushon.ai/character/f5757531-9a53-4c38-85ef-cd5ae51cdc13/details
Im obliged for the blog.Really looking forward to read more. Keep writing.
https://crushon.ai/character/188bda39-a5d0-40c4-a1a5-33e26408927c/details
I am so grateful for your article.Much thanks again. Really Great.
https://smashorpass.app/
I appreciate you sharing this blog post.Really looking forward to read more. Fantastic.
https://vapzvape.com/
I think this is a real great article post.Thanks Again. Will read on…
https://www.wavlinkstore.com
Thanks for the article.Much thanks again. Really Great.
https://inspro2.com/