# ASSIGNMENT FOR RESEARCH METHODS AND DATA ANALYSIS

Exercise 1a

 Case Processing Summary Cases Valid Missing Total N Percent N Percent N Percent cholesterol values in mmol/L 44 100.0% 0 0.0% 44 100.0%

As all the variables have same value which indicates the null missing value in the statistics in the field of normality of the following distribution of cholesterol values.

 Descriptives Statistic Std. Error cholesterol values in mmol/L Mean 5.9932 .23706 95% Confidence Interval for Mean Lower Bound 5.5151 Upper Bound 6.4713 5% Trimmed Mean 6.0162 Median 5.7500 Variance 2.473 Std. Deviation 1.57250 Minimum 3.00 Maximum 8.50 Range 5.50 Interquartile Range 2.50 Skewness -.184 .357 Kurtosis -.980 .702

From the descriptive values, it has been analysed that the mean of the cholesterol value is 5.99 with standard error of 0.24 and 5% trimmed mean is reflected as 6.01. The median, variance and standard deviation for cholesterol values are 5.75, 2.473 and 1.572 respectively.

On the other hand, the values of skewness and kurtosis is need to be approximately zero for constructing the table using normal distribution. The kurtosis is -0.980 with standard error of 0.702 which reflects the pattern of normal distribution.

 Tests of Normality Kolmogorov-Smirnova Shapiro-Wilk Statistic df Sig. Statistic df Sig. cholesterol values in mmol/L .105 44 .200* .959 44 .121 *. This is a lower bound of the true significance. a. Lilliefors Significance Correction

Shapiro-Wilk test highlights if the random sample is obtained from normal distribution and it is a test of normality (Hanusz, et al., 2016). Kolmogorov–Smirnov test (KS test) helps to determine if the datasets vary significantly or not (Hassani & Silva, 2015). The normal distribution test or test of normality has been conducted through following Shapiro-Wilk test and KS test.

The value of statistical significant differs with that of normal distribution. The null-hypothesis of the normal distribution is rejected if the value is less than 0.05 which has also reflected the non-normal distribution. The significant value of cholesterol is 0.20 which indicates that it is distributed normally with having null-hypothesis.

From the Shapiro-Wilk test, it has been found that the significant value for cholesterol is 0.121 which is greater than 0.05. So, the cholesterol values are stated to be normally distributed. Shapiro-Wilk test is also considered as very useful in the cases where the number of observations is very much less in relation with the same value.

Shapiro-Wilk test is also considered as very useful in the cases where the number of observations is very much less in relation with the same value.

From the histogram, it has been stated that the range of the value lies in between 3 to 9 which reflects the continuous measurement of variables. The results have shown that the standard deviation is 1.57 for the sample size 44 and the histogram also reflects the standard median of 5.75.

The normal Q-Q plot of cholesterol values reflects that the observed values of the limit will be hugging with the line which indicates the normal distribution of the variables. By checking each section of the variable category, it has been stated that the cholesterol values are approximated normally distributed.

The difference in outliers is demonstrated through the detrended normal Q-Q plot which reflects that the observed values have differed with that of the standard one. The pattern of Q-Q curve helps to showcase the approximated normal distribution of identified cholesterol values.

From the figure of Boxplot, it has been identified that the line across the summary plot indicates its median of 5.75 and outliers belong to the range of 3 to 8.5.

Exercise 1b

 Statistics cholesterol values in mmol/L N Valid 44 Missing 0 Mean 5.9932 Median 5.7500 Mode 5.50 Std. Deviation 1.57250 Variance 2.473 Range 5.50 Minimum 3.00 Maximum 8.50

From the statistics, it has been analysed that the null hypothesis is not included in the cholesterol values and the mean, mode and median of the dataset are identified as 5.99, 5.50 and 5.75 respectively.

Having the variance of the sample as 2.47, the maximum and minimum sample is also identified as 8.5 and 3 respectively. Moreover, 95% of the distributed sample of cholesterol are included under confidence interval which is also reflected from the below figure.

From the histogram, it has been stated that the range of the value lies in between 3 to 9 which reflects the continuous measurement of variables. The results have shown that the standard deviation is 1.57 for the sample size 44 and the histogram also reflects the standard median of 5.75.

The original or true value must lie in between the upper and lower limit of 95% confidence interval which reflects the normal distribution.

Exercise 2a

 Paired Samples Statistics Mean N Std. Deviation Std. Error Mean Pair 1 aerobic capacity for group 1 46.3462 13 8.13794 2.25706 aerobic capacity for group 2 34.3692 13 6.86590 1.90426 Pair 2 systolic blood pressure for group 1 122.7692 13 3.72276 1.03251 systolic blood pressure for group 2 136.5385 13 5.79677 1.60774 Pair 3 body fat for group 1 21.5000 13 5.13566 1.42438 body fat for group 2 32.1154 13 3.68077 1.02086

For developing the hypothesis, the group of participants is distributed on the basis of three pairs which are aerobic capacity, systolic blood pressure and body fat. For the constant sample size of 13, the mean and standard deviation for thehealthy weight group is 46.34 and 8.13 respectively. The mean and standard deviation forthe overweight group for SBP are 136.53 and 5.79 respectively.

 Paired Samples Correlations N Correlation Sig. Pair 1 aerobic capacity for group 1 & aerobic capacity for group 2 13 .087 .778 Pair 2 systolic blood pressure for group 1  & systolic blood pressure for group 2 13 .134 .663 Pair 3 body fat for group 1 & body fat for group 2 13 -.179 .559

From the paired samples correlations, it has been identified that the significant values for aerobic capacity for group 1 & aerobic capacity for group 2 along with systolic blood pressure for group 1 & systolic blood pressure for group 2 are 0.778 and 0.663 respectively. On the other hand, the correlation and significant value of body fat for group 1 & body fat for group 2 are -0.179 and 0.559 respectively.

 Paired Samples Test Paired Differences t df Sig. (2-tailed) Mean Std. Deviation Std. Error Mean 95% Confidence Interval of the Difference Lower Upper Pair 1 aerobic capacity for group 1 – aerobic capacity for group 2 11.97692 10.18194 2.82396 5.82404 18.12981 4.241 12 .001 Pair 2 systolic blood pressure for group 1  – systolic blood pressure for group 2 -13.76923 6.45696 1.79084 -17.67113 -9.86733 -7.689 12 .000 Pair 3 body fat for group 1 – body fat for group 2 -10.61538 6.83177 1.89479 -14.74378 -6.48699 -5.602 12 .000

Paired sample test are normally carried out to evaluate if the mean difference between two data sets is zero or not (Skyttberg, et al., 2018).From the paired sample test, it has been identified that the difference between two datasets have been equivalent from the value of t in T-testing.

The standard error mean of three dataset is identified as 2.82, 1.79 and 1.894 respectively. The original or true value must lie in between the upper and lower limit of 95% confidence interval which reflects the normal distribution.

Exercise 2b

 Paired Samples Statistics Mean N Std. Deviation Std. Error Mean Pair 1 aerobic capacity for group 2 34.3692 13 6.86590 1.90426 group 2 retest for AC 36.4077 13 5.14546 1.42709

For the group of overweight, it has been identified that the standard deviation has been changed from 6.86 to 5.14 which demonstrates the slight change after training.

 Paired Samples Correlations N Correlation Sig. Pair 1 aerobic capacity for group 2 & group 2 retest for AC 13 .912 .000

For the sample size of 13, the correlation of the aerobic capacity for group 2 is demonstrated as 0.912.

 Paired Samples Test Paired Differences t df Sig. (2-tailed) Mean Std. Deviation Std. Error Mean 95% Confidence Interval of the Difference Lower Upper Pair 1 aerobic capacity for group 2 – group 2 retest for AC -2.03846 3.02587 .83923 -3.86698 -.20995 -2.429 12 .032

The original or true value must lie in between – 3.86 to -0.209 of 95% confidence interval which reflects the normal distribution.  The mean of the group is also identified as 2.03 with standard deviation of 3.025.

Exercise 3a

 Correlations systolic bp diastolic bp age systolic bp Pearson Correlation 1 .725** .223 Sig. (2-tailed) .000 .124 N 49 49 49 diastolic bp Pearson Correlation .725** 1 .365** Sig. (2-tailed) .000 .010 N 49 49 49 age Pearson Correlation .223 .365** 1 Sig. (2-tailed) .124 .010 N 49 49 49 **. Correlation is significant at the 0.01 level (2-tailed).

Exercise 3b

Based on the Bivariate relationship table above, it is seen that diastolic blood pressure and systolic blood pressure has the highest correlation with 0.725, followed by the diastolic blood pressure and age with 0.365 and systolic blood pressure and age with 0.223.

Exercise 4a

 Descriptives N Mean Std. Deviation Std. Error 95% Confidence Interval for Mean Minimum Maximum Lower Bound Upper Bound Atkins Diet 25 2 54.50 36.062 25.500 -269.51 378.51 29 80 32 1 37.00 . . . . 37 37 36 1 40.00 . . . . 40 40 39 1 48.00 . . . . 48 48 40 1 65.00 . . . . 65 65 41 1 66.00 . . . . 66 66 49 2 57.00 4.243 3.000 18.88 95.12 54 60 50 1 58.00 . . . . 58 58 51 1 63.00 . . . . 63 63 52 1 57.00 . . . . 57 57 58 2 55.00 7.071 5.000 -8.53 118.53 50 60 60 1 70.00 . . . . 70 70 61 1 70.00 . . . . 70 70 72 1 73.00 . . . . 73 73 Total 17 57.65 13.527 3.281 50.69 64.60 29 80 5:2 Diet 25 2 39.00 29.698 21.000 -227.83 305.83 18 60 32 1 25.00 . . . . 25 25 36 1 28.00 . . . . 28 28 39 1 44.00 . . . . 44 44 40 1 45.00 . . . . 45 45 41 1 59.00 . . . . 59 59 49 2 46.00 5.657 4.000 -4.82 96.82 42 50 50 1 54.00 . . . . 54 54 51 1 68.00 . . . . 68 68 52 1 47.00 . . . . 47 47 58 2 55.50 .707 .500 49.15 61.85 55 56 60 1 63.00 . . . . 63 63 61 1 55.00 . . . . 55 55 72 1 75.00 . . . . 75 75 Total 17 49.65 15.137 3.671 41.86 57.43 18 75

 ANOVA Sum of Squares df Mean Square F Sig. Atkins Diet Between Groups 1559.382 13 119.952 .263 .963 Within Groups 1368.500 3 456.167 Total 2927.882 16 5:2 Diet Between Groups 2751.382 13 211.645 .694 .724 Within Groups 914.500 3 304.833 Total 3665.882 16

Exercise 4b

Null hypothesis: There is no difference in the preference of a particular diet.

Alternate hypothesis: There is a significant difference in the preference of a particular diet.

One–way Anova test helps to find out if there is a variance between the means of independent variables (Kim, 2017). Based on one–way Anova test result, the significance value for both Atkins diet and 5:2 diet with relation to low-calorie diet is higher than the significance value of 0.05.

Therefore, null hypothesis is accepted. Since the significance value as suggested by Anova test is above 0.05 for both the variables, therefore there is no need for post hoc tests.

Exercise 5a

 Descriptives systolic BP N Mean Std. Deviation Std. Error 95% Confidence Interval for Mean Minimum Maximum Lower Bound Upper Bound 6 weeks 10 187.4000 12.94604 4.09390 178.1390 196.6610 170.00 210.00 12 weeks 7 179.0000 14.23610 5.38074 165.8338 192.1662 159.00 200.00 24 weeks 6 163.5000 4.96991 2.02896 158.2844 168.7156 155.00 169.00 Total 23 178.6087 15.06271 3.14079 172.0951 185.1223 155.00 210.00

 ANOVA systolic BP Sum of Squares df Mean Square F Sig. Between Groups 2143.578 2 1071.789 7.527 .004 Within Groups 2847.900 20 142.395 Total 4991.478 22

Post Hoc Tests

 Multiple Comparisons Dependent Variable:   systolic BP Tukey HSD (I) Group (J) Group Mean Difference (I-J) Std. Error Sig. 95% Confidence Interval Lower Bound Upper Bound 6 weeks 12 weeks 8.40000 5.88062 .346 -6.4779 23.2779 24 weeks 23.90000* 6.16214 .003 8.3099 39.4901 12 weeks 6 weeks -8.40000 5.88062 .346 -23.2779 6.4779 24 weeks 15.50000 6.63887 .074 -1.2962 32.2962 24 weeks 6 weeks -23.90000* 6.16214 .003 -39.4901 -8.3099 12 weeks -15.50000 6.63887 .074 -32.2962 1.2962 *. The mean difference is significant at the 0.05 level.

Exercise 5b

From the results of the Anova test, it can be said that there is a statistical significant difference between the levels of the independent variables i.e. difference is observed in systolic blood pressure due to the exercise lasting for 24 weeks, 12 weeks and 6 weeks.

The significant difference in the Anova test can be found out through Post hoc where except the significance value of 24 weeks and 6 weeks, all other values have more than 0.05 significance. Therefore all the values except the significance value of 24 weeks and 6 weeks have a non-statistically significant difference.

Exercise 6a

 Between-Subjects Factors Value Label N Main Sport participated in 1 Runner 8 2 Rower 8

 Multivariate Testsa Effect Value F Hypothesis df Error df Sig. heartrate Pillai’s Trace .971 37.635b 7.000 8.000 .000 Wilks’ Lambda .029 37.635b 7.000 8.000 .000 Hotelling’s Trace 32.931 37.635b 7.000 8.000 .000 Roy’s Largest Root 32.931 37.635b 7.000 8.000 .000 heartrate * sport Pillai’s Trace .641 2.038b 7.000 8.000 .170 Wilks’ Lambda .359 2.038b 7.000 8.000 .170 Hotelling’s Trace 1.783 2.038b 7.000 8.000 .170 Roy’s Largest Root 1.783 2.038b 7.000 8.000 .170 a. Design: Intercept + sport Within Subjects Design: heartrate b. Exact statistic

 Mauchly’s Test of Sphericitya Measure:   MEASURE_1 Within Subjects Effect Mauchly’s W Approx. Chi-Square df Sig. Epsilonb Greenhouse-Geisser Huynh-Feldt Lower-bound heartrate .001 81.311 27 .000 .411 .565 .143 Tests the null hypothesis that the error covariance matrix of the orthonormalized transformed dependent variables is proportional to an identity matrix. a. Design: Intercept + sport Within Subjects Design: heartrate b. May be used to adjust the degrees of freedom for the averaged tests of significance. Corrected tests are displayed in the Tests of Within-Subjects Effects table.

 Tests of Within-Subjects Effects Measure:   MEASURE_1 Source Type III Sum of Squares df Mean Square F Sig. heartrate Sphericity Assumed 15918.094 7 2274.013 87.152 .000 Greenhouse-Geisser 15918.094 2.876 5535.544 87.152 .000 Huynh-Feldt 15918.094 3.956 4023.623 87.152 .000 Lower-bound 15918.094 1.000 15918.094 87.152 .000 heartrate * sport Sphericity Assumed 190.844 7 27.263 1.045 .405 Greenhouse-Geisser 190.844 2.876 66.366 1.045 .381 Huynh-Feldt 190.844 3.956 48.240 1.045 .392 Lower-bound 190.844 1.000 190.844 1.045 .324 Error(heartrate) Sphericity Assumed 2557.063 98 26.092 Greenhouse-Geisser 2557.063 40.259 63.516 Huynh-Feldt 2557.063 55.386 46.168 Lower-bound 2557.063 14.000 182.647

 Tests of Within-Subjects Contrasts Measure:   MEASURE_1 Source heartrate Type III Sum of Squares df Mean Square F Sig. heartrate Linear 3043.006 1 3043.006 42.968 .000 Quadratic 30.006 1 30.006 2.137 .166 Cubic 6436.031 1 6436.031 148.967 .000 Order 4 .274 1 .274 .017 .899 Order 5 2849.144 1 2849.144 133.314 .000 Order 6 16.751 1 16.751 4.524 .052 Order 7 3542.881 1 3542.881 271.684 .000 heartrate * sport Linear 17.357 1 17.357 .245 .628 Quadratic 44.024 1 44.024 3.135 .098 Cubic 62.546 1 62.546 1.448 .249 Order 4 24.961 1 24.961 1.516 .238 Order 5 6.322 1 6.322 .296 .595 Order 6 5.046 1 5.046 1.363 .263 Order 7 30.587 1 30.587 2.346 .148 Error(heartrate) Linear 991.494 14 70.821 Quadratic 196.613 14 14.044 Cubic 604.862 14 43.204 Order 4 230.485 14 16.463 Order 5 299.203 14 21.372 Order 6 51.839 14 3.703 Order 7 182.566 14 13.040

 Tests of Between-Subjects Effects Measure:   MEASURE_1 Transformed Variable:   Average Source Type III Sum of Squares df Mean Square F Sig. Intercept 3292819.531 1 3292819.531 5790.345 .000 sport 3549.031 1 3549.031 6.241 .026 Error 7961.437 14 568.674

Exercise 6b

Based on the significant value provided in Tests of Within-Subjects Contrasts table, it is seen that for eight separate heart rate variables, the significance value is less than 0.05. But heartrate * sport source leads to the significance values of higher than 0.05. Test of between-subjects effects also showed that the significance value of sport is less than 0.05. From the above graph between heart rate and estimated marginal means, it is observed that the rowers showed higher heart rates in comparison to runners.

References

Hanusz, Z., Tarasinska, J. & Zielinski, W., 2016. Shapiro-Wilk test with known mean. REVSTAT-Statistical Journal, 14(1), pp. 89-100.

Hassani, H. & Silva, E., 2015. A Kolmogorov-Smirnov based test for comparing the predictive accuracy of two sets of forecasts. Econometrics, 3(3), pp. 590-609.

Kim, T., 2017. Understanding one-way ANOVA using conceptual figures. Korean journal of anesthesiology, 70(1), p. 22.

Skyttberg, N., Chen, R. & Koch, S., 2018. Man vs machine in emergency medicine–a study on the effects of manual and automatic vital sign documentation on data quality and perceived workload, using observational paired sample data and questionnaires. BMC, 18(1), p. 54.