Essential Statistical Concepts: Selecting the Appropriate Analysis for Your Data

Rihab Weghlani
7 min readFeb 15, 2023

--

The proper use of statistics is crucial for drawing accurate conclusions and making informed decisions. Understanding the basics of statistics and knowing how to choose the appropriate statistical test for your data is essential to ensure the validity of your results. Whether you are a scientist, business professional, or student, having a strong grasp of statistics and its applications is critical for conducting successful research and making data-driven decisions.

Before choosing an appropriate statistical test for your study, it is important to consider the following factors:

  1. The nature of the data: It is important to determine whether your data is quantitative or qualitative, as this will determine the type of test you should use.

Quantitative data is data that can be expressed in numerical form and can be subjected to mathematical operations such as addition, subtraction, averaging, etc. Examples of quantitative data include age, height, weight, and income.

Qualitative data, is descriptive data that cannot be expressed in numerical form. Examples of qualitative data include sex, emotions or feelings, appearance, color, or shape of objects.

2. The distribution of the data: Understanding the distribution of your data will help you to choose the right test. For example, if your data follow a normal distribution, a parametric test may be more appropriate, while if the data is not normally distributed, a non-parametric test may be more suitable.

The main emphasis of this article will be on the techniques employed for managing continuous data.

What is the normal distribution or (Gaussian distribution) in statistics?

The probability distribution of statistics is a crucial factor in statistical testing. When samples are taken from a population N (µ, σ²) with a sample size of n, the distribution of the sample mean X̄ should be normally distributed as N (µ, σ²/n). The sample size plays a vital role in determining the sample distribution. It has been observed that small sample size can result in a non-normal distribution. However, as the sample size increases, the distribution begins to conform to the normal curve, becoming normally distributed when the sample size reaches 30. The Central Limit Theorem states that as the sample size increases, the distribution of the sample means approaches a normal distribution.

To determine if the data follow a normal distribution, tests such as the Shapiro-Wilk test, Anderson-Darling test, or Kolmogorov-Smirnov test can be used. Also, inspecting the data visually is an easy way to determine if it is close to being normally distributed. This can be done by constructing a histogram or a normal quantile-quantile (Q-Q) plot. If the data is nearly normally distributed, the histogram should display a bell-shaped curve, and the Q-Q plot should be roughly linear. If the data follow a normal distribution, parametric tests can be used for data analysis.

The shape of the histogram closely follows the form of a bell curve(https://mathbitsnotebook.com/Algebra2/Statistics/STnormalDistribution.html)

Parametric tests:

T-test:

The T-test is used to compare the means of two groups. There are two types of t-tests: the independent t-test, which is used when the two groups are independent, and the paired t-test, which is used when the two groups are dependent.

Example “Psychological Study”: A psychologist wants to determine if a new therapy technique significantly reduces anxiety levels compared to a traditional therapy technique. They randomly assign patients to receive either the new or traditional therapy and track their anxiety levels before and after treatment. A t-test can be used to determine if there is a significant difference in anxiety reduction between the two groups.

ANOVA One Way:

The analysis of continuous data is known as Analysis of Variance (ANOVA). ANOVA is used to determine if there is a significant difference between the means of multiple groups. The process of ANOVA involves an initial examination of the data using an omnibus test, also known as the F-test, which checks if there is a difference in means between the various groups being studied. The null hypothesis for this test is that there is no difference between the means, meaning that all groups have the same mean. If the omnibus test fails to reject the null hypothesis, it means there is no evidence of a difference between the group means and the conclusion is made that they are the same. However, if the test shows significance, further analysis is conducted using post-hoc tests, such as the Tukey test, Duncan test, or the Bonferroni correction.

For example, in an agricultural experiment, a farmer wants to test the effect of a new fertilizer on crop yields and is curious about whether applying the fertilizer at different rates will make a difference. He sets up an experiment where he grows crops in four different plots of land. On one plot, he applies the fertilizer at a rate of 10 pounds per acre. On the second plot, he applies the fertilizer at a rate of 20 pounds per acre. On the third plot, he doesn’t apply the fertilizer at all and serves as the control. After the crops have grown, he measures the yields from each plot.

To determine if the fertilizer and the different rates of application have a significant impact on crop yields, the farmer performs an ANOVA test. The ANOVA test will tell him if there is a difference in the means of the yields between the 3 plots and using post-hoc tests, which plots have significantly different means.

MANOVA:

This statistical technique analyzes the influence of one or more grouping variables on several outcome variables. The comparison of groups is made based on the average values of linear combinations of the outcome variables, and the goal is to determine if there is a significant difference between the groups in terms of the effect of the grouping variables on the outcome variables. The objective of this analysis is to determine if there is a significant difference between the groups in terms of the effect of the grouping variables on the outcome variables. This provides a more comprehensive understanding of the relationship between the grouping and outcome variables.

Compared to ANOVA, MANOVA has several advantages. First, by analyzing multiple dependent variables in a single experiment, there is an increased likelihood of identifying the most important factor. Second, MANOVA reduces the risk of Type I errors that might arise from conducting multiple independent ANOVA tests. It can uncover differences that ANOVA tests might miss.

For example, a medical scientist is interested in evaluating the effectiveness of a new medication for a chronic illness on two aspects of the patient’s health: discomfort levels and mobility. The scientist randomly assigns 100 patients with the illness to either a control group or a treatment group. The control group receives the standard treatment, while the treatment group receives the new medication. After three months, the scientist assesses the patient’s pain levels and mobility levels using two different methods.

Pearson Correlation:

Person correlation refers to the statistical relationship between two variables that describe the traits or characteristics of individuals. In psychology, this type of correlation is used to examine the relationship between various personality traits, attitudes, behaviors, and other descriptive variables. Positive person correlation occurs when individuals with high scores in one variable also tend to have high scores on the other, while negative person correlation occurs when high scores in one variable are associated with low scores on the other. Pearson’s correlation coefficient, which ranges from -1 to 1, is often used to measure person correlation, with -1 signifying a completely negative correlation, 0 signifying no correlation, and 1 indicating a completely positive correlation.

NB: If the data in your study do not follow a normal distribution, there are several solutions to consider. One option is to increase the sample size, but if that is not feasible, data transformation techniques, such as taking the square root or logarithm of the observations, can be used to make the distribution more normal. Another solution is to use non-parametric tests, which do not assume the normality of the data.

There are several nonparametric tests available, including:

The Mann-Whitney U test, also referred to as the Wilcoxon rank-sum test or the Wilcoxon-Mann-Whitney test, is a nonparametric alternative to the independent t-test. This test is used to compare the differences between two independent groups.

The Wilcoxon signed-rank test is a nonparametric alternative to the paired t-test and is used to compare two paired (non-independent) groups or two sets of repeated measurements within a subject.

The Kruskal-Wallis test is a nonparametric alternative to the one-way analysis of variance (ANOVA). Both tests are used to compare the difference in the mean of a continuous dependent variable between three or more independent groups.

The Spearman rank correlation is a commonly used nonparametric alternative to Pearson’s correlation for analyzing the relationship between two continuous or ordinal variables. This test measures the association between the two variables by determining the rank of the values of each variable, rather than using the actual values.

Types of Statistical Tests( Rihab Weghlani)
Types of Statistical Tests(Rihab Weghlani)

References:

CHEN, T., XU, M., TU, J., WANG, H., & NIU, X. (s. d.). Relationship between Omnibus and Post-hoc Tests : An Investigation of performance of the F test in ANOVA. Shanghai Archives of Psychiatry, 30(1), 60‑64. https://doi.org/10.11919/j.issn.1002-0829.218014

Kim, T. K. (2015). T test as a parametric statistic. Korean Journal of Anesthesiology, 68(6), 540‑546. https://doi.org/10.4097/kjae.2015.68.6.540

Warne, R. (2019). A Primer on Multivariate Analysis of Variance (MANOVA) for Behavioral Scientists. Practical Assessment, Research, and Evaluation, 19(1). https://doi.org/10.7275/sm63-7h70

Schober, P., & Vetter, T. R. (2020). Nonparametric Statistical Methods in Medical Research. Anesthesia & Analgesia, 131(6), 1862. https://doi.org/10.1213/ANE.0000000000005101

Thank you for reading!

If you found this post enjoyable, offering a small amount of additional encouragement by clapping 👏 would be greatly appreciated. I am receptive to any inquiries or feedback you may have, and you’re welcome to share this on Facebook, Twitter, or LinkedIn.

--

--

Responses (1)