9  Independent Sample t-test

This chapter will cover the independent t-test. Although more details follow, in short, an independent t-test is a statistical method used to determine if there is a significant difference between the means of two independent groups. Unlike the z-test, the independent t-test is used when the population variances are unknown and typically estimated from the sample data. It calculates a t-statistic, which measures how many standard deviations the difference between the two sample means is from the expected difference (usually zero). By comparing this t-statistic to a critical value from the t-distribution, researchers can determine whether the observed difference is statistically significant. Independent t-tests are commonly used in studies with smaller sample sizes or when population parameters are not available.

9.1 Some Additional Details

The t-test is used to compare two groups, which is a categorical (independent) variable (e.g., male versus female; time 1 versus time 2; treatment versus control) on some continuous outcome (dependent) variable.

NEED TO ADD CHAPTER ON RESEARCH DESIGN OF BUILD THESE CONCEPTS INTO SCIENTIFIC PROCESS. In most experiments, we aim to both randomly sample participants from the population and randomly assign them to different groups. This process helps ensure that the groups are comparable, so any differences in the outcome can be attributed to the grouping variable. In other words, any changes in the dependent variable (DV) are assumed to result from the independent variable (IV).

If you recall, the null hypothesis typically purports that there is no difference/association. Thus, imagine we are comparing the means two groups: 1 and 2. The null hypothesis states:

\(H_0: \mu_1 = \mu_2\)

and the alternative hypothesis states (for a two-sided test)

\(H_1: \mu_1 \neq \mu_2\)

In the last chapter we learned about the z-test, which carries a somewhat unrealistic assumptions: that we know some population variance parameter. Most times we do not know the population standard deviation or variance, so we must estimate it using our sample data. Furthermore, in last chapter we simulated the distribution of a sample’s means over and over to demonstrate central limit theorem. In a t-test, we are no longer interested in one mean, but two. We also have two potential standard deviations. Thus, our analysis is slightly different.

9.2 Betcha’ can’t eat just two…?

Imagine we wanted to model the average weight of bags of two different flavours of Lay’s Potato Chips. Let’s use our scientific method, as discussed in an earlier chapter, to conduct some science.

Specifically, you have a new theory: Lays purposely puts fewer chips in a bag of Ketchup Chips than Regular Chips because the seasoning in Ketchup Chips costs more to produce. Based on this, you hypothesize that 200g bags of Ketchup Chips weigh less than 200g bags of Regular Chips.

9.3 Step 1. Generate your hypotheses

Your hypothesis can be translated into a statistical hypothesis, represented as the null and alternative hypotheses:

\(H_0: \mu_{ketchup} = \mu_{regular}\) (The average weight of Ketchup Chips is equal to that of Regular Chips.) \(H_1: \mu_{ketchup} < \mu_{regular}\) (The average weight of Ketchup Chips is less than that of Regular Chips.)

You email Lays, and they respond similarly to before: “All our bags weigh, on average, 200g regardless of flavor! Also, we don’t know the standard deviations of ALL the flavours…measure them yourself! And, oh…stop emailing us!”

9.4 Step 2. Designing a study

Determined to investigate further, you plan to purchase two cases of chips (15 bags each) directly from Lays: one case of Ketchup Chips and one case of Regular Chips. Your total sample size is 30 bags–15 Ketchup, 15 Regular.

Once you have the chips, you will weigh each bag using a professionally calibrated scale, ensuring the weights reflect only the chips themselves, excluding the bags. You decide to use null hypothesis significance testing (NHST) to analyze your data with a significance level of \(\alpha = .05\).

Prior to conducting your research, you submit your research plan to the Grenfell Campus research ethics board, which approves your study and classified it as low-risk.

9.5 Step 3. Conducting your study

You follow through with your research plan. You get the following data:

BagID Flavour Weight
1 Ketchup 203.46
2 Ketchup 202.53
3 Ketchup 189.37
4 Ketchup 190.32
5 Ketchup 203.45
6 Ketchup 199.86
7 Ketchup 200.60
8 Ketchup 197.85
9 Ketchup 191.77
10 Ketchup 186.04
11 Ketchup 193.80
12 Ketchup 194.04
13 Ketchup 194.23
14 Ketchup 189.66
15 Ketchup 198.12
16 Regular 206.96
17 Regular 201.44
18 Regular 209.53
19 Regular 195.01
20 Regular 204.16
21 Regular 206.17
22 Regular 198.25
23 Regular 200.89
24 Regular 197.88
25 Regular 198.52
26 Regular 191.28
27 Regular 205.95
28 Regular 191.04
29 Regular 201.97
30 Regular 213.08

And we can summarize the data:

Flavour Mean Min Max SD
Ketchup 195.673 186.04 203.46 5.615633
Regular 201.475 191.04 213.08 6.359418

It is also helpful to visualize the data using a graph/plot. There are several options for a t-test (see an earlier chapter regarding ways to visualize data). For now, we will create a dot plot that has a dot for each bag of chips. The y-axis represent the weight, and the x-axis represent the flavour. I have added some slight x-axis movement within each flavour to prevent dots from overlapping.

Think about it

How do you make sense of this figure? Are there trends you see? If you have to guess, without having access to a formal analysis, would the flavours weight the same?

9.6 Step 4. Analyzing Data

If you recall the concept of the distribution of sample means from the z-test chapter, you know that sample means vary due to random sampling. For example, even if a population mean is 200g with a standard deviation of 6g, taking a sample from that population might not give us exactly 200g every time due to this variability.

We can apply this same logic to the differences between two groups’ means. Even if the true means of both groups are identical, sampling variability will cause the observed difference between the sample means to deviate from 0. In other words, we may see some differences between the two groups’ sample means, even if both are drawn from the same population or populations with identical means.

A t-test helps us assess whether the observed difference between these sample means is large enough to be statistically significant. By using a pre-specified significance level (like \(\alpha = 0.05\)), we can determine whether the difference is so large that it’s unlikely to have occurred if the groups indeed came from the same population—suggesting that the two groups’ means are different.

We will need several pieces of information prior to analyzing our results.

9.7 Deriving the Distribution of Differences in Sample Means

The distribution of differences in sample means, known as the sampling distribution of the difference between two means, is derived from the individual sampling distributions of each group’s mean.

9.7.1 1. Sampling Distribution of Each Group’s Mean

For each group in an independent t-test, the sampling distribution of the sample mean is normally distributed (or approximately normal for large enough sample sizes, due to the Central Limit Theorem). The mean of each group’s sampling distribution is the population mean (\(\mu\)), and the variability is represented by the standard error of the mean.

  • For Group 1, the standard error of the mean (SE1) is:

\(SE_1 = \frac{s_1}{\sqrt{n_1}}\)

where \(s_1\) is the sample standard deviation of Group 1, and \(n_1\) is the sample size.

  • For Group 2, the standard error of the mean (SE2) is:

\(SE_2 = \frac{s_2}{\sqrt{n_2}}\)

9.7.2 2. Sampling Distribution of the Difference Between Means

To obtain the sampling distribution of the difference between the two sample means (\(\bar{X}_1 - \bar{X}_2\)), we combine the two individual sampling distributions. The mean of the difference between the two sample means is the difference between the population means:

\(\mu_{(\bar{X}_1 - \bar{X}_2)} = \mu_1 - \mu_2\)

If the null hypothesis (\(H_0\)) is true (i.e., the two population means are equal), then the mean difference will be 0 (\(\mu_1 - \mu_2 = 0\)).

9.7.3 3. Standard Error of the Difference Between Means

The variability (spread) of the distribution of the difference between means is captured by the standard error of the difference. This is calculated by combining the standard errors of both groups. If we assume that the variances of the two groups are equal, we use the pooled standard deviation to compute the standard error of the difference:

\(SE_{\bar{X}_1 - \bar{X}_2} = \sqrt{\left(\frac{s_p^2}{n_1}\right) + \left(\frac{s_p^2}{n_2}\right)}\)

Where \(s_p\) is the pooled standard deviation, and \(n_1\) and \(n_2\) are the sample sizes of the two groups.

The pooled standard deviation (\(s_p\)) is calculated as follows. It’s often called the summation form):

\(s_p = \sqrt{\frac{(n_1 - 1)s_1^2 + (n_2 - 1)s_2^2}{n_1 + n_2 - 2}}\)

or, an alternative formula for pooled variance follows. You can take the square root of the following to obtain the former. It is often called the weighted variance form:

\(s^2_p = \frac{\Sigma(x_{i1}-\bar{x_1})^2+\Sigma(x_{i2}-\bar{x_2})^2}{n_1+n_2-2}\)

9.7.4 4. t-Distribution

The sampling distribution of the difference between means follows a t-distribution when the population variances are unknown. The degrees of freedom (df) for this t-distribution are:

\(df = n_1 + n_2 - 2\)

This t-distribution is used because we are estimating the population variances from the sample data, and it accounts for the uncertainty associated with that estimation.

9.7.5 5. Calculating the t-statistic

The t-statistic is calculated by comparing the observed difference between the two sample means to the expected difference under the null hypothesis. The formula is:

\[ t = \frac{\bar{X}_1 - \bar{X}_2}{SE_{\bar{X}_1 - \bar{X}_2}} \]

Where: - \(\bar{X}_1\) and \(\bar{X}_2\) are the sample means for Group 1 and Group 2, - \(SE_{\bar{X}_1 - \bar{X}_2}\) is the standard error of the difference between the two means, calculated using the pooled standard deviation.

The t-statistic tells us how many standard errors the observed difference between means is away from 0 (the expected difference under the null hypothesis). A large t-value suggests that the difference between the means is unlikely to have occurred by chance, leading to the potential rejection of the null hypothesis.

9.7.5.1 Assumptions

There are several assumptions we must have when testing from the t distribution.

  • First, the data are continuous. For our purposes, this will be interval or ratio data.
  • Second, the data are randomly sampled.
  • And third, the variance of each group is similar.

9.7.6 Welch’s t-test

So far we have used formulas from Student’s t-test. Specifically, Welch’s t-test is an alternative test that is more robust to unequal group variances and smaller sample sizes. Welch’s alters the denominator for the t-test in the equation to:

\(\sqrt{s^2_{\bar{x1}}+s^2_{\bar{x2}}}\) where

\(s^2_{xi}=\frac{s_i}{\sqrt{n_i}}\)

Thus, the overall equation for Welch’s t-test is:

\(t=\frac{\bar{X}_1-\bar{X}_2}{\sqrt{s^2_{\bar{x1}}+s^2_{\bar{x2}}}}\)

Furthermore, Welch’s t-test alters the degrees of freedom (v) to:

\(v \approx \frac{\left(\frac{s^2_1}{n_1}+\frac{s^2_2}{n_2}\right)^2}{\frac{s^4_1}{n_1^2v_1}+\frac{s^4_2}{n_2^2v_2}}\)

Importantly, there are no major disadvantages to using Welch’s versus Student’s and you should probably use it in your own research. R’s function t.test() automatically uses Welch’s t-test.

For the purposes of this course, we will use Student’s t-test for our hand calculations. However, you can use Welch’s t-test for any analyses conducted using statistical software.

9.7.7 Effect Size

Cohen’s d is the standard effect size estimate for a t-test. It provides us with an estimate of the standardized mean difference. It is:

\(d=\frac{\overline{X}_1-\overline{X}_2}{s_{pooled}}\)

This is a standardized effect size that be compared across groups of metrics. Please review the chapter that discussed how to best determine meaningful effect sizes. However, Cohen suggested the following cut-offs:

  • Small - \(d = .2\)
  • Medium - \(d = .5\)
  • Large - \(d = .8\)

9.7.8 Ketchup a rip-off?

Let’s apply this to our chips example. We have all the data to calculate our t-statistic.

Flavour Mean Min Max SD
Ketchup 195.6733 186.04 203.46 5.615633
Regular 201.4753 191.04 213.08 6.359418

We calculate our squared differences between each bag and the mean of that group, which will be needed later. Not that in the following table, the last column is the squared difference between the weight of a bag of chips (\(X\)) and the mean of that bag’s GROUP (\(\bar{X}\); ketchup or regular):

BagID Flavour Weight (x - x̅)^2
1 Ketchup 203.46 60.6321778
2 Ketchup 202.53 47.0138778
3 Ketchup 189.37 39.7320111
4 Ketchup 190.32 28.6581778
5 Ketchup 203.45 60.4765444
6 Ketchup 199.86 17.5281778
7 Ketchup 200.60 24.2720444
8 Ketchup 197.85 4.7378778
9 Ketchup 191.77 15.2360111
10 Ketchup 186.04 92.8011111
11 Ketchup 193.80 3.5093778
12 Ketchup 194.04 2.6677778
13 Ketchup 194.23 2.0832111
14 Ketchup 189.66 36.1601778
15 Ketchup 198.12 5.9861778
16 Regular 206.96 30.0815684
17 Regular 201.44 0.0012484
18 Regular 209.53 64.8776551
19 Regular 195.01 41.8005351
20 Regular 204.16 7.2074351
21 Regular 206.17 22.0398951
22 Regular 198.25 10.4027751
23 Regular 200.89 0.3426151
24 Regular 197.88 12.9264218
25 Regular 198.52 8.7339951
26 Regular 191.28 103.9448218
27 Regular 205.95 20.0226418
28 Regular 191.04 108.8961818
29 Regular 201.97 0.2446951
30 Regular 213.08 134.6682884

So, for bag 1:

\((X-\bar{X})^2=(203.46 - 195.6733)^2=60.63\)

Let’s fill in the missing data to compute our t-statistic. We have:

\(\bar{X_1} =\) 195.67;

\(\bar{X_2} =\) 201.48;

\(s^2_p = \frac{\Sigma(X_{i1}-\bar{X_1})^2+\Sigma(X_{i2}-\bar{X_2})^2}{n_1+n_2-2} = \frac{441.49+556.19}{15+15-2} = \frac{1007.67}{28} = 35.99\)

and, therefore:

\(t = \frac{195.6733 - 201.4753} {\sqrt{\frac{35.99}{15}+\frac{35.99}{15}}} = \frac{-5.802}{2.190586} = -2.6486\)

You may wish to look the p-value of the resulting test up in a critical value table. However, most likely you will use statistical software to provide you with an exact p-value. The following is the formal results of our t-test:


    Two Sample t-test

data:  Weight by Flavour
t = -2.6487, df = 28, p-value = 0.01313
alternative hypothesis: true difference in means between group Ketchup and group Regular is not equal to 0
95 percent confidence interval:
 -10.289135  -1.314865
sample estimates:
mean in group Ketchup mean in group Regular 
             195.6733              201.4753 

9.7.9 Cohen’s D

Recall that:

\(d=\frac{\overline{x}_1-\overline{x}_2}{s_{pooled}}\)

From our above means and pooled variance, we have:

\(d=\frac{195.67-201.48}{\sqrt{35.99}}=-0.968\)

9.8 Step 5: Write up your results/conclusions

A two Sample t-test testing the difference of weight of bags of chips by Flavour suggest that Ketchup chips (\(\bar{X}= 195.67\) weigh less than Regular chips (\(\bar{X}) = 201.48\)). The results suggests that the effect is statistically significant, and large \(\bar{X}_{diff}= -5.80, 95\% CI [-10.29,-1.31]\), \(t(28) = -2.65, p = .013\), Cohen’s d = -1.00, 95% CI [-1.78, -0.21]$.

9.9 Conclusion

an independent t-test is a statistical method used to determine if there is a significant difference between the means of two independent groups. Unlike the z-test, the independent t-test is used when the population variances are unknown and typically estimated from the sample data. It calculates a t-statistic, which measures how many standard deviations the difference between the two sample means is from the expected difference (usually zero). By comparing this t-statistic to a critical value from the t-distribution, researchers can determine whether the observed difference is statistically significant.

Building your toolbox.
Name Uses Number of IVs Number of DVs IV DV Assumptions Hypotheses Effect Size
z-test Compare one group's mean to a population mean. 0 (No IVs) 1 None or Categorical (e.g., Group) Continuous Normality, known population variance Null: Mean of group equals population mean, Alternative: Mean of group differs from population mean Cohen's d
Independent t-test Compare means between two independent groups. 1 (Categorical, e.g., Group) 1 Categorical (2 groups) Continuous Normality, equal variances (for Student's t-test), independence Null: Means of the two groups are equal, Alternative: Means of the two groups differ Cohen's d (or Hedges' g)
Practice Questions
  1. Practice Question: Calculate the degree of freedom that would be used in Welch’s t-test on the chip data aboce.

  2. Practice Question: Calculate Student’s t statistics for the following data comparing Sour Cream and Onion (SCO) chips to Salt and Vinegar (SV).

Bag_ID Flavour Weight
1 SCO 198.3
2 SCO 192.1
3 SCO 204.8
4 SCO 201.6
5 SCO 198.3
6 SCO 196.6
7 SV 193.7
8 SV 197.4
9 SV 199.2
10 SV 198.3
11 SV 213.0
12 SV 205.8
Warning: The dot-dot notation (`..density..`) was deprecated in ggplot2 3.4.0.
ℹ Please use `after_stat(density)` instead.