10  Repeated Measures t-test

This chapter will cover the repeated measures t-test, a statistical method used to determine if there is a significant difference between the means of two related, repeated, or dependent groups. Unlike the independent t-test, which compares two separate groups, the repeated measures t-test is used when the same participants are measured under different conditions or at multiple points in time. It calculates a t-statistic based on the differences between paired scores, which allows researchers to determine whether any observed changes are statistically significant.

10.1 Some Additional Details

The repeated measures t-test is appropriate for situations in which there is a natural pairing of the data such as when measuring a group of participants before and after an intervention, or when participants undergo two different experimental conditions.

The null hypothesis posits that there is no change in the mean between the two time points or conditions. Specifically, the null hypothesis states (here, comparing the means of the same individuals at two different time points):

\(H_0: \Delta\mu_{D}=0\)

where \(\Delta\mu\) (delta mu) is the difference between the mean of the same group of participants across two time points or two conditions. The alternative hypothesis states (for a two-sided test)

\(H_1: \Delta\mu_{D} \ne 0\)

In this context, we’re testing whether there is a statistically significant difference in the mean scores of the same group of participants at two time points or under two conditions, rather than comparing two independent groups.

10.1.1 Key Assumptions

A repeated measures t-test can be conducted under certain assumptions. We will explore these in more detail later, but in short:

1. The data are continuous

The dependent variable should be at the interval or ratio level.

2. Paired scores are normally distributed

The differences between paired scores should follow a normal distribution.

3. Independence of observations within each pair

Each observation in one condition should correspond to a single observation in the other condition.

10.2 Therapy for Reducing Anxiety

Imagine a researcher that wants to test the effectiveness of a new anxiety-reduction therapy. The researcher plans on recruiting individuals who are diagnosed with generalized anxiety disorder (GAD) and measures their anxiety levels before and after completing the therapy program. The researcher believes that the therapy will reduce participants’ anxiety levels. However, they decide that a two-tailed test would be best, in case the new therapy program worsens anxiety.

10.3 Step 1. Generate Hypotheses

We can translate this into a statistical hypothesis. For a repeated measures t-test, we are interested in whether the mean difference between the two sets of measurements (pre-therapy and post-therapy anxiety levels) is significantly different from zero. Thus, our hypotheses are as follows:

\(H_0: \Delta\mu = 0\)

\(H_1: \Delta\mu \neq 0\)

10.4 Step 2. Designing the Study

In brief, the method for this study is:

Participants: Participants will be recruited by placing recruitment posters at a local hospital. Interested participants will complete an anxiety questionnaire and those with a score of 50 or above on the anxiety measurement will meet criteria for participation. Those currently receiving psychological services outside of the study will be excluded.

Measures: Anxiety was measured using the Anxiety Questionnaire for Adults (AQA). The questionnaire consists of 20 items, designed to evaluate the frequency and intensity of anxiety symptoms experienced over the past week. Each item is rated on a 5-point Likert scale, ranging from 1 (not at all) to 5 (very often). Higher total scores indicate greater levels of anxiety. The AQA has demonstrated strong psychometric properties in community and clinical samples.

Procedure: Participants for the study were recruited through community centers and online platforms, where they were provided with information about the study’s purpose, procedures, and potential risks and benefits. Interested individuals who met the inclusion criteria (ages 18 and older) were screened via a brief eligibility questionnaire.

Upon obtaining informed consent, participants were administered the Anxiety Questionnaire for Adults (AQA) in a controlled environment. The questionnaire was presented either individually or in small groups, depending on the setting. The administration of the AQA took approximately 10-15 minutes.

Included participants were enrolled in a six-week therapy program. Participants completed individual therapy once per week for the six weeks with a doctoral-level clinical psychologist. Participants who missed more than two sessions were excluded from analysis.

The ethics review board at Grenfell Campus reviewed the project and ethics submission and approved the study.

10.5 Step 3. Conducting the Study

The study was completed as described; a final sample size of 20 was used. The following data were obtained:

Anxiety Scores Before and After Therapy
Participant ID Pre-Therapy Score Post-Therapy Score
1 57.1 49.1
2 66.9 54.7
3 72.4 60.0
4 55.8 38.7
5 59.7 41.6
6 71.1 63.5
7 64.8 56.8
8 60.2 39.4
9 74.9 44.9
10 66.5 53.5
11 60.4 49.0
12 62.9 43.0
13 57.6 40.3
14 74.3 63.2
15 74.6 62.0
16 60.3 45.3
17 59.5 41.6
18 64.3 53.0
19 59.0 43.1
20 63.2 39.1

10.6 Step 4. Analyzing Data

To assess the significance of the mean difference in anxiety scores, we calculate the t-statistic for the paired differences. There is some information we need to calculate the statistics. We require:

  1. Mean Difference

First, we need the mean of the differences between the pre- and post-therapy scores. We can simply calculate the difference between each group’s mean. Here, the mean of the pre-therapy condition is 64.275 and the mean of post-therapy is 49.09. Thus:

\(\Delta\mu =\mu_1-\mu_2\)

And for this study:

\(\Delta\mu = 64.275-49.09=15.185\)

  1. Standard Deviation of Differences

Next, we need to calculate the standard deviation of these differences. For us, we will first need difference scores (\(D_i\))for each person. For example, the difference score (representing by \(\Delta\)) for person 1 (\(x_1\)) is:

\(\Delta x_1 = 57.1-49.1=8.0\)

We would do this for each individual. These are:

Anxiety Scores Before and After Therapy
Participant ID Pre-Therapy Score Post-Therapy Score Difference
1 57.1 49.1 8.0
2 66.9 54.7 12.2
3 72.4 60.0 12.4
4 55.8 38.7 17.1
5 59.7 41.6 18.1
6 71.1 63.5 7.6
7 64.8 56.8 8.0
8 60.2 39.4 20.8
9 74.9 44.9 30.0
10 66.5 53.5 13.0
11 60.4 49.0 11.4
12 62.9 43.0 19.9
13 57.6 40.3 17.3
14 74.3 63.2 11.1
15 74.6 62.0 12.6
16 60.3 45.3 15.0
17 59.5 41.6 17.9
18 64.3 53.0 11.3
19 59.0 43.1 15.9
20 63.2 39.1 24.1

Next we would calculate the standard deviation for these difference scores. Note that the mean of the difference scores is the same as the mean of pre-therapy subtract the mean of post_therapy (\(15.185\)).

\[ s_D = \sqrt{\frac{\sum_{i=1}^n (D_i - \bar{D})^2}{n - 1}} \]

where:

  • \(D_i\) represents each individual difference,
  • \(\bar{D}\) is the mean of the differences,
  • \(n\) is the number of paired observations.

For us, this works out to be:

  • \(\sum (D_i - \bar{D})^2 = 611.365\)
  • \(n-1=20-1=19\)

Thus:

\(s_D = \sqrt{\frac{611.365}{19}} = \sqrt{32.177}=5.677\)

  1. Standard Error of the Mean Difference: The standard error of the mean difference is calculated as follows:

\(SE_D = \frac{\sum(d_i-\overline{D})^2\frac{1}{N}}{\sqrt{n}}= \frac{s_{D}}{\sqrt{n}}\)

where:

  • \(s_{D}\) is the standard deviation of the differences
  • \(n\) is the number of paired observations.

Thus, our SE can be calculated as follows:

\(SE_D = \frac{5.677}{\sqrt{20}}=1.27\)

  1. Calculate the t-Statistic: The t-statistic is calculated by dividing the mean difference by the standard error:

\(t = \frac{\bar{X}_{difference}}{SE}=\frac{15.185}{5.677}=2.67\)

A formal analysis would result in:


    Paired t-test

data:  df_anxiety$Pre_Therapy and df_anxiety$Post_Therapy
t = 11.963, df = 19, p-value = 2.731e-10
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
 12.5282 17.8418
sample estimates:
mean difference 
         15.185 

The t-test output provides the t-statistic, degrees of freedom, and p-value. If the p-value is below the significance level (\(\alpha=.05\)), we can conclude that difference between pre- and post-therapy scores is unlikely if the null were true (i.e., that the therapy was ineffective).

10.6.1 Effect Size: Cohen’s d

For a paired-samples t-test, Cohen’s d provides an estimate of the standardized mean difference. Cohen’s d for repeated measures is calculated as follows:

\(d = \frac{\bar{X}_{D}}{s_{D}}\)

For us:

\(d = \frac{15.185}{5.677}=2.67\)

Note that due to a small sample size, some statistical software may apply a correction to the calculation. Specifically, you should use Hedge’s g when deadling with a small sample size. In fact, there is no downside to using Hedge’s g, as it will be the same as Cohen’s d in larger samples. Using our software:

Cohen's d |       95% CI
------------------------
2.67      | [1.72, 3.62]

This standardized effect size allows us to determine the practical significance of the results. Cohen suggested interpreting d values as follows:

  • Small: (\(d = 0.2\))
  • Medium: (\(d = 0.5\))
  • Large: (\(d = 0.8\))

10.7 Step 5: Write up your results/conclusions

A paired t-test was used to determine the efficacy of the therapy by testing the difference between the Pre-Therapy Post-Therapy scores. The results suggests that the results are unlikely given a true null hypothesis, \(\bar{D}= 15.19\), \(95\% CI [12.53, 17.84]\), \(t(19) = 11.96\), \(p < .001\). Additionally, the effect is considered large, \(d = 2.67\), \(95\% CI [1.72, 3.62]\).

##Building Your Toolbox

Name Uses Number of IVs Number of DVs IV DV Assumptions Hypotheses Effect Size
z-test Compare one group's mean to a population mean. 0 (No IVs) 1 None or Categorical (e.g., Group) Continuous Normality, known population variance Null: Mean of group equals population mean, Alternative: Mean of group differs from population mean Cohen's d
Independent t-test Compare means between two independent groups. 1 (Categorical, e.g., Group) 1 Categorical (2 groups) Continuous Normality, equal variances (for Student's t-test), independence Null: Means of the two groups are equal, Alternative: Means of the two groups differ Cohen's d (or Hedges' g)
Repeated Measures t-test Compare means within the same group at different time points. 1 (Categorical, e.g., Time Point) 1 Categorical (1 group) Continuous Normality of differences, sphericity (if applicable) Null: Means at different time points are equal, Alternative: Means at different time points differ Cohen's d (for paired samples)