4  Statistical Models

In psychological research, a statistical model is a mathematical framework used to represent, analyze, and make predictions about data. It is often used to explain relationships between variables, such as the effect of one or more independent variables (predictors) on a dependent variable (outcome). Statistical models help researchers to:

1. Summarize Data

They provide a simplified representation of complex data, making it easier to understand and interpret patterns or trends.

2. Test Hypotheses

Researchers use statistical models to evaluate whether there are significant relationships between variables, or whether observed patterns in the data could have occurred by chance.

3. Make Predictions

Based on the relationships identified in the data, models can predict future outcomes or behaviors.

When we use theory and generate hypotheses, we can translate our hypotheses into statistical models to test. We are attempting to create a model of real-world phenomenon. For example, consider the following theory proposed by Edwin Shneidman: suicide is caused by psychache (unbearable mental pain). Imagine this was truly how suicide worked in the real world. Consider the following three possible models:

Each researcher could collect data to test how well their hypothesis fits the data that they collect. Each hypothesis can be represented as a model that is statistically testable.

Each researcher would collect slightly different data and analyse it differently. If the model fits the data well, it provides support for the hypothesis and theory. If it does not fit the data well, it likely does not accurately represent the real-world phenomenon of interest. For example, if researcher 3 collected genetic data and the presence of the hypothetical gene did not lead to suicide in some individuals, it would indicate a poor model fit. The results of statistical analyses that test your model can give indication of fit.

Models can be complex or simple. Again, the research question and hypothesis precede the research design–and, subsequently, the model.

4.1 A Basic Model

Let’s try to model the mean height of psychology professors (in centimeters). You cannot measure all the psych professors in the world. Instead, you go to the Arts and Sciences Building at Grenfell Campus and measure the heights of four of your psychology professors. You get the following data.

Name Height
Tyler 181
Steve 190
Jenny 173
Cindy 158

The average height of these professors is 175.5cm. This mean is a model. The model can be represented as:

\(x_i = \overline{x} + e_i\)

Here: \(x_i\) presents the height of professor i, \(\overline{x}\) represents the sample mean height of the professors; and \(e_i\) represent the difference between the professor and the mean, or errors. These are also sometimes referred to as residuals.

We can assess how well the model fits with the data we collected. For our model, it would make sense to try to calculate how large our \(e_i\)s are, as these represent the model error. If our model does a poor job, errors will be higher compared to a model that does a good job.

4.2 Deviations

One method to assess the quality of the fit of the model, our mean, to the data is compare how different our data are from the model. You now know that these are model errors. We can subtract the mean from each value to create a numerical representation of this fit. For example, Tyler is 181cm tall. Our model suggests that the average height is 175.5cm tall. We can calculate the deviation here as:

\(e_i = (x_i - \overline{x}) = (181 - 175.5) = 5.5\)

The following are the deviations for each individual.

Name Deviation
Tyler 5.5
Steve 14.5
Jenny -2.5
Cindy -17.5

If we sum all the errors up across all our data, we get:

So, \(\sum{e_i}=5.5 + 14.5 + (-2.5) + (-17.5) = 0\). What?? That can’t be right. Does this mean that this is the perfect model? No. In many models, the sum of raw residuals will equal 0:

\(\sum_{i=1}^n{e_i}=0\)

There is a way to bypass this statistical conundrum.

4.3 Variance and Standard Deviation

We may effectively model the fit of our mean model with the variance and standard deviation. These are extremely important in statistics so it’s imperative to become familiar with them (see a previous chapter where these are explained in detail).

Above we calculated the the deviation of each score. The variance is, in essence, the average squared difference between a score and its mean.

\(\sigma^2 = {\frac{\sum\limits_{i = 1}^N {\left( {x_i - \bar x} \right)^2 }}{N} }\)

But for a sample, our equation is (see last chapter for the rationale):

\(s^2 = {\frac{\sum\limits_{i = 1}^N {\left( {x_i - \bar x} \right)^2 }}{N-1} }\)

This equation simply means we add up all the squared differences between a score and the mean and divide by the number of scores. So, the squared deviations are:

Name Deviation Squared
Tyler 5.5 30.25
Steve 14.5 210.25
Jenny -2.5 6.25
Cindy -17.5 306.25

We then add up the squared deviations, \(30.25+210.25+6.25+306.26=553\). And divide by the number of scores (with sample adjustment to \(N-1\)), \(4-1=3\), to get:

\(\sigma^2 = {\frac{\sum\limits_{i = 1}^N {\left( {x_i - \bar x} \right)^2 }}{N-1} } = \frac{30.25+210.25+6.25+306.26}{4-1} = \frac{553}{3}=184.33\)

Thus, the variance of the heights of psychology professors is \(184.33\). The standard deviation is simply the squared root of the variance:

\(s = \sqrt{{\frac{\sum\limits_{i = 1}^N {\left( {x_i - \bar x} \right)^2 }}{N-1} }}=\sqrt{184.33}=13.58\)

While you might think that the standard deviation (SD) is the average absolute difference between a score and the mean, it is not. For example, the SD of our heights is 13.58. But the average deviation is, in fact, \(\frac{|5.5| + |14.5 |+ |-2.5| + |-17.5|}{4} = 13.33\). It is most likely helpful to think of the variance as the average squared deviation and the SD as the root of the variance.

Try This

Instead of using the mean in the above model, use a value of 190cm. Our new model would be:

\(x_i = 190 + e_i\)

Calculate the errors, variance and SD using this new model. Was the variance higher, the same or lower?

Which model seemed better? The one using the mean or 190cm?

Name Height NewDeviation NewSquaredDeviation
Tyler 181 -9 81
Steve 190 0 0
Jenny 173 -17 289
Cindy 158 -32 1024

The sum of these new deviations is 1394.

The variance of these is 464.67.

The SD is 21.56.

4.4 Advanced Models

While above we have simply modeled a mean, later chapters will build up to more advanced models, such as:

\(y_i=\beta_0+x_{1i}\beta_1+x_{2i}\beta_2+x_{3i}\beta_3+e_i\)

Don’t be intimidated, this is a whole lot like your classic high school’s \(y=mx+b\), with some intercepts and slopes. More to come. For now, a brief overview of some potential models will do. I will note that many common statistically models fall under the broader umbrella of general linear models (GLM). Some common types of statistical models in psychological research include:

1. Linear regression models

These assess the relationship between one or more independent variables and a continuous dependent variable. For example, predicting levels of anxiety based on hours of sleep.

2. ANOVA (Analysis of Variance)

This model compares the means of different groups to determine if they are significantly different from one another, often used in experimental studies.

3. Structural equation modeling (SEM)

SEM is a more complex statistical model that can evaluate multiple relationships between variables simultaneously, including latent (unmeasured) variables.

Statistical models are essential for drawing conclusions about psychological phenomena, helping researchers identify patterns, test theoretical models, and inform practice. We will revisit the idea of models throughout each chapter that follows.

Practice Problems
  1. Calculate the mean, variance, and standard deviation for both the height (in cms) and weight (in kgs) of these NHL players.
Player Height Weight
Connor McDavid 185.4 99.0
Auston Matthews 190.5 93.0
Sidney Crosby 180.0 91.0
Alex Ovechkin 191.0 107.9
  1. Write out the model for NHL height.

  2. What are the \(e_i\) values for each player when modeling their height?

Mean_Height 186.725000
SD_Height 5.148058
var_Height 26.502500
Mean_Weight 97.725000
SD_Weight 7.587435
var_Weight 57.569167
  1. Write out the model for NHL height.

\(height_i=\overline{x}_{height}+e_i\)

  1. What are the \(e_i\) values for each player when modeling their height?
Player Height e_i
Connor McDavid 185.4 -1.325
Auston Matthews 190.5 3.775
Sidney Crosby 180.0 -6.725
Alex Ovechkin 191.0 4.275