Thought bubble: maybe refugees are bad for your mental health, but this paper doesn't prove it
Controlling for time 1 doesn't usually work
Because I once foolishly wrote some papers on intergroup relations, Google Scholar likes to show me what it thinks of as similar papers. “An experimental study of the process of felt understanding in intergroup relations: Japanese and Chinese relations in Japan”… “Ethnic-racial identity and attitude change: assessments of outgroup and diversity attitudes among adolescents in Sweden”… “Intergroup contact via the head: does wearing a furry hat increase positive prosociality towards Orthodox Jews?” (I made up that one.) I treat most of these as standard social psychology flotsam: this research will never stop being published, and nobody will ever care.
Today, one looked interesting. “Effects of refugee settlement on citizens: A prospective longitudinal study of associations between perceived intergroup threat and mental health.”
Now I am pretty open to that research, in two ways. First, I think there ought to be more research about effects of migration which go beyond wages and employment. There’s a kind of stale dichotomy in this research, where anti-migration sentiment is driven either by worries about losing your job, or simply by xenophobia. But since that sentiment is quite persistent, it seems worth understanding it better. Second, I find it quite plausible that immigration or ethnic diversity indeed does affect people’s mental health. There is some evidence that it affects happiness, for instance (but in which direction?) So, could be an interesting paper!
Unfortunately, the results here aren’t convincing; we’ve missed a chance to learn something. Let me explain why. The researchers asked some questions of 280 participants in Amsterdam. Here’s the core result in a picture:
Here, “symbolic threat” is a measure from asking participants a questionnaire about refugees, e.g. “refugees should learn to adapt to the values and norms of the Dutch society as soon as they arrive”. (They also looked at another questionnaire measure called “realistic threat”, but results were about the same, so I’ll ignore that.) Mental health is also from a standard 14-item questionnaire measuring emotional and psychological well-being.
Those arrows represent effects, measured by a pair of regressions, one on symbolic threat and one on mental health. The result is that symbolic threat at time 1 negatively predicts mental health at time 2, controlling for mental health at time 1.
The authors’ interpretation is that symbolic threat at time 1 is causing mental health at time 2 to get worse. In particular, the reason they control for mental health at time 1 is to rule out reverse causation. Maybe people with bad mental health are more scared of refugees — that seems plausible. But even controlling for initial mental health, people who were worried about refugees had worse mental health later! This is their story.
The problem is that mental health is measured with error, using a short 14 item questionnaire. As a result, we have a noisy measure of mental health at time 1, and the correlation of mental health at time 2 with immigrant threat could still be driven by true underlying mental health at time 1.
In case you think this is a theoretical worry, I’ll create a simple example for you using code. The code is below, for R nerds. If you don’t understand it, it doesn’t matter. The comments after #
tell you what it is doing.
# This function estimates the effect of refugee threat on
# mental health.
# n is the number of subjects in your experiment.
# real_effect is the true effect.
# mh_noise is the amount of error in the measure of
# mental health.
estimate_effect <- function (n, real_effect, mh_noise) {
# Real mental health is random:
real_mh <- rnorm(n)
# When we measure it at t1, we add some more random noise:
mh_t1 <- real_mh + rnorm(n, sd = mh_noise)
# The same at t2:
mh_t2 <- real_mh + rnorm(n, sd = mh_noise)
# Perceived refugee threat is correlated with
# real mental health, plus noise:
refugee_threat <- real_mh + rnorm(n)
# And at t2, mental health is affected by the
# level of refugee threat:
mh_t2 <- mh_t2 + real_effect * refugee_threat
# We measure mental health at each time,
# and regress mental health at time 2
# on refugee threat, controlling for mental health at time 1:
result <- lm(mh_t2 ~ mh_t1 + refugee_threat)
# Get the the 95% confidence interval from our result:
result <- broom::tidy(result, conf.int = TRUE)
result <- result[result$term == "refugee_threat",
c("estimate", "conf.low", "conf.high"), drop = TRUE]
result <- unlist(result)
return(result)
}
In this code, we create a little virtual world, where perceived refugee threat is affected by real mental health, real mental health is measured with some error, and mental health at time 2 may also be affected by refugee threat. Then we take the data from our virtual world and run a regression on it, just as the authors did with their real data.
What happens if we create a world in which there is no effect of refugee threat on mental health? We’ll run an experiment with 300 subjects (the actual paper had 280).
estimate_effect(n = 300, real_effect = 0, mh_noise = 0.5)
estimate conf.low conf.high
0.09124411 0.01981220 0.16267602
Uh oh. We estimated an effect of 0.1, and our confidence interval is between 0.02 and 0.16, which excludes the real effect of zero.
Was that chance? To test that, we can rerun the same virtual experiment 100 times and plot the result. (Code is at the end of this post.)
Every dot shows a single estimated effect of refugee threat on mental health, in a virtual world where the true effect is zero. Every line shows the estimated 95% confidence interval. Every line is red, because all the confidence intervals exclude the true, zero effect. We have 100 out of 100 false positives.
Maybe that’s just because we had a very noisy measure of mental health? Our true mental health was chosen randomly with a standard deviation of 1, and our noise had 0.5 standard deviations. Let’s rerun it with a less noisy measure, where the noise is only 0.2. (This means that true mental health correlates at about 0.97 with measured mental health, i.e. almost perfectly.)
99 out of 100 estimates are still positive, even though the true effect is zero, and 64 out of 100 confidence intervals exclude zero.
What if we ran a bigger experiment?
Unfortunately, that just makes things worse. Having a larger N doesn’t get rid of the bias; it just makes us estimate the wrong thing more precisely.
Lastly, what if there really is an effect? Now, our regression correctly says the effect isn’t zero. But it still always overestimates the true effect:
By the way, this is exactly the same problem as I pointed out for spanking research, where people predict bad behaviour at time 2 from physical punishment at time 1, controlling for bad behaviour at time 1. The politics of this result are different, but the scientific problems are the same. Social science means fighting fair, so I can’t criticize the spanking research without criticizing this stuff too.
What have we learned? Well, if you control for things using noisy measures, don’t expect to get the right answer. But I think statisticians knew that already. You see, I am a statistical clown, a mud-eating peasant in the world of statistics, and I knew that.
So I think what we learn is: weak statistical methods are still widespread in psychology, and this makes their results less trustworthy and their research less useful. These guys are asking an interesting question, and they gathered useful data! But it went to waste because their statistics weren’t solid.
In other words, there is still a lot of unrealized value from teaching solid statistics to social scientists.
If you liked this, you might enjoy my book Wyclif’s Dust: Western Cultures from the Printing Press to the Present. It’s available from Amazon, and you can read more about it here.
I also write Lapwing, a more intimate newsletter about my family history.
Code for those plots:
replicate_effect <- function (n, real_effect, mh_noise) {
reps <- replicate(100,
estimate_effect(n = n, real_effect = real_effect, mh_noise = mh_noise)
)
as.data.frame(t(reps))
}
library(ggplot2)
plot_effect <- function (n, real_effect, mh_noise) {
reps <- replicate_effect(n = n, real_effect = real_effect, mh_noise = mh_noise)
false_positives <- sum(reps$conf.low > real_effect)
ggplot(reps, aes(x = estimate, y = 1:100,
color = conf.low > real_effect)) +
geom_pointrange(aes(xmin = conf.low, xmax = conf.high), alpha = 0.4) +
geom_vline(xintercept = real_effect, color = "black", linetype = "dashed") +
theme_minimal() +
theme(
panel.grid.major.y = element_blank(),
panel.grid.minor.y = element_blank(),
legend.position = "none"
) +
scale_color_manual(values = c("FALSE" = "grey50", "TRUE" = "red3")) +
labs(y = "Replicate",
title = "Estimating mental health effect of refugee threat",
subtitle = glue::glue("n = {n}
real effect = {real_effect}
mental health measurement error = {mh_noise}
false positives = {false_positives}/100"))
}